Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Bug #3 – Obscure Behavior

> We claim this is a bug – intentional or not.

I like seeing this described as a bug. Bug #3 is documented behavior (there's a comment right there in the source calling it out), and it does what it's intended to do, and the relevant RFC says that it should be doing that. It's only a "bug" in the sense that the behavior is nevertheless metaphysically incorrect.

I once wrote java code doing bit shifts of arbitrary length as part of a programming contest at my school. It failed on a single test case for mysterious reasons. I eventually discovered that my algorithm was totally correct -- but I had been assuming that if you shift a 32-bit integer by 32 or more bits, you'll get zero. In fact, and this is required by the java specification, you get unpredictable garbage data. More specifically, the java standard mandates that "a << b" be silently compiled to "a << (b % 32)". So I had to rewrite code like this:

    bit_pattern = bit_pattern << potentially_large_value;
into something more like this:

    // if you shift by x, Java will silently rewrite it as a shift by (x % 32)
    while( potentially_large_value > 31 ) {
      bit_pattern = bit_pattern << 31;
      potentially_large_value -= 31;
    }
    bit_pattern = bit_pattern << potentially_large_value;
I can imagine no circumstance where this would be useful or helpful in any way. Even later, I found out that the JVM bytecode for bit shifting uses a 5-bit operand, and I surmise that the standard is written the way it is to make it slightly (slightly!) simpler to compile a Java shift statement into a JVM bytecode instruction. I can't call this a bug in javac -- it's doing exactly what it's required to do! But the only way it makes sense to me to describe this is as a bug in the Java specification. If I say to do something 45 times, that should be the same thing as doing it 15 times, and then another 15 times, and then 15 more times. It shouldn't instead be the same as doing it 13 times.


This looks wonky. Why are you writing a loop here?

    if (potentially_large_value >= 32) bit_pattern = 0;
    else bit_pattern = bit_pattern << potentially_large_value;
That does the same thing in constant time, and doesn't loop 2^27 times on bad input.


Simple, it preserves the algorithm. Just setting to 0 is conceptually a different thing.

I might also note that in the specific instance, potentially_large_value was the length of a string in memory.


The x86 shift instructions do the same, so I'd guess that's why the JVM does it that way.


Sure, the JVM can use a 5-bit value for doing shifts on 32-bit integers, there's no problem there. But why would Java compile "x << 37" to "x << 5"? If you're working raw in the JVM, there's no such concept as x << 37, because 37 requires more than 5 bits to specify. Java explicitly allows you to say x << 37, but secretly defines it to do something insane.


x % 32 === x & 31

So if they use a 5-bit number, they can just stick your second operand into that slot after it's been ANDed with 31 (which might even be implicit in how the x86 shift instructions work). To do otherwise would require them to branch (just like you did in your solution).


So what?


...except on the 8086/8088, where it will continue shifting and filling the register with 0s or sign bits (it's done in a microcode loop) up to 255, the maximum encodable shift amount.

Intel changed the behaviour to mask off the shift count to the low 5 bits, starting with the 80186/80188.


I'm surprised this (is/was) the behaviour of Java. Java doesn't have much (any?) undefined behaviour with simple expressions. I think you only start to get to undefined behaviour when you write programs that are racy according to Java or if you find bugs. And I think this was a very deliberate design decision.

Anyway here is the current JLS which claims the behaviour is defined: https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.htm...

Oh. Never mind I just read the rest of your comment and you linked to the behaviour that is defined :)


I can imagine no circumstance where this would be useful or helpful in any way.

The one application that stands out is encryption algorithms which do data-dependent rotates.


Well, that's not a bit-shift, it's a rotate.


What about circular rotates? a<<(x%32) & a>>(32-(x%32))


having worked an example to see how this could possibly work, I feel compelled to note for other people who might have wondered that a circular rotate looks like "(a << (x % 32)) | (a >> (32 - (x % 32)))"; as written, you're replacing every bit with zero.


a = 1111 0000 0000 1111 1111 0000 0000 1111

a << 37 == a << (37 % 32) == a << 5 == 0000 0001 1111 1110 0000 0001 1110 0000

a >> (32 - (37 % 32)) == a >> 27 == 0000 0000 0000 0000 0000 0000 0001 1110

anding the two together you get == 0000 0001 1111 1110 0000 0001 1111 1110

which is a circular rotation of the bit string...


0 & 1 is not 1, it's 0.

      0000 0001 1111 1110 0000 0001 1110 0000
    & 0000 0000 0000 0000 0000 0000 0001 1110
    -----------------------------------------
      0000 0000 0000 0000 0000 0000 0000 0000


You're right! My bad. Definitely meant to | it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: