Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edefazio/85cd0c1264a0938b88a8 to your computer and use it in GitHub Desktop.
Save edefazio/85cd0c1264a0938b88a8 to your computer and use it in GitHub Desktop.
Unsafe.allocateMemory(...) performs 2x BETTER when we pass in an int verses a long (of the same value)
package strangeunsafe;
import java.lang.reflect.Field;
import java.util.Random;
import sun.misc.Unsafe
/**
* Illustrate "strange" issue where Unsafe.allocateMemory(...) performs BETTER
* when we pass in an int parameter verses a long parameter (of the same value).
* (more detail in description)
* <PRE>
* Took 665ms to scan LANE 1 100000 8bit bytes 10000x and find 197690000
* Took 1138ms to scan LANE 2 100000 8bit bytes 10000x and find 197690000
*
* Took 6597ms to scan LANE 1 100000000 8bit bytes 100x and find 2000811000
* Took 11335ms to scan LANE 2 100000000 8bit bytes 100x and find 2000811000
*
* Took 6489ms to scan LANE 1 100000000 8bit bytes 100x and find 1999797500
* Took 11319ms to scan LANE 2 100000000 8bit bytes 100x and find 1999797500
* </PRE>
*/
public class UnsafeAllocateMemoryConstructorEffectsScanPerformance
{
public static final int LANE1_DATASET_SIZE = 100000000;
public static final long LANE2_DATASET_SIZE = LANE1_DATASET_SIZE;
public static final long ITERATIONS = 100;
public static void main ( String[] args )
throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
{
Unsafe theUnsafe = getUnsafe();
//NOTE: the addresses are BOTH longs
long startLane1 = theUnsafe.allocateMemory( LANE1_DATASET_SIZE );
long startLane2 = theUnsafe.allocateMemory( LANE2_DATASET_SIZE );
// I "thought" perhaps there was some optimization going on
// with the addressing, but BOTH of the addresses SEEM to be
// using only 31/32-bits of the 64-bit address...
// (so both physical memory addresses are using only 32-bits of the 64-bits)
System.out.println ("LANE1 MEMORY ADDRESS BINARY "+ Long.toString ( startLane1 , 2 ) );
System.out.println ("LANE2 MEMORY ADDRESS BINARY "+ Long.toString ( startLane2 , 2 ) );
Random r = new Random();
long lane1Address = startLane1;
long lane2Address = startLane2;
for ( int i = 0; i < LANE1_DATASET_SIZE; i++ )
{
//populate data [1...5] in each byte (the same data in each "lane")
byte oneTo5 = (byte) ( r.nextInt( 5 ) + 1 );
theUnsafe.putByte( lane1Address, oneTo5 );
theUnsafe.putByte( lane2Address, oneTo5 );
lane1Address++;
lane2Address++;
}
long start = System.currentTimeMillis();
//lets scan through lane1, counting the number of matches
int count = 0;
for ( int it = 0; it < ITERATIONS; it++ )
{
lane1Address = startLane1;
for ( int i = 0; i < LANE1_DATASET_SIZE; i++ )
{
if ( theUnsafe.getByte( lane1Address ) == 5 )
{
count++;
}
lane1Address++;
}
}
long end = System.currentTimeMillis();
System.out.println ("Took "+(end-start)+"ms to scan LANE 1 "+ LANE1_DATASET_SIZE+" 8bit bytes "+ITERATIONS+"x and find "+count);
theUnsafe.freeMemory( startLane1 );
//lets do the same with lane 2
start = System.currentTimeMillis(); //RESET START
count = 0;
for ( int it = 0; it < ITERATIONS; it++ )
{
lane2Address = startLane2;
for ( int i = 0; i < LANE2_DATASET_SIZE; i++ )
{
if ( theUnsafe.getByte( lane2Address ) == 5 )
{
count++;
}
lane2Address++;
}
}
end = System.currentTimeMillis();
System.out.println ("Took "+(end-start)+"ms to scan LANE 2 "+ LANE1_DATASET_SIZE+" 8bit bytes "+ITERATIONS+"x and find "+count);
theUnsafe.freeMemory( startLane2 );
}
private static Unsafe getUnsafe()
throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
}
@edefazio
Copy link
Author

Just to immortalize my stupidity for posterity: I'll keep this comment here but the problem (as Alexsey and Nitsan point out) has nothing to do with Unsafe.get()... or with Unsafe.allocateMemory(); but rather dealing in the fact that loop unrolling (in the second for loop:
for ( int i = 0; i < LANE2_DATASET_SIZE; i++ )
...doesn't take place by the VM 'cause LANE2_DATASET_SIZE is a long.
to "fix" this issue, change it to:
for ( int i = 0; i < LANE1_DATASET_SIZE; i++ )

Here's the original comment:

when Unsafe.allocateMemory(...) passes in:
a 64-bit long;
-rather than-
a 32-bit int (each having the same numeric value)
...the scan performance of the memory lane allocated with an int is 2x as fast as the one allocated with a long.

You can comment one or the other out and achieve the same results, I am putting
them both "together" for conciseness/illustration.

--------------- S Y S T E M ---------------
OS: Windows 7 , 64 bit Build 7601 (6.1.7601.19018)
CPU:total 4 (4 cores per cpu, 1 threads per core) family 6 model 58 stepping 9, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, aes, clmul, erms, tsc, tscinvbit, tscinv
Memory: 4k page, physical 8073868k(3073852k free), swap 16145900k(10420676k free)
vm_info: Java HotSpot(TM) 64-Bit Server VM (25.60-b23) for windows-amd64 JRE (1.8.0_60-b27), built on Aug 4 2015 11:06:27 by "java_re" with MS VC++ 10.0 (VS2010)

Same result with alternate Sun VM:
vm_info: Java HotSpot(TM) 64-Bit Server VM (25.11-b03) for windows-amd64 JRE (1.8.0_11-b12), built on Jun 16 2014 20:57:32 by "java_re" with MS VC++ 10.0 (VS2010)

Same with alternate OpenJDK (Zulu) VM:
Took 6490ms to scan LANE 1 100000000 8bit bytes 100x and find 2000606200
Took 10905ms to scan LANE 2 100000000 8bit bytes 100x and find 2000606200
JRE version: OpenJDK Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
Java VM: OpenJDK 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)
vm_info: OpenJDK 64-Bit Server VM (25.60-b23) for windows-amd64 JRE (1.8.0_60-b27), built on Aug 20 2015 03:44:36 by "tester" with MS VC++ 10.0 (VS2010)

@shipilev
Copy link

Would you please use a proper benchmarking harness? The effects you are seeing might as well be caused by bad benchmarking. But if the issue is real, then you'd need to provide profiles. I think JMH will provide enough insights with -prof perfnorm and -prof perfasm.

@nitsanw
Copy link

nitsanw commented Oct 30, 2015

Blind guess, the second loop somehow got a safepoint by having a limit that is a long.

@nitsanw
Copy link

nitsanw commented Oct 30, 2015

Validated via a JMH benchmark. The problem is not your allocation it is the second loop through the second allocation being limited by a long: 'for ( int i = 0; i < LANE2_DATASET_SIZE; i++ )'
If you change it to be 'for ( int i = 0; i < LANE1_DATASET_SIZE; i++ )' the results should be the same (and you should use JMH!!!!)

@bourgesl
Copy link

I have rewritten your test to be more correct = allocate a single lane, reproduce the test several times and remove the if (byte == 5) which is very costly and leads to branch mispredictions !

With my Test class, I do not see any difference on JDK 1.7 or 1.8 LINUX 64:

package test;

import java.lang.reflect.Field;
import java.util.Random;

import sun.misc.Unsafe;

/**
Java version: 1.8.0_60
Testing useInt= true
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2594ms to scan LANE 50000000 8bit bytes  100x and find 2115247912
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2565ms to scan LANE 50000000 8bit bytes  100x and find 2113732012
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2566ms to scan LANE 50000000 8bit bytes  100x and find 2114594612
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2565ms to scan LANE 50000000 8bit bytes  100x and find 2115537912
Testing useInt= false
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2565ms to scan LANE 50000000 8bit bytes  100x and find 2116939912
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2565ms to scan LANE 50000000 8bit bytes  100x and find 2114448212
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2566ms to scan LANE 50000000 8bit bytes  100x and find 2115045012
LANE1 MEMORY ADDRESS BINARY 11111111110100111110101000001010000000000010000
Took 2565ms to scan LANE 50000000 8bit bytes  100x and find 2115605712
 */
public class Test 
{
    public static final int LANE1_DATASET_SIZE  = 50000000;    

    public static final long LANE2_DATASET_SIZE = LANE1_DATASET_SIZE; 

    public static final long ITERATIONS = 100;

    public static void main ( String[] args ) 
        throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
    {       
        System.out.println("Java version: "+System.getProperty("java.version"));

        for ( int n = 0; n < 4; n++ ) 
        {           
            test(true);
            test(false);
        }
    }

    public static void test(boolean useInt) 
        throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
    {
        System.out.println("Testing useInt= "+useInt);
        Unsafe theUnsafe = getUnsafe();

        Random r = new Random();

        long start, end;

        for ( int n = 0; n < 4; n++ ) 
        {           
            // Init:

            //NOTE: the addresses are BOTH longs
            long startLane = 
                (useInt) ? theUnsafe.allocateMemory( LANE1_DATASET_SIZE ) :
                           theUnsafe.allocateMemory( LANE2_DATASET_SIZE );

            // I "thought" perhaps there was some optimization going on
            // with the addressing, but BOTH of the addresses SEEM to be
            // using only 31/32-bits of the 64-bit address...
            // (so both physical memory addresses are using only 32-bits of the 64-bits)
            System.out.println ("LANE1 MEMORY ADDRESS BINARY "+ Long.toString ( startLane , 2 ) );

            long addr = startLane;

            for ( int i = 0, len = LANE1_DATASET_SIZE; i < len; i++ )
            {
                //populate data [1...5] in each byte (the same data in each "lane")
                byte oneTo5 = (byte) ( r.nextInt( 5 ) + 1 );
                theUnsafe.putByte( addr, oneTo5 );
                addr++;
            }


            int count = 0;

            //lets scan through lane, counting the number of matches
            start = System.currentTimeMillis();
            for ( int it = 0; it < ITERATIONS; it++ ) 
            {           
                addr = startLane;

                for ( int i = 0, len = LANE1_DATASET_SIZE; i < len; i++ )
                {
                    count += theUnsafe.getByte( addr );
                    addr++;
                }           
            }
            end = System.currentTimeMillis();
            System.out.println ("Took "+(end-start)+"ms to scan LANE "+ LANE1_DATASET_SIZE+" 8bit bytes  "+ITERATIONS+"x and find "+count);

            // Free memory:
            theUnsafe.freeMemory( startLane );
        }
    }

    private static Unsafe getUnsafe() 
            throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException 
    {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        Unsafe unsafe = (Unsafe) theUnsafe.get(null);
        return unsafe;
    }
}

@bourgesl
Copy link

You're right: the mistake was on the loop condition I have written differently:

 for ( int i = 0, len = LANE1_DATASET_SIZE; i < len; i++ )

@edefazio
Copy link
Author

edefazio commented Nov 1, 2015

Nitsan's (Video) "The Illusion of Execution" describing JVM execution.
The whole video is great, relevant to this faux paus (non-counted Loops and safe points) starts at about ~12:15:
https://vimeo.com/138863976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment