Skip to content

Instantly share code, notes, and snippets.

@twillouer
Created November 15, 2014 20:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save twillouer/ac13eb1dadc8a270f821 to your computer and use it in GitHub Desktop.
Save twillouer/ac13eb1dadc8a270f821 to your computer and use it in GitHub Desktop.
Benchmarking of toArray
@State(Scope.Benchmark)
public class ToArrayBench {
ArrayList<Byte> list;
@Setup
public void setup() throws Throwable
{
list = new ArrayList<>();
for (int i = 0; i < 10; i++) {
list.add((byte) i);
}
}
@Benchmark
public void zero_sized_array()
{
list.toArray(new Byte[0]);
}
@Benchmark
public void simple_toArray()
{
list.toArray();
}
@Benchmark
public void sized_array_from_list()
{
list.toArray(new Byte[list.size()]);
}
@Benchmark
public void sized_array_fixed_size()
{
list.toArray(new Byte[100]);
}
@Benchmark
public void defensive_copy()
{
new ArrayList<>(list);
}
public static void main(String[] args) throws RunnerException, IOException
{
Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*")
.warmupIterations(20)
.warmupTime(TimeValue.seconds(1))
.measurementIterations(20)
.timeUnit(TimeUnit.MILLISECONDS)
.forks(1)
// .addProfiler(LinuxPerfProfiler.class)
.build();
new Runner(opt).run();
}
}
@jerrinot
Copy link

this is giving me similarly confusing results:

@State(Scope.Thread)
public class ToArrayBench {

    @Param("10")
    private int size;
    private Byte[] buffer;

    @Setup
    public void setup() throws Throwable {
        buffer = new Byte[size];
        for (byte i = 0; i < size; i++) {
            buffer[i] = i;
        }
    }

    @Benchmark
    public void fast(Blackhole bh) {
        int s = buffer.length;
        Byte[] copy = Arrays.copyOf(buffer, s, Byte[].class);
        bh.consume(copy);
    }

    @Benchmark
    public void slow(Blackhole bh) {
        int s = buffer.length;
        Byte[] copy = (Byte[]) Array.newInstance(Byte[].class.getComponentType(), s);
        System.arraycopy(buffer, 0, copy, 0, s);
        bh.consume(copy);
    }

    public static void main(String[] args) throws Throwable {
        Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*")
                .warmupIterations(10)
                .warmupTime(TimeValue.seconds(1))
                .measurementIterations(20)
                .timeUnit(TimeUnit.MILLISECONDS)
                .threads(1)
                .forks(1)
                .addProfiler(LinuxPerfProfiler.class)
                .build();

        new Runner(opt).run();
    }
}
Benchmark                         (size)   Mode  Samples       Score  Score error   Units
o.o.j.s.ToArrayBench.fast             10  thrpt       20  134199.249    15573.195  ops/ms
o.o.j.s.ToArrayBench.fast:@cpi        10  thrpt        1       0.367          NaN     CPI
o.o.j.s.ToArrayBench.slow             10  thrpt       20   49499.140     2661.012  ops/ms
o.o.j.s.ToArrayBench.slow:@cpi        10  thrpt        1       0.764          NaN     CPI

perf stats for fast():

      23044.197935 task-clock (msec)         #    0.632 CPUs utilized          
            14,410 context-switches          #    0.625 K/sec                  
             3,227 cpu-migrations            #    0.140 K/sec                  
               423 page-faults               #    0.018 K/sec                  
    78,964,852,601 cycles                    #    3.427 GHz                     [30.93%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   214,945,315,883 instructions              #    2.72  insns per cycle         [38.77%]
    40,570,072,182 branches                  # 1760.533 M/sec                   [39.00%]
         8,633,173 branch-misses             #    0.02% of all branches         [38.99%]
    55,630,805,896 L1-dcache-loads           # 2414.092 M/sec                   [39.18%]
     2,727,361,923 L1-dcache-load-misses     #    4.90% of all L1-dcache hits   [38.89%]
     1,403,751,595 LLC-loads                 #   60.916 M/sec                   [30.76%]
   <not supported> LLC-load-misses:HG      
   <not supported> L1-icache-loads:HG      
        10,486,830 L1-icache-load-misses:HG  #    0.00% of all L1-icache hits   [31.88%]
    54,778,270,818 dTLB-loads:HG             # 2377.096 M/sec                   [31.78%]
           584,070 dTLB-load-misses:HG       #    0.00% of all dTLB cache hits  [31.70%]
        23,538,830 iTLB-loads:HG             #    1.021 M/sec                   [31.62%]
           312,318 iTLB-load-misses:HG       #    1.33% of all iTLB cache hits  [31.64%]
   <not supported> L1-dcache-prefetches:HG 
                 0 L1-dcache-prefetch-misses:HG #    0.000 K/sec                   [31.57%]

      36.459311589 seconds time elapsed

perf stats for slow():

      23121.720993 task-clock (msec)         #    0.635 CPUs utilized          
            14,615 context-switches          #    0.632 K/sec                  
             3,168 cpu-migrations            #    0.137 K/sec                  
               463 page-faults               #    0.020 K/sec                  
    79,644,122,769 cycles                    #    3.445 GHz                     [30.98%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   104,254,073,200 instructions              #    1.31  insns per cycle         [38.76%]
    17,489,978,097 branches                  #  756.431 M/sec                   [38.87%]
        42,822,998 branch-misses             #    0.24% of all branches         [38.74%]
    21,942,957,608 L1-dcache-loads           #  949.019 M/sec                   [38.73%]
     1,022,663,441 L1-dcache-load-misses     #    4.66% of all L1-dcache hits   [38.71%]
        88,585,236 LLC-loads                 #    3.831 M/sec                   [30.86%]
   <not supported> LLC-load-misses:HG      
   <not supported> L1-icache-loads:HG      
        11,208,770 L1-icache-load-misses:HG  #    0.00% of all L1-icache hits   [31.33%]
    21,891,428,217 dTLB-loads:HG             #  946.791 M/sec                   [31.21%]
           807,245 dTLB-load-misses:HG       #    0.00% of all dTLB cache hits  [31.04%]
        24,236,835 iTLB-loads:HG             #    1.048 M/sec                   [30.99%]
           387,664 iTLB-load-misses:HG       #    1.60% of all iTLB cache hits  [31.06%]
   <not supported> L1-dcache-prefetches:HG 
                 0 L1-dcache-prefetch-misses:HG #    0.000 K/sec                   [31.13%]

      36.428755980 seconds time elapsed

@twillouer
Copy link
Author

@jerrinot thanks for your time.
Still in trouble to understand the problem :)

@klinham
Copy link

klinham commented Jan 10, 2015

No idea either, can someone please elaborate on that ?

@twillouer
Copy link
Author

Updated:

@State(Scope.Benchmark)
public class ToArrayBench {

    private static final int SIZE = 100;

    ArrayList<Byte> list;

    @Setup
    public void setup() throws Throwable
    {
        list = new ArrayList<>();
        for (int i = 0; i < SIZE; i++) {
            list.add((byte) i);
        }
    }

    @Benchmark
    public void zero_sized_array(Blackhole bh)
    {
        bh.consume(list.toArray(new Byte[0]));
    }

    @Benchmark
    public void simple_toArray(Blackhole bh)
    {
        bh.consume(list.toArray());
    }

    @Benchmark
    public void sized_array_from_list(Blackhole bh)
    {
        bh.consume(list.toArray(new Byte[list.size()]));
    }

    @Benchmark
    public void sized_array_fixed_size(Blackhole bh)
    {
        bh.consume(list.toArray(new Byte[SIZE]));
    }

    @Benchmark
    public void defensive_copy(Blackhole bh)
    {
        bh.consume(new ArrayList<>(list));
    }

    public static void main(String[] args) throws RunnerException, IOException
    {
        Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*")
                .warmupIterations(20)
                .warmupTime(TimeValue.seconds(1))
                .measurementIterations(20)
                .timeUnit(TimeUnit.MILLISECONDS)
                .forks(1)
//                .addProfiler(LinuxPerfAsmProfiler.class)
                .build();

        new Runner(opt).run();
    }
}

@twillouer
Copy link
Author

Benchmark Mode Cnt Score Error Units
ToArrayBench.defensive_copy thrpt 200 16 714 192 ± 129515,217 ops/s
ToArrayBench.simple_toArray thrpt 200 17 918 950 ± 102801,298 ops/s
ToArrayBench.sized_array_fixed_size thrpt 200 5 799 136 ± 65921,564 ops/s
ToArrayBench.sized_array_from_list thrpt 200 5 643 162 ± 85215,009 ops/s
ToArrayBench.zero_sized_array thrpt 200 6 529 068 ± 78960,062 ops/s

@shipilev
Copy link

Routinely, I will chew on people who can't use perfasm profiler, but this is not your fault it wasn't helping here. ;) Only in JMH 1.5+ (released yesterday) perfasm can decode the VM stubs, and VM stubs are the crucial piece of info to untangle this. See: http://cr.openjdk.java.net/~shade/scratch/ToArrayBench.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment