Skip to content

Instantly share code, notes, and snippets.

@cykl
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cykl/9924619 to your computer and use it in GitHub Desktop.
Save cykl/9924619 to your computer and use it in GitHub Desktop.
Companion code for JDK 8 @contended article

Java 8 est sorti ce mois-ci; tu as même eu droit à une dépêche ici même qui parle des lambdas, la stream API etc.

Cependant derrière ces gros changements qui impactent le monde hétérogène des devs Java, il y a des petits changements qui eux servent plutôt aux devs qui font des briques de base, de l'infra ou du code qui va vite. Je te propose donc d'explorer quelques JDK Enhancement Proposals d'OpenJDK.

Pour ce premier journal, on commence avec la JEP 142: Reduce Cache Contention on Specified Fields soit l'annotation @Contended qui vise à proposer une solution aux problèmes de false sharing.

C'est quoi le false sharing ?

Le false sharing est un problème de performance en environnement parallèle qui est causé par une « leaky abstraction » du matériel. La présentation suivante est extrêmement grossière et ne vise qu'à faire comprendre le problème aux gens ne connaissant pas du tout le domaine.

En tant que développeur on aime se représenter la mémoire comme un espace d'adressage continu. Plus on travaille dans un langage de haut niveau plus cela est vrai. Par exemple les problèmes d'alignement sont une notion totalement inconnue pour beaucoup. Cependant dans ce monde idéal la réalité du matériel refait surface périodiquement.

Un CPU étant beaucoup plus rapide que la mémoire vive et le principe de localité ayant été découvert, il dispose de caches mémoire. Ce sont des petits morceaux de mémoire tampon travaillant à une vitesse beaucoup plus proche du CPU. Le pari étant qu'une fois l'accès couteux à la mémoire centrale effectué cette valeur va être réutilisée et on ira alors la chercher dans ce cache et donc gagner beaucoup beaucoup de temps.

Le problème du false sharing vient de deux choses:

  • L'archicture des caches. Un cache est composé d'un certain nombre de lignes de taille fixe (64 octets par exemple). Lorsqu'une modification est faite, elle affecte la ligne entière.
  • Le CPU doit gérer la cohérence entre ces caches à l’aide d’un protocole de cohérence. Il s'agit de s'assurer que lorsqu'un CPU/core fait une modification elle sera visible par les autres. Chaque architecture à son propre modèle de cohérence, le X86 étant par exemple particulière fort. Ce modèle est exposé dans les langages de bas niveau, mais les langages de haut niveau décrivent souvent leur propre memory model. Charge au compilateur de traduire celui du langage vers celui de la plateforme.

Si l'on met ces deux choses ensembles, on arrive au false sharing: deux variables théoriquement indépendantes se retrouvent sur la même ligne de cache. Chacune est accédée/modifiée par un CPU distinct, cependant ils doivent passer leur temps à se synchroniser ce qui écroule les performances.

Bref notre bel espace mémoire uniforme vient d'en prendre un coup. Coller ou espacer deux variables peut faire varier d'un à plusieurs ordre de grandeur la performance de notre structure de donnée.

Exemple

Commençons avec un benchmark très simple: Une classe avec un seul membre ainsi que de 4 threads. Le premier lit en permanence la valeur de ce membre, les trois autres ne font rien. Le benchmark est écrit avec JMH

  @State(Scope.Benchmark)
  public static class StateNoFalseSharing {
    public int readOnly;
  }

  @GenerateMicroBenchmark
  @Group("noFalseSharing")
  public int reader(StateNoFalseSharing s) { return s.readOnly; }

  @GenerateMicroBenchmark
  @Group("noFalseSharing")
  public void noOp(StateNoFalseSharing s) { }

qui nous donne le résultat suivant:

Benchmark                                        Mode   Samples         Mean   Mean error    Units
g.c.Benchmarks.noFalseSharing:noOp               avgt        18        0.297        0.002    ns/op
g.c.Benchmarks.noFalseSharing:reader             avgt        18        0.743        0.003    ns/op

Comme on pouvait s'y attendre c'est très rapide et on aura du mal à mesurer quelque chose de plus petit.

Maintenant faisons évoluer notre benchmark. Nous ajoutons un deuxième membre qui va être accéder par les trois threads qui ne faisaient rien. Le premier thread ne change absolument pas et si les caches n'étaient pas organisés en ligne il n'y aurait aucune raison que sa performance soit affectée.

  @State(Scope.Group)
  public static class StateFalseSharing {
    int readOnly;
    volatile int writeOnly;
  }

  @GenerateMicroBenchmark
  @Group("falseSharing")
  public int reader(StateFalseSharing s) {
    return s.readOnly;
  }

  @GenerateMicroBenchmark
  @Group("falseSharing")
  public int writer(StateFalseSharing s) {
    return s.writeOnly++;
  }

Regardons les résultats:

Benchmark                                        Mode   Samples         Mean   Mean error    Units
g.c.Benchmarks.falseSharing:reader               avgt        18        5.038        0.617    ns/op
g.c.Benchmarks.falseSharing:writer               avgt        18       78.530        3.598    ns/op

On vient presque de prendre un facteur 10.

Nous pouvons vérifier la disposition mémoire de notre objet StateBaseline avec jol pour voir que nos deux variables ont bien été collées par le compilateur:

gist.contended.Benchmarks.StateFalseSharing object internals:
 OFFSET  SIZE  TYPE DESCRIPTION                    VALUE
      0    12       (object header)                N/A
     12     4   int StateFalseSharing.readOnly     N/A
     16     4   int StateFalseSharing.writeOnly    N/A
     20     4       (loss due to the next object alignment)
Instance size: 24 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Sans rentrer dans les détails, statistiquement il y a de fortes chances qu'ils se retrouvent dans la même ligne de cache.

@Contended

La solution à notre problème est donc simplement d'espacer ces deux variables quitte à perdre de l'espace. Ça parait simple mais avant OpenJDK 8 cela demande de très sérieusement connaitre/lutter contre la VM.

Fort du principe de localité, le comportement logique de la VM est d'essayer d'entasser autant que possible les différents membres comme bon lui semble. Le layout d'un objet peut changer selon beaucoup de critères et l'utilisation d'un GC n'aide pas puisqu'il peut décider de déplacer un peu tout et n’importe quoi (notamment les tableaux utilisés pour padder). Bref trouver une stratégie qui marche, est une source d'amusement inépuisable. Aleksey Shipilёv en a documenté quelques-unes dans un benchmark JMH de même que Martin Thompson.

La JEP 142 propose d'ajouter une annotation @Contended pour identifier les variables, ou classes, qui doivent se retrouver seules sur une ligne de cache pour éviter le false sharing.

Essayons de l'utiliser:

  @State(Scope.Group)
  public static class StateContended {
    int readOnly;
    @Contended volatile int writeOnly;
  }

  @GenerateMicroBenchmark
  @Group("contented")
  public int reader(StateContended s) {
    return s.readOnly;
  }

  @GenerateMicroBenchmark
  @Group("contented")
  public int writer(StateContended s) {
    return s.writeOnly++;
  }

Vérifions avec jol puis avec JMH

gist.contended.Benchmarks.StateContended object internals:
 OFFSET  SIZE  TYPE DESCRIPTION                    VALUE
      0    12       (object header)                N/A
     12     4   int StateContended.readOnly        N/A
     16   128       (alignment/padding gap)        N/A
    144     4   int StateContended.writeOnly       N/A
    148     4       (loss due to the next object alignment)
Instance size: 152 bytes (estimated, the sample instance is not available)
Space losses: 128 bytes internal + 4 bytes external = 132 bytes total


Benchmark                                        Mode   Samples         Mean   Mean error    Units
g.c.Benchmarks.contented:reader                  avgt        18        0.742        0.006    ns/op
g.c.Benchmarks.contented:writer                  avgt        18       70.811        3.572    ns/op

On observe que la variable a bien été décalée et on retrouve les performances initiales.

Limitations

@Contended est une JEP d'OpenJDK, c'est à dire qu'il ne s'agit pas d'une spécification de Java ou de la JVM. L'annotation se trouve dans un package privé d'Oracle et l'annotation n'est disponible que pour les classes du JDK par défaut (comme beaucoup de chose que le JDK se réserve précieusement). Si on veut l'utiliser, et donc se lier à OpenJDK, il faut passer l'option -XX:-RestrictContended.

Bien entendu vu l'impact sur la consommation mémoire et la possibilité de réduire l’efficacité du cache il faut bien savoir ce qu'on fait et utiliser avec parcimonie.

Comment détecter un cas de false sharing

Notre exemple était très simple et nous connaissions le problème. Malheureusement dans la vraie vie ce n'est pas aussi évident et il n'existe pas à ma connaissance d'outil simple permettant de détecter le false sharing, quel que soit le langage. On peut suivre les conseils d'Intel et les appliquer avec leur outil ou avec perf mais ça reste assez empirique.

Si on garde le principe du false sharing en tête, cela permet de surveiller les mauvais patterns dans les bouts d'infrastructure qui peuvent être affectés. En général il faut que ça commence à aller sérieusement vite, donc structure de données dédiée, pour que ça commence à devenir un problème.

Cas courants

Identiquement à notre exemple un même objet possède deux membres qui sont utilisés par deux threads différents. Ça arrive par exemple qu'en un objet tient des statistiques. Dans ce cas on va annoter le membre avec @Contended.

On peut avoir aussi le cas de plusieurs instances d'une même classe qui préférerait être chacune être dans sa propre ligne de cache. Dans ce cas on va annoter la classe. Cela fonctionne aussi si l'on met les instances dans un tableau. Cas courant lorsque fait travailler plusieurs threads en parallèle.

Le dernier cas est du calcul de type matriciel avec plusieurs threads. Dans ce cas l'annotation ne peut rien et il faut concevoir son algorithme pour en tenir compte (tout comme on itère dans le bon sens). Dr doobs fourni un tel exemple.

J’ai essayé de fournir quelques exemples dans le benchmark.

Conclusion

@Contended ne devrait pas changer la vie de grand monde hormis celle des gens qui pondent l’infra de service à haute performance. Mais elle ouvre la porte à une revendication de longue date : marier les bénéfices de la JVM avec les besoins des applications haute performance en ouvrant l’accès au matériel et à des techniques contre l’esprit initial de Java mais requises.

Cette annotation ne répond absolument pas au besoin de pouvoir contrôler le layout d’un objet ou de choisir quels membres d’un objet doivent être regroupés ensemble. Il ne résout pas non plus les problèmes d’indirection dus aux références pour les objets. Mais la pression monte doucement avec le nombre d’application qui passe en off-the-heap ou en mmap lorsque nécessaire.

Enfin le false sharing n'est pas le seul problème liés aux cauches. Un autre exemple. Et bien entendu exploiter les caches correctement a un impact certain sur les performances d'une application.

# Run progress: 0.00% complete, ETA 00:07:21
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedArray
# Warmup Iteration 1: 66.198 ns/op
# Warmup Iteration 2: 65.109 ns/op
# Warmup Iteration 3: 66.666 ns/op
Iteration 1: 65.355 ns/op
"counter1": 37.084 ns/op (1 threads)
"counter2": 87.609 ns/op (3 threads)
Iteration 2: 65.101 ns/op
"counter1": 37.170 ns/op (1 threads)
"counter2": 86.876 ns/op (3 threads)
Iteration 3: 66.439 ns/op
"counter1": 38.518 ns/op (1 threads)
"counter2": 87.636 ns/op (3 threads)
Iteration 4: 65.368 ns/op
"counter1": 37.016 ns/op (1 threads)
"counter2": 87.778 ns/op (3 threads)
Iteration 5: 65.142 ns/op
"counter1": 37.263 ns/op (1 threads)
"counter2": 86.788 ns/op (3 threads)
Iteration 6: 65.411 ns/op
"counter1": 36.957 ns/op (1 threads)
"counter2": 87.994 ns/op (3 threads)
Result : 65.469 ±(99.9%) 1.379 ns/op
Statistics: (min, avg, max) = (65.101, 65.469, 66.439), stdev = 0.492
Confidence interval (99.9%): [64.090, 66.849]
Result "counter1": 37.335 ±(99.9%) 1.654 ns/op
Statistics: (min, avg, max) = (36.957, 37.335, 38.518), stdev = 0.590
Confidence interval (99.9%): [35.680, 38.989]
Result "counter2": 87.447 ±(99.9%) 1.391 ns/op
Statistics: (min, avg, max) = (86.788, 87.447, 87.994), stdev = 0.496
Confidence interval (99.9%): [86.055, 88.838]
# Run progress: 4.76% complete, ETA 00:07:08
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedArray
# Warmup Iteration 1: 65.009 ns/op
# Warmup Iteration 2: 62.378 ns/op
# Warmup Iteration 3: 62.638 ns/op
Iteration 1: 62.962 ns/op
"counter1": 37.374 ns/op (1 threads)
"counter2": 81.572 ns/op (3 threads)
Iteration 2: 64.111 ns/op
"counter1": 36.886 ns/op (1 threads)
"counter2": 85.056 ns/op (3 threads)
Iteration 3: 62.097 ns/op
"counter1": 36.704 ns/op (1 threads)
"counter2": 80.750 ns/op (3 threads)
Iteration 4: 62.656 ns/op
"counter1": 36.699 ns/op (1 threads)
"counter2": 81.964 ns/op (3 threads)
Iteration 5: 63.910 ns/op
"counter1": 36.898 ns/op (1 threads)
"counter2": 84.540 ns/op (3 threads)
Iteration 6: 62.072 ns/op
"counter1": 36.779 ns/op (1 threads)
"counter2": 80.534 ns/op (3 threads)
Result : 62.968 ±(99.9%) 2.461 ns/op
Statistics: (min, avg, max) = (62.072, 62.968, 64.111), stdev = 0.878
Confidence interval (99.9%): [60.507, 65.429]
Result "counter1": 36.890 ±(99.9%) 0.706 ns/op
Statistics: (min, avg, max) = (36.699, 36.890, 37.374), stdev = 0.252
Confidence interval (99.9%): [36.184, 37.596]
Result "counter2": 82.403 ±(99.9%) 5.425 ns/op
Statistics: (min, avg, max) = (80.534, 82.403, 85.056), stdev = 1.935
Confidence interval (99.9%): [76.978, 87.828]
# Run progress: 9.52% complete, ETA 00:06:46
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedArray
# Warmup Iteration 1: 63.710 ns/op
# Warmup Iteration 2: 61.207 ns/op
# Warmup Iteration 3: 63.087 ns/op
Iteration 1: 62.778 ns/op
"counter1": 35.747 ns/op (1 threads)
"counter2": 83.925 ns/op (3 threads)
Iteration 2: 62.103 ns/op
"counter1": 35.100 ns/op (1 threads)
"counter2": 83.524 ns/op (3 threads)
Iteration 3: 62.277 ns/op
"counter1": 36.042 ns/op (1 threads)
"counter2": 82.106 ns/op (3 threads)
Iteration 4: 64.119 ns/op
"counter1": 34.926 ns/op (1 threads)
"counter2": 88.883 ns/op (3 threads)
Iteration 5: 63.038 ns/op
"counter1": 35.116 ns/op (1 threads)
"counter2": 85.669 ns/op (3 threads)
Iteration 6: 62.322 ns/op
"counter1": 35.093 ns/op (1 threads)
"counter2": 84.064 ns/op (3 threads)
Result : 62.773 ±(99.9%) 2.091 ns/op
Statistics: (min, avg, max) = (62.103, 62.773, 64.119), stdev = 0.746
Confidence interval (99.9%): [60.682, 64.863]
Result "counter1": 35.337 ±(99.9%) 1.253 ns/op
Statistics: (min, avg, max) = (34.926, 35.337, 36.042), stdev = 0.447
Confidence interval (99.9%): [34.084, 36.591]
Result "counter2": 84.695 ±(99.9%) 6.583 ns/op
Statistics: (min, avg, max) = (82.106, 84.695, 88.883), stdev = 2.348
Confidence interval (99.9%): [78.112, 91.278]
# Run progress: 14.29% complete, ETA 00:06:24
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedInstances
# Warmup Iteration 1: 21.584 ns/op
# Warmup Iteration 2: 21.378 ns/op
# Warmup Iteration 3: 21.197 ns/op
Iteration 1: 21.402 ns/op
"counter1": 6.853 ns/op (1 threads)
"counter2": 73.316 ns/op (3 threads)
Iteration 2: 21.137 ns/op
"counter1": 6.789 ns/op (1 threads)
"counter2": 71.763 ns/op (3 threads)
Iteration 3: 21.215 ns/op
"counter1": 6.791 ns/op (1 threads)
"counter2": 72.641 ns/op (3 threads)
Iteration 4: 21.586 ns/op
"counter1": 6.930 ns/op (1 threads)
"counter2": 73.144 ns/op (3 threads)
Iteration 5: 21.406 ns/op
"counter1": 6.820 ns/op (1 threads)
"counter2": 74.556 ns/op (3 threads)
Iteration 6: 21.162 ns/op
"counter1": 6.789 ns/op (1 threads)
"counter2": 71.926 ns/op (3 threads)
Result : 21.318 ±(99.9%) 0.492 ns/op
Statistics: (min, avg, max) = (21.137, 21.318, 21.586), stdev = 0.176
Confidence interval (99.9%): [20.826, 21.810]
Result "counter1": 6.829 ±(99.9%) 0.156 ns/op
Statistics: (min, avg, max) = (6.789, 6.829, 6.930), stdev = 0.056
Confidence interval (99.9%): [6.672, 6.985]
Result "counter2": 72.891 ±(99.9%) 2.882 ns/op
Statistics: (min, avg, max) = (71.763, 72.891, 74.556), stdev = 1.028
Confidence interval (99.9%): [70.009, 75.773]
# Run progress: 19.05% complete, ETA 00:06:03
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedInstances
# Warmup Iteration 1: 20.882 ns/op
# Warmup Iteration 2: 20.552 ns/op
# Warmup Iteration 3: 20.479 ns/op
Iteration 1: 20.245 ns/op
"counter1": 6.878 ns/op (1 threads)
"counter2": 57.515 ns/op (3 threads)
Iteration 2: 20.137 ns/op
"counter1": 6.798 ns/op (1 threads)
"counter2": 58.230 ns/op (3 threads)
Iteration 3: 20.121 ns/op
"counter1": 6.789 ns/op (1 threads)
"counter2": 58.288 ns/op (3 threads)
Iteration 4: 19.851 ns/op
"counter1": 6.796 ns/op (1 threads)
"counter2": 55.339 ns/op (3 threads)
Iteration 5: 20.210 ns/op
"counter1": 6.861 ns/op (1 threads)
"counter2": 57.506 ns/op (3 threads)
Iteration 6: 20.178 ns/op
"counter1": 6.794 ns/op (1 threads)
"counter2": 58.757 ns/op (3 threads)
Result : 20.123 ±(99.9%) 0.395 ns/op
Statistics: (min, avg, max) = (19.851, 20.123, 20.245), stdev = 0.141
Confidence interval (99.9%): [19.728, 20.519]
Result "counter1": 6.819 ±(99.9%) 0.110 ns/op
Statistics: (min, avg, max) = (6.789, 6.819, 6.878), stdev = 0.039
Confidence interval (99.9%): [6.709, 6.929]
Result "counter2": 57.606 ±(99.9%) 3.396 ns/op
Statistics: (min, avg, max) = (55.339, 57.606, 58.757), stdev = 1.211
Confidence interval (99.9%): [54.209, 61.002]
# Run progress: 23.81% complete, ETA 00:05:41
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contendedInstances
# Warmup Iteration 1: 20.246 ns/op
# Warmup Iteration 2: 20.079 ns/op
# Warmup Iteration 3: 20.388 ns/op
Iteration 1: 20.104 ns/op
"counter1": 6.878 ns/op (1 threads)
"counter2": 56.066 ns/op (3 threads)
Iteration 2: 19.931 ns/op
"counter1": 6.796 ns/op (1 threads)
"counter2": 56.027 ns/op (3 threads)
Iteration 3: 20.066 ns/op
"counter1": 6.805 ns/op (1 threads)
"counter2": 57.322 ns/op (3 threads)
Iteration 4: 20.121 ns/op
"counter1": 6.808 ns/op (1 threads)
"counter2": 57.798 ns/op (3 threads)
Iteration 5: 19.939 ns/op
"counter1": 6.824 ns/op (1 threads)
"counter2": 55.496 ns/op (3 threads)
Iteration 6: 20.124 ns/op
"counter1": 6.816 ns/op (1 threads)
"counter2": 57.689 ns/op (3 threads)
Result : 20.048 ±(99.9%) 0.251 ns/op
Statistics: (min, avg, max) = (19.931, 20.048, 20.124), stdev = 0.090
Confidence interval (99.9%): [19.796, 20.299]
Result "counter1": 6.821 ±(99.9%) 0.082 ns/op
Statistics: (min, avg, max) = (6.796, 6.821, 6.878), stdev = 0.029
Confidence interval (99.9%): [6.739, 6.903]
Result "counter2": 56.733 ±(99.9%) 2.767 ns/op
Statistics: (min, avg, max) = (55.496, 56.733, 57.798), stdev = 0.987
Confidence interval (99.9%): [53.966, 59.500]
# Run progress: 28.57% complete, ETA 00:05:20
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contented
# Warmup Iteration 1: 2.951 ns/op
# Warmup Iteration 2: 2.853 ns/op
# Warmup Iteration 3: 2.872 ns/op
Iteration 1: 2.963 ns/op
"reader": 0.763 ns/op (1 threads)
"writer": 70.431 ns/op (3 threads)
Iteration 2: 2.866 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 71.104 ns/op (3 threads)
Iteration 3: 2.865 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 68.682 ns/op (3 threads)
Iteration 4: 2.862 ns/op
"reader": 0.738 ns/op (1 threads)
"writer": 68.694 ns/op (3 threads)
Iteration 5: 2.862 ns/op
"reader": 0.738 ns/op (1 threads)
"writer": 68.679 ns/op (3 threads)
Iteration 6: 2.863 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 68.403 ns/op (3 threads)
Result : 2.880 ±(99.9%) 0.113 ns/op
Statistics: (min, avg, max) = (2.862, 2.880, 2.963), stdev = 0.040
Confidence interval (99.9%): [2.767, 2.994]
Result "reader": 0.743 ±(99.9%) 0.028 ns/op
Statistics: (min, avg, max) = (0.738, 0.743, 0.763), stdev = 0.010
Confidence interval (99.9%): [0.715, 0.771]
Result "writer": 69.332 ±(99.9%) 3.189 ns/op
Statistics: (min, avg, max) = (68.403, 69.332, 71.104), stdev = 1.137
Confidence interval (99.9%): [66.143, 72.521]
# Run progress: 33.33% complete, ETA 00:04:59
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contented
# Warmup Iteration 1: 3.022 ns/op
# Warmup Iteration 2: 2.884 ns/op
# Warmup Iteration 3: 2.932 ns/op
Iteration 1: 2.870 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 76.293 ns/op (3 threads)
Iteration 2: 2.890 ns/op
"reader": 0.747 ns/op (1 threads)
"writer": 67.411 ns/op (3 threads)
Iteration 3: 2.868 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 71.179 ns/op (3 threads)
Iteration 4: 2.878 ns/op
"reader": 0.740 ns/op (1 threads)
"writer": 77.780 ns/op (3 threads)
Iteration 5: 2.874 ns/op
"reader": 0.739 ns/op (1 threads)
"writer": 77.804 ns/op (3 threads)
Iteration 6: 2.858 ns/op
"reader": 0.738 ns/op (1 threads)
"writer": 67.847 ns/op (3 threads)
Result : 2.873 ±(99.9%) 0.030 ns/op
Statistics: (min, avg, max) = (2.858, 2.873, 2.890), stdev = 0.011
Confidence interval (99.9%): [2.843, 2.903]
Result "reader": 0.740 ±(99.9%) 0.009 ns/op
Statistics: (min, avg, max) = (0.738, 0.740, 0.747), stdev = 0.003
Confidence interval (99.9%): [0.731, 0.749]
Result "writer": 73.053 ±(99.9%) 13.615 ns/op
Statistics: (min, avg, max) = (67.411, 73.053, 77.804), stdev = 4.855
Confidence interval (99.9%): [59.438, 86.667]
# Run progress: 38.10% complete, ETA 00:04:37
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.contented
# Warmup Iteration 1: 2.914 ns/op
# Warmup Iteration 2: 2.868 ns/op
# Warmup Iteration 3: 2.941 ns/op
Iteration 1: 2.880 ns/op
"reader": 0.744 ns/op (1 threads)
"writer": 68.218 ns/op (3 threads)
Iteration 2: 2.874 ns/op
"reader": 0.741 ns/op (1 threads)
"writer": 66.589 ns/op (3 threads)
Iteration 3: 2.939 ns/op
"reader": 0.757 ns/op (1 threads)
"writer": 69.042 ns/op (3 threads)
Iteration 4: 2.858 ns/op
"reader": 0.738 ns/op (1 threads)
"writer": 67.866 ns/op (3 threads)
Iteration 5: 2.876 ns/op
"reader": 0.740 ns/op (1 threads)
"writer": 77.468 ns/op (3 threads)
Iteration 6: 2.889 ns/op
"reader": 0.745 ns/op (1 threads)
"writer": 71.107 ns/op (3 threads)
Result : 2.886 ±(99.9%) 0.078 ns/op
Statistics: (min, avg, max) = (2.858, 2.886, 2.939), stdev = 0.028
Confidence interval (99.9%): [2.808, 2.965]
Result "reader": 0.744 ±(99.9%) 0.019 ns/op
Statistics: (min, avg, max) = (0.738, 0.744, 0.757), stdev = 0.007
Confidence interval (99.9%): [0.725, 0.763]
Result "writer": 70.048 ±(99.9%) 11.024 ns/op
Statistics: (min, avg, max) = (66.589, 70.048, 77.468), stdev = 3.931
Confidence interval (99.9%): [59.024, 81.072]
# Run progress: 42.86% complete, ETA 00:04:16
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedArray
# Warmup Iteration 1: 118.383 ns/op
# Warmup Iteration 2: 116.566 ns/op
# Warmup Iteration 3: 116.832 ns/op
Iteration 1: 125.995 ns/op
"counter1": 119.291 ns/op (1 threads)
"counter2": 128.402 ns/op (3 threads)
Iteration 2: 116.918 ns/op
"counter1": 116.748 ns/op (1 threads)
"counter2": 116.974 ns/op (3 threads)
Iteration 3: 116.768 ns/op
"counter1": 116.555 ns/op (1 threads)
"counter2": 116.839 ns/op (3 threads)
Iteration 4: 116.557 ns/op
"counter1": 113.596 ns/op (1 threads)
"counter2": 117.577 ns/op (3 threads)
Iteration 5: 116.969 ns/op
"counter1": 114.996 ns/op (1 threads)
"counter2": 117.642 ns/op (3 threads)
Iteration 6: 116.680 ns/op
"counter1": 120.073 ns/op (1 threads)
"counter2": 115.592 ns/op (3 threads)
Result : 118.315 ±(99.9%) 10.560 ns/op
Statistics: (min, avg, max) = (116.557, 118.315, 125.995), stdev = 3.766
Confidence interval (99.9%): [107.755, 128.875]
Result "counter1": 116.876 ±(99.9%) 6.921 ns/op
Statistics: (min, avg, max) = (113.596, 116.876, 120.073), stdev = 2.468
Confidence interval (99.9%): [109.955, 123.798]
Result "counter2": 118.838 ±(99.9%) 13.301 ns/op
Statistics: (min, avg, max) = (115.592, 118.838, 128.402), stdev = 4.743
Confidence interval (99.9%): [105.537, 132.139]
# Run progress: 47.62% complete, ETA 00:03:54
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedArray
# Warmup Iteration 1: 106.344 ns/op
# Warmup Iteration 2: 105.518 ns/op
# Warmup Iteration 3: 105.477 ns/op
Iteration 1: 105.587 ns/op
"counter1": 105.169 ns/op (1 threads)
"counter2": 105.727 ns/op (3 threads)
Iteration 2: 105.541 ns/op
"counter1": 102.439 ns/op (1 threads)
"counter2": 106.617 ns/op (3 threads)
Iteration 3: 106.169 ns/op
"counter1": 106.758 ns/op (1 threads)
"counter2": 105.974 ns/op (3 threads)
Iteration 4: 106.096 ns/op
"counter1": 106.582 ns/op (1 threads)
"counter2": 105.935 ns/op (3 threads)
Iteration 5: 106.516 ns/op
"counter1": 112.265 ns/op (1 threads)
"counter2": 104.729 ns/op (3 threads)
Iteration 6: 105.553 ns/op
"counter1": 101.975 ns/op (1 threads)
"counter2": 106.802 ns/op (3 threads)
Result : 105.910 ±(99.9%) 1.147 ns/op
Statistics: (min, avg, max) = (105.541, 105.910, 106.516), stdev = 0.409
Confidence interval (99.9%): [104.763, 107.058]
Result "counter1": 105.865 ±(99.9%) 10.465 ns/op
Statistics: (min, avg, max) = (101.975, 105.865, 112.265), stdev = 3.732
Confidence interval (99.9%): [95.400, 116.329]
Result "counter2": 105.964 ±(99.9%) 2.065 ns/op
Statistics: (min, avg, max) = (104.729, 105.964, 106.802), stdev = 0.736
Confidence interval (99.9%): [103.899, 108.029]
# Run progress: 52.38% complete, ETA 00:03:33
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedArray
# Warmup Iteration 1: 108.675 ns/op
# Warmup Iteration 2: 107.597 ns/op
# Warmup Iteration 3: 107.649 ns/op
Iteration 1: 107.634 ns/op
"counter1": 106.037 ns/op (1 threads)
"counter2": 108.177 ns/op (3 threads)
Iteration 2: 107.643 ns/op
"counter1": 105.981 ns/op (1 threads)
"counter2": 108.208 ns/op (3 threads)
Iteration 3: 107.454 ns/op
"counter1": 105.620 ns/op (1 threads)
"counter2": 108.078 ns/op (3 threads)
Iteration 4: 107.515 ns/op
"counter1": 112.283 ns/op (1 threads)
"counter2": 106.015 ns/op (3 threads)
Iteration 5: 107.819 ns/op
"counter1": 101.063 ns/op (1 threads)
"counter2": 110.279 ns/op (3 threads)
Iteration 6: 107.561 ns/op
"counter1": 112.180 ns/op (1 threads)
"counter2": 106.105 ns/op (3 threads)
Result : 107.604 ±(99.9%) 0.357 ns/op
Statistics: (min, avg, max) = (107.454, 107.604, 107.819), stdev = 0.127
Confidence interval (99.9%): [107.248, 107.961]
Result "counter1": 107.194 ±(99.9%) 12.135 ns/op
Statistics: (min, avg, max) = (101.063, 107.194, 112.283), stdev = 4.327
Confidence interval (99.9%): [95.059, 119.329]
Result "counter2": 107.810 ±(99.9%) 4.450 ns/op
Statistics: (min, avg, max) = (106.015, 107.810, 110.279), stdev = 1.587
Confidence interval (99.9%): [103.360, 112.260]
# Run progress: 57.14% complete, ETA 00:03:12
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedInstances
# Warmup Iteration 1: 101.067 ns/op
# Warmup Iteration 2: 100.416 ns/op
# Warmup Iteration 3: 99.306 ns/op
Iteration 1: 100.594 ns/op
"counter1": 101.676 ns/op (1 threads)
"counter2": 100.239 ns/op (3 threads)
Iteration 2: 100.410 ns/op
"counter1": 115.009 ns/op (1 threads)
"counter2": 96.334 ns/op (3 threads)
Iteration 3: 100.411 ns/op
"counter1": 117.612 ns/op (1 threads)
"counter2": 95.743 ns/op (3 threads)
Iteration 4: 100.345 ns/op
"counter1": 102.396 ns/op (1 threads)
"counter2": 99.679 ns/op (3 threads)
Iteration 5: 100.491 ns/op
"counter1": 99.982 ns/op (1 threads)
"counter2": 100.662 ns/op (3 threads)
Iteration 6: 99.977 ns/op
"counter1": 118.139 ns/op (1 threads)
"counter2": 95.103 ns/op (3 threads)
Result : 100.371 ±(99.9%) 0.593 ns/op
Statistics: (min, avg, max) = (99.977, 100.371, 100.594), stdev = 0.211
Confidence interval (99.9%): [99.778, 100.964]
Result "counter1": 109.136 ±(99.9%) 24.196 ns/op
Statistics: (min, avg, max) = (99.982, 109.136, 118.139), stdev = 8.629
Confidence interval (99.9%): [84.939, 133.332]
Result "counter2": 97.960 ±(99.9%) 7.001 ns/op
Statistics: (min, avg, max) = (95.103, 97.960, 100.662), stdev = 2.497
Confidence interval (99.9%): [90.959, 104.961]
# Run progress: 61.90% complete, ETA 00:02:50
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedInstances
# Warmup Iteration 1: 75.313 ns/op
# Warmup Iteration 2: 74.480 ns/op
# Warmup Iteration 3: 74.365 ns/op
Iteration 1: 74.325 ns/op
"counter1": 74.394 ns/op (1 threads)
"counter2": 74.303 ns/op (3 threads)
Iteration 2: 74.370 ns/op
"counter1": 75.034 ns/op (1 threads)
"counter2": 74.151 ns/op (3 threads)
Iteration 3: 74.300 ns/op
"counter1": 75.412 ns/op (1 threads)
"counter2": 73.936 ns/op (3 threads)
Iteration 4: 74.367 ns/op
"counter1": 74.235 ns/op (1 threads)
"counter2": 74.410 ns/op (3 threads)
Iteration 5: 74.413 ns/op
"counter1": 74.797 ns/op (1 threads)
"counter2": 74.286 ns/op (3 threads)
Iteration 6: 73.585 ns/op
"counter1": 73.411 ns/op (1 threads)
"counter2": 73.644 ns/op (3 threads)
Result : 74.227 ±(99.9%) 0.888 ns/op
Statistics: (min, avg, max) = (73.585, 74.227, 74.413), stdev = 0.317
Confidence interval (99.9%): [73.339, 75.114]
Result "counter1": 74.547 ±(99.9%) 1.967 ns/op
Statistics: (min, avg, max) = (73.411, 74.547, 75.412), stdev = 0.701
Confidence interval (99.9%): [72.581, 76.514]
Result "counter2": 74.121 ±(99.9%) 0.800 ns/op
Statistics: (min, avg, max) = (73.644, 74.121, 74.410), stdev = 0.285
Confidence interval (99.9%): [73.322, 74.921]
# Run progress: 66.67% complete, ETA 00:02:29
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharedInstances
# Warmup Iteration 1: 75.246 ns/op
# Warmup Iteration 2: 74.455 ns/op
# Warmup Iteration 3: 74.802 ns/op
Iteration 1: 75.126 ns/op
"counter1": 76.063 ns/op (1 threads)
"counter2": 74.818 ns/op (3 threads)
Iteration 2: 75.473 ns/op
"counter1": 77.517 ns/op (1 threads)
"counter2": 74.817 ns/op (3 threads)
Iteration 3: 75.532 ns/op
"counter1": 78.758 ns/op (1 threads)
"counter2": 74.513 ns/op (3 threads)
Iteration 4: 75.784 ns/op
"counter1": 78.480 ns/op (1 threads)
"counter2": 74.929 ns/op (3 threads)
Iteration 5: 75.620 ns/op
"counter1": 75.286 ns/op (1 threads)
"counter2": 75.732 ns/op (3 threads)
Iteration 6: 75.597 ns/op
"counter1": 78.933 ns/op (1 threads)
"counter2": 74.547 ns/op (3 threads)
Result : 75.522 ±(99.9%) 0.619 ns/op
Statistics: (min, avg, max) = (75.126, 75.522, 75.784), stdev = 0.221
Confidence interval (99.9%): [74.903, 76.141]
Result "counter1": 77.506 ±(99.9%) 4.265 ns/op
Statistics: (min, avg, max) = (75.286, 77.506, 78.933), stdev = 1.521
Confidence interval (99.9%): [73.241, 81.771]
Result "counter2": 74.893 ±(99.9%) 1.242 ns/op
Statistics: (min, avg, max) = (74.513, 74.893, 75.732), stdev = 0.443
Confidence interval (99.9%): [73.651, 76.135]
# Run progress: 71.43% complete, ETA 00:02:08
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharing
# Warmup Iteration 1: 17.195 ns/op
# Warmup Iteration 2: 19.261 ns/op
# Warmup Iteration 3: 14.899 ns/op
Iteration 1: 14.806 ns/op
"reader": 4.279 ns/op (1 threads)
"writer": 80.668 ns/op (3 threads)
Iteration 2: 18.174 ns/op
"reader": 5.501 ns/op (1 threads)
"writer": 77.400 ns/op (3 threads)
Iteration 3: 15.512 ns/op
"reader": 4.572 ns/op (1 threads)
"writer": 76.903 ns/op (3 threads)
Iteration 4: 19.538 ns/op
"reader": 6.045 ns/op (1 threads)
"writer": 76.334 ns/op (3 threads)
Iteration 5: 18.079 ns/op
"reader": 5.478 ns/op (1 threads)
"writer": 77.524 ns/op (3 threads)
Iteration 6: 14.766 ns/op
"reader": 4.279 ns/op (1 threads)
"writer": 80.737 ns/op (3 threads)
Result : 16.813 ±(99.9%) 5.719 ns/op
Statistics: (min, avg, max) = (14.766, 16.813, 19.538), stdev = 2.039
Confidence interval (99.9%): [11.094, 22.531]
Result "reader": 5.026 ±(99.9%) 2.095 ns/op
Statistics: (min, avg, max) = (4.279, 5.026, 6.045), stdev = 0.747
Confidence interval (99.9%): [2.930, 7.121]
Result "writer": 78.261 ±(99.9%) 5.432 ns/op
Statistics: (min, avg, max) = (76.334, 78.261, 80.737), stdev = 1.937
Confidence interval (99.9%): [72.829, 83.693]
# Run progress: 76.19% complete, ETA 00:01:46
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharing
# Warmup Iteration 1: 18.159 ns/op
# Warmup Iteration 2: 15.102 ns/op
# Warmup Iteration 3: 16.262 ns/op
Iteration 1: 14.641 ns/op
"reader": 4.256 ns/op (1 threads)
"writer": 78.610 ns/op (3 threads)
Iteration 2: 15.650 ns/op
"reader": 4.569 ns/op (1 threads)
"writer": 81.649 ns/op (3 threads)
Iteration 3: 18.076 ns/op
"reader": 5.421 ns/op (1 threads)
"writer": 80.867 ns/op (3 threads)
Iteration 4: 15.257 ns/op
"reader": 4.449 ns/op (1 threads)
"writer": 79.109 ns/op (3 threads)
Iteration 5: 20.052 ns/op
"reader": 6.177 ns/op (1 threads)
"writer": 79.839 ns/op (3 threads)
Iteration 6: 18.212 ns/op
"reader": 5.405 ns/op (1 threads)
"writer": 86.682 ns/op (3 threads)
Result : 16.981 ±(99.9%) 5.931 ns/op
Statistics: (min, avg, max) = (14.641, 16.981, 20.052), stdev = 2.115
Confidence interval (99.9%): [11.051, 22.912]
Result "reader": 5.046 ±(99.9%) 2.081 ns/op
Statistics: (min, avg, max) = (4.256, 5.046, 6.177), stdev = 0.742
Confidence interval (99.9%): [2.965, 7.127]
Result "writer": 81.126 ±(99.9%) 8.248 ns/op
Statistics: (min, avg, max) = (78.610, 81.126, 86.682), stdev = 2.941
Confidence interval (99.9%): [72.878, 89.374]
# Run progress: 80.95% complete, ETA 00:01:25
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.falseSharing
# Warmup Iteration 1: 18.399 ns/op
# Warmup Iteration 2: 18.727 ns/op
# Warmup Iteration 3: 15.181 ns/op
Iteration 1: 17.223 ns/op
"reader": 5.270 ns/op (1 threads)
"writer": 70.710 ns/op (3 threads)
Iteration 2: 16.348 ns/op
"reader": 4.785 ns/op (1 threads)
"writer": 84.062 ns/op (3 threads)
Iteration 3: 15.134 ns/op
"reader": 4.429 ns/op (1 threads)
"writer": 78.036 ns/op (3 threads)
Iteration 4: 15.131 ns/op
"reader": 4.427 ns/op (1 threads)
"writer": 77.993 ns/op (3 threads)
Iteration 5: 19.225 ns/op
"reader": 5.989 ns/op (1 threads)
"writer": 72.953 ns/op (3 threads)
Iteration 6: 17.595 ns/op
"reader": 5.362 ns/op (1 threads)
"writer": 73.468 ns/op (3 threads)
Result : 16.776 ±(99.9%) 4.426 ns/op
Statistics: (min, avg, max) = (15.131, 16.776, 19.225), stdev = 1.578
Confidence interval (99.9%): [12.350, 21.202]
Result "reader": 5.044 ±(99.9%) 1.716 ns/op
Statistics: (min, avg, max) = (4.427, 5.044, 5.989), stdev = 0.612
Confidence interval (99.9%): [3.328, 6.759]
Result "writer": 76.204 ±(99.9%) 13.539 ns/op
Statistics: (min, avg, max) = (70.710, 76.204, 84.062), stdev = 4.828
Confidence interval (99.9%): [62.665, 89.743]
# Run progress: 85.71% complete, ETA 00:01:04
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 1 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.noFalseSharing
# Warmup Iteration 1: 0.548 ns/op
# Warmup Iteration 2: 0.732 ns/op
# Warmup Iteration 3: 0.575 ns/op
Iteration 1: 0.543 ns/op
"noOp": 0.298 ns/op (1 threads)
"reader": 0.749 ns/op (3 threads)
Iteration 2: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Iteration 3: 0.541 ns/op
"noOp": 0.299 ns/op (1 threads)
"reader": 0.741 ns/op (3 threads)
Iteration 4: 0.539 ns/op
"noOp": 0.297 ns/op (1 threads)
"reader": 0.741 ns/op (3 threads)
Iteration 5: 0.547 ns/op
"noOp": 0.305 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Iteration 6: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Result : 0.541 ±(99.9%) 0.009 ns/op
Statistics: (min, avg, max) = (0.539, 0.541, 0.547), stdev = 0.003
Confidence interval (99.9%): [0.532, 0.550]
Result "noOp": 0.298 ±(99.9%) 0.010 ns/op
Statistics: (min, avg, max) = (0.296, 0.298, 0.305), stdev = 0.003
Confidence interval (99.9%): [0.289, 0.308]
Result "reader": 0.743 ±(99.9%) 0.009 ns/op
Statistics: (min, avg, max) = (0.741, 0.743, 0.749), stdev = 0.003
Confidence interval (99.9%): [0.734, 0.752]
# Run progress: 90.48% complete, ETA 00:00:42
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 2 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.noFalseSharing
# Warmup Iteration 1: 0.711 ns/op
# Warmup Iteration 2: 0.542 ns/op
# Warmup Iteration 3: 0.540 ns/op
Iteration 1: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.741 ns/op (3 threads)
Iteration 2: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.741 ns/op (3 threads)
Iteration 3: 0.541 ns/op
"noOp": 0.299 ns/op (1 threads)
"reader": 0.740 ns/op (3 threads)
Iteration 4: 0.542 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.751 ns/op (3 threads)
Iteration 5: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Iteration 6: 0.539 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.741 ns/op (3 threads)
Result : 0.540 ±(99.9%) 0.004 ns/op
Statistics: (min, avg, max) = (0.539, 0.540, 0.542), stdev = 0.001
Confidence interval (99.9%): [0.536, 0.543]
Result "noOp": 0.296 ±(99.9%) 0.004 ns/op
Statistics: (min, avg, max) = (0.296, 0.296, 0.299), stdev = 0.001
Confidence interval (99.9%): [0.293, 0.300]
Result "reader": 0.743 ±(99.9%) 0.011 ns/op
Statistics: (min, avg, max) = (0.740, 0.743, 0.751), stdev = 0.004
Confidence interval (99.9%): [0.732, 0.754]
# Run progress: 95.24% complete, ETA 00:00:21
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.0.x86_64/jre/bin/java
# VM options: -XX:-RestrictContended
# Fork: 3 of 3
# Warmup: 3 iterations, 1 s each
# Measurement: 6 iterations, 3 s each
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: gist.contended.Benchmarks.noFalseSharing
# Warmup Iteration 1: 0.554 ns/op
# Warmup Iteration 2: 0.542 ns/op
# Warmup Iteration 3: 0.540 ns/op
Iteration 1: 0.539 ns/op
"noOp": 0.297 ns/op (1 threads)
"reader": 0.740 ns/op (3 threads)
Iteration 2: 0.538 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Iteration 3: 0.541 ns/op
"noOp": 0.295 ns/op (1 threads)
"reader": 0.749 ns/op (3 threads)
Iteration 4: 0.538 ns/op
"noOp": 0.295 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Iteration 5: 0.539 ns/op
"noOp": 0.295 ns/op (1 threads)
"reader": 0.744 ns/op (3 threads)
Iteration 6: 0.538 ns/op
"noOp": 0.296 ns/op (1 threads)
"reader": 0.742 ns/op (3 threads)
Result : 0.539 ±(99.9%) 0.003 ns/op
Statistics: (min, avg, max) = (0.538, 0.539, 0.541), stdev = 0.001
Confidence interval (99.9%): [0.536, 0.542]
Result "noOp": 0.296 ±(99.9%) 0.002 ns/op
Statistics: (min, avg, max) = (0.295, 0.296, 0.297), stdev = 0.001
Confidence interval (99.9%): [0.293, 0.298]
Result "reader": 0.743 ±(99.9%) 0.009 ns/op
Statistics: (min, avg, max) = (0.740, 0.743, 0.749), stdev = 0.003
Confidence interval (99.9%): [0.734, 0.752]
Benchmark Mode Samples Mean Mean error Units
g.c.Benchmarks.contendedArray avgt 18 63.737 1.341 ns/op
g.c.Benchmarks.contendedArray:counter1 avgt 18 36.521 0.914 ns/op
g.c.Benchmarks.contendedArray:counter2 avgt 18 84.848 2.524 ns/op
g.c.Benchmarks.contendedInstances avgt 18 20.496 0.573 ns/op
g.c.Benchmarks.contendedInstances:counter1 avgt 18 6.823 0.038 ns/op
g.c.Benchmarks.contendedInstances:counter2 avgt 18 62.410 7.198 ns/op
g.c.Benchmarks.contented avgt 18 2.880 0.026 ns/op
g.c.Benchmarks.contented:reader avgt 18 0.742 0.006 ns/op
g.c.Benchmarks.contented:writer avgt 18 70.811 3.572 ns/op
g.c.Benchmarks.falseSharedArray avgt 18 110.610 5.620 ns/op
g.c.Benchmarks.falseSharedArray:counter1 avgt 18 109.978 5.677 ns/op
g.c.Benchmarks.falseSharedArray:counter2 avgt 18 110.871 6.037 ns/op
g.c.Benchmarks.falseSharedInstances avgt 18 83.373 11.572 ns/op
g.c.Benchmarks.falseSharedInstances:counter1 avgt 18 87.063 15.700 ns/op
g.c.Benchmarks.falseSharedInstances:counter2 avgt 18 82.325 10.715 ns/op
g.c.Benchmarks.falseSharing avgt 18 16.857 1.693 ns/op
g.c.Benchmarks.falseSharing:reader avgt 18 5.038 0.617 ns/op
g.c.Benchmarks.falseSharing:writer avgt 18 78.530 3.598 ns/op
g.c.Benchmarks.noFalseSharing avgt 18 0.540 0.002 ns/op
g.c.Benchmarks.noFalseSharing:noOp avgt 18 0.297 0.002 ns/op
g.c.Benchmarks.noFalseSharing:reader avgt 18 0.743 0.003 ns/op
Process finished with exit code 0
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>20140331_java_contented</groupId>
<artifactId>20140331_java_contented</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Auto-generated JMH benchmark</name>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>0.5.5</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>0.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.1</version>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<compilerVersion>1.8</compilerVersion>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>microbenchmarks</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
package gist.contended;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.parameters.TimeValue;
import sun.misc.Contended;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Benchmarks {
/**
* Initial app without false sharing.
*/
@State(Scope.Benchmark)
public static class StateNoFalseSharing {
public int readOnly;
}
@GenerateMicroBenchmark
@Group("noFalseSharing")
public int reader(StateNoFalseSharing s) { return s.readOnly; }
@GenerateMicroBenchmark
@Group("noFalseSharing")
public void noOp(StateNoFalseSharing s) { }
/**
* Introduce False sharing between readOnly and writeOnly.
*
* The compiler will decide of an object layout according to the following criteria:
* - Minimize object footprint
* - Aligned memory access
*
* It is quite sure that the two attributes will be in the same cache line.
*/
@State(Scope.Group)
public static class StateFalseSharing {
int readOnly;
volatile int writeOnly;
}
@GenerateMicroBenchmark
@Group("falseSharing")
public int reader(StateFalseSharing s) {
return s.readOnly;
}
@GenerateMicroBenchmark
@Group("falseSharing")
public int writer(StateFalseSharing s) {
return s.writeOnly++;
}
/**
* Use @Contented to solve the false sharing issue.
*
* Hotspot will add some padding to ensure that writeOnly is in its own cache line.
*/
@State(Scope.Group)
public static class StateContended {
int readOnly;
@Contended volatile int writeOnly;
}
@GenerateMicroBenchmark
@Group("contented")
public int reader(StateContended s) {
return s.readOnly;
}
@GenerateMicroBenchmark
@Group("contented")
public int writer(StateContended s) {
return s.writeOnly++;
}
/**
* Show that false sharing also occurs one instance not only members.
*
* It this case the false sharing does not depend on the object layout. We assume that the two object instances
* will be close together and will share the same cache line. It will most likely happen.
*
* Since the GC is free to move objects in the heap, false sharing can occur when an object is created or move.
*/
@State(Scope.Group)
public static class FalseSharedCountersState {
final AtomicInteger counter1 = new AtomicInteger();
final AtomicInteger counter2 = new AtomicInteger();
}
@GenerateMicroBenchmark
@Group("falseSharedInstances")
public int counter1(FalseSharedCountersState s) {
return s.counter1.incrementAndGet();
}
@GenerateMicroBenchmark
@Group("falseSharedInstances")
public int counter2(FalseSharedCountersState s) {
return s.counter2.incrementAndGet();
}
/**
* Use @Contended on the class.
*
* Hotspot will enforce that each object has its own cache line. Even the GC cannot break this rule.
*/
@Contended
static class Counter extends AtomicInteger{ }
@State(Scope.Group)
public static class ContendedCounterState {
final Counter counter1 = new Counter();
final Counter counter2 = new Counter();
}
@GenerateMicroBenchmark
@Group("contendedInstances")
public int counter1(ContendedCounterState s) {
return s.counter1.incrementAndGet();
}
@GenerateMicroBenchmark
@Group("contendedInstances")
public int counter2(ContendedCounterState s) {
return s.counter2.incrementAndGet();
}
/**
* Same as falseSharedInstances but with an array.
*/
@State(Scope.Group)
public static class FalseSharedCounterArrayState {
final AtomicInteger[] counters;
public FalseSharedCounterArrayState() {
counters = new AtomicInteger[2];
for (int i = 0; i < counters.length; i++) {
counters[i] = new AtomicInteger();
}
}
}
@GenerateMicroBenchmark
@Group("falseSharedArray")
public int counter1(FalseSharedCounterArrayState s) {
return s.counters[0].incrementAndGet();
}
@GenerateMicroBenchmark
@Group("falseSharedArray")
public int counter2(FalseSharedCounterArrayState s) {
return s.counters[1].incrementAndGet();
}
/**
* Same as contendedInstances but with an array.
*/
@State(Scope.Group)
public static class ContendedCounterArrayState {
final Counter[] counters;
public ContendedCounterArrayState() {
counters = new Counter[2];
for (int i = 0; i < counters.length; i++) {
counters[i] = new Counter();
}
}
}
@GenerateMicroBenchmark
@Group("contendedArray")
public int counter1(ContendedCounterArrayState s) {
return s.counters[0].incrementAndGet();
}
@GenerateMicroBenchmark
@Group("contendedArray")
public int counter2(ContendedCounterArrayState s) {
return s.counters[1].incrementAndGet();
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(".*")
.jvmArgs("-XX:-RestrictContended") //, "-XX:CompileCommand=print,*Benchmarks.reader")
.forks(3)
.warmupIterations(3)
.warmupTime(TimeValue.seconds(3))
.measurementIterations(6)
.measurementTime(TimeValue.seconds(3))
.threadGroups(1, 3)
.timeUnit(TimeUnit.NANOSECONDS)
.mode(Mode.SampleTime)
.build();
new Runner(opt).run();
}
}
package gist.contended;
import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.util.VMSupport;
public class Layouts {
public static void main(String[] args) {
System.out.println(VMSupport.vmDetails());
System.out.println(ClassLayout.parseClass(Benchmarks.StateFalseSharing.class).toPrintable());
System.out.println(ClassLayout.parseClass(Benchmarks.StateContended.class).toPrintable());
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment