Skip to content

Instantly share code, notes, and snippets.

@piyush-kurur
Last active February 6, 2017 12:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save piyush-kurur/93955e669ab72a51996590bfc106677d to your computer and use it in GitHub Desktop.
Save piyush-kurur/93955e669ab72a51996590bfc106677d to your computer and use it in GitHub Desktop.
Performance figure of primitives supported by raaz-0.1.0
Enable opt-vectorise and opt-native with cabal. Effectively adds the GCC Flags used -O2 -ftree-vectorize -march=native
Notice the jump in the performance of the portable C implementation.
Buffer Size = 32768
Iterations = 10000
memset
time = 545.5 ns
cycles = 1849.2347
rate = 0.48Tbps
secs/byte = 1.66nsec/byte
cycles/byte = 5.643416442871094e-2
random
time = 37.85 μs
cycles = 128395.1539
rate = 6.92Gbps
secs/byte = 1.15nsec/byte
cycles/byte = 3.9183091400146486
chacha20-cportable-encrypt
time = 18.18 μs
cycles = 61679.9382
rate = 14.41Gbps
secs/byte = 0.55nsec/byte
cycles/byte = 1.8823223327636718
chacha20-vector-256-encrypt
time = 27.60 μs
cycles = 93635.0423
rate = 9.49Gbps
secs/byte = 0.84nsec/byte
cycles/byte = 2.8575147186279297
chacha20-vector-128-encrypt
time = 54.09 μs
cycles = 183468.0962
rate = 4.84Gbps
secs/byte = 1.65nsec/byte
cycles/byte = 5.599001959228516
aes128cbc-cportable-encrypt
time = 179.7 μs
cycles = 609669.0943
rate = 1.45Gbps
secs/byte = 5.48nsec/byte
cycles/byte = 18.605624215698242
aes128cbc-cportable-decrypt
time = 255.5 μs
cycles = 866588.5603
rate = 1.02Gbps
secs/byte = 7.79nsec/byte
cycles/byte = 26.446184091186524
aes192cbc-cportable-encrypt
time = 214.6 μs
cycles = 728057.4486
rate = 1.22Gbps
secs/byte = 6.54nsec/byte
cycles/byte = 22.218550067138672
aes192cbc-cportable-decrypt
time = 306.9 μs
cycles = 1041169.4642
rate = 0.85Gbps
secs/byte = 9.36nsec/byte
cycles/byte = 31.773970465087892
aes256cbc-cportable-encrypt
time = 249.2 μs
cycles = 845231.499
rate = 1.05Gbps
secs/byte = 7.60nsec/byte
cycles/byte = 25.794418304443358
aes256cbc-cportable-decrypt
time = 358.5 μs
cycles = 1215931.5501
rate = 0.73Gbps
secs/byte = 10.93nsec/byte
cycles/byte = 37.107286074829105
sha1-cportable-compress
time = 47.41 μs
cycles = 160813.0104
rate = 5.52Gbps
secs/byte = 1.44nsec/byte
cycles/byte = 4.9076236083984375
sha256-cportable-compress
time = 107.2 μs
cycles = 363558.6518
rate = 2.44Gbps
secs/byte = 3.27nsec/byte
cycles/byte = 11.094929559326172
sha512-cportable-compress
time = 66.92 μs
cycles = 227009.765
rate = 3.91Gbps
secs/byte = 2.04nsec/byte
cycles/byte = 6.927788238525391
Note:
1. Performance is for raw buffer operations. Will incure overheads of copying and alignment for bytestrings.
2. Rate is measured in Giga (1000^3 not 1024^3) bits (not bytes) per sec.
Buffer Size = 32768
Iterations = 10000
memset
time = 1.171 μs
cycles = 3971.7088
rate = 0.22Tbps
secs/byte = 3.57nsec/byte
cycles/byte = 0.12120693359375
random
time = 48.09 μs
cycles = 163124.9825
rate = 5.45Gbps
secs/byte = 1.46nsec/byte
cycles/byte = 4.978179397583008
chacha20-cportable-encrypt
time = 52.64 μs
cycles = 178556.0558
rate = 4.98Gbps
secs/byte = 1.60nsec/byte
cycles/byte = 5.449098382568359
chacha20-vector-256-encrypt
time = 27.43 μs
cycles = 93043.5375
rate = 9.55Gbps
secs/byte = 0.83nsec/byte
cycles/byte = 2.8394634246826174
chacha20-vector-128-encrypt
time = 53.73 μs
cycles = 182262.4298
rate = 4.87Gbps
secs/byte = 1.63nsec/byte
cycles/byte = 5.5622079406738285
aes128cbc-cportable-encrypt
time = 176.8 μs
cycles = 599657.1242
rate = 1.48Gbps
secs/byte = 5.39nsec/byte
cycles/byte = 18.300083135986327
aes128cbc-cportable-decrypt
time = 256.0 μs
cycles = 868286.8795
rate = 1.02Gbps
secs/byte = 7.81nsec/byte
cycles/byte = 26.498012680053712
aes192cbc-cportable-encrypt
time = 209.5 μs
cycles = 710800.9001
rate = 1.25Gbps
secs/byte = 6.39nsec/byte
cycles/byte = 21.69192200012207
aes192cbc-cportable-decrypt
time = 308.4 μs
cycles = 1046292.8873
rate = 0.84Gbps
secs/byte = 9.41nsec/byte
cycles/byte = 31.93032492980957
aes256cbc-cportable-encrypt
time = 245.0 μs
cycles = 831212.5922
rate = 1.06Gbps
secs/byte = 7.47nsec/byte
cycles/byte = 25.366595220947264
aes256cbc-cportable-decrypt
time = 358.8 μs
cycles = 1217173.9974
rate = 0.73Gbps
secs/byte = 10.95nsec/byte
cycles/byte = 37.14520255737305
sha1-cportable-compress
time = 58.88 μs
cycles = 199738.0163
rate = 4.45Gbps
secs/byte = 1.79nsec/byte
cycles/byte = 6.095520516967773
sha256-cportable-compress
time = 141.0 μs
cycles = 478336.6424
rate = 1.85Gbps
secs/byte = 4.30nsec/byte
cycles/byte = 14.597675854492188
sha512-cportable-compress
time = 97.30 μs
cycles = 330068.3753
rate = 2.69Gbps
secs/byte = 2.96nsec/byte
cycles/byte = 10.07288742980957
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3396.679
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3401.328
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3400.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3400.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3400.398
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3400.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 2707.914
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3400.796
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 6783.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment