Skip to content

Instantly share code, notes, and snippets.

@jzakiya
Last active January 8, 2024 01:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jzakiya/0d6987ee00f3708d6cfd6daee9920bd7 to your computer and use it in GitHub Desktop.
Save jzakiya/0d6987ee00f3708d6cfd6daee9920bd7 to your computer and use it in GitHub Desktop.
Cousin Primes generator, multi-threaded, using SSoZ (Segmented Sieve of Zakiya), written in Crystal
# This Crystal source file is a multiple threaded implementation to perform an
# extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.
# Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
# Output is the number of cousin primes <= N, or in range N1 to N2; the last
# cousin prime value for the range; and the total time of execution.
# Code originally developed on a System76 laptop with an Intel I7 6700HQ cpu,
# 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
# probably needed to optimize for other hardware systems (ARM, PowerPC, etc).
# Compile as: $ crystal build cousinprimes_ssoz.cr -Dpreview_mt --release
# To reduce binary size do: $ strip cousinprimes_ssoz
# Thread workers default to 4, set to system max for optimum performance.
# Single val: $ CRYSTAL_WORKERS=8 ./cousinprimes_ssoz val1
# Range vals: $ CRYSTAL_WORKERS=8 ./cousinprimes_ssoz val1 val2
# val1 and val2 can be entered as either: 123456789 or 123_456_789
# Mathematical and technical basis for implementation are explained here:
# This Crystal source file is a multiple threaded implementation to perform an
# extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.
# Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
# Output is the number of cousin primes <= N, or in range N1 to N2; the last
# cousin prime value for the range; and the total time of execution.
# Code originally developed on a System76 laptop with an Intel I7 6700HQ cpu,
# 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
# probably needed to optimize for other hardware systems (ARM, PowerPC, etc).
# Compile as: $ crystal build cousinprimes_ssozgistnew.cr -Dpreview_mt --release
# To reduce binary size do: $ strip cousinprimes_ssoz
# Thread workers default to 4, set to system max for optimum performance.
# Single val: $ CRYSTAL_WORKERS=8 ./cousinprimes_ssoz val1
# Range vals: $ CRYSTAL_WORKERS=8 ./cousinprimes_ssoz val1 val2
# val1 and val2 can be entered as either: 123456788 or 123_456_789
# Mathematical and technical basis for implementation are explained here:
# https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_
# Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_
# for_the_Riemann_Hypotheses
# https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_
# https://www.academia.edu/19786419/PRIMES-UTILS_HANDBOOK
# https://www.academia.edu/81206391/Twin_Primes_Segmented_Sieve_of_Zakiya_SSoZ_Explained
# This source code, and its updates, can be found here:
# https://gist.github.com/jzakiya/0d6987ee00f3708d6cfd6daee9920bd7
# This code is provided free and subject to copyright and terms of the
# GNU General Public License Version 3, GPLv3, or greater.
# License copy/terms are here: http://www.gnu.org/licenses/
# Copyright (c) 2017-2024; Jabari Zakiya -- jzakiya at gmail dot com
# Last update: 2024/01/07
# Customized gcd to determine coprimality; n > m; m odd
def coprime?(m, n)
while m|1 != 1; t = m; m = n % m; n = t end
m > 0
end
# Compute modular inverse a^-1 to base m, e.g. a*(a^-1) mod m = 1
def modinv(a0, m0)
return 1 if m0 == 1
a, m = a0, m0
x0, inv = 0, 1
while a > 1
inv -= (a // m) * x0
a, m = m, a % m
x0, inv = inv, x0
end
inv += m0 if inv < 0
inv
end
def gen_pg_parameters(prime)
# Create prime generator parameters for given Pn
puts "using Prime Generator parameters for P#{prime}"
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23]
modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value
primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm }
rescousins = [] of Int32 # save upper cousinpair residues here
inverses = Array.new(modpg + 2, 0) # save Pn's residues inverses here
rc, inc, res = 5, 2, 0 # use P3's PGS to generate residue candidates
midmodpg = modpg >> 1 # mid value for modulus
while rc < midmodpg # find PG's 1st half residues
if coprime?(rc, modpg) # if rc is a residue
mc = modpg - rc # create its modular complement
inverses[rc] = modinv(rc, modpg) # save rc and mc inverses
inverses[mc] = modinv(mc, modpg) # if in cousinpair save both hi residues
rescousins << rc << mc + 4 if res + 4 == rc
res = rc # save current found residue
end
rc += inc; inc ^= 0b110 # create next P3 sequence rc: 5 7 11 13 17 19 ...
end
rescousins << (midmodpg + 2); rescousins.sort! # create pivot hi_cp
inverses[modpg + 1] = 1; inverses[modpg - 1] = modpg - 1 # last 2 residues are self inverses
{modpg, res_0, rescousins.size, rescousins, inverses}
end
def set_sieve_parameters(start_num, end_num)
# Select at runtime best PG and segment size parameters for input values.
# These are good estimates derived from PG data profiling. Can be improved.
nrange = end_num - start_num
bn, pg = 0, 3
if end_num < 49
bn = 1; pg = 3
elsif nrange < 77_000_000
bn = 16; pg = 5
elsif nrange < 1_100_000_000
bn = 32; pg = 7
elsif nrange < 35_500_000_000
bn = 64; pg = 11
elsif nrange < 14_000_000_000_000
pg = 13
if nrange > 7_000_000_000_000; bn = 384
elsif nrange > 2_500_000_000_000; bn = 320
elsif nrange > 250_000_000_000; bn = 196
else bn = 128
end
else
bn = 384; pg = 17
end
modpg, res_0, pairscnt, rescousins, resinvrs = gen_pg_parameters(pg)
kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num
kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num
krange = kmax - kmin + 1 # number of resgroups in range, at least 1
n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8)
b = bn * 1024 * n # set seg size to optimize for selected PG
ks = krange < b ? krange : b # segments resgroups size
puts "segment size = #{ks} resgroups for seg bitarray"
maxpairs = krange * pairscnt # maximum number of cousinprime pcs
puts "cousinprime candidates = #{maxpairs}; resgroups = #{krange}"
{modpg, res_0, ks, kmin, kmax, krange, pairscnt, rescousins, resinvrs}
end
def sozp5(val, res_0, start_num, end_num)
# Return the primes r0..sqrt(end_num) within range (start_num...end_num)
md, rescnt = 30u64, 8 # P5's modulus and residues count
res = [7,11,13,17,19,23,29,31] # P5's residues
range_size = end_num - start_num # integers size of inputs range
kmax = (val - 2) // md + 1 # number of resgroups upto input value
prms = Array(UInt8).new(kmax, 0) # byte array of prime candidates, init '0'
sqrtn = Math.isqrt(val-1|1) # integer sqrt of largest odd value <= val
k = sqrtn//md; resk = sqrtn-md*k; r=0 # sqrtn resgroup|residue values, 1st res posn
while resk >= res[r]; r += 1 end # find largest residue <= sqrtn posn in its resgroup
pcs_to_sqrtn = k*rescnt + r # number of pcs <= sqrtn
pcs_to_sqrtn.times do |i| # for r0..sqrtn primes mark their multiples)
k, r = i.divmod rescnt # update resgroup parameters
next if prms[k] & (1 << r) != 0 # skip pc if not prime
prm_r = res[r] # if prime save its residue value
prime = md*k + prm_r # numerate its value
rem = start_num % prime # prime's modular distance to start_num
next unless (prime - rem <= range_size) || rem == 0 # skip prime if no multiple in range
res.each do |ri| # mark prime's multiples in prms
kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue
bit_r = 1 << res.index(rn+2).not_nil! # bit mask value for prod's residue
kpm = k * (prime + ri) + kn # resgroup for prime's 1st multiple
while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end # mark prime's multiples
end end
# prms now contains the prime multiples positions marked for the pcs r0..N
# now along each restrack, identify|numerate the primes in each resgroup
# return only the primes with a multiple within range (start_num...end_num)
primes = [] of UInt64 # create empty dynamic array for primes
res.each_with_index do |r_i, i| # along each restrack|row til end
kmax.times do |k| # for each resgroup along restrack
if prms[k] & (1 << i) == 0 # if bit location a prime
prime = md * k + r_i # numerate its value, store if in range
# check if prime has multiple in range, if so keep it, if not don't
rem = start_num % prime # if rem 0 then start_num is multiple of prime
primes << prime if (res_0 <= prime <= val) && (prime - rem <= range_size || rem == 0)
end end end
primes # primes stored in restrack|row sequence order
end
def nextp_init(rhi, kmin, modpg, primes, resinvrs)
# Initialize 'nextp' array for cousinpair upper residue rhi in 'rescousins'.
# Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and
# store consecutively as lo_cp|hi_cp pairs for their restracks.
nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for cousinpair
r_hi, r_lo = rhi, rhi - 4 # upper|lower cousinpair residue values
primes.each_with_index do |prime, j| # for each prime r0..sqrt(N)
k = (prime - 2) // modpg # find the resgroup it's in
r = (prime - 2) % modpg + 2 # and its residue value
r_inv = resinvrs[r].to_u64 # and residue inverse
rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo
rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi
kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup
kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup
kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kmin)
kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= kmin)
nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val in range
nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val in range
end
nextp
end
def cousins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs)
# Perform in thread the ssoz for given cousinpair residues for kmax resgroups.
# First create|init 'nextp' array of 1st prime mults for given cousinpair,
# stored consequtively in 'nextp', and init seg array for ks resgroups.
# For sieve, mark resgroup bits to '1' if either cousinpair restrack is nonprime
# for primes mults resgroups, and update 'nextp' restrack slices acccordingly.
# Return the last cousinprime|sum for the range for this cousinpair residues.
sum, ki, kn = 0_u64, kmin-1, ks # init these parameters
hi_cp, k_max = 0_u64, kmax # max cousinprime|resgroup
seg = BitArray.new(ks) # seg array of ks resgroups
ki += 1 if r_hi - 4 < (start_num - 2) % modpg + 2 # ensure lo tp in range
k_max -= 1 if r_hi > (end_num - 2) % modpg + 2 # ensure hi tp in range
nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array
while ki < k_max # for ks size slices upto kmax
kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg
primes.each_with_index do |prime, j| # for each prime r0..sqrt(N)
# for lower cousinpair residue track
k1 = nextp.to_unsafe[j * 2] # starting from this resgroup in seg
while k1 < kn # until end of seg
seg.unsafe_put(k1, true) # mark primenth resgroup bits prime mults
k1 += prime end # set resgroup for prime's next multiple
nextp.to_unsafe[j * 2] = k1 - kn # save 1st resgroup in next eligible seg
# for upper cousinpair residue track
k2 = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg
while k2 < kn # until end of seg
seg.unsafe_put(k2, true) # mark primenth resgroup bits prime mults
k2 += prime end # set resgroup for prime's next multiple
nextp.to_unsafe[j * 2 | 1]=k2 - kn # save 1st resgroup in next eligible seg
end
cnt = seg[...kn].count(false) # count|store cousinprimes in segment
if cnt > 0 # if segment has cousinprimes
sum += cnt # add segment count to total range count
upk = kn - 1 # from end of seg, count back to largest cp
while seg.unsafe_fetch(upk); upk -= 1 end
hi_cp = ki + upk # set its full range resgroup value
end
ki += ks # set 1st resgroup val of next seg slice
seg.fill(false) if ki < k_max # set next seg to all primes if in range
end # when sieve done, numerate largest cousin
# for ranges w/o cousins set largest to 0
hi_cp = (r_hi > end_num || sum == 0) ? 0 : hi_cp * modpg + r_hi
{hi_cp.to_u64, sum.to_u64} # return largest cousinprime|cousins count
end
def cousinprimes_ssoz()
end_num = {(ARGV[0].to_u64 underscore: true), 3u64}.max
start_num = ARGV.size > 1 ? {(ARGV[1].to_u64 underscore: true), 3u64}.max : 3u64
start_num, end_num = end_num, start_num if start_num > end_num
start_num |= 1 # if start_num even increase by 1
end_num = (end_num - 1) | 1 # if end_num even decrease by 1
start_num = end_num = 7 if end_num - start_num < 4
puts "threads = #{System.cpu_count}"
ts = Time.monotonic # start timing sieve setup execution
# select Pn, set sieving params for inputs
modpg, res_0, ks, kmin, kmax, krange, pairscnt, rescousins, resinvrs = set_sieve_parameters(start_num, end_num)
# create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range
primes = end_num < 49 ? [5] : sozp5(Math.isqrt(end_num), res_0, start_num, end_num)
puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array"
te = (Time.monotonic - ts).total_seconds.round(6)
puts "setup time = #{te} secs" # display sieve setup time
puts "perform cousinprimes ssoz sieve"
t1 = Time.monotonic # start timing ssoz sieve execution
cnts = Array(UInt64).new(pairscnt, 0) # cousins count for each cousinpair
lastcousins = Array(UInt64).new(pairscnt, 0) # largest hi_cp for each cousinpair
done = Channel(Nil).new(pairscnt)
threadscnt = Atomic.new(0) # count of finished threads
rescousins.each_with_index do |r_hi,i| # sieve cousinpair restracks
spawn do
lastcousins[i], cnts[i] = cousins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs)
print "\r#{threadscnt.add(1)} of #{pairscnt} cousinpairs done"
done.send(nil)
end end
pairscnt.times { done.receive } # wait for all threads to finish
print "\r#{pairscnt} of #{pairscnt} cousinpairs done"
cousinscnt = 0u64 # count of cousinprimes in range
last_cousin = 0u64 # largest hi_cp cousin in range
if end_num < 49 # check for cousins in low ranges
small_cousins = [11, 17, 23, 41, 47 ].select { |c_hi| start_num <= c_hi - 4 && c_hi <= end_num }
cousinscnt, last_cousin = small_cousins.size, small_cousins.last? || 0
else # check for cousins from sieve
cousinscnt += cnts.sum # compute number of cousinprimes in range
last_cousin = lastcousins.max # find largest hi_cp cousinprime in range
end
# account for 1st cousin prime, defined as (3, 7)
if start_num == 3 && end_num > 6; cousinscnt += 1; last_cousin = 7 if end_num < 11 end
kn = krange % ks # set number of resgroups in last slice
kn = ks if kn == 0 # if multiple of seg size set to seg size
t2 = (Time.monotonic - t1).total_seconds # sieve execution time
puts "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time
puts "total time = #{(t2 + te).round(6)} secs" # setup + sieve time
puts "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}"
puts "total cousins = #{cousinscnt}; last cousin = #{last_cousin}|-4"
end
cousinprimes_ssoz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment