twobombs/highqrackguide.gist

## highqrackguide.gist
# First of all a big thank your for Dan Strano and all the maintainers of the Qrack repository at the Unitary Fund
# Without their constant effort, focus and guidance all of ThereminQ and its endeavours would simply not exist.
# Please pay their sites a visit https://github.com/unitaryfund/qrack https://unitary.fund/ and join them on discord

After the installation of the Qrack system this Gist is about the recommended practices of running Qrack
This part has no required runnable code, but I would like to help you here how to help Qrack tell the system you have running

Qrack is instructed by export variables and command line variables. The right mix will help Qrack perform best.

An example of a script is here ( taken from https://github.com/twobombs/thereminq/blob/master/runscripts/run-tcc-dense-cube-multi )

'
cd /qrack128/_build/
export QRACK_QPAGER_DEVICES=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1
export QRACK_OCL_DEFAULT_DEVICE=0
export QRACK_MAX_PAGING_QB=32
time ./benchmarks --proc-stabilizer-qpager --layer-qunit-multi --timeout=2000000 --max-qubits=44 --devices=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1 --benchmark-depth=44 --single test_dense_cc_nn --samples=100 --measure-output=/var/log/qrack/"t_cc_dense_$(date +%F_%R_%S).log" &
'

the first line deals with the selection of the version of qrack for high qubit execution.
qrack128 has been compiled with settings for big vectors and is a FP64 environment
this version will use more vRAM then the regular /qrack
selecting this version is required because we want to use a high width and depth, above 32 qubits

the second line deals with device selection for the qpager layer inside qrack.
the numbers represent the opencl devices as reported by /qrack128/_build/benchmarks -h
an eight to twenty gigabyte GPU would want to have four qpager 'slots'
an 24 to 32 gigabyte GPU would typically want eight qpager 'slots'
stacking multiple GPUs will come long way in stretching the vector space over the vRAM of the GPUs

the third line deals with the deafult GPU. This line is important because it will represent the 'base of operations' for qrack
the default GPU will be the place where the first bunch of qubits are created and usually will require more vRAM space then the other GPUs
So, it is wise to select the GPU with the most vRAM avaliable. If you have all GPUs equal select the one with the least vRAM in use
by other processes. You can look this up and check this with NVTOP, also at runtime NVTOP will provide interesting telemetrics

the fourth line deals with the size of the 'workbench' for qrack measured in 'QFT' or maximum qubit vector space.
from all the optimisations of qrack this one is for t_cc_xx benchmarks the most important.
it prevents qrack from flooding the system with a huge vector space allocation. in the example this is set to 32 qubits.
GPUs with vRAM up to 8Gb might want to set this to 31 or even 30 qubits.
again: more vRAM on the GPU and more GPUs gives more room for qrack to operate.

the fifth line the actions starts: first the time command to measure the system and runtime of the benchmarks CMD invocation
the the benchmarks command followed by:

--proc-stabilizer-qpager                    > invoke the qpager stabiliser procedure, this helps performance and vector density
--layer-qunit-multi                         > qunit-multi because we're using multiple GPUs, use qunit when only one GPU
--timeout=2000000                           > timeout for the circuit run. measured in ms. ( so it's set at 200 seconds here )
--max-qubits=44                             > maximum amount of qubits to simulate
--devices=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1   > again here are the devices enumerated, keep them same as in line 2
--benchmark-depth=44                        > maximum random depth of the circuit
--single                                    > one run at max-qubits
test_dense_cc_nn                            > the name of the benchmark suite test, in this case the dense nearest neighbour
--samples=100                               > 100 samples is default as to acertain the fidelity of the benchmarks run in percentages
--measure-output=/var/log/qrack/
"t_cc_dense_$(date +%F_%R_%S).log"          > measured values are logged to a file in /var/log/qrack
&                                           > fork the process: meaning that the cmd is free to use for system diagnotic tools or tailing the output
	# First of all a big thank your for Dan Strano and all the maintainers of the Qrack repository at the Unitary Fund
	# Without their constant effort, focus and guidance all of ThereminQ and its endeavours would simply not exist.
	# Please pay their sites a visit https://github.com/unitaryfund/qrack https://unitary.fund/ and join them on discord

	After the installation of the Qrack system this Gist is about the recommended practices of running Qrack
	This part has no required runnable code, but I would like to help you here how to help Qrack tell the system you have running

	Qrack is instructed by export variables and command line variables. The right mix will help Qrack perform best.

	An example of a script is here ( taken from https://github.com/twobombs/thereminq/blob/master/runscripts/run-tcc-dense-cube-multi )

	'
	cd /qrack128/_build/
	export QRACK_QPAGER_DEVICES=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1
	export QRACK_OCL_DEFAULT_DEVICE=0
	export QRACK_MAX_PAGING_QB=32
	time ./benchmarks --proc-stabilizer-qpager --layer-qunit-multi --timeout=2000000 --max-qubits=44 --devices=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1 --benchmark-depth=44 --single test_dense_cc_nn --samples=100 --measure-output=/var/log/qrack/"t_cc_dense_$(date +%F_%R_%S).log" &
	'

	the first line deals with the selection of the version of qrack for high qubit execution.
	qrack128 has been compiled with settings for big vectors and is a FP64 environment
	this version will use more vRAM then the regular /qrack
	selecting this version is required because we want to use a high width and depth, above 32 qubits

	the second line deals with device selection for the qpager layer inside qrack.
	the numbers represent the opencl devices as reported by /qrack128/_build/benchmarks -h
	an eight to twenty gigabyte GPU would want to have four qpager 'slots'
	an 24 to 32 gigabyte GPU would typically want eight qpager 'slots'
	stacking multiple GPUs will come long way in stretching the vector space over the vRAM of the GPUs

	the third line deals with the deafult GPU. This line is important because it will represent the 'base of operations' for qrack
	the default GPU will be the place where the first bunch of qubits are created and usually will require more vRAM space then the other GPUs
	So, it is wise to select the GPU with the most vRAM avaliable. If you have all GPUs equal select the one with the least vRAM in use
	by other processes. You can look this up and check this with NVTOP, also at runtime NVTOP will provide interesting telemetrics

	the fourth line deals with the size of the 'workbench' for qrack measured in 'QFT' or maximum qubit vector space.
	from all the optimisations of qrack this one is for t_cc_xx benchmarks the most important.
	it prevents qrack from flooding the system with a huge vector space allocation. in the example this is set to 32 qubits.
	GPUs with vRAM up to 8Gb might want to set this to 31 or even 30 qubits.
	again: more vRAM on the GPU and more GPUs gives more room for qrack to operate.

	the fifth line the actions starts: first the time command to measure the system and runtime of the benchmarks CMD invocation
	the the benchmarks command followed by:

	--proc-stabilizer-qpager > invoke the qpager stabiliser procedure, this helps performance and vector density
	--layer-qunit-multi > qunit-multi because we're using multiple GPUs, use qunit when only one GPU
	--timeout=2000000 > timeout for the circuit run. measured in ms. ( so it's set at 200 seconds here )
	--max-qubits=44 > maximum amount of qubits to simulate
	--devices=0,0,0,0,0,0,0,0,2,2,2,2,1,1,1,1 > again here are the devices enumerated, keep them same as in line 2
	--benchmark-depth=44 > maximum random depth of the circuit
	--single > one run at max-qubits
	test_dense_cc_nn > the name of the benchmark suite test, in this case the dense nearest neighbour
	--samples=100 > 100 samples is default as to acertain the fidelity of the benchmarks run in percentages
	--measure-output=/var/log/qrack/
	"t_cc_dense_$(date +%F_%R_%S).log" > measured values are logged to a file in /var/log/qrack
	& > fork the process: meaning that the cmd is free to use for system diagnotic tools or tailing the output