Consider setting the following optimization flags while training.
KMP_BLOCKTIME=0
KMP_AFFINITY=granularity=fine,verbose,compact,1,0
KMP_SETTINGS=1
OMP_NUM_THREADS=8
Also, use the following code after importingtf
.
import os
if 'OMP_NUM_THREADS' in os.environ:
NUM_THREADS = int(os.environ['OMP_NUM_THREADS'])
tf.config.threading.set_inter_op_parallelism_threads(NUM_THREADS)
tf.config.threading.set_intra_op_parallelism_threads(NUM_THREADS)
print(f"Using {NUM_THREADS} threads")
else:
print("Running with default configuration. Consider setting optimization flags while starting jupyter.")
Example:
- When starting jupyter-lab, use
KMP_BLOCKTIME=0 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 KMP_SETTINGS=1 OMP_NUM_THREADS=8 jupyter-lab
- When starting a python script, use
KMP_BLOCKTIME=0 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 KMP_SETTINGS=1 OMP_NUM_THREADS=8 python <script_to_run.py>