Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save thieu1995/bccd07d5b10d66de616d1a41b0a9360c to your computer and use it in GitHub Desktop.
Save thieu1995/bccd07d5b10d66de616d1a41b0a9360c to your computer and use it in GitHub Desktop.
Best way to combine multiprocessing python and multithreading tensorflow

For Tensorflow > 2.0 and Keras

1. No tensorflow settings, environment - type 1 (1 cores or 2 cores)

import everything_here
import tensorflow as tf

import os
# os.environ['MKL_NUM_THREADS'] = '2'           # 1
# os.environ['GOTO_NUM_THREADS'] = '2'          # 1
# os.environ['OMP_NUM_THREADS'] = '2'           # 1

## implement model below
....
print(time.time())

Single core OS

  • Running on 5 cores (1 main core, 4 created thread cores) -- not 100% CPU usage (around 30%) -- 10.4 seconds
    • Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
    • ==> Default tensorflow parallel thread is 4

Multiple core OS

  • Running on 9 cores (1 main core, 8 created thread cores) -- not 100% CPU usage (around 30%) -- 11.3 seconds
    • Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
    • ==> Default tensorflow parallel thread is 8

==>>> Tensorflow auto tune number of threads created based on OS settings

2. Single thread tensorflow settings, environment - type 1 (1 cores or 2 cores)

import everything_here
import tensorflow as tf

import os
# os.environ['MKL_NUM_THREADS'] = '1'           # 2  
# os.environ['GOTO_NUM_THREADS'] = '1'          # 2  
# os.environ['OMP_NUM_THREADS'] = '1'           # 2  

tf.config.threading.set_intra_op_parallelism_threads(1)     
tf.config.threading.set_inter_op_parallelism_threads(1)    

## implement model below
....
print(time.time())

Single core OS

  • Running on 2 cores (1 main core, 1 created thread cores) -- not 100% CPU usage (around 80%) -- 10.34 seconds
    • XLA service 0x558ffd7d01d0 initialized for platform Host (this does not guarantee that XLA will be used)
    • ==> better about running time and CPU usage

Multiple core OS

  • Running on 2 cores (1 main core, 1 created thread cores) -- not 100% CPU usage (around 80%) -- 10.03 seconds
    • XLA service 0x558ffd7d01d0 initialized for platform Host (this does not guarantee that XLA will be used)
    • ==> better about running time and CPU usage

==>>> If we dont use multiple threads in tensorflow, it wont have to take time to create and transfer data to different cores (each thread created will assigned to different core)

3. Multiple threads tensorflow settings, environment - type 1 (1 cores or 2 cores)

import everything_here
import tensorflow as tf

import os
# os.environ['MKL_NUM_THREADS'] = '1'       # 2          
# os.environ['GOTO_NUM_THREADS'] = '1'      # 2    
# os.environ['OMP_NUM_THREADS'] = '1'       # 2    

tf.config.threading.set_intra_op_parallelism_threads(2)     
tf.config.threading.set_inter_op_parallelism_threads(2)     

## implement model below
....
print(time.time())

Single core OS

Multiple core OS

  • Running on 3 cores (1 main core, 2 created thread cores) -- not 100% CPU usage (around 50%) -- 9.87 seconds
    • XLA service 0x558ffd7d01d0 initialized for platform Host (this does not guarantee that XLA will be used)
    • ==> this case is ok but not what we want.

4. Single or Multiple threads tensorflow settings, environment - type 2 (affinity 1)

import everything_here
import tensorflow as tf

import os
os.sched_setaffinity(0, {1})

tf.config.threading.set_intra_op_parallelism_threads(2)     # 1 
tf.config.threading.set_inter_op_parallelism_threads(2)     # 1

## implement model below
....
print(time.time())

Single thread tensorflow

  • Running on 1 core (1 main core) -- 100% CPU usage -- 8.8 seconds
    • The best so far, also this is what we want, because later we will use this with Multiprocessing Python

Multiple threads tensorflow

  • Running on 3 cores (1 main core, 2 created threads core) -- 50% CPU usage -- 11.4 seconds
    • Almost the worst, maybe it take time to create 2 created inside, meanwhile we assign main script with only 1 core.

MultiProcessing Python combine with Tensorflow

  • Assumption that I have 10 scripts python - 10 tasks which contain Keras (10 files). What is the best way to run 10 scripts?
  1. Run 10 files on 10 different screens?

    • Handy jobs too much (creating screens, activating environment, handling tensorflow multithreading,...)
  2. Merge 10 files into single file, then using MultiProcessing python handle 10 tasks, then runs it in single screen?

    • All in one, we know how to handle tensorflow, now we only have to take care MultiProcessing Python, and also the assigment of affinity on cores.
    • Each task will run on single core, multi-threading tensorflow will execute in that single core, it will make each task run full 100 % CPU ===> Best performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment