Skip to content

Instantly share code, notes, and snippets.

@nashid
Forked from rahulrajaram/.md
Created September 23, 2021 17:02
Show Gist options
  • Save nashid/a0bf21954cd30f2249b3ca23cd387633 to your computer and use it in GitHub Desktop.
Save nashid/a0bf21954cd30f2249b3ca23cd387633 to your computer and use it in GitHub Desktop.
Python: Write to a file from multiple threads

I recently came across the need to spawn multiple threads, each of which needs to write to the same file. Since the file will experience contention from multiple resources, we need to guarantee thread-safety.

NOTE: The following examples work with Python 3.x. To execute the following programs using Python 2.7, please replace threading.get_ident() with thread.get_ident(). As a result, you would need to import thread and not threading.

  1. (The following example will take a very long time). It will create 200 threads, each of which will wait until a global lock is available for acquisition.
# threading_lock.py
import threading

global_lock = threading.Lock()

def write_to_file():
    while global_lock.locked():
        continue

    global_lock.acquire()

    with open("thread_writes", "a+") as file:
        file.write(str(threading.get_ident()))
        file.write("\n")
        file.close()

    global_lock.release()

# Create a 200 threads, invoke write_to_file() through each of them,
# and 
threads = []
for i in range(1, 201):
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

As mentioned earlier, the above program takes an unacceptable 125s:

python threading_lock.py  125.56s user 0.34s system 103% cpu 2:01.57 total

(Addendum: @agiletelescope points out that with the following minor change, we can avoid lock-contention drastically.

...
while global_lock.locked():
    time.sleep(0.01)
    continue

)

  1. A simple modification to this is to store the information the threads want to write in an in-memory data structure such as a Python list and to write the contents of the list to a file once all threads have join-ed.
# threading_lock_2.py
import threading

# Global lock
global_lock = threading.Lock()
file_contents = []
def write_to_file():
    while global_lock.locked():
        continue

    global_lock.acquire()
    file_contents.append(threading.get_ident())
    global_lock.release()

# Create a 200 threads, invoke write_to_file() through each of them,
# and 
threads = []
for i in range(1, 201):
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

with open("thread_writes", "a+") as file:
    file.write('\n'.join([str(content) for content in file_contents]))
    file.close()

The above program takes a significantly shorter, and almost negligible, time:

python threading_lock_2.py  0.04s user 0.00s system 76% cpu 0.052 total

With thread count = 2000:

python threading_lock_2.py  0.10s user 0.06s system 77% cpu 0.206 total

With thread count = 20000:

python threading_lock_2.py  0.10s user 0.06s system 77% cpu 0.206 total
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment