Skip to content

Instantly share code, notes, and snippets.

@tylerneylon
Last active February 5, 2024 09:29
Show Gist options
  • Star 29 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save tylerneylon/a7ff6017b7a1f9a506cf75aa23eacfd6 to your computer and use it in GitHub Desktop.
Save tylerneylon/a7ff6017b7a1f9a506cf75aa23eacfd6 to your computer and use it in GitHub Desktop.
A simple read-write lock implementation in Python.
# -*- coding: utf-8 -*-
""" rwlock.py
A class to implement read-write locks on top of the standard threading
library.
This is implemented with two mutexes (threading.Lock instances) as per this
wikipedia pseudocode:
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock#Using_two_mutexes
Code written by Tyler Neylon at Unbox Research.
This file is public domain.
"""
# _______________________________________________________________________
# Imports
from contextlib import contextmanager
from threading import Lock
# _______________________________________________________________________
# Class
class RWLock(object):
""" RWLock class; this is meant to allow an object to be read from by
multiple threads, but only written to by a single thread at a time. See:
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock
Usage:
from rwlock import RWLock
my_obj_rwlock = RWLock()
# When reading from my_obj:
with my_obj_rwlock.r_locked():
do_read_only_things_with(my_obj)
# When writing to my_obj:
with my_obj_rwlock.w_locked():
mutate(my_obj)
"""
def __init__(self):
self.w_lock = Lock()
self.num_r_lock = Lock()
self.num_r = 0
# ___________________________________________________________________
# Reading methods.
def r_acquire(self):
self.num_r_lock.acquire()
self.num_r += 1
if self.num_r == 1:
self.w_lock.acquire()
self.num_r_lock.release()
def r_release(self):
assert self.num_r > 0
self.num_r_lock.acquire()
self.num_r -= 1
if self.num_r == 0:
self.w_lock.release()
self.num_r_lock.release()
@contextmanager
def r_locked(self):
""" This method is designed to be used via the `with` statement. """
try:
self.r_acquire()
yield
finally:
self.r_release()
# ___________________________________________________________________
# Writing methods.
def w_acquire(self):
self.w_lock.acquire()
def w_release(self):
self.w_lock.release()
@contextmanager
def w_locked(self):
""" This method is designed to be used via the `with` statement. """
try:
self.w_acquire()
yield
finally:
self.w_release()
@Xztty
Copy link

Xztty commented Sep 16, 2019

must add try... finally.. or it may be dead lock when exception happened!

try:
        self.w_acquire()
        yield
finally:
        self.w_release()

@tylerneylon
Copy link
Author

Oh, that's a good point.

@tylerneylon
Copy link
Author

Updated.

@jonathan-daniel
Copy link

the try catch is needed because its used with yield?

@nickyfoto
Copy link

I have a simply script read and write a global variable.

import threading
from rwlock import RWLock
marker = RWLock()

WEEKDAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
today = 0

def calendar_reader(id_number):
    global today
    name = 'Reader-' + str(id_number)
    while today < len(WEEKDAYS)-1:
        # print(today)
        with marker.r_locked():
            print(name, 'sees that today is', WEEKDAYS[today])

def calendar_writer(id_number):
    
    global today
    name = 'Writer-' + str(id_number)
    while today < len(WEEKDAYS)-1:
        with marker.w_locked():
            today = (today + 1) % 7
            print(name, 'updated date to ', WEEKDAYS[today])
        

if __name__ == '__main__':
    for i in range(10):
        threading.Thread(target=calendar_reader, args=(i,)).start()
    for i in range(2):
        threading.Thread(target=calendar_writer, args=(i,)).start()

It works on the builtin Lock and readerwriterlock 1.0.4, but to make it work using your script, I have to uncomment print(today) on line 12. I didn't figure out why.

@tylerneylon
Copy link
Author

tylerneylon commented Dec 11, 2019

Hi @nickyfoto — this is actually correct behavior. What's happening is that the read locks are starving the write threads, so today never gets a chance to change. If you have both readers and writers, then it's up to you to ensure that at some point all of the readers will stop reading in order to allow the writers to write. A simple way to do this is to only have a single reader thread that regularly releases the read lock. Another option is to somehow fence the readers so that every once in a while they will all simultaneously let go of the read lock.

The reason this same code may work with other libraries is that their locking stage may be slower than this code. In that case it's less likely to starve the write lock. However, that does not make the other libraries better — the opposite, those libraries are actually slower, and even for those slow libraries it's bad practice to count on them being slow as a means to ensure your write threads are not starved.

The reason it terminates when you uncomment the print() line is that the print() line is slow enough that it dramatically increases the probability that at some point all of the read threads will not be holding on to the read lock.

@nickyfoto
Copy link

Hi @tylerneylon , thank you for the clear explanation. I learned a lot.

@LucasKaur
Copy link

LucasKaur commented Apr 8, 2020

I wonder about other options in realizing this task. I want to create a library for https://eduzaurus.com/free-essay-samples/sociology/ and I wonder how to do it. Thanks for the idea.

@IceflowRE
Copy link

You should make sure that https://gist.github.com/tylerneylon/a7ff6017b7a1f9a506cf75aa23eacfd6#file-rwlock-py-L87 does not get released if a read lock was acquired.

@tylerneylon
Copy link
Author

Hi @IceflowRE — If a read lock is active, then the write lock cannot be acquired until that read lock is complete. So w_release() should not be called unless there are zero readers.

You might be saying that incorrect code could call w_release() when a read lock is active. In that case, it is incorrect usage of this class. I am not trying to protect against bad usage of the class — only to support correct usage of it. The same philosophy is true of many concurrency tools.

@IceflowRE
Copy link

Hi @IceflowRE — If a read lock is active, then the write lock cannot be acquired until that read lock is complete. So w_release() should not be called unless there are zero readers.

You might be saying that incorrect code could call w_release() when a read lock is active. In that case, it is incorrect usage of this class. I am not trying to protect against bad usage of the class — only to support correct usage of it. The same philosophy is true of many concurrency tools.

Ah ok thanks :) because there are other implementation which prevents those.

@nidhimad
Copy link

Hi, in the member functions 'r_acquire' and 'r_release', a different reader thread can end up releasing the write lock than the one that acquired it, how does that work?

@tylerneylon
Copy link
Author

Hi @nidhimad , there is not really an ownership between readers and the write lock. Instead, the write lock is locked when either (there is a writer, and there can only be one writer) or (there are any readers, and there can be many simultaneous readers).

For example, consider this sequence of events for readers A, B, C: [A acquires] [B acquires] [A releases] [C acquires] [B releases] [C releases]. This has interesting overlap between the readers, but we don't actually care about the order -- we only care that the write lock is held until all readers are done. So this is the sequence of events in the example:

[start]
    num_r = 0
    w_lock is unlocked
[A acquires]
    num_r = 1
    w_lock is locked
[B acquires]
    num_r = 2
[A releases]
    num_r = 1
[C acquires]
    num_r = 2
[B releases]
    num_r = 1
[C releases]
    num_r = 0
    w_lock is unlocked

@chungyang
Copy link

chungyang commented Jan 14, 2022

This implementation is not safe at all in my opinion. Someone who uses this read write lock can call w_release to release the write lock that's currently held by another thread. Since write lock should only be released by the thread holding the lock, there should be a variable holding the current thread ID and w_release needs to check whether calling thread is the same before actually releasing it.

Edit:
I see a previous comment that says the implementation is not trying to protect against incorrect usage, in that case, nvm

@nidhimad
Copy link

@tylerneylon, I think your implementation is based on the assumption that a mutex acquired by one thread can be released by another.
Please look at the first implementation in the below link, it is similar to yours:
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

@tylerneylon
Copy link
Author

@nidhimad, you are correct that this code may release a lock from a different thread than the acquiring thread. This is supported by Python's threading library. See: https://docs.python.org/3/library/threading.html#lock-objects

Specifically, the documentation for Lock.release() says "Release a lock. This can be called from any thread, not only the thread which has acquired the lock." In other words, this implementation will not have a problem due to using a lock this way.

You are also correct that this implementation is like one listed on Wikipedia. This one may be useful for anyone who wants an already-written Python implementation. :)

@Nibbetts
Copy link

Nibbetts commented Jun 27, 2022

So if I understand correctly, as long as any readers are reading, more can be added on even if there is a writer waiting. Perhaps you could avoid this potentially infinite wait by having w_acquire acquire num_r_lock, and w_release release it?

If so, then I believe it becomes more similar to this other solution: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch06s04.html which, as soon as a writer starts waiting, causes any new readers to have to wait (Edit: I figured out this is not correct. It releases the lock while waiting, so other new readers could still pick it up, so this example is also not the behavior I want). With this behavior of new readers having to wait, it may be slower in some cases, but is much more fair to the writers so it is faster in other cases, and may be less likely to cause your writer thread to hang if readers are consistently run.

However, I'm not sure I quite grasp whether this increases or decreases the risk of deadlock, though I know it depends on the situation. Anyone have any insights there?

Edit: I believe I was correct, that for what I'm looking for, which allows the writer to block new readers until it has had its own turn, can be done by having the reader incrementer lock also be held by the writer upon its turn. Of course, this could create deadlock if readers and writers end up waiting on each other, but that is also always a risk with the other given solutions.

Edit 2: However, I haven't been able to test that, because my situation apparently has a reader requesting to read multiple times in a row before releasing the writer lock, and I deadlock in between recursion levels if I try to have the writer grab the reader lock. Or it just deadlocks and I haven't figured out why. For now, the given solution is good enough, till I learn more.

@tylerneylon
Copy link
Author

Hi @Nibbetts , thanks for the link to the alternative approach, which looks useful! I'll call that other approach a queued lock and this one (the gist above) queueless. (I just made up those names, but they make sense to me.)

When do we want a write request to take precedence over later read requests? (ie use a queued lock vs a queueless lock?)

The advantage of a queueless lock is that readers don't have to wait for each other. The advantage of a queued lock is that writers won't be starved.

Consider the situation where we have a number of incoming readers, and each reader and each writer holds the lock for 1 full second (a long time for a cpu). Suppose we receive requests very quickly in this order, R for read, W for write: R R R W R R R, and then nothing for a while.

  • If we have a queueless lock, then all 6 reads happen in about 1 second, since they are simultaneous; then the write occurs. The average read wait is 0 seconds, and the total time is 2 seconds.
  • If we have a queued lock, then the first 3 reads happen together in a concurrent second, then the write occurs, then the next 3 reads. The average read wait is 1 second (avg of 0,0,0,2,2,2), and the total time is 3 seconds.
  • Note that the writer has to wait about 1 second in both cases. I'm assuming reads don't slow each other down (which, if they do, changes the timing but also affects both kinds of locks).

If we wanted, we could look at specific R/W orderings like R W*n R*m for large values of n and m and produce more pronounced differences.

All of the above is to say that each lock type may be better to use depending on your use case. My take on how to choose between them is:

  • What are the possible time intervals between being in a zero-reader state?
  • If those time intervals are too long between writes, then use a queued lock; otherwise go queueless.

@tylerneylon
Copy link
Author

Just another note: In general, it would be nice if a low-level Lock object had the ability to shout, "Hey! I've been waiting 300 years!" if it was waiting a while. I'd use that while building a system to help gain visibility into any bad behavior, which could help find deadlock cases, or in general to make good decisions like that between a queued and queueless lock.

@Eboubaker
Copy link

A forked version with support for with blocks https://gist.github.com/Eboubaker/6a0b807788088a764b2a4c156fda0e4b

@tylerneylon
Copy link
Author

Hi @Eboubaker ! As a friendly fyi, this gist does include support for with blocks (this is done via contextmanager).

Example usage:

from rwlock import RWLock

rwlock = RWLock()  # This is a lock for my_obj.

# When reading from my_obj:
with rwlock.r_locked():
    do_read_only_things_with(my_obj)

# When writing to my_obj:
with rwlock.w_locked():
    mutate(my_obj)

@Eboubaker
Copy link

Eboubaker commented Oct 29, 2022

def update_things():
   with rwlock.w_locked():
    #  update state

with rwlock.r_locked():
    # read state
    if some_confition:
       update_things()

this example will hang in your implementation if the same thread was doing the work,
you have to move lock from update_things, but this method is being called from multiple places, and you have to keep adding with rwlock.w_locked(): before each call to the method, code will be less readable

@tylerneylon
Copy link
Author

Thanks for explaining. You've designed your lock to be reentrant, which is cool. If I have time I may add this feature here, but for now it is not a reentrant lock.

@Eboubaker
Copy link

Eboubaker commented Oct 29, 2022

@tylerneylon, I think your implementation is based on the assumption that a mutex acquired by one thread can be released by another. Please look at the first implementation in the below link, it is similar to yours: https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

I believe my implementation does not have this problem I have a list of thread ids that are reading, and my release method will not do anything if the current thread did not call acquire_read before

@Nibbetts So if I understand correctly, as long as any readers are reading, more can be added on even if there is a writer waiting. Perhaps you could avoid this potentially infinite wait by having w_acquire acquire num_r_lock, and w_release release it?

in my implementation i used a condition lock for the read_ready_state any writer or reader will need to acquire the state before they can do read or write requests, so there will be a queue which is managed by the languge and will give the lock by the order of callers. so when the writer requests the state other read requests will have to be after when the writer has released the state

@littlefattiger
Copy link

use time.sleep(1) to give them a chance to gain the lock

import threading
import time
# from rwlock import RWLock
marker = RWLock()

WEEKDAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
today = 0

def calendar_reader(id_number):
    global today
    name = 'Reader-' + str(id_number)
    while today < len(WEEKDAYS)-1:
        # print(today)
        with marker.r_locked():
            print(name, 'sees that today is', WEEKDAYS[today])
        time.sleep(1)

def calendar_writer(id_number):
    
    global today
    name = 'Writer-' + str(id_number)
    while today < len(WEEKDAYS)-1:
        with marker.w_locked():
            today = (today + 1) % 7
            print(name, 'updated date to ', WEEKDAYS[today])
        time.sleep(1)
        

if __name__ == '__main__':
    for i in range(10):
        threading.Thread(target=calendar_reader, args=(i,)).start()
    for i in range(2):
        threading.Thread(target=calendar_writer, args=(i,)).start()

result:

Reader-0 sees that today is Sunday
Reader-1 sees that today is Sunday
Reader-2 sees that today is Sunday
Reader-3 sees that today is Sunday
Reader-4 sees that today is Sunday
Reader-5 sees that today is Sunday
Reader-6 sees that today is Sunday
Reader-7 sees that today is Sunday
Reader-8 sees that today is Sunday
Reader-9 sees that today is Sunday
Writer-0 updated date to  Monday
Writer-1 updated date to  Tuesday
Reader-4 sees that today is Tuesday
Reader-6 sees that today is Tuesday
Reader-0 sees that today is Tuesday
Reader-7 sees that today is Tuesday
Reader-3 sees that today is Tuesday
Reader-5 sees that today is Tuesday
Reader-2 sees that today is Tuesday
Reader-8 sees that today is Tuesday
Reader-1 sees that today is Tuesday
Reader-9 sees that today is Tuesday
Writer-0 updated date to  Wednesday
Writer-1 updated date to  Thursday
Reader-9 sees that today is Thursday
Reader-4 sees that today is Thursday
Reader-2 sees that today is Thursday
Reader-7 sees that today is Thursday
Reader-1 sees that today is Thursday
Reader-0 sees that today is Thursday
Reader-5 sees that today is Thursday
Reader-6 sees that today is Thursday
Reader-3 sees that today is Thursday
Reader-8 sees that today is Thursday
Writer-1 updated date to  Friday
Writer-0 updated date to  Saturday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment