jbush001/semaphore.md

## semaphore.md

      
    Raw
  

              semaphore.md
            
          
    Problem Description

Currently, the primitive for inter-thread/core synchronization is a spinlock, which is supported
on this processor using the sync_load and sync_store instructions. This requires busy waiting,
as the processor checks the variable in a tight loop until it changes.
On a multithreaded processor, this steals cycles that could be used by other threads. In the
worst case, multiple threads may be waiting for a lock held by another thread on the
same core, which slows it down and increases contention.
Proposed solution

Create new synchronization instructions that will suspend the thread until the lock is
released.
semacquire 12(s0)
semrelease 12(s0)

The semacquire instruction will read a memory location. If it is 0, it will set it to 0xffffffff
and proceed with execution. If it is 1, it will suspend the current thread until the
other thread executes semrelease.
The advantage of these instructions is that they avoid busy waiting.
Semaphore values are always 32-bits wide.
Details

This implementation adds two new L2 message types:
L2REQ_SEMACQUIRE
L2REQ_SEMRELEASE

There are associated response types
L2RSP_SEMACQUIRE
L2RSP_SEMRELEASE

This is similar to sync_load and sync_store instructions and will try to reuse
logic from it as much as possible. Like synchronized memory instructions, these
instructions always generate L2 transactions. The semacquire operation must
track whether the instruction is being issued the first or second time. This
will reuse the logic that already exists for synchronized loads. In
dcache_data_stage, dd_sync_load_pending tracks this. Perhaps rename the signal
to indicate it can stand for either.
The L2 pipeline will be modified in l2_cache_update stage. For the semacquire
operation, it will emulate having a store value of all 1s. It will then write
the new value back to the cache line. If the old value is less than or equal to
zero, it will send the L2RSP_semacquire message with a flag indicating the
acquire did not succeed. At this point, the calling thread will already be
blocked. If the value was greater than zero, this flag will cause the thread to
resume (this logic is in l1_l2_interface).
The semrelease logic is similar to semacquire. It will increment the value in
the selected lane and set a flag based on the old value similar to semrelease.
However, when the l1_l2_interface receives a L2RSP_SEMRELEASE message with the
flag set, it will check if the address matches the address threads are blocked
on and wake them if so. At that point, all waiting threads will retry the lock
and one will acquire it. This does not guarantee progress or ordering, and there
is more overhead than if one thread was woken.
Testing


Functional tests
Stress tests
Performance tests