Skip to content

Instantly share code, notes, and snippets.

@jbush001
Last active December 6, 2018 06:34
Show Gist options
  • Save jbush001/22a3c336f0b59b095547025cdb7cee5d to your computer and use it in GitHub Desktop.
Save jbush001/22a3c336f0b59b095547025cdb7cee5d to your computer and use it in GitHub Desktop.

Problem Description

Currently, the primitive for inter-thread/core synchronization is a spinlock, which is supported on this processor using the sync_load and sync_store instructions. This requires busy waiting, as the processor checks the variable in a tight loop until it changes.

On a multithreaded processor, this steals cycles that could be used by other threads. In the worst case, multiple threads may be waiting for a lock held by another thread on the same core, which slows it down and increases contention.

Proposed solution

Create new synchronization instructions that will suspend the thread until the lock is released.

semacquire 12(s0)
semrelease 12(s0)

The semacquire instruction will read a memory location. If it is 0, it will set it to 0xffffffff and proceed with execution. If it is 1, it will suspend the current thread until the other thread executes semrelease.

The advantage of these instructions is that they avoid busy waiting.

Semaphore values are always 32-bits wide.

Details

This implementation adds two new L2 message types:

L2REQ_SEMACQUIRE
L2REQ_SEMRELEASE

There are associated response types

L2RSP_SEMACQUIRE
L2RSP_SEMRELEASE

This is similar to sync_load and sync_store instructions and will try to reuse logic from it as much as possible. Like synchronized memory instructions, these instructions always generate L2 transactions. The semacquire operation must track whether the instruction is being issued the first or second time. This will reuse the logic that already exists for synchronized loads. In dcache_data_stage, dd_sync_load_pending tracks this. Perhaps rename the signal to indicate it can stand for either.

The L2 pipeline will be modified in l2_cache_update stage. For the semacquire operation, it will emulate having a store value of all 1s. It will then write the new value back to the cache line. If the old value is less than or equal to zero, it will send the L2RSP_semacquire message with a flag indicating the acquire did not succeed. At this point, the calling thread will already be blocked. If the value was greater than zero, this flag will cause the thread to resume (this logic is in l1_l2_interface).

The semrelease logic is similar to semacquire. It will increment the value in the selected lane and set a flag based on the old value similar to semrelease. However, when the l1_l2_interface receives a L2RSP_SEMRELEASE message with the flag set, it will check if the address matches the address threads are blocked on and wake them if so. At that point, all waiting threads will retry the lock and one will acquire it. This does not guarantee progress or ordering, and there is more overhead than if one thread was woken.

Testing

  • Functional tests
  • Stress tests
  • Performance tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment