Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Tool to isolate bottleneck mutex locks in C-family code.
-- time_lock_inj.lua
local usage_str = [[
lua time_lock_inj.lua <list_of_input_source_files>
Injects lock timing code for all the given files.
This assumes you're locking things with calls to
In a C-based repo, I suggest using the glob syntax
*.{c,cpp,m} to capture all source files in a given dir,
using only the suffixes that exist; so, drop the m if
you have no objective-C files, for example.
-- Internal globals.
local timelock_h_lines = {[[
// timelock.h
// This file has been autogenerated by time_lock_inj.lua.
void add_interval_to_timer(double interval, int timer_index);
local timelock_h_suffix = [[
local timelock_c_lines = {[[
// timelock.c
// This file has been autogenerated by time_lock_inj.lua.
#include "timelock.h"
#include "oswrap/oswrap.h"
#include <stdio.h>
#include <stdlib.h>
// Inserted lines below will vaguely look like this:
// #define num_timers 34
// double timelock_timers[num_timers] = {0, 0, 0, ..., 0};
// const char *lock_names[num_timers] = {"filename.c:123", "other_file.c:234"};
timelock_c_suffix = [[
// When the first timer reach this many seconds, we print out the current
// lock timing stats.
#define timer_trigger_sec 0.05
typedef struct {
double perc;
double time;
int index;
const char * lock_name;
} TimerInfo;
static double start_time = 0;
// Compare two TimerInfos given via pointers.
// For use as a qsort callback.
static int cmp(const void *vp1, const void *vp2) {
// We want to return an int with the opposite sign as diff.
// (Opposite sign gives us a largest-to-smallest sorting.)
TimerInfo *ti1 = (TimerInfo *)vp1;
TimerInfo *ti2 = (TimerInfo *)vp2;
double diff = ti1->perc - ti2->perc;
if (diff < 0) return 1;
if (diff > 0) return -1;
return 0;
static void print_stats() {
// Get the total time.
double total = 0;
for (int i = 0; i < num_timers; ++i) {
total += timelock_timers[i];
// Print out the percent of total time used.
double total_wall_time = now() - start_time;
dbg__printf("\n %7s %9s %9s\n", "lock%", "lock time", "wall time");
dbg__printf("%7.4f%% %9.4fs %9.4fs\n", (total / total_wall_time) * 100,
total, total_wall_time);
// Compute percentages and sort.
TimerInfo info[num_timers];
for (int i = 0; i < num_timers; ++i) {
info[i].perc = timelock_timers[i] / total;
info[i].time = timelock_timers[i];
info[i].index = i;
info[i].lock_name = lock_names[i];
qsort(info, num_timers, sizeof(TimerInfo), cmp);
// Print out the stats.
dbg__printf("\n%5s %5s %5s location\n", "index", "perc", "time");
for (int i = 0; i < num_timers; ++i) {
TimerInfo *ti = info + i;
dbg__printf("%5d %5.2f%% %5.3fs %s\n",
ti->index, ti->perc * 100, ti->time, ti->lock_name);
static int done_timing = 0;
void add_interval_to_timer(double interval, int timer_index) {
if (done_timing) return;
if (start_time == 0) start_time = now();
timelock_timers[timer_index] += interval;
if (timelock_timers[timer_index] >= timer_trigger_sec) {
// This use of done_timing isn't 100% thread-proof, but it avoids needing to
// check any lock behavior within this function, which is important because
// we want to minimize interference with the true timing data.
// In addition, in practice, the resulting inaccuracies are likely to
// be small or zero so that the data is still quite useful.
done_timing = 1;
local first_line_suffix = ' // This file was modified by time_lock_inj.lua.'
local num_timers = 0
local lock_names = {}
-- Internal functions.
local function modify_file(filename)
local did_modify = false
-- Read in the file as an array of lines.
local lines = {}
local header_injected = false
local line_num = 1
for line in io.lines(filename) do
if line_num == 1 then
line = line .. first_line_suffix
if line:match('^//') == nil and not header_injected then
lines[#lines + 1] = '#include "timelock.h"'
header_injected = true
-- The optional slash at the end can help handle some cases where a
-- lock is being used within a multiline macro definition.
ind, lock, slash = line:match('^(%s*)dbgcheck__named_lock(%b());%s*(\\?)')
if ind and lock then
num_timers = num_timers + 1
lock_names[#lock_names + 1] = filename .. ':' .. line_num
-- Surround the lock call by timing lines.
tmp_name = 'tmp_' .. line_num
local fmt = '%sdouble %s = now();%s'
lines[#lines + 1] = string.format(fmt, ind, tmp_name, slash);
lines[#lines + 1] = line
local fmt = '%sadd_interval_to_timer(now() - %s, %d);%s'
lines[#lines + 1] = string.format(fmt, ind, tmp_name, num_timers, slash);
did_modify = true
lines[#lines + 1] = line
line_num = line_num + 1
if not did_modify then return end
-- Write out the modified version.
print(string.format('Modifying %s', filename))
f =, 'w')
for _, line in ipairs(lines) do
f:write(line, '\n') -- TODO Ok for windows?
local function write_out_timelock_hc()
-- Add timer definition to timelock.c.
table.insert(timelock_c_lines, '#define num_timers ' .. num_timers)
local timer_defn = 'double timelock_timers[num_timers] = {'
if num_timers > 0 then
timer_defn = timer_defn .. '0' .. string.rep(', 0', num_timers - 1)
timer_defn = timer_defn .. '};'
table.insert(timelock_c_lines, timer_defn)
names_defn = 'const char *lock_names[num_timers] = {'
for i, name in ipairs(lock_names) do
if i > 1 then names_defn = names_defn .. ', ' end
names_defn = names_defn .. '"' .. name .. '"'
names_defn = names_defn .. '};'
table.insert(timelock_c_lines, names_defn)
-- Write out the files.
local filenames = { 'timelock.h', 'timelock.c' }
local line_sets = { timelock_h_lines, timelock_c_lines }
local suffixes = { timelock_h_suffix, timelock_c_suffix }
for i = 1, 2 do
f =[i], 'w')
for _, line in ipairs(line_sets[i]) do
f:write(line, '\n')
-- Main.
if #arg == 0 then
for _, filename in ipairs(arg) do

This comment has been minimized.

Copy link
Owner Author

@tylerneylon tylerneylon commented Apr 9, 2015

It can be tricky to measure which mutex locks may be a bottleneck in your code. Your code is typically not running when a mutex is waiting, so user time is not accumulating, making lag due to these look similar to slow system calls, which you can't do anything about. It's also difficult to measure since you're working with multithreaded code, and need to measure the times involved in a thread-safe way, but without adding any measurable overhead from new locks. This script gives you a way to isolate mutex lock speed bottlenecks in spite of those challenges.

Use this code like so from the directory of the code you want to measure:

$ lua time_lock_inj.lua *.{c,cpp}  # Modifies your source - use version control plz thx

Then build and run your code, also building and linking in the generated time_lock.c file. When you run your app, a print-out like the following will be sent to either stdout on mac/linux or to the debug console in Visual Studio:

   lock%   lock time   wall time
 2.0388%     0.0805s     3.9465s

index   perc   time location
   14 75.64% 0.061s walk.c:26
    7 13.90% 0.011s land_mem.c:187
    4  9.98% 0.008s land_draw.c:376
   13  0.18% 0.000s loop.c:74
    1  0.10% 0.000s edit.c:201

The first line shows the total percentage of time used waiting for any mutex lock. If this percentage is low, then you have no bottlenecks in your use of mutex locks - good job! Otherwise, look at the top line of the index/perc/time table to see which mutex lock has used up the most time so far.

This script modifies your source files, so after you're done measuring things you can run:

$ git checkout -- .

To drop the modifications.

This script makes two depedency assumptions that you can avoid if needed:

  1. That your code uses the dbgcheck__named_lock macro in dbgcheck to lock mutexes. You can really use any lock command you want; modify the ind, lock, slash = line:match(...) line to match your preferred locking mechanism.
  2. That your code uses oswrap, which provides the useful dbg__printf and now functions. You can replace these with anything you like that provides both printf-like funcitonality and a way to measure time passing at high resolution.

To be extra clear, if you want to avoid those dependencies, you have to modify the script, but in theory it's not a difficult modification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment