Skip to content

Instantly share code, notes, and snippets.

@pavel-a
Last active June 24, 2024 22:42
Show Gist options
  • Save pavel-a/bf266750be6e6f6c1aa389404d9da04d to your computer and use it in GitHub Desktop.
Save pavel-a/bf266750be6e6f6c1aa389404d9da04d to your computer and use it in GitHub Desktop.
Preprocessor that converts C printf-like statements to something like Linux bprintf and extracts format strings. Can be used in small embedded systems to save memory and processing time on target.
#!/usr/bin/python
"""
Debug prints preprocessor
v2.beta3 06-aug-2017
(c) pavel_a@fastmail.fm, 2017
Debug statements:
DEBUG_LOG(filter, fmt, ...)
DEBUG_LOG_BUFF(filter, fmt, ...)
DEBUG_LOG_NF (no filter)
DEBUG_LOG_NFB (no filter, buffered)
DEBUG_MEM_DUMP(filter, pdata, size [, fmt, ...])
Notes:
- String args {%s) must be read-only and reside in the literals segment defined by the compiler.
- Only one print/dump statement can be in one src. line.
- Conditionals (#if #else #endif) can surround print parameters,
but entire format string must be kept together with the statement.
- Valid format specs: see docum.
- Format specifications cannot be wrapped in other macros, such as stdint.h PRI32d etc.
- This version can modify the original format string!
%l removed, %p replaced by %#x. TODO remove other noop modifiers (z,t).
Inspired by WPP tool from Windows Driver Kit (but no configuration, ETW, GUIDs, tmh files and so on).
Unlike WPP, all configuration (names of macros...) is done by editing this script.
Usage: python dbg_preprocessor.py INFILE OUTFILE
* Input file is a C file to scan for debug priint statements
* Output file is .h file for the inout C file
The h file will be injected into its C files with -include in the makefile (/FI for MS compiler)
The C file can also #include the generated h file explicitly.
Format strings are placed in a special section that will be extracted after build.
Then the block containing the string is copied to host side (server) which does formatting and output.
As alternative, format strings can be written to a file.
The segment with string literals must be copied to the server as well, if %s format is used.
The preprocessor uses numeric IDs for C files, assigned manually, in the file "dbg/file_id.h",
in form '#define FNAME_C_FILEID number' ; the IDs must be < 255.
"""
from __future__ import print_function
import sys, os, os.path
import re
file_db_filename= "dbg/file_id.h"
MAX_PRINT_OFF_PARAMS = 18
my_pattern = r'(DEBUG_LOG\w*|DEBUG_MEM_DUMP)\s*\('
my_re = re.compile(my_pattern)
expr_fmt=R'[ +0#-]?(\d{0,2}(\.\d?)?)?(ll|[lh])?[udxXcsp]'
fmt_re = re.compile(expr_fmt)
infilename, outfilename = None,None
_whereAmI = None # file,line for diagnostics
of = None
def die(msg):
print("*** ERROR:", msg, "at", _whereAmI)
raise Exception(msg)
def get_file_id(fname):
s = os.path.basename(fname).upper()
s = "__" + s.replace(".", "__") + "__FILEID__"
#print fname, s
with open(file_db_filename, 'rt') as f:
for line in f:
if s in line:
return line.split()[2]
return ""
def wr(str):
global of
print( str.rstrip(), file=of )
# ----------------------------------
def parse_D_xxxPrefixes(d1_arg, d2_arg):
# Handle D_xxx severity and module2 in any order, both can be None
# Return (severity, module) in this order
sev = { "D_CRT": "DBG_ERROR", "D_ERR": "DBG_ERROR", "D_WRN": "DBG_WARNING",
"D_INF": "DBG_INFO", "D_VRB": "DBG_VRB" }
mods = { "D_M1": 1, "D_M2": 2, "D_M3": 3, "D_M4": 4,}
d_sev, d_mods = None,None
for x in (d1_arg, d2_arg):
if not x: continue
i = sev.get(x)
if i:
if not d_sev:
d_sev = i
else:
die("Two D_ severity prefixes: %r %r" % (d1_arg,d2_arg) )
else:
i = mods.get(x)
if i:
if not d_mods:
d_mods = i
else:
die("Two D_ module prefixes: %r %r" % (d1_arg,d2_arg) )
else:
die('Unknown D_XXX prefix:%r' % x)
return d_sev, d_mods
# Parse extracted print statement:
def pr_parse(fileline, stmt, tail):
#print(fileline, stmt, len(tail))
assert len(tail) > 1
filter_arg = None # filter expression
is_buff = False # Buffered
if "DEBUG_MEM_DUMP" == stmt:
# DEBUG_MEM_DUMP(filter, pdata, size, [fmt, ...])
# Parse 1st arg off:
filter_arg,_,tail = tail.partition(",")
filter_arg = filter_arg.strip()
assert len(filter_arg) > 0
# Parse pdata, size:
dump_pdata_arg,_,tail = tail.partition(",")
dump_pdata_arg = dump_pdata_arg.strip()
assert len(dump_pdata_arg) > 1 and tail
dump_size_arg,_,tail = tail.partition(",")
dump_size_arg = dump_size_arg.strip()
assert len(dump_size_arg) > 1
tail = tail.strip()
dump_no_msg = False
if not tail:
dump_no_msg = True
tail = '"")'
dump_size_arg,_,tail2 = dump_size_arg.partition(')')
#print("DUMP:",dump_pdata_arg,dump_size_arg, tail )
# Continue parsing format string and params ...
if stmt.endswith("_BUFF"): is_buff = True
elif stmt == "DEBUG_LOG_NF":
stmt = "DEBUG_LOG_NO_FILTER"
elif stmt == "DEBUG_LOG_NFB":
stmt = "DEBUG_LOG_NO_FILTER"
is_buff = True
if stmt == "DEBUG_LOG" or stmt == "DEBUG_LOG_BUFF":
# Parse 1st arg off:
filter_arg,_,tail = tail.partition(",")
filter_arg = filter_arg.strip()
assert len(filter_arg) > 0
# Parse format string:
# Syntax: [D_xxx [D_yyy]] "string" ["string2" ....] ,|)
# D_ prefixes ?
d1_arg, d2_arg = None,None
tail = tail.strip()
if tail.startswith("D_"):
d1_arg,_,tail = tail.partition(" ")
tail = tail.strip()
assert tail
if tail.startswith("D_"):
d2_arg,_,tail = tail.partition(" ")
tail = tail.strip()
assert tail
if d1_arg or d2_arg:
d1_arg, d2_arg = parse_D_xxxPrefixes(d1_arg, d2_arg)
# Find format string; may consist of several segments
fmtstr = ""
while tail and tail[0] != ',' and tail[0] != ')' :
if len(tail) <= 0:
print("ERROR: format str not ends ok in ", fileline)
return
if tail[0] == '"':
ix = tail.find('"', 1)
if ix <= 0:
print("ERROR: bad format 2 in ", fileline)
return
# Escaped quote?
assert tail[ix] == '"'
if tail[ix-1] == "\\"[0]:
fmtstr += tail[1:ix+1]
tail = tail[ix:].rstrip()
continue
fmtstr += tail[1:ix]
tail = tail[ix+1:].strip()
continue
print("ERROR:format must start with literal string in ", fileline)
print( " >>%r<<" % tail )
print( " >>{%r}<<" % fmtstr )
return
if "DEBUG_MEM_DUMP" == stmt:
if dump_no_msg:
fmtstr,d1_arg, d2_arg = None,None,None
do_generate_dump(dump_pdata_arg, dump_size_arg, fileline, fmtstr, filter_arg, d1_arg, d2_arg)
return
# All others:
assert len(fmtstr) > 0
do_generate( stmt, fileline, fmtstr, filter_arg, d1_arg, d2_arg, is_buff )
# Generate C code...
def do_generate( stmt, fileline, fmtstr, filter_arg, filt2 = None, mod2 = None, is_buff = False ):
num_args, new_fmtstr = check_format_count_params(fmtstr)
if new_fmtstr is not None:
fmtstr = new_fmtstr
# Format file num and line as escaped characters, so whole descriptor struct looks as a string to compiler:
s_file_line = R'\x%2.2X@\x%2.2X\x%2.2X' % (fileline[0], fileline[1] & 0xFF, fileline[1] >> 8)
#print( "filters:", filter_arg, filt2)
if filt2 is None: filt2 = "0"
#print( "[[ Nargs=%u|%s | %r]]\n" % (num_args, s_file_line, fmtstr) )
if is_buff:
num_args |= 0x80000000;
if stmt == "DEBUG_LOG" or stmt == "DEBUG_LOG_BUFF":
wr( R"#define _DBGPF_%u(_filter, _fmt, ...) \ " % fileline[1])
wr( R" while(DBG_FLTR(%s,%s)) { \ " % (filter_arg, filt2) )
wr( R' X_ATTRF static const char a_fmt[] ="%s" "%s" ; \ ' % (s_file_line, fmtstr))
wr( R" dbgpn_off(%d, a_fmt, ## __VA_ARGS__ ); break; }" % num_args)
elif stmt == "DEBUG_LOG_NO_FILTER" or stmt == "DEBUG_LOG_NO_FILTER_BUFF":
wr( R"#define _DBGP_%u(_fmt, ...) \ " % fileline[1])
wr( R" while(DBG_FLTR(0, %s)) { \ " % filt2 )
wr( R' X_ATTRF static const char a_fmt[] = "%s" "%s" ; \ ' % (s_file_line, fmtstr))
wr( R" dbgpn_off(%d, a_fmt, ## __VA_ARGS__ ); break; }" % num_args )
#elif stmt == DEBUG_MEM_DUMP:
# below...............
else:
die("unhandled statement: %s at %r" % (stmt, fileline) )
wr("")
def do_generate_dump(dump_pdata_arg, dump_size_arg, fileline, fmtstr, filter_arg, filt2 = "0", mod2 = None):
is_buff = True
num_args = 0
if fmtstr :
num_args, new_fmtstr = check_format_count_params(fmtstr)
if new_fmtstr is not None:
fmtstr = new_fmtstr
# Format file num and line as escaped characters, so whole descriptor struct looks as a string to compiler:
s_file_line = R'\x%2.2X@\x%2.2X\x%2.2X' % (fileline[0], fileline[1] & 0xFF, fileline[1] >> 8)
if filt2 is None: filt2 = "0"
#print( "[[ Nargs=%u|%s | %r]]\n" % (num_args, s_file_line, fmtstr) )
#print( "filters:", filter_arg, filt2)
if is_buff:
num_args |= 0x80000000;
if fmtstr is not None:
wr( R"#define _DBGXD_%u(_filter, _addr, _size, _fmt, ...) \ " % fileline[1])
wr( R" while(DBG_FLTR(_filter,%s)) { \ " % filt2 )
wr( R' X_ATTRF static const char a_fmt[] ="%s" "%s" ; \ ' % (s_file_line, fmtstr))
wr( R" dbgpn_off(%d, a_fmt, ## __VA_ARGS__ ); \ " % num_args )
else:
wr( R"#define _DBGXD_%u(_filter,_addr, _size) \ " % fileline[1])
wr( R" while(DBG_FLTR(_filter,0)) { \ " )
wr( R" dbg_hexdump(_addr, _size); \ " )
wr( R" break; }")
##############################################################################
ALLOW_64 = True
def check_format_count_params(fmtstr):
""" Check format string, count parameters.
Returns: 1. size of parameters in u32 'words'.
2. Fixed format string, or None if no fixups were made
If found at least one invalid specs - stops and throws exception
Do not modify the format string at this step yet.
If %p and single l are allowed, host must handle these.
Any parameter except of 64-bit counts as one 'word'.
"""
def my_sub(src, pos, cnt, replacement):
""" replace substring... no battery for this? """
return src[0:pos] + replacement + src[pos+cnt:]
fcount = 0
t = fmtstr
res = False
pos = 0
while True:
n = t.find('%', pos)
if n < 0:
res = True; break # Normal end, ok
#if % in last position -> fail
if (n + 1) == len(t):
print("Failed [%d]: illegal %% at end of string" % i); break
pos = n + 1
#check for escaped \% : it is invalid in raw string, must be only %%
if n > 0 and t[n-1] == '\\':
print(R"ERROR: illegal \ before %, use double % !"); break
if t[n+1] == '%': # double %%, skip both
pos = n + 2
continue
# Match format spec by regexp:
x = fmt_re.match( t, n + 1 )
if x:
#print("%d: match [%s]" % (i, x.span()))
pos = x.span()[1] # next pos after match
spec = x.group()
assert 1 <= len(spec) < 10
fcount += 1 # Most args count as one (32-bit) word after int promotion
if 'll' in spec: # but 64 bit arg counts as 2 words
if not ALLOW_64: die("64-bit arguments (%llX) not supported")
fcount += 1
elif 'l' in spec:
#Always remove single l. For xcc, long is same as int, but host side can be confused.
ix = spec.index('l')
t = my_sub(t, n+1+ix, 1, '')
pos -= 1
#print( "Single %l subst ", _whereAmI) #dbg
if spec.endswith('p'):
fcount -= 1 # oh wait...
# Must be no other modifiers in %p
if spec != 'p':
print("Modifiers in %%p format spec not allowed: %s" % spec)
break
if pos < len(t):
if t[pos].isalnum(): # printk extensions like %pM, %ph - not supported
print("Format %%p, followed by letter/digit [%s] not allowed" % t[pos:])
break
# replace %p -> %#x
pos += (len('#x') - len(spec))
t = my_sub(t, n+1, len(spec), '#x')
#print("*** Replaced p-> [%r]" % t, _whereAmI) # dbg
fcount += 1
continue
# Format spec not recognized
print("ERROR: bad print format spec at [%r]" % t[n:] )
break
if not res:
die("Unsupported format spec (see above), please fix the error")
num_args = fcount
if num_args > MAX_PRINT_OFF_PARAMS :
die("Too many print parameters, max=%d" % MAX_PRINT_OFF_PARAMS)
return num_args, t if t != fmtstr else None
####################### COMMON CODE ###########
# Function _X_DBG_FILT should behave exactly as in legacy code:
# * 1. Check filter of DEBUG_LOG: it can be one or two bits (or 0 for _NF)
# Fail if all bits not set (for _NF this never fails)
# * 2. If D_sev is specified (not 0), check it.
# It is 1 bit. If the bit is not set, fail.
# * 3. Pass
# This is a function (inline) to ensure order of operations is not optimized.
# D_mod prefix is not checked, do later if needed.
h_comm_defs = R"""
#define __DBG_PRINTS_OFFLOAD__ 1
#define NO_DEBUG_LOG_DEFS
#define _PASTE2(x,y) x ## y
#define _PASTEDL1(x) _PASTE2(_DBGPF_, x)
#define _PASTEDN1(x) _PASTE2(_DBGP_, x)
#define _PASTEDXD1(x) _PASTE2(_DBGXD_, x)
#define DEBUG_LOG _PASTEDL1(__LINE__)
#define DEBUG_LOG_BUFF _PASTEDL1(__LINE__)
#define DEBUG_LOG_NF _PASTEDN1(__LINE__)
#define DEBUG_LOG_NFB _PASTEDN1(__LINE__)
#define DEBUG_MEM_DUMP _PASTEDXD1(__LINE__)
#define DBG_FLTR(flt1, flt2) _X_DBG_FILT((flt1), (flt2), g_DEBUG_LEVEL)
static inline _Bool _X_DBG_FILT(unsigned flt1, unsigned D_sev_mask, unsigned lvl)
{
if ((flt1 & lvl) != flt1) return 0;
if (D_sev_mask == 0) return 1;
return ((D_sev_mask & lvl) != 0);
}
extern void dbg_hexdump(const unsigned char *p, unsigned size);
extern void ddbg_offload_put(_Bool buffered, void *args, unsigned cbytes);
static inline void dbgpn_off(int argcount, const void *fmtp, ... )
{
_Bool buffered = !(argcount & 0x80000000);
unsigned cbytes = sizeof(int32_t) * ((argcount & 0xFF) + 1);
uint32_t *p = (uint32_t*)(&fmtp);
dbg_offload_put(buffered, p, cbytes); // this hack works on my arch. YMMV.
}
#define X_ATTRF __attribute__((section("rodata.offload")))
"""
#----------------------------------------------------------------------------------------------------------
def generate(P):
global _whereAmI; _whereAmI = (P.file_id,P.stmt_1st_line)
if not P.stmt_text:
die("generate w/no text?")
if not P.stmt:
die("generate w/no kwd?")
delta=P.stmt_last_line - P.stmt_1st_line
#print(" >> generate: %s @ %u%s" % (P.stmt, P.stmt_last_line, "" if delta == 0 else " [C+%u]"%(delta)) )
#global of
#if of is None: of=open( R'/mnt/disk/data2/svn_co/lmac_B0_CTO/_offl/prints.log', 'at')
#of.write( ">>> %u:%u [%s]\n" %(P.file_id, P.stmt_last_line, P.stmt))
#of.write(P.stmt_text)
#of.write('\n')
#Sanity
assert delta < 25 # too many cont. lines, missed last line??
if not P.stmt_text.endswith(');') and not P.cond:
die("MISSING closing ')'")
# Split the macro name and opening (
stmt,_,tail = P.stmt_text.partition('(')
assert tail
# Parse what we get and generate code:
pr_parse( (P.file_id,P.stmt_1st_line), P.stmt, tail.strip() )
# cleanup for next cycle
P.stmt_text = ""
P.stmt = None
P.stmt_1st_line = 0
# -----------------------------------
# Parser for debug prints
def parse_dbg_prints(infilename, infile, outfile):
global of
of = outfile #TODO pass as param.?
assert infilename.endswith(".c")
file_id = get_file_id(infilename)
if file_id == "" :
return True # do not parse, return empty file
class P:
stmt = None
stmt_text = ""
stmt_1st_line = 0
stmt_last_line = 0
cond = False
P.file_id = int(file_id)
assert 0 < P.file_id < 255
linenum = 0
cont_flag = False
cond_flag = False
for line in infile:
linenum += 1
# try to remove comments before and after statements
line = line.strip()
if line.startswith('//') or line.startswith('/*'):
continue
x = line.rfind("//")
if x > 0 :
line = line[:x].rstrip()
if len(line) <= 0 :
continue
if cont_flag: # Continuation lines ...
if line.startswith('#'): # preprocessor directive forces end of format
cond_flag = True
P.cond = True
continue
if not cond_flag:
P.stmt_text += " " + line
if line.endswith(';') :
P.stmt_last_line = linenum
generate(P)
cont_flag = False
cond_flag = False
continue
# Match the print macros:
match = re.match(my_re, line)
if match:
P.stmt = match.group()[:-1].strip()
assert len(P.stmt) > 1
P.stmt_1st_line = linenum
#print( " %s:%d >> [%s]" % (infilename, linenum, P.stmt) )
P.stmt_text = line
if line.endswith(';') :
P.stmt_last_line = linenum
generate(P)
cont_flag = False
else:
#print("Cont... line %u %s" % (linenum, P.stmt))
cont_flag = True # wait for ";"
cond_flag = False
P = None
# Add common defs ...
outfile.write(h_comm_defs)
outfile.write(h_D_xxx)
return True
# ---------------------------------------------------------
# Define all D_xxx prefixes as nothing for the compiler
h_D_xxx = R"""
#define D_M1
#define D_M2
#define D_M3
#define D_M4
#define D_ERR
#define D_CRT
#define D_WRN
#define D_INF
#define D_VRB
"""
##########################################
# Special case for dbg_print.c:
for_dbg_print_c = """
#define __DBG_PRINTS_OFFLOAD__ 1
#define MAX_PRINT_OFF_PARAMS %d
""" % MAX_PRINT_OFF_PARAMS
#############################################################
# Handle special cases ...
def suppr(fname):
if "dbg_print.c" in fname: # special case
return True, for_dbg_print_c
return False, None
# MAIN
if __name__ == "__main__" :
try :
infilename = sys.argv[1]
outfilename = sys.argv[2]
with open(infilename, 'rt') as inf, open(outfilename, 'wb') as outf:
rc, subst_text = suppr(infilename)
if rc:
if subst_text :
outf.write(subst_text)
else:
if not parse_dbg_prints(infilename, inf, outf):
print("dbg_prepr: ERROR processing file: %s" % infilename)
sys.exit(1)
except:
print("dbg_prepr: error processing file: %s" % infilename, _whereAmI)
raise
sys.exit(0)
This is a preprocessor that converts C printf-like statements in C code to something like Linux bprintf and extracts format strings.
It can be used in small embedded systems to save memory and processing time on target.
The following describes our custom implementation for a 32-bit LE machine (Tensilica Xtensa CPU & xcc toolchain). One can tweak it as needed.
Limitations:
* Limitation of the parser: ...... see in the py file
* Limitation of codegen: ....... ditto.
Mainly, all arguments must be scalar.
Format %s is supported only for constant strings (string literals, __func__ or __FUNCTION__)
.........
How it works
============
Format strings are extracted from the source and prefixes are added to them for use by the server (host) side.
The prefix is a packed 4 byte struct:
* uint8_t file ID
* char: Flag for server use, initially character '@'
* uint16_t line number
* <here goes the format string, 0 terminated>
The preprocessor converts the above struct to a character string because our compiler or linker could not
digest it as a structure; and use the result as initilaizer (union of structure and a char array).
So we have to convert the prefix struct to a string and concatenate with the format. This always works.
Format descriptors (the prefix + format string) can be placed to a special data section by a linker and then extracted; or written to a file, etc.
Parameters are passed to the server (host) as array of binary 32-bit values, similar to the Linux bprintf function:
[0] 32-bit pointer to format descriptor (see above)
[1] 1st print arg
.....
[n] Nth print arg
Arguments [1] - [n] are optional.
The size of the array is passed with it (the transport to the server/host and the host side decode is not shown here)
Server/host side
==================
The server should locate the format descriptor in its local copy of the data by the offset (array[0]), then use the format and
parameters to print the message.
The server can use it's own implementation of printf (sprintf, etc) if its format specifications are same as on the target.
32-bit LE host likely can pass the args array to its vprintf (vsprintf).
64-bit LE host (ex. AMD64) should extend each 32-bit parameter to 64-bits, 64-bit parameters must be combined from
two 32-bit values (low part, high part) to one 64-bit.
BIG endian host (heavens forbid) must byte-swap the values received from client.
If the host needs additional space in the descriptor for processing (ex. if it compiles the format string...) the size of the descriptor
can be increased and the structure changed as neeeded. Anyway, this won't consume real memory on the target system.
Wide format strings, i18n and so on: we don't need it. Keep it simple, use Latin char set or UTF-8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment