Skip to content

Instantly share code, notes, and snippets.

View simonhf's full-sized avatar

Simon Hardy-Francis simonhf

View GitHub Profile
@simonhf
simonhf / cache-line-example.c
Last active February 1, 2024 23:30
Experiment with __builtin_prefetch()
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/time.h>
#include <locale.h>
#define NUMBYTES (1024*1024*1024)
char bytes[NUMBYTES];
@simonhf
simonhf / _libarchive-read-blocking.md
Last active October 28, 2023 05:30
Experiments with libarchive read blocking: Part 1

Experiments with libarchive read blocking: Part 1

Disclaimer: Don't know much about libarchive... yet!

Step 1: Describe the issue

  • When reading a streamed archive using archive_read_open() [1] and archive_read_extract() [2] then a callback is called one or more times to read chunks of the archive.
  • This creates an issue if (a) your program needs to wait for the next chunk to arrive, and/or (b) you want to process multiple archive streams in the same thread.
  • Effectively archive_read_open() [1] and archive_read_extract() [2] block until all the necessary number of archive stream chunks have been read via the callback.
@simonhf
simonhf / Chain.java
Last active October 4, 2022 16:18
C versus CPP versus Java; the performance failings of naive OO
//package com.dnene.josephus;
// $ javac Person.java Chain.java && java Chain
public class Chain
{
private Person first = null;
public Chain(int size)
{
@simonhf
simonhf / _golang-profiling.md
Last active June 2, 2021 00:40
Golang profiling techniques

Learning about Golang go-routines, LWPs, async pre-emption, and timing

Create a script which is forced to execute non-cooperative go-routines one at a time because they are only running on a single LWP

  • Try to minimize the number of concurrent threads / LWP via GOMAXPROCS.
  • In theory, the go-routines should execute one after the other since they are non-cooperative?
  • Note: On a 16 core box, by default Golang spots the 16 cores available and GOMAXPROCS is set to 16.
$ cat blocking.go 
@simonhf
simonhf / perl-to-crystal.md
Created May 30, 2020 01:28
Intro to Crystal Lang for Perl developers

Intro to Crystal Lang for Perl developers

Background

I've used Perl for years for all the quick and dirty programs where it's much faster to develop in Perl than e.g. C/C++ or another language like Java which is very verbose and you end up writing tons of source code.

And although I do write Perl scripts, I mostly use so-called Perl one-liners in order to extend the command line and act as glue for other command line run scripts and programs, e.g.:

$ perl -e 'foreach(1..3){printf qq[$_\n];}' | \
@simonhf
simonhf / _musl_printf_stack_usage.md
Created February 3, 2020 21:06
Why does musl printf() use so much more stack when printfing floating point numbers?

Why does musl printf() use so much more stack when printfing floating point numbers?

How to test?

  • Create a small test program and run it with different configurations.

Run the test program on Ubuntu with glibc

$ gcc           -O0 -o svnprintf svnprintf.c && ./svnprintf

Reproduction of seg fault caused by musl thread creation race condition

Background

  • Recently I created some code which appeared to work perfectly when compiled with glibc but often failed with a seg fault when compiled with musl.
  • I then tried to create minimal test code to reproduce the issue, but without luck.
  • After much fiddling I noticed that the test code would seg fault, but only if it was run via gdb or strace!
  • The same test code compiled with glibc never seg faults when run via gdb or strace.
  • So I'm hoping this is evidence enough to suggest an issue with musl and not glibc.
  • At least it's a starting point... :-)

Experiments with gcc auto instrumentration of C/C++ source code for run-time function call tree tracing... a working example!

Background

  • In the original experiments [1] we managed to auto instrument C but not C++ via auto generating a wrapper function for every real function to be instrumented. However, this method does not lend itself well to C++, for example, it does not seem possible to double the number of functions in a class without modifying the class source code in place, thus displacing the original source lines.
  • In the further experiments [2] it was possible to auto instrument both C and C++ via the somewhat brittle CastXML mechanism. While this method works, CastXML is not very refined and somewhat slow, making this mechanism a little heavyweight. Also, it relies on the attribute cleanup mechanism too, which isn't the fastest if the instrumentation for a particular function is disabled.
  • In this working example below we show a new mechanism (but also using some elements of both prev

Comparing c++filt against llvm-cxxfilt with 100k+ mangled C++ names

Background

  • In a recent project I wanted to demangle C++ names in order to show a human friendly version of the names.
  • Once compiled to a debug version, the project contained over 100k+ mangled C++ names.
  • The mangled C++ names were collected by using the -S gcc command line option, which compiles C++ files to assembly files, making the mangled names visible in the source.
  • One the assembly files are generated then the mangled C++ names (which are labels in assembler) can be harvested.
  • Then they can be fed into c++filt and llvm-cxxfilt for demangling.
  • This is where the issues start:
@simonhf
simonhf / _other_gcc_trace_wish.md
Created December 3, 2019 00:20
Which other missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Which other missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Background

Let's say we want to compile a larger C/C++ application with 80,000 unique functions so that at run-time the function call-tree is output for code comprehension and debugging purposes.

Here [1] various methods and implementation options were discussed including the possibility of modifying gcc itself.

However, modifying gcc to insert macros all over the code has an unknown level of difficulty... but it's probably more difficult than expected.