Skip to content

Instantly share code, notes, and snippets.

@MaskRay
Last active January 8, 2023 00:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save MaskRay/219effe23a767b85059097f863ebc085 to your computer and use it in GitHub Desktop.
Save MaskRay/219effe23a767b85059097f863ebc085 to your computer and use it in GitHub Desktop.
lld with different malloc implementations
# Build malloc implementations.
mkdir -p /tmp/p
pushd /tmp/p
if [[ ! -d jemalloc ]]; then
git clone https://github.com/jemalloc/jemalloc
(cd jemalloc
./autogen.sh
mkdir -p out/release && cd out/release
../../configure && make -j $(nproc))
fi
if [[ ! -d mimalloc ]]; then
git clone https://github.com/microsoft/mimalloc
(cd mimalloc; cmake -GNinja -S. -Bout/release -DCMAKE_BUILD_TYPE=Release; ninja -C out/release)
fi
if [[ ! -d rpmalloc ]]; then
git clone https://github.com/mjansson/rpmalloc
(cd rpmalloc; ./configure.py; ninja)
fi
if [[ ! -d snmalloc ]]; then
git clone https://github.com/microsoft/snmalloc
(cd snmalloc; cmake -GNinja -S. -Bout/release -DCMAKE_BUILD_TYPE=Release -DSNMALLOC_STATIC_LIBRARY_PREFIX=; ninja -C out/release)
fi
popd
# In a directory extracted from an lld reproduce tarball
# In llvm-project, run: ninja lld; rm bin/lld; ninja -v lld; copy the command line, append -Wl,--reproduce=/tmp/lld.tar
# cd /tmp; tar xf lld.tar; cd lld
pushd /tmp/lld
sed -i '/--chroot/d' response.txt
ld.lld @response.txt -o lld.glibc
ld.lld /tmp/p/mimalloc/out/release/libmimalloc.a @response.txt -o lld.mi
ld.lld /tmp/p/jemalloc/out/release/lib/libjemalloc.a @response.txt -o lld.je
ld.lld /tmp/p/rpmalloc/build/ninja/linux/release/x86-64/rpmalloc-*/librpmalloc*.a @response.txt -o lld.rp
ld.lld /tmp/p/snmalloc/out/release/libsnmallocshim-static.a @response.txt /usr/lib/x86_64-linux-gnu/libatomic.so.1 -o lld.sn
popd
for m in glibc mi je rp sn; do
/tmp/lld/lld.$m -flavor gnu @response.txt --time-trace=$i.$m.time-trace
for i in 1 2 4 8 16 32 64; do
numactl -C 0-$[i-1] /tmp/lld/lld.$m -flavor gnu @response.txt --threads=$i --time-trace=$i.$m.time-trace
jq -r '.traceEvents[] | select(.name|contains("Total")) | "\(.dur/1000000) \(.name)"' < $i.$m.time-trace > $i.$m
done
done
@MaskRay
Copy link
Author

MaskRay commented Oct 2, 2022

For glibc malloc and snmalloc, --threads=8 has the peak performance. --threads=16 becomes slow and --threads=64 is very slow.

Scan relocations (since 2022-09-12 https://reviews.llvm.org/D133003) and Split sections appear to be affected the most.
Scan relocations calls (per-thread) sec->relocations.reserve(rels.size()); in RelocationScanner::scan and a lot of sec->relocations.push_back in RelocationScanner::processAux.

clang -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=on

% grep 'relocations' {1,2,4,8,16,32,64}.{glibc,mi,je,rp,sn}
1.glibc:0.231926 Total Scan relocations
1.mi:0.202974 Total Scan relocations
1.je:0.215223 Total Scan relocations
1.rp:0.23792 Total Scan relocations
1.sn:0.227665 Total Scan relocations
2.glibc:0.181386 Total Scan relocations
2.mi:0.11957 Total Scan relocations
2.je:0.133877 Total Scan relocations
2.rp:0.135936 Total Scan relocations
2.sn:0.157039 Total Scan relocations
4.glibc:0.104471 Total Scan relocations
4.mi:0.083414 Total Scan relocations
4.je:0.091646 Total Scan relocations
4.rp:0.085893 Total Scan relocations
4.sn:0.086307 Total Scan relocations
8.glibc:0.090597 Total Scan relocations
8.mi:0.065062 Total Scan relocations
8.je:0.058909 Total Scan relocations
8.rp:0.065551 Total Scan relocations
8.sn:0.067661 Total Scan relocations
16.glibc:0.240263 Total Scan relocations
16.mi:0.054076 Total Scan relocations
16.je:0.054905 Total Scan relocations
16.rp:0.050947 Total Scan relocations
16.sn:0.061477 Total Scan relocations
32.glibc:0.270437 Total Scan relocations
32.mi:0.06434 Total Scan relocations
32.je:0.05706 Total Scan relocations
32.rp:0.062589 Total Scan relocations
32.sn:0.069823 Total Scan relocations
64.glibc:0.344121 Total Scan relocations
64.mi:0.072153 Total Scan relocations
64.je:0.062939 Total Scan relocations
64.rp:0.076356 Total Scan relocations
64.sn:0.078987 Total Scan relocations
% for m in glibc mi je rp sn; do a=$(/usr/bin/time -f '%e\t%M' /tmp/lld/lld.$m -flavor gnu @response.txt --threads=16 2>&1); echo -e "$m\t$a"; done
glibc   1.12    1161176
mi      0.66    1334028
je      0.70    1300912
rp      0.73    1328284
sn      0.73    1282372

chromium

% grep 'relocations' {1,2,4,8,16,32,64}.{glibc,mi,je,rp,sn}
1.glibc:0.478237 Total Scan relocations
1.mi:0.472918 Total Scan relocations
1.je:0.478159 Total Scan relocations
1.rp:0.488223 Total Scan relocations
1.sn:0.535785 Total Scan relocations
2.glibc:0.337231 Total Scan relocations
2.mi:0.273341 Total Scan relocations
2.je:0.274167 Total Scan relocations
2.rp:0.303937 Total Scan relocations
2.sn:0.305474 Total Scan relocations
4.glibc:0.185736 Total Scan relocations
4.mi:0.160852 Total Scan relocations
4.je:0.177224 Total Scan relocations
4.rp:0.178963 Total Scan relocations
4.sn:0.19295 Total Scan relocations
8.glibc:0.17925 Total Scan relocations
8.mi:0.172494 Total Scan relocations
8.je:0.166599 Total Scan relocations
8.rp:0.15678 Total Scan relocations
8.sn:0.141205 Total Scan relocations
16.glibc:0.190787 Total Scan relocations
16.mi:0.169511 Total Scan relocations
16.je:0.176115 Total Scan relocations
16.rp:0.181175 Total Scan relocations
16.sn:0.14422 Total Scan relocations
32.glibc:0.326898 Total Scan relocations
32.mi:0.238637 Total Scan relocations
32.je:0.195906 Total Scan relocations
32.rp:0.220888 Total Scan relocations
32.sn:0.18388 Total Scan relocations
64.glibc:0.305912 Total Scan relocations
64.mi:0.260754 Total Scan relocations
64.je:0.253686 Total Scan relocations
64.rp:0.278348 Total Scan relocations
64.sn:0.235382 Total Scan relocations
% for m in glibc mi je rp sn; do a=$(/usr/bin/time -f '%e\t%M' /tmp/lld/lld.$m -flavor gnu @response.txt --threads=16 2>&1); echo -e "$m\t$a"; done
glibc   6.39    7053376
mi      5.40    7540108
je      5.57    7415076
rp      5.83    7851768
sn      5.87    7444916

@mjp41
Copy link

mjp41 commented Oct 13, 2022

Just a quick question. When you built snmalloc did you set -DSNMALLOC_STATIC_LIBRARY_PREFIX= ? It defaults to creating symbols like sn_malloc. The performance numbers look like is more similar to glibc than I would have expected.

@MaskRay
Copy link
Author

MaskRay commented Oct 26, 2022

Just a quick question. When you built snmalloc did you set -DSNMALLOC_STATIC_LIBRARY_PREFIX= ? It defaults to creating symbols like sn_malloc. The performance numbers look like is more similar to glibc than I would have expected.

Sorry, I didn't. Used -DSNMALLOC_STATIC_LIBRARY_PREFIX= now and reran the tests. (Still updating chromium ones)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment