Skip to content

Instantly share code, notes, and snippets.

@letterj
Last active October 23, 2017 19:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save letterj/5a8079353ee7811f1ed3b9b48b381765 to your computer and use it in GitHub Desktop.
Save letterj/5a8079353ee7811f1ed3b9b48b381765 to your computer and use it in GitHub Desktop.

Best Performance gains with Memory Allocation using jemalloc

Current version of tcmalloc

According to Sean, this is the version of memory allocator

   libtcmalloc-minimal4                 2.4-0ubuntu5.16.04.1

Test Results for rbd_cache

jewel: 10.2.7

Test Type IOPS
4k-rbd_cache_true rand rw 10777/13604 (w/r iops) 4k
4k-rbd_cache_false rand rw 10539/28141 (w/r iops) 4k
4k-rbd_writethrough_cache rand rw 660/19887 (w/r iops) 4k

jewel: 10.2.10 (client/server)

Test Type IOPS
4k-jewel-latest-rbd_cache_true rand rw 9009/20537 (w/r iops) 4k
4k-jewel_latest_rbd_cache_false rand rw 10592/29959 (w/r iops) 4k
4k-jewel_latest_rbd_cache_writethrough_true rand rw 5412/11345 (w/r iops) 4k

Test Results for jemalloc

Allicator Verion Cache Type IOPS Spreadsheet
tcmalloc 2.4 128mb cache rand rw 10355/87073 (w/r iops) 4k Spreadsheet row 18
tcmalloc 2.4 256mb cache rand rw 9926/30893 (w/r iops) 4k Spreadsheet row 10
jemalloc 3.6 rand rw 14034/30165 (w/r iops) 4k Spreadsheet row 2

Note: These tests were all run using rbd_cache=false with attached volumes

Downside of jemalloc

  • It's not standard install.
  • According to testing done by Red Hat jemalloc will use significantly more memory than tcmalloc on the order to 200M to 300MB per OSD process under nomal use and about 400MB more during recovery. But according to our tests in the lab it only increased by approxiately 100MB

Test Suggestion:

Looking for volume iops to increase and watching for increase memory usage for everyday workload and a recovery

Set up memory monitoring on osd boxes

  1. Metrics to gather prior to the jemalloc change (Is rpc-maas installed on the staging environment?)

    • attached volume iops (should already have iops and memory)
    • Perform a recovery test to gather timming and memory usage
      • Define a valid recovery test with RPC support
  2. Install/Setup jemalloc

# Install new memory allocator
sudo apt-get install libjemalloc1 libjemalloc-dev
# Uncomment “#LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1” from /etc/default/ceph
# Restart ceph services on the host
  1. Verify ceph is now using jemalloc
lsof -E | grep malloc
  1. Volume performance increase test

    • Create several vm and attach a volume using rbd_cache = none
    • Run fio to verify the performance gain based on the documented tests you already have
    • Review osd memory profile.
  2. Recovery (track time of operation and memory foot print)

    • Run the same recovery operation done in Step 2.
    • Review timing and memory usage.

Notes and references

Execute fio test

fio 4k-randrw.fio | tee <situation>.txt

fio test file

$ cat 4k-randrw.fio 
[global]
bs=4k
iodepth=128
direct=1
ioengine=libaio
randrepeat=0
group_reporting
time_based
runtime=60
filesize=10G

[4k-randwrite]
rw=randwrite
stonewall
filename=<device>

Andy has submitted a Pull Request to ceph-ansible to add a flag to install, configure jemalloc


Link to RPC Test data


"Ceph default packages use tcmalloc.


Red Hat presentation https://www.youtube.com/watch?v=oxixZPSTzDQ&feature=youtu.be


"For flash optimized configurations, we found jemalloc providing best possible performance without performance degradation over time."

http://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment