“Vlad and analysis of dtrace was used”

(title from the Google translation of a Japanese blog [edit: it is a technology news site, not a blog] about Firefox memory fragmentation)

Using dtrace and some tools that we’ve built I’ve been able to get more fragmentation data. I haven’t hooked up all the allocators yet — Vlad just made some changes that will make getting data from a bunch of the other allocators much easier, so expect more data soon.

Lets compare the Windows standard heap, the Windows Low-Fragmentation Heap, nedmalloc and jemalloc. I’ve posted pictures from the two Windows allocators before, but here they are again:

Windows standard heap — Small and compact, but very fragmented:

Windows Standard Heap

The Windows Low-Fragmentation Heap — bigger and less fragmented:

Windows Low-Fragmentation Heap

nedmalloc — faster but more fragmented than the Windows LFH:

nedmalloc

jemalloc — faster, smaller than Windows LFH, and less fragmented:

jemalloc

So far, I’m seeing that tcmalloc (the latest version, which I’m told is slower than previous releases) and nedmalloc are both about 10% faster at pure allocations than the Windows heaps (which are about the same speed). jemalloc looks to be also about 10% faster but I ran my tests on a different machine and need to verify my numbers before making a strong claim about its speed.

jemalloc looks to be a pretty solid contender. Jason Evans, jemalloc’s author has been super helpful in answering lots of questions I’ve had and has done some investigating of his own. I’ll hold off declaring a winner until I’ve had time to run with a few more allocators, but the data is showing that we can get good wins by switching allocators. I’m also in the process of generating different sets of logs to run against the allocators so we can see how they behave with stress tests such as loading 200 tabs and then closing them all.

On a side but related note, Olli Pettay and Jonas Sicking are doing lots of great work on moving content nodes and related data in to arenas.

Tags: , , ,

14 Responses to ““Vlad and analysis of dtrace was used””

  1. Gen Kanai Says:

    Heh, actually that Japanese website is a technology news site, akin to a CNet or whatnot. What’s great is that they are bringing news about our work on fragmentation to Japanese readers.

  2. fredrik Says:

    Off-topic: what’s the deal with Planet.M.O randomly picking a word from the post and using that as the displayed title? For this entry on PMO the title was “jemalloc”. It’s been equally odd with earlier posts as well, but the feed looks fine. Weird.

  3. pavlov Says:

    fredrik: looks to be a bug in the feed parsing picking up image titles instead of the item title

  4. Ian McKellar Says:

    I wonder if all this work to make XPCOM objects GCable will allow us to move from pointers to a handle system so we can move them around. That would help fragmentation a *lot*.

  5. Ian M Says:

    One thing you didn’t mention was which of these are cross-platform or platform-specific.

  6. jimis Says:

    Before using alternative allocators on linux, you should have a look at glibc’s mallopt(). There are *many* ways to make the allocation strategy slower but more space efficient…. Its reference is in the source:
    http://sourceware.org/cgi-bin/cvsweb.cgi/libc/malloc/malloc.c?rev=1.181&content-type=text/x-cvsweb-markup&cvsroot=glibc

    And a very nice article:
    http://www.linuxjournal.com/node/6390/print

  7. jimis Says:

    I think that the slowest but most space efficient settings, set by mallopt() would be:
    M_MXFAST 0 (disables fastbins for small allocations)
    M_TRIM_THRESHOLD 0
    M_MMAP_THRESHOLD 4*1024 (uses mmap() instead of brk() for all memory allocations?)
    M_MMAP_MAX 32*1024*1024 (and uses it even for large allocations, even if mmap() is slower than brk())

  8. More allocator data — tcmalloc edition « pavlov.net Says:

    [...] pavlov.net Ramblings from the mind of Stuart Parmenter « “Vlad and analysis of dtrace was used” [...]

  9. pavlov Says:

    Ian: everything but jemalloc is mostly cross-platform currently (aside from the platform allocators, obviously). I ported jemalloc to Linux and Mac in a few hours and Windows shouldn’t take much longer.

  10. mark Says:

    Would be curious to get your port of jemalloc to Linux. Would like to try with an application that is having memory fragmentation issues.

  11. pavlov Says:

    mark: it is in cvs under mozilla/memory/jemalloc

  12. mark Says:

    excellent. i will grab it today. thanks.

  13. Firefox 3 Memory Usage « pavlov.net Says:

    [...] carefully studied the fragmentation effects of various allocators and concluded that jemalloc gave us the smallest [...]

  14. brian v Says:

    all this great effort, and still the installer is broken on server 2003. with b3, i could just keep running it a dozen times to get everything (xul.dll, etc). Now, half the dir.

    : - (

Leave a Reply