Archive for December, 2007

Bye Bye Allocations

December 18, 2007

Everyone has been working quite hard lately to get rid of allocations.  I’d like to call out jst, sicking, smaug, dbaron, brendan, igor, and crowder for their help, patches and suggestions over the last few weeks.

Just looking at our startup allocation test, we’ve removed about 100,000 allocations.  Other tests show far, far bigger gains.

allocation graph

We’ve been very successful in converting quite a few heap allocations to use small stack buffers instead.  We’ve found places where we didn’t need to do any allocation at all and just removed themMany of these improvements have come in our DOM and XPConnect code which not only helps reduce fragmentation but also results in performance gains for sites with lots of DOM access (not to mention our UI!).  We’ve also gotten startup speed improvements.

Smaug has done some great work getting content to use arenas.  There are some issues with the arena implementation and we need to get some more stats, but this is a good step in the right direction.

We’ve got more things in the pipeline so I expect this number to continue to drop.

More allocator data — tcmalloc edition

December 6, 2007

I hooked tcmalloc up today and ran some numbers. As I mentioned before it is pretty fast, but looking at this image certainly doesn’t place it ahead of jemalloc. It looks to be a little less fragmented than nedmalloc, but not as fast. (I’ll note that this image isn’t quite as accurate as some of the previous ones. The sizes I’m using don’t take the allocator’s overhead in to account which would most likely result in gray blocks becoming a little darker, but wouldn’t really effect the number of blocks spread out.)

tcmalloc

“Vlad and analysis of dtrace was used”

December 4, 2007

(title from the Google translation of a Japanese blog [edit: it is a technology news site, not a blog] about Firefox memory fragmentation)

Using dtrace and some tools that we’ve built I’ve been able to get more fragmentation data. I haven’t hooked up all the allocators yet — Vlad just made some changes that will make getting data from a bunch of the other allocators much easier, so expect more data soon.

Lets compare the Windows standard heap, the Windows Low-Fragmentation Heap, nedmalloc and jemalloc. I’ve posted pictures from the two Windows allocators before, but here they are again:

Windows standard heap — Small and compact, but very fragmented:

Windows Standard Heap

The Windows Low-Fragmentation Heap — bigger and less fragmented:

Windows Low-Fragmentation Heap

nedmalloc — faster but more fragmented than the Windows LFH:

nedmalloc

jemalloc — faster, smaller than Windows LFH, and less fragmented:

jemalloc

So far, I’m seeing that tcmalloc (the latest version, which I’m told is slower than previous releases) and nedmalloc are both about 10% faster at pure allocations than the Windows heaps (which are about the same speed). jemalloc looks to be also about 10% faster but I ran my tests on a different machine and need to verify my numbers before making a strong claim about its speed.

jemalloc looks to be a pretty solid contender. Jason Evans, jemalloc’s author has been super helpful in answering lots of questions I’ve had and has done some investigating of his own. I’ll hold off declaring a winner until I’ve had time to run with a few more allocators, but the data is showing that we can get good wins by switching allocators. I’m also in the process of generating different sets of logs to run against the allocators so we can see how they behave with stress tests such as loading 200 tabs and then closing them all.

On a side but related note, Olli Pettay and Jonas Sicking are doing lots of great work on moving content nodes and related data in to arenas.