Windows Low Fragmentation Heap
Be sure that you’ve read my previous blog post about memory fragmentation before reading this.
I previously stated that I had tried to use the Windows Low-fragmentation Heap and hadn’t seen much difference. I realized that I wasn’t setting it on all the heaps.
I’ve added the following code to the top of main():
HANDLE heaps[1025];
DWORD nheaps = GetProcessHeaps(1024, heaps);
for (DWORD i = 0; i < nheaps; i++) {
ULONG HeapFragValue = 2;
HeapSetInformation(heaps[i],
HeapCompatibilityInformation,
&HeapFragValue,
sizeof(HeapFragValue));
}
What do I see? Well, not quite what I expected.
After startup, loading about:blank:

This represents:
total: 19,247,554 bytes
used: 13,381,970 bytes
free: 5,865,584 bytes
This looks like about 1mb in used blocks more than without the LFH turned on. Notice though that we have 4mb more free blocks than with LFH off.
I repeated the same steps I followed in my last blog post. After loading Tripadvisor, clicking check rates, waiting for pages to load, closing them all and then loading about:blank. After clearing caches, I see:

This represents:
total: 54,253,757 bytes
used: 26,337,381 bytes
free: 27,916,376 bytes
Whoa! What happened here? We’re now using 14mb when we loaded the browser and have 27mb of free blocks on our heap! While we are less fragmented than without the low fragmentation heap, we’re using quite a bit more memory. There seems to be a lot of overhead to using the low fragmentation heap. Being far less fragmented means that future allocations will be a lot easier and we won’t have to keep putting them at the end.
“What does this mean after running for a while?”
A fine question, really. It certainly means that early on you’re going to be using a lot more memory. However, over time you may end up using less memory as you will be less fragmented and thus new allocations as you load new pages should be able to fill in to the free space more easily. From my quick testing, loading lots of pages, gmail, etc we don’t seem to grow much than we are at this state.
“If it doesn’t grow much more that sounds awesome, how do I try it?”
Watch this space. I’ll upload a build that people can test with and report back their results. If we really don’t grow much more than the initial growth from loading lots of pages at once, this might be worth turning on.
“I hate Windows, what about Mac and Linux”
If this does turn out to being a good thing on Windows, it certainly doesn’t solve all our problems. There is lots of work that we need to do to continue to reduce fragmentation. As you can see, there is still some fragmentation going on and we need to work to reduce it. I’m a bit scared of the amount of extra space we’re using with this enabled and would like to better understand what is going on. We’re looking at using cross-platform allocators, pools and arenas, etc. More on those results soon…
Tags: firefox, leaks, memory fragmentation, Mozilla, performance, Technology, windows
November 12, 2007 at 12:59 am
Keep up the good work. It’s well past high time that someone looked at the memory issue in this fashion
November 12, 2007 at 2:30 am
Obligatory Question:
Will this work have influence on the gecko 1.9 release?
November 12, 2007 at 2:34 am
Obligatory Question:
Will this work have influence on the gecko 1.9 release?
November 12, 2007 at 4:26 am
I believe removing fragmenting is hard, and on the long term, only acting on the root cause can really help.
Acting on the root cause means to stop allocating memory blocks that have a long lifetime.
Now that is hard because everytime you keep a memory of what has happened since the browser started, you allocate such a block.
But if it’s just too hard to do, there’s another easier approach. If you go the GC way, it becomes foreseeable to go even further and do unfragmenting through memory reallocation ? Even if this is impossible to do for every type of allocation, once the more damaging one are identified, it could be possible to make them run through a memory reallocation friendly allocator ?
November 12, 2007 at 7:27 am
How difficult would it be to hook up a moving garbage collector? Isn’t there already a garbage collector in place for debugging to find memory leaks?
November 12, 2007 at 8:23 am
This is very interesting. Keep us all posted on progress, and future memory mysteries you encounter!
November 12, 2007 at 8:42 am
The documentation for the Windows Low-fragmentation Heap mentions that the feature was added for Windows XP and 2003. Does this imply that it is not supported by Windows 2000? If not, would a separate build be required for Win2000 or could it just be a conditional feature within the same build?
November 12, 2007 at 11:43 am
Aaron: We would just have to check OS version before calling it. Depending on if the function existed at all in W2k we might have to dynamically load the symbol. Both of these things are super easy and wouldn’t require a separate build. You just wouldn’t get the benefit (if there is one) on Win2k.
November 13, 2007 at 12:47 am
This is probably a really stupid question, but one which I’ve been wondering about the past day.
If we have so many free pages, why doesn’t the OS happily reuse them for other apps?
Why is the “total memory usage” of an application seemingly determined by where the *first* used page is, until the *last* free pages, not accounting for the free blocks in between that the OS may reuse however it wishes?
Sincerely,
H
November 13, 2007 at 2:37 am
Hello, I had the same problem: I needed to load millions of street names (as std::string) to memory at once. Albeit size of combined strings is some 100 MB, program easily used all of my 1GB of RAM and crashed. Therefore, I created simple class “StringContainer” (300 lines of C++ code) that stores all strings as one huge string plus additional binary tree for fast access. You can StrHandle h = container.put(std:string) and later use this handle to get string back std::string s = container.get(h). Now all strings use as little memory as combined size of all strings (also, multiple identical strings are stored only once!). Get method is using special binary tree to retrieve string quickly, but tree nodes are stored in std::vector, so tree itself also occupy single chunk of memory! Removing single string from this container is not possible, but it is not problem in way I am using it. For example you can setup one container for each HTML page or each Java Script so container lives as long as its domain is alive. If you want this class, send me your mail to mirza@seznam.cz.
November 13, 2007 at 8:00 am
Good work, Stuart.
This could help a lot of other projects (longer running processes in non-GC environments), so please keep your analysis abstract enough that things != FF could profit too.
November 13, 2007 at 3:07 pm
Use Doug Lea’s malloc. It’s far better than Windows Low Frag heap.
http://g.oswego.edu/
November 13, 2007 at 3:16 pm
take a look at these papers on pools and fragmentation:
http://slashdot.org/comments.pl?sid=120623&cid=10160124
http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf
November 13, 2007 at 3:28 pm
Is there any facility to tag memory allocations with info from the call stack so you see where they came from?
combining something like that with these pretty heapmap images so you could hover and see what caused the allocation would help shed light on the issue.
There is the possibilty of heap defragmentation ofr any objects that allocate memory for their own use and don’t pass pointers to others. Only one reference would have to be maintained. You’d need to have those objects select an alternate allocation system. or just add a ReallocatePrivateMemory method to base objects.
The lifetime hint comment by max in the previous blogpost is another good idea.
both of those will involve delving into the actual allocating code, but I suspect the quick fix of an alternate heap manager won’t be the best solution.
November 13, 2007 at 4:02 pm
I’ve posted some Windows Minefield beta 1 builds on my blog with the low-fragmentation heap enabled. Please try them out and give feedback on that post.
November 14, 2007 at 8:04 am
Did you try to see if _heapmin() had any effect? I made some experiments on OS/2 last night, adding that to the cycle collector and it had at least a bit of an effect.
I also experimented with OS/2 system APIs (DosAllocMem/DosFreeMem) instead of malloc for allocations of large buffers (e.g. the pixel buffers in cairo). This basically allocates memory outside the C heaps and when freed gets released to the system immediately. Made a huge difference! Not sure if there is something like that on the Win or the other platforms.
November 14, 2007 at 8:56 am
Peter: I’ve tried using HeapCompact() but I don’t see a lot of big wins from it. I should probably do some more tests though.
November 21, 2007 at 1:22 pm
[...] such as tcmalloc, nedmalloc, Hoard, and jemalloc. Also native platform specific ones such as the Windows low-fragmentation heap. I’m having to dig in to some of their internals to pull out the data that we need which is [...]
December 4, 2007 at 2:41 pm
[...] compare the Windows standard heap, the Windows Low-Fragmentation Heap, nedmalloc and jemalloc. I’ve posted pictures from the two Windows allocators before, but [...]