Archive for the ‘Mozilla’ Category

New contributors + Big Patches + HG = ?

June 5, 2008

We’ve seen tremendous interest in Firefox and the Mozilla platform, not just from consumers, but also from groups of developers that would like to build on top of and contribute to Mozilla itself.  One of the challenges that these groups often face is that if their work is any more extensive than a simple patch, it’s difficult for them to effectively publish their work and to collaborate with others during development.

Transitioning to a distributed version control system like Mercurial has helped this situation some; branching is easy, as is merging back in to the mainline.  But even with that, these developers would still be isolated, working within essentially their own private repository.

We’d like to make it easy for these people to give their work wider exposure within the community, without having to make a decision up front as to whether the work will be included in mozilla-central or not.

Our rules for giving new commiters access to the main repository don’t work well for groups with large changes, and we’d like to come up with a different process whereby these people would still have to go through the same effort as other contributors to become full “Mozilla contributors”, but that, in the meantime, they can make their work available and can collaborate with others.

I’ve been working with Mitchell and Brendan on coming up with a policy that allows people to more easily work together in cases such as these.  Mitchell be posting what we’ve come up with shortly.

Firefox 3 Memory Usage

March 11, 2008

As the web and web browsers have matured, people have started expecting different things out of them. When we first released Firefox, few people were browsing with tabs or add-ons. I’ve written before about how web usage patterns have changed, so too have our strategies on how to effectively make use of system resources such as memory.

While Firefox 2 used less memory than it’s predecessor, Firefox 1.5, we intentionally restricted the number of changes to the Gecko platform (Gecko 1.8.1 was only slightly different than Gecko 1. 8) on which Firefox was built. However, while the majority of people were working on Firefox 2 / Gecko 1.8.1, others of us were already ripping into the platform that Firefox 3 was to be built on: Gecko 1.9.

We’ve made more significant changes to the platform than I can count, including many to reduce our memory footprint. The result has been dramatic, and you can see for yourself by getting a copy of the recently released Firefox 3 Beta 4.

Here’s What We’ve Done:

Reduced Memory fragmentation

As I’ve written about before, long running applications such as ours can wind up wasting a lot of space due to memory fragmentation. This can occur as a result of mixing lots of various sized allocations and can leave a lot of small holes in memory that are hard to reuse.

One of the things we did to help was to minimize the number of total allocations we did, to avoid unnecessarily churning memory. We’ve managed to reduce allocations in almost all areas of our code base. The graph below shows the number of allocations we do during startup. The graph below shows we were able to get rid of over 1/3 of them! Olli Pettay, Jonas Sicking, Johnny Stenback, and Dan Witte all made a big difference here.

alloccount.png

I carefully studied the fragmentation effects of various allocators and concluded that jemalloc gave us the smallest amount of fragmentation after running for a long period of time. I’ve worked closely with the jemalloc author, Jason Evans, to port and tune jemalloc for our platforms. It was a huge effort resulting in Jason doubling the number of lines of code in jemalloc over a 2 month period, but the results paid off. As of beta 4 we now use jemalloc on both Windows and Linux. Our automated tests on Windows Vista showed a 22% drop in memory usage when we turned jemalloc on.

Fixed cycles with the Cycle collector

Some leaks are harder to fix than others. One of the most difficult ones is where two objects have references to each other, holding each other alive. This is called a cycle, and cycles are bad. In previous versions, we’ve used very complex and annoying code to manually break cycles at the right times, but getting the code right and maintaining it always proved to be difficult. For Gecko 1.9, we’ve implemented an automated cycle collector that can recognize cycles in the in-memory object graph and break them automatically. This is great for our code as we can get rid of lots of complexity. It is especially significant for extensions, which can often inadvertently introduce cycles without knowing it because they have access to all of Firefox’s internals. It isn’t reasonable to expect all those authors to write code to manually break the cycles themselves.

Basically, the cycle collector means there are whole classes of leak that we can easily avoid in both our code and in extensions, and that’s good for everyone. You can thank Graydon Hoare, Peter Van der Beken and David Baron for their amazing hard work on this.

Tuned our caches

Firefox uses various in-memory caches to keep performance up including a memory cache for images, a back/forward cache to speed up back and forward navigation, a font cache to improve text rendering speed, and others. We’ve taken a look at how much they cache and how long they cache it for. In many cases we’ve added expiration policies to our caches which give performance benefits in the most important cases, but don’t eat up memory forever.

We now expire cached documents in the back/forward cache after 30 minutes since you likely won’t be going back to them anytime soon. We have timer based font caches as well as caches of computed text metrics that are very short lived.

We also throw away our uncompressed image data as I describe below…

Adjusted how we store image data

Another big change we’ve made in Firefox 3 is improving how we keep image data around.

Images on the web come in a compressed format (GIF, JPEG, PNG, etc). When we load images we uncompress them and keep the full uncompressed image in memory. In Firefox 2 we would keep these around even if the image is just sitting around on a tab that you haven’t looked at in hours. In Firefox 3, thanks to some work by Federico Mena-Quintero (of GNOME fame), we now throw away the uncompressed data after it hasn’t been used for a short while. Not only does this affect images that are on pages in background tabs but also ones that are in the memory cache that might not be attached to a document. This results in pretty dramatic memory reduction for images that aren’t on the page you’re actively looking at. If you have a 100KB JPEG image which uncompress to several megabytes, you won’t be charged with the uncompressed size when you’re not viewing it.

Another fantastic change from Alfred Kayser changed the way we store animated GIFs so that they take up a lot less memory. We now store the animated frames as 8bit data along with a palette rather than storing them as 32 bits per pixel. This savings can be huge for large animations. One extreme example from the bug showed us drop from using 368MB down to 108MB — savings of 260MB!

Hunted down leaks

Most leaks are a pain in the ass to find and fix in any complex piece of software. There are small leaks, big leaks, and in-between leaks. If you leak a small piece of text once an hour you probably won’t notice. If you leak a large image every time you move the cursor, you’ve got a big problem. Both are important to fix, because even the little ones add up. Some leaks are only leaks until you leave a page, so they don’t show up with conventional leak-finding tools, but they make a difference if you have a page opened all day long like GMail.

Leak HuntBen Turner has gotten pretty good at Leak Hunt.

We’ve fixed many leaks, ranging from small DOM objects that get leaked on GMail until you leave the site to entire windows that were leaked holding on to everything inside of them when you closed them.

Overall, we’ve been able to close over 400 leak bugs so far, most of which are very uncommon, but can still occur. We’ve greatly improved our tools for detecting leaks. Carsten Book, in particular, has done an amazing job at finding and reporting leaks.

Measuring Memory Use

As I’ve learned the hard way, accurately measuring memory usage is hard.

This part gets a bit technical, feel free to skip over. The short summary is Windows Vista (Commit Size) and Linux (RSS) provide pretty accurate memory measurement numbers while Windows XP and MacOS X do not.

If you’re running Windows Vista and take a look at Commit Size in task manager, you should get some pretty accurate memory numbers. If you’re looking at Memory Usage under Windows XP, your numbers aren’t going to be so great. The reason: Microsoft changed the meaning of “private bytes” between XP and Vista (for the better). On XP the number is the amount of virtual memory you’re application has reserved for use. For performance reasons you often want to reserve more memory than you actually use. The application can tell the operating system that it isn’t going to use parts of the reserved space and to not back the virtual space with physical space. On Vista, Private Bytes is the commit size, which only counts the memory the application has actually said it is actively using. Since virtual memory size has to be greater than or equal to your commit size, XP memory numbers will always appear bigger than Vista ones, even though the application is using the same amount of memory.

On Mac, If you look at Activity Monitor it will look like we’re using more memory than we actually are. Mac OS X has a similar, but different, problem to Windows XP. After extensive testing and confirmation from Apple employees we realized that there was no way for an allocator to give unused pages of memory back while keeping the address range reserved.. (You can unmap them and remap them, but that causes some race conditions and isn’t as performant.) There are APIs that claim to do it (both madvise() and msync()) but they don’t actually do anything. It does appear that pages mapped in that haven’t been written to won’t be accounted for in memory stats, but you’ve written to them they’re going to show as taking up space until you unmap them. Since allocators will reuse space, you generally won’t have that many pages mapped in that haven’t been written to. Our application can and will reuse the free pages, so you should see Firefox hit a peak number and generally not grow a lot higher than that.

Linux seems to do a pretty good job of reporting memory usage. It supports madvise(), allowing us to tell Linux about pages we don’t need, and so its resident set size numbers are fairly accurate. You can use ps or top to measure RSS.

Ways to test

There are many ways to measure memory usage in a browser. Open up 10 tabs with your favorite websites in them and see how much memory the browser is using. Close all but the last tab and load about:blank or Google. Measure again. Another simple test is simply loading Zimbra, Google Reader and Zoho each in their own tab and logging in. We’ve learned that users do so many things with the browser it is nearly impossible to construct a single test to measure memory usage.

We wanted more of a stress test — One that was more reproducible than loading random sites from the web. We took our Standalone Talos framework and Mike Schroepfer modified it to cycle pages through a set of windows while opening and closing them to try and approximate people running for a long period of time. Talos makes it pretty straightforward to get this up and running, and is great for measuring things like memory usage and layout speed. This works great for Firefox and allows measuring performance and other metrics, but the page cycling code doesn’t work with other browsers.

Since we wanted to test cross-browser, we modified the tests to run cross-browser and we wired up some of our talos code that uses the Windows Performance Counters to measure Private Bytes (commit size on Vista).

For the results below we loaded 29 different web pages through 30 windows over 11 cycles (319 total page loads), always opening a new window for each page load (closing the oldest window alive once we hit 30 windows). At the end we close all the windows but one and let the browser sit for a few minutes so see if they will reclaim memory, clear short-term caches, etc. There is a 3 second delay between page loads to try and get all the browsers to take the same amount of time. We used the proxy server that is part of Standalone Talos to make sure we were serving up the same content. We had to disable popup blocking to allow the test window to open the 30 windows for running the test. You can get the simple webpage test here and the python script to monitor memory usage here. These things are built on top of the standalone talos framework so you’ll need to drop the python script in with talos to get good results. Mad props to Mike Schroepfer for getting this all working.

Results

ff3-ff2-ie7.png

Looking at the graph:

  • All browsers increase in memory use slightly over time, but the Firefox 3 slope is closer to 0.
  • The _peak_ of Firefox 3 is lower than the terminal size of Firefox 2!
  • The terminal state of Firefox 3 is nearly 140MB smaller than Firefox 2. 60% less memory!
  • IE7 doesn’t appear to give any memory back, even after all the windows are closed!
  • Firefox 3 ends up about 400mb smaller than IE7 at the end of the test!

This is just one test that I feel shows the great progress that has been made. We’ll continue working on adding additional tests that can measure more of the ways that users use their browser.

Conclusion

Our work has paid off.

We’re significantly smaller than previous versions of Firefox and other browsers.

You can keep the browser open for much longer using much less memory.

Extensions are much less likely to cause leaks.

We’ve got automated tools in place to detect leaks that might result from new code. We’re always monitoring and testing to make sure we’re moving in the right direction.

All of this has been done while dramatically improving performance.

Thanks

Many people have worked on this but I’d like to specifically thank: David Baron, Carsten Book, Peter Van der Beken, Igor Bukanov, Brendan Eich, Jason Evans, Alfred Kayser, Federico Mena-Quintero, Robert O’Callahan, Olli Pettay, Mike Schroepfer, Mike Shaver, Jonas Sicking, Johnny Stenback, Ben Turner, Vladimir Vukicevic, Dan Witte, Boris Zbarsky, and everyone else I’m forgetting who has worked on this. Everyone really pulled together to make this happen.

jemalloc on trunk — linux edition

February 27, 2008

Back on the 12th of February we turned jemalloc on in our Linux builds.  Sorry for not posting sooner!  We saw a good performance increase and a drop in memory.  Neither were as large as the wins we saw on Windows but still good.  I tried tuning the glibc allocator a bit but was mostly unsuccessful at making it both faster and use less memory at the same time.  Also over time the fragmentation results just weren’t quite what we were hoping for.

Things in the memory world are looking pretty great.  Our work seemed to have paid off.  There is always more to do but Firefox 3 beta 4 will be great.

jemalloc now on the trunk

February 5, 2008

Our Windows nightlies (beta4pre, this is not in beta 3) now include jemalloc. These builds are leaps and bounds better than the last build I posted.

Tons of amazing work has gone in to this. I’d like to thank Jason for making all the crazy changes to jemalloc that we wanted and Ted for his days and days of crazy build stuff. Thanks to Benjamin for his work getting the CRT building initially — we wouldn’t be here without it. We’ve worked day and night for weeks to make this happen and it is finally here.

Due to the requirement for you to have Microsoft Visual Studio 2005 SP1 Professional, –enable-jemalloc is off by default in configure. If you have the right stuff installed toss it in your mozconfig and magic should happen.

We’re still evaluating the switch on Mac and Linux, but you can use the same configure flag to build on those platforms.

APNG

January 24, 2008

I received word recently from Brendan Sera-Shriar at Seneca College about this APNG portal site that he, along with folks from PHUG, got off the ground recently.  The site looks great and should help provide a wealth of information about a great new feature in Firefox 3 (and Opera 9.5 and other products to follow…).

They’ve got several cool samples up and I’m sure they’d love to add more if you’ve got them.

We seem to have introduced some flashing bug while the animations load.  I filed bug 413933.  Should get fixed soon I hope!

3d animating dolphin

jemalloc builds

January 12, 2008

Since just before the holidays, Jason and I have been working on getting jemalloc ported to Windows, Mac and Linux as well as integrated in to our build system.  Each platform has its own set of challenges, with Windows being the most exciting.  I tried several approaches including dynamically patching over the C runtime allocation functions (malloc, free, etc) in memory, but kept running in to issues with the allocations the CRT does during startup.  Things like putenv() realloc and free allocations that were done before we had the chance to patch in.  I talked to Benjamin about replacing the CRT and he did a bunch of work to get a CRT building with jemalloc.  It turned out not to be very difficult — Microsoft includes the source as part of Visual Studio Professional (sadly, you can’t redistribute it).  I had to make various changes to the init functions to make sure we could use CRITICAL_SETIONs and thread local storage during malloc initialization.  Was a bit of a pain to make sure the malloc initialization code didn’t call things that needed to allocate but eventually I got everything working.  The result: A Windows jemalloc build.  The sunspider JS test looks to run about 5% faster.  You’ll notice that these builds take up a bit more memory initially but they should level out around 80mb.  There is plenty of tuning left to do, but I’m curious to hear how these run for people after running for a long time.

Linux builds are stable but I don’t have one handy.  Mac builds run for a little while before they crash, but the crash looks like one I fixed on Windows so I expect to have those early next week.

Bye Bye Allocations

December 18, 2007

Everyone has been working quite hard lately to get rid of allocations.  I’d like to call out jst, sicking, smaug, dbaron, brendan, igor, and crowder for their help, patches and suggestions over the last few weeks.

Just looking at our startup allocation test, we’ve removed about 100,000 allocations.  Other tests show far, far bigger gains.

allocation graph

We’ve been very successful in converting quite a few heap allocations to use small stack buffers instead.  We’ve found places where we didn’t need to do any allocation at all and just removed themMany of these improvements have come in our DOM and XPConnect code which not only helps reduce fragmentation but also results in performance gains for sites with lots of DOM access (not to mention our UI!).  We’ve also gotten startup speed improvements.

Smaug has done some great work getting content to use arenas.  There are some issues with the arena implementation and we need to get some more stats, but this is a good step in the right direction.

We’ve got more things in the pipeline so I expect this number to continue to drop.

More allocator data — tcmalloc edition

December 6, 2007

I hooked tcmalloc up today and ran some numbers. As I mentioned before it is pretty fast, but looking at this image certainly doesn’t place it ahead of jemalloc. It looks to be a little less fragmented than nedmalloc, but not as fast. (I’ll note that this image isn’t quite as accurate as some of the previous ones. The sizes I’m using don’t take the allocator’s overhead in to account which would most likely result in gray blocks becoming a little darker, but wouldn’t really effect the number of blocks spread out.)

tcmalloc

“Vlad and analysis of dtrace was used”

December 4, 2007

(title from the Google translation of a Japanese blog [edit: it is a technology news site, not a blog] about Firefox memory fragmentation)

Using dtrace and some tools that we’ve built I’ve been able to get more fragmentation data. I haven’t hooked up all the allocators yet — Vlad just made some changes that will make getting data from a bunch of the other allocators much easier, so expect more data soon.

Lets compare the Windows standard heap, the Windows Low-Fragmentation Heap, nedmalloc and jemalloc. I’ve posted pictures from the two Windows allocators before, but here they are again:

Windows standard heap — Small and compact, but very fragmented:

Windows Standard Heap

The Windows Low-Fragmentation Heap — bigger and less fragmented:

Windows Low-Fragmentation Heap

nedmalloc — faster but more fragmented than the Windows LFH:

nedmalloc

jemalloc — faster, smaller than Windows LFH, and less fragmented:

jemalloc

So far, I’m seeing that tcmalloc (the latest version, which I’m told is slower than previous releases) and nedmalloc are both about 10% faster at pure allocations than the Windows heaps (which are about the same speed). jemalloc looks to be also about 10% faster but I ran my tests on a different machine and need to verify my numbers before making a strong claim about its speed.

jemalloc looks to be a pretty solid contender. Jason Evans, jemalloc’s author has been super helpful in answering lots of questions I’ve had and has done some investigating of his own. I’ll hold off declaring a winner until I’ve had time to run with a few more allocators, but the data is showing that we can get good wins by switching allocators. I’m also in the process of generating different sets of logs to run against the allocators so we can see how they behave with stress tests such as loading 200 tabs and then closing them all.

On a side but related note, Olli Pettay and Jonas Sicking are doing lots of great work on moving content nodes and related data in to arenas.

malloc replacements?

November 21, 2007

We’ve built some great tools lately including one to test fragmentation of different allocators.  I’m currently in the process of hooking up allocators such as tcmalloc, nedmalloc, Hoard, and jemalloc.  Also native platform specific ones such as the Windows low-fragmentation heap. I’m having to dig in to some of their internals to pull out the data that we need which is taking a bit of time, but things are progressing well.

If anyone knows of other allocators we should be looking at, would you please leave a comment?  I would like to make sure we’re comparing all of our options.