There are many things that can fall into the broad category of “performance.” The flavor of the month, as of late, has been “JavaScript performance,” perhaps because it’s so easy to get numbers from various test suites that can be used to compare browsers. The major advances that browsers have been making in the last few years enable a whole class of web apps that were previously unthinkable.

But we care about many other things that could fall into the performance bucket, like the time it takes the browser to start up, which has an obvious impact on the user’s experience, especially on mobile where the browser tends to start and stop more often than desktop. Another fairly obvious performance metric is page load speed, something the user will see feel every time they use the browser.

Besides measuring how long things take to happen, we also care about how much space browsers take while doing it, both in terms of disk space and memory. Finally there are some things that are traditionally harder to measure, such as responsiveness.

JS Performance

Between Alpha 1 and Firefox 4 for Android final release we were able to increase Javascript performance across the board on major benchmarks. This includes a 1.6x improvement on Kraken,  1.9x on Sunspider and 4.1x on V8. The final release is 3.8x faster than the stock browser on Kraken, 2.2x on Sunspider and 1.7x on V8.

Firefox for mobile features the same JägerMonkey JavaScript engine as Desktop Firefox 4.0, which features both a whole method JIT as well as the trace JIT featured in previous versions of Desktop and mobile Firefox. Essentially, the method JIT makes everything fast by compiling almost all JS code to fast bytecode while the trace JIT finds hot spots in the code and can compile even faster byte code that makes certain assumptions based on the results of an execution trace, like the data types of variables.



While the performance of JS has steadily improved throughout the development process, we got a particularly large performance win in Febuary when Chris Leary landed PICs for ARM, which he discussed in more depth here.

Install Size

Firefox for mobile’s installer is 13.8Mb. That includes the ability to render the entire web in 14 languages and locales. To put that in perspective, Angry Birds’ installer is 17Mb.  In many blog posts and comments on the internet I’ve seen our apk and install size compared to the stock browser, Dolphin, Opera Mini and SkyFire. I’d like to point out that those are all apples-to-oranges comparisons. For the stock browser and Dolphin, what you see reported in Android’s Application Management screen is essentially only the UI of the browser. The rendering engine, Javascript engine, networking layer and essentially everything else that makes a browser a browser are considered part of the OS and not included in what’s reported there. In the case of Opera Mini the rendering engine resides on back-end servers and not on the phone. SkyFire (for Android) is a hybrid of the two, with some rendering happening on back-end servers and some using the OS libraries. Opera Mobile Browser is a fair comparison since it ships a soup to nuts browser like Firefox. Opera’s installer is 10.5Mb and its install size is 20.5Mb, which is comparable to Firefox. Firefox will also allow you to move the entirety of the install to external storage, bringing the reported size down to about 100kb.

One unfortunate part of doing native development on Android is that the OS extracts your libraries from the apk on first run to your data directory. The trouble comes from the fact that you essentially have two copies of all of your libraries; one compressed copy in your apk and an extracted copy in your data directory. Opera recently blogged about their own dealings with this issue. Michael Wu was able to improve upon that by writing a custom dlopen that extracts the libraries to mmap’d shared memory. After that change, we had only one copy of our libraries in the compressed apk. Later we changed that behavior slightly for devices with at least 150Mb of free disk space. For these devices, we write out the extracted libraries out to disk after extracting them to memory. That change was made to improve start-up time, and I talk about it more below.

Of course all of this work is open. It would be great if Android could add support for loading libraries directly from apk’s, but in the meantime, other Android apps using native libraries can take a similar approach.

Late in the development cycle we also implemented support for moving the browser’s profile data to the SD card. Some users, especially those using our Sync feature, can wind up having very large profiles. After this change, if the user moves Firefox to the SD card with Froyo’s “Move to SD Card” option in the Application Management settings, the size on disk reported by Android’s Application Management is less than 100kb after the next launch (assuming the device has less than 150Mb free).

Start-up Time

Start-up time is another item that has improved steadily during the development of mobile and desktop Firefox thanks to the work of many people. We made a couple of improvements that were specific to Android that I’d like to highlight.

Typically Android applications that include native libraries pay a particularly high price the first time they’re run as the OS extracts the libraries from the apk file. As I mentioned before, Michael Wu implemented a custom dlopen to load our libraries directly into memory. That change allowed us to reduce the cost extracting all of our libraries the first time the application is run because we only extract sections of code as they’re needed and only a small subset is required for start-up.  However, it’s not uncommon for performance work to make trade-offs of size for speed or visa-versa, and this change was no exception. While it improved first run start-up time slightly and size on disk drastically, it regressed normal start-up time slightly. A few months later we made some changes that won back that regression. Essentially, after we extract the libraries to the mmap’d shared memory, we kick off a low priority process that writes that memory out to disk. Around the same time we started calling madvise on our main library. With these changes we get our cake and eat it too. Once we get the libraries written out to disk, we use those files rather than the shared memory and don’t need to spend time extracting parts of the libraries that we already extracted in a previous run.

Another performance improvement that other Android developers may find interesting is with our splash screen. We found that the spinning throbber was taking up a lot more cpu cycles than one might expect. So we switched to drawing the app logo and “loading” directly to our surface and picked up about a 1s start-up time improvement!

In the end we were able to take start-up time from double digits down to 1-5s depending on what you measure and what device you measure it on.

Page load time

One of the major changes between mobile Firefox 1.1 and 4.0 is the introduction of a separate process for content to run and render in, part of the Electrolysis project. The main reason for Firefox for mobile to make that change was to increase our responsiveness (more on that later), but that change almost doubled our page load times. Most of that regression can be blamed on the need to pass messages and data between the content process where the page is being rendered and the chrome process where all the network activity happens.

Since we landed and turned on Electrolysis, we have worked to recover the lost page load performance.  Alon Zakai has been doing some great work profiling the area of the code base and producing tools that can create visualizations of these messages which helped identify hotspots. The fruits of his labor can be seen in the massive improvements we’ve seen since our alpha releases of Fennec, through the Beta cycle and now in this release.

Recently Mark Finkle launched a website to collect crowd-sourced performance information, from that you can see some of the progress we’ve made over the past couple months.

Memory Footprint

The mobile team has been particularly interested in anything that can drive down our memory usage for obvious reasons. Besides trying to be a good citizen on resource-limited devices, Android will kill our process as soon as we go into the background in low memory conditions, which is not the greatest experience for the user.

One way we were able to reduce memory usage by not JIT’ing all JavaScript is described by Nicholas Nethercote in his blog. We also pay attention to the memory conditions of the device and preemptively save out the state of the browser before killing off our content process. With that saved state we can then recreate tabs as you focus on them, complete with the state you left them in, but avoid having the whole application killed off by the OS.

Responsiveness

One of the problems that we had with mobile Firefox 1.0 and 1.1 was that if a web page was doing something too intensive our UI would freeze up. To combat this, we split the content rendering and JavaScript execution off into its own process. We also reduce the priority of the content process (also known a”nice’ing” the process) such that the chrome process (and thus the main UI) always takes precedence when both processes compete for CPU time.

Panning

Panning responsiveness is a big part of user perceived performance. When you touch a page you want an immediate response, especially when kinetic pans are involved since the “weight” of the page impacts how far you think it will go. For Firefox  for mobile we’ve mad some major architectural changes that allows us to be much more responsive to the users’ panning input than in previous releases. That starts with the layers work lead by Robert O’Callahan, which allows us to quickly perform common graphical operations, retain the results from frame to frame and composite the end results. The next major break through was Chris Jones’s work on shared layers, which allows us to render to a layer in the content process while manipulating it (e.g. by panning it) in the chrome process. And finally we introduce the concept of a display port to the code base to allow us to “know” which parts of the page should be rendered even if not visible, which means that there is visible content already rendered past the edge of the screen even before the user starts to pan. For more on this, check out Ben Stover’s recent blog post with a great overview of all these optimizations.

Robert O’Callahan