Tuesday, February 19, 2013

OMTC Questions

mayankleoboy1 asked some questions in the comments of another blog post and I figured the answers might be of interest to a wider audience, so here they are (edited for order because it makes them easier to answer):

So when is the expected date for GFX and the m-c trees to merge ?

Around March 18th, as long as there are no unexpected problems. This is our goal date, not a promise :-)

[...] a lot of the OMT* is being done on priority for FFOS and Android, and later trickling to desktops. Has the traditional desktpos (win, lin and mac) market become second tier platform for mozilla ?

No, certainly not, although we have a lot of work to do on mobile, so that is a focus for many engineers right now. OMT* is developed for mobile because it is needed most there. Without OMTC, Firefox for Android is really unusably bad. Without OMTA FirefoxOS is really slow and jittery in some key places. Desktop Firefox works pretty well without them (although it will work better with).

Why is that OMT* work lags on windows, compared to OSX and Linux ? AFAIK, windows makes 90% of mozilla users. So shouldnt windows desktop get more priority ?

As I said above, the focus of the OMT* work has been for mobile and that means OpenGL. We don't support OpenGL on Windows, so we don't have OMT* there. We only have it on Mac and Linux to make mobile development easier - it is not yet a supported configuration on either platform (although it will be in the future). Implementing OMT* on a different graphics backend has been very daunting. One goal of the layers refactoring is to make that easier. Our current focus for OMTC is Windows, in particular for the Metro browser. Unless there are unforeseen hurdles, Windows will be the next platform to get OMTC. (OMTA has a few other issues before it can be used anywhere other than FirefoxOS (including on Android), not least of which is testing).

And yes, Windows is a higher priority for Mozilla (in general) than Linux and Mac, although user share is not the sole determinant of priority (Linux gets a lot of love (relative to its user share) because it is more closely aligned to our mission and a lot of developers use it, for example).

Thursday, February 07, 2013

Skia canvas on Windows XP

Using Skia as the rendering backend for canvas has been an option for a while now. Skia is now the default for Windows XP users. That will filter out to nightlies today or possibly tomorrow. It should make canvas perform a bit better on XP.

At the moment our benchmarking does not make a solid case for making it the default on other platforms. If you are not on XP and would like to experiment (possibly exposing yourself to 'fun' bugs) you can use Skia by setting the pref gfx.canvas.azure.backends to 'skia'.

Thanks to Rik Cabanier, Matt Woodrow, Jet, and George Wright for getting this done.

Tuesday, February 05, 2013

A fun bug

(Actually this is a two for one kind of a deal)

I've spent the last two days finding two tricky bugs in my port of tiled Thebes layers to the async compositing API. I think they are kind of fun, so I'll try and describe them here. I'll try to elide the details a bit. If you want to check out the real code, look at ContentClient.cpp, ContentHost.cpp, and BasicTiledThebesLayer.cpp on the graphics branch.

First, the old way. A tile buffer keeps a bunch of tiles (the actual tiles, not references, that is important) and each tile keeps a reference to a gfxReusableSurfaceWrapper. A gfxReusableSurfaceWrapper is kind of neat, it keeps a reference to a surface and can be locked for reading. When we want to write to it we ask for its surface. If it is locked, then you get a fresh surface (with a new gfxReusableSurfaceWrapper to wrap it). If it is not locked, you get the same surface as last time.

To render the tiled layer, the content thread gets a surface for each tile and paints to it. When the tiled layer is rendered, a copy of the tile buffer is made in the heap and a reference is passed to the compositor thread. The compositor thread locks all of the gfxReusableSurfaceWrappers (via the tiles and buffers) and blits them to the screen.

Note that if the gfxReusableSurfaceWrapper is locked and we get a new surface when painting, then we store the new gfxReusableSurfaceWrapper in the tile and lose track of the old gfxReusableSurfaceWrapper. Also, gfxReusableSurfaceWrappers are reference counted. They are destroyed when there are no more references to them. Finally, it is very important that when a gfxReusableSurfaceWrapper is destroyed it is not locked for reading; we assert that.

This sounds fun already, right? But the fun bit is still to come...

As far as we are concerned, the main effect of refactoring into the new compositing API is that we add another layer between the tiles and the gfxReusableSurfaceWrappers. We add a TextureClient. The tile holds a reference to the TextureClient and the TextureClient holds a reference to the gfxReusableSurfaceWrapper. The TextureClient lives on the heap and is also reference counted.

What could go wrong?

What goes wrong is that we trigger an assertion by trying to destroy a locked gfxReusableSurfaceWrapper. Figuring out why took me a little while. What should happen is that the copy of the buffer and  its tiles on the compositor thread keeps the gfxReusableSurfaceWrappers alive once the tiles on the content tread forget about them. That works because we only lock the tiles for reading when we pass them to the compositor and because when we copy the buffer (a bitwise copy) we copy all the tiles, creating another reference to each gfxReusableSurfaceWrapper. But, with the TextureClients, the tiles are copied and we add another reference to the TextureClients, but they are not copied and so we only have one reference to the gfxReusableSurfaceWrappers. Thus, the next time around if we get new gfxReusableSurfaceWrappers and forget about the old ones, then they are destroyed, even though they are locked by the compositor! The fix is to do a 'deep' copy, copying the TextureClients rather than making another reference to them.

What could go wrong?

This gives rise to the really fun bug. Because if you do the 'deep' copy on the Compositor thread, you still hit the same assertions, just much less often. What is happening here is that there is a gap between when the tiles are locked (content thread) and when we make a copy (compositor thread). Sometimes we might get to repaint (content) before we composite the previous round and that means we un-reference the gfxReusableSurfaceWrappers after we lock and before we copy. That took a while to find, but in retrospect doing the 'deep' copy on the compositor thread was dumb, I'm not sure why I did that. The fix is easy, just move the deep copy to the content thread.

Monday, February 04, 2013

Throttling off main thread animations

For the last few months I have been working mostly on throttling off main thread animations (OMTA), in between a little of the layers refactoring, which I'm now returning to. Under OMTA, CSS animations and transitions are animated on the compositor thread. That makes things run faster (because the main thread is free to do other work) and smoother (because if the main thread gets bogged down in some work, the compositor thread can carry on animating smoothly). Much of the work for OMTA was done by one of our awesome interns, David Zbarsky.

The old way of doing CSS animations (and the way we still do things for most properties) is for layout to do all the work. Every frame of the animation the necessary parts of the webpage are laid out (the process of converting HTML to graphical objects) and rendered (converting those graphical objects to pixels) afresh with the correctly interpolated property value. If we have off main thread composition (where each layer is rendered on the main thread, but layers are composited together on a separate thread) then we can instead layout the web page once and change the way we composite to take account of the animation. The initial implementation did this in such a way that the main thread still does a layout run for each frame, to keep its model up to date and the compositor did it's own animations too. That got the smoothness but not the speed-up. In fact, since we did the interpolating twice, presumably it slowed things down slightly. My task was to finish off the work to stop animating on the main thread (bug 780692). That is the 'throttling' bit. It has been surprisingly difficult; easily the hardest and most frustrating problem I have worked on at Mozilla. But also lots of fun.

The main difficulty is that we do sometimes need to 'catch up' on the main thread, mostly when we need to respond to some JS/DOM stuff. For example, if we have to test whether the mouse cursor is over an element with an animated scale, we need the current value of that scale to be able to tell whether the cursor is inside that element. That means that layout, which runs on the main thread, needs to have an accurate picture of the state of the animation. We call this update of layout a mini-flush. We do a mini-flush periodically (every 200ms at the moment) and when we need to have accurate information for DOM stuff. What happens during a mini-flush is that we calculate the animated values for that moment in time and post them to the compositor. It gets tricky because we want to avoid doing a full (and very expensive) re-layout of everything and only update the animating property of the animated element. It gets even trickier because it might have been a restyle which requires the animation data and we cannot start a new restyle pass while one is already in progress.

I have skipped *a lot* of the details here. There is a lot of interesting discussion in bug 780692 if you are interested in this stuff.

Currently OMTA is only used on Firefox OS. There is work in progress to port it to Firefox on Android, and that shouldn't be too hard. It should work fine on desktop (that is where I did most of the development work), but requires OMTC (which in turn, currently, requires hardware acceleration), which is a little way of for all platforms. Once we have that, OMTA should be good to go.

If you are writing a webpage, there is no way to guarantee you'll get OMTA. But you have a good chance if you use CSS animations or transitions to animate either the transform or opacity, and don't have a 3D transform on that element. For example, most of the 'windowing' animations on Firefox OS (window opening animation, window changing animation, etc.) get OMTA.

Saturday, February 02, 2013

Layers refactoring update

I got taken off the layers refactoring last year to work on off-main thread animations for Firefox OS (more on that in another post). In the meantime Bas and Nical have been carrying on the refactoring work. As of a few weeks ago I am back on the refactoring. And so is most of the graphics team in some capacity. It has become a high priority for us because it blocks OMTC on Windows, which blocks our Windows Metro browser. I have been converting tiled layers to the refactored setup (more below). Bas has got OMTC going on Windows using Direct3D 11, still early days though. There have been some small architectural changes (also below), and work carries on. We're getting there, we hope to merge the graphics branch (where you can follow our progress and contribute, beware builds can be very shaky) to Mozilla Central around the end of February.

On the architectural side, there are two major changes: textures communicate via their own IPDL protocol, and textures can be changed dynamically. There has also been some renaming - what used to be called BufferHost/Client are now called CompositableHost/Client. Many of the flavours of Texture* and Compositable* have changed as we go for cleaner abstractions rather than trying to closely match existing code.

Textures (and soon Compositables) communicate directly with one another using their own IPDL protocols, rather than using the Layers protocol. Communication mostly still occurs within the Layers transactions, so we avoid any race conditions. The advantage of separate protocols is that each abstraction layer is more fully isolated - the layers don't know what their textures are doing and so forth.

It is a requirement that Textures can be changed dynamically. This is a shame. It would be nice (and sensible) if once we create a layer its Textures remain of the same type, unless the layer changes them. But this is not the case, for example, async video can change from RGB to YCbCr after the first frame without the layer knowing. So, we have to deal with the texture changing under us (i.e., the Layer and Compositable), which since the Textures use their own communication mechanism is complicated. This has lead to a lot of code churn, but hopefully we have a solution now. It will be an interesting challenge for our test frameworks to see if they pick up all the bugs!

Personally, I have been concentrating on tiled layers (and tidying up a whole bunch of TODOs and little things). Tiled layers are Thebes layers which are broken up into a grid of tiles. We use tiled layers on Android, but have long term plans to use them pretty much everywhere. Each tile is a texture and they are managed by a TiledBuffer which is held by the layer. There is thus an obvious mapping to the refactored layers system. Unfortunately that didn't work so well. Perhaps in the long term we can end up with something like that. For now, the Compositable owns a TiledBuffer which manages tiles which hold Textures. This is because the buffer is copied from the rendering to compositing threads, and tiles are required to be lightweight value objects, but Textures are ref counted heap objects. Once we have an initial landing of the refactoring, we can hopefully change the tiled layers architecture to match the refactoring vision and we'll be sweet (which will allow tiled layers to work cross-process too, currently they only work cross-thread).