Saturday, December 15, 2012

Leaving Auckland

We are moving again; back to Christchurch. Hopefully this time to fewer earthquakes and more normalcy. I will continue my tradition of writing a retrospective on my year in a new city, this time Auckland.

I have been pretty much disappointed by Auckland. I had high hopes, I was hoping for big city culture, fun nights out, interesting cafes, and a bit more life than was offered by post-quake Christchurch. Partially because of my own laziness, and partially because of the city, that did not happen.

First the good bits: the Project is the awesome-est little climbing wall in the world. The wall is a good size and has good angles, and the setting has been great. But really, the community of climbers there has been amazing. I have met and hung out with a lot of very, very good people and had a lot of fun climbing and hanging out. I am going to really miss that place. In fact, (and contrary to popular opinion) I found the people of Auckland to be a real upside. My experience has been that they have been a lot more open-minded and less cliquey than people elsewhere in NZ, whilst still being friendly and laid back. I think the Project is mostly to blame for this opinion, and my sample size is small.

The second great thing about Auckland is the Mozilla office. I know it's kind of sad for work to be a highlight, but it has been fantastic. The physical office is pretty good - well-equipped, good atmosphere, everything needed for productivity and comfort. The thing that has made it great is the denizens. It has been a real privilege to work with such a group of smart, friendly, hard-working people. I will really miss our lunches and generally just being part of such an awesome group. It has also been cool to see the office grow from a handful of engineers working on similar things to an office full of people working on graphic design, privacy and security, research work, and of course platform engineering.

Some small things that have been cool: Kohu Road cafe (great ice cream, pretty good coffee, next door to the Project), proper Indian food (in Sandringham and Mount Roskill, Urban Turban, Rasoi. So good to have authentic stuff rather than the kiwi-Indian crap you get everywhere else), the fact that being in a mixed-race relationship is not unusual (and just being much more culturally diverse than the rest of NZ), Piha beach (so nice, so close, and a nice ride on the bike to get there), some fantastic cakes at cafes around Newmarket (in particular Little and Friday, mmmm), the view from my deck over Lynfield Cove and Blockhouse Bay - so lovely.

And some bad things - commuting (I have really got used to not commuting in NZ) and the traffic (oh God, the traffic), public transport (a city as large as Auckland needs decent public transport, without it I find it hard to take advantage of the nightlife or lots of other cool stuff, and also see the traffic), rude people (not as bad as England, but much worse than the rest of NZ, also see traffic again), lack of nearby rock and snow (this really pains me), our landlords (who booted us out six months into a one year contract and then got pissy that we hadn't dusted the blinds or cleaned the drains), the coffee (I thought Auckland was meant to be a Mecca for coffee, but it has been invariably terrible, I think I've had one really good cup all year (Coffee Supreme, Ponsonby)), food in general (too expensive, and for a city of its size, not so many vegetarian options, especially at the weekend), the weather (hot, humid, and lots of rain), probably some other stuff, but I should stop moaning.

I am glad to have spent the year here (the work stuff and the friends I've met have made it worth it), but I would not be keen to come back.

Friday, December 14, 2012

How To Win Benchmarks And Influence People

ParticleAcceleration is a Microsoft demo. If you follow the link, you'll see lots of little spheres arranged in a big sphere, rotating slowly. It's kind of pretty. But as a benchmark, it is misleading. For many combinations of browser, platform, and graphics library, neither canvas nor math performance is the limiting factor; something strange is going on.
    
If you view the page source for the benchmark, then you'll see two background divs - the faint background gradient you can see behind the spinning balls. Why two? One is a gradient from #FFFFFF (white) to #BFBFBF (grey) and the other is from #FFFFFF to #BEBEBE (almost exactly the same grey). Every frame, the divs are swapped (to be precise: one is set to display:none and the other to display:block). This happens so fast and the gradients are so similar that the user cannot see the difference. I don't think there can be any purpose other than to confound browsers that are slow at repeatedly drawing gradients.
    
The demo presents itself as a benchmark of canvas performance*, not rapid switching between slightly different gradients. Releasing benchmarks like this is bad for the web. They put pressure on browser vendors (like us) to devote time and resources into optimising for artificial scenarios, and away from improving the real web experience for users.

Thanks to mayankleoboy1 for the initial bug report (806044) and Matt Woodrow for the detective work.  



*from the page: "Thanks for checking out this Internet Explorer 10 Platform Preview demo. This demo uses mathematical models to draw the particles in 3D, using the 2D Context HTML5 Canvas. The faster your underlying browser and computer, the higher your score and faster the particles will spin."

Appendix: some details, in case you are interested

The specific optimisation I imagine the benchmark is designed to exploit is caching gradients. Drawing gradients is expensive, so some caching makes sense. All modern browsers cache parts of a rendered web page. In Firefox, this can happen in multiple places, but primarily in the layers system. A gradient that doesn't change will only be drawn once into a layer and then retained. Once a rendered layer disappears (e.g., you move away from the web page or, as in this case, the div gets set to display:none) it is discarded (not straight away, but pretty soon). So for this benchmark we have to redraw the gradient when it reappears. On Windows, Direct2D caches gradients for us. So, if a gradient is redrawn soon after it was originally drawn it is cheap, and we are fast. I assume this is why IE is fast here too (Firefox, IE, and Chrome all get the maximum 60fps on my Windows machine).

Using the Cairo canvas backend (which we do for Linux, Android, older versions of Windows, and if your drivers are not up to date) knocks our frame rate from 60fps to 6fps. Here, we don't get the gradient caching behaviour from the graphics library. Removing the code from the benchmark that draws the gradients brings us back up to 11fps (yes, Direct2D is much faster than Cairo, even without the gradients. That is why we use it where we can!). So in the Cairo case, drawing the gradients uses about half our processor cycles. The Skia backend is even more affected by the gradients - 15fps and 50fps with and without the gradients. That is, the important parts of the benchmark use less than a third of the total cycles of the original benchmark.

It is possible to cache gradients within the browser. I think Webkit does this, and we are considering doing that within Azure. The downside to all that caching is that gradients take up a lot of memory and caching them in multiple places (and potentially in both CPU and GPU memory) is not good for our memory footprint. We have already had crashes due to running out of memory where a (faulty) driver has cached too many gradients for us.

Wednesday, November 28, 2012

Forcing active layers

When implementing or debugging layers stuff it is often useful to force a layer to be active - active layers have different behaviour from inactive ones, e.g., only active layers can have mask layers - and it is nice to know whether a glitch is due to code on the active or inactive path, or because a layer should be active and isn't, or vice versa. The easiest way to force a single layer to be active is to give it a 1 pixel 3d transform (this is ultra-hackey, please don't use this except for debugging). But sometimes it is nice to make all layers active, or be able to switch between active/inactive without editing HTML. So, I added a pref to do just that, in about:config, set layers.force-active to true and all possible layers should become active. This shouldn't affect which content gets a layer, if it didn't get its own layer before it won't get one with the pref. This is not meant for normal browsing and may cause all kinds of bugs.

Sunday, October 14, 2012

Thebes canvas is dead

Azure canvas was turned on by default on all platforms for Firefox 18 (now on Aurora). And so we have finally removed Thebes canvas from Firefox 19 (current nightlies).

To recap, we previously had two implementations of canvas (not including WebGL canvases which are different). One using our older graphics library, Thebes, which is a thin wrapper over Cairo; and one using our new graphics library, Azure, which supports multiple backends (Cairo, Skia, Direct2D, and Quartz/CoreGraphics currently). To get to this stage we needed to get Azure canvas working properly on all platforms, which was finally finished off last cycle. Now that Azure canvas works properly, we can dump Thebes canvas, which removes a whole bunch of duplicated code, and thus duplicated effort in maintaining two implementations. Also, it means canvas always uses the new DOM bindings.

In the code, nsCanvasRenderingContext2D has gone and nsCanvasRenderingContext2DAzure has been renamed to mozilla::dom::CanvasRenderingContext2D. The various bits of XPIDL for getting a Thebes canvas rendering context has also gone.

If you are testing nightly, let us know if you spot any new bugs with canvas (including when printing, which uses (sometimes or always, I'm not sure) a canvas).

Monday, September 17, 2012

HTML 5 Canvas Radial Gradients

Some of the trickiest bugs getting Azure canvas working were with radial gradients. Radial gradients in canvases are very flexible, more so than CSS radial gradients or as commonly found in paint software. Other than the spec,which is not particularly user-friendly, I did not find much explanation of how they work. So, here is my understanding.

A gradient is an interpolation between colours across a filled area. The simplest kind of gradient is linear, which is just a linear interpolation. For example, here is a rectangle with a left to right linear gradient from red to green:



A radial gradient interpolates circles, rather than lines, for example here is a simple radial gradient from red to green:
To add clarity, here is the same gradient with the start and end circles added:

The above example is simple because the two circles have the same centre and the 'start' circle is entirely within the 'finish' circle. In addition, there are only two colours involved, the start and end colours. These two issues are pretty much orthogonal, so we'll deal with multiple colours first, and we'll do so in the context of linear gradients, since the issues are the same independent of the type of gradient, and linear ones are simpler.

The start, end, and any intermediate colours are called colour stops. Each gradient must have at least two colour stops - the start and end colours - and may have any number of additional, intermediate colour stops. The syntax for these things is (I've elided the details of gradient creation for now):

  var context = canvas.getContext('2d');
  var gradient = context.create...Gradient(...);
  gradient.addColorStop(0, 'red');
  gradient.addColorStop(1, 'green');
  context.fillStyle = gradient;
  ...


The gradient will blend smoothly from red to green. The first parameter to addColorStop can be thought of as the parameter in a parametric equation; 0 is the start of the gradient, and 1 is the end. In fact, the gradient will shade from negative to positive infinity. Outside the start and end colour stops, the colour is extrapolated. For example, here is a gradient fill that shows the gradient outside the 0,1 range; the black lines are at 0 and 1 on the gradient:

Additional colour stops add intermediate colours, for example adding

  gradient.addColorStop(0.5, 'blue');

gives a gradient from red to blue to green (again the black lines are at the colour stops):


And here's a more complex example,

  gradient.addColorStop(0, 'red');
  gradient.addColorStop(0.1, 'blue');
  gradient.addColorStop(0.3, 'red');
  gradient.addColorStop(1, 'green');


in linear and radial versions:


OK, now back to just two colours, and more complex radial gradients. The syntax to create a radial gradient is:

  createRadialGradient(x0, y0, r0, x1, y1, r1)

where *0 define the position and radius of the starting circle (where the gradient parameter is 0), and the *1 arguments define the end circle (where  the gradient parameter is 1). If we offset the two circles (but circle 0 is still completely inside circle 1), then we get a skewed version of the simple radial gradient:
But, if the circles are not inside one another, then we get a cone:
Huh?

The gradient is interpolated from positive to negative infinity, just like with linear gradients. In the radial case, you can think of the gradient being drawn by stroking many incremental circles between the start and end circles. The position and radius of the circles are interpolated between x0, y0, r0 at t=0 (where t is the gradient parameters discussed above) and x1, y1, r1 at t=1. The interpolation is extrapolated to larger and larger circles as t tends to positive infinity, and to smaller and smaller circles as the radius disappears to 0, somewhere between t=0 and t=-infinity. This is what gives the cone shape. To illustrate, I have added the circles at t=0 and t=1 and a line along the centres of all the hypothetical circles:

The shading of the gradient is also kind of interesting; the shading is limited to the cone and is a gradient from the 'left' side of the t=0 circle to the 'left' side of the  t=1 circle. In particular, the gradient extends across the t=0 circle, but the t=1 circle is a solid colour. That is in contrast to the simple case (where t=0 circle is entirely inside t=1 circle) where t=0 is a solid colour and t=1 is shaded. To understand this we need two concepts: that the gradient is drawn from positive infinity to negative infinity (technically, towards negative infinity until the radius of circles becomes 0), and that what is drawn is not overdrawn. In the example, the gradient is drawn right-to-left, getting less green and more red as we go. The gradient goes between the left side of the circles because we draw from the right and do not overdraw. Note that left or right does not make a difference, we draw from circle 1 to circle 0, so the gradient will go from the side of circle 1 closest to circle 0, to the side of circle 0 furthest from circle 1.


There is nothing which says that circle 0 must be the smaller circle, so we can create a gradient from the 'opposite' sides of the circles by swapping the definition of the circles in the createRadialGradient call and the colours of colour stops:
The final case is if the two circles overlap, but neither is entirely contained in the other. We can apply the same understanding as the separate circles case (and of couse, that understanding applies to the simplest case too). In fact the gradient looks similar, just with a more compressed cone:


We can see the cone stretch out as the circles get further apart (in the first example, the circles are just one pixel apart):
 
 And there is the interesting case where the edges of the circles touch. This looks like the case where circle o is inside circle 1, except that the solid blue colour stops at the edge of the circles. One pixel further, and we are back to the simple case:

Saturday, September 15, 2012

Orcon - why you so useless?

This is actually a rant about software design, but that won't become apparent until later...

I have been an Orcon (an internet/phone provider) customer for the last few years, I forget how many exactly, but somewhere between 1.5 and 3.5. As is par for the course for internet in NZ, the service is slow and expensive, but they have at least been good on the customer service front and they are relatively cheap (compared to other NZ providers). I've recently moved out of my house (not out of choice) and so had to terminate my contract early, there is a charge for this, which is fair enough, I had the option to choose a more expensive non-contract option when I signed up, but chose not to. It slightly irritates me that the shared house I'm now in is with Orcon too, so I'm paying to terminate my contract, but still with them. And it irritates me a lot that I can't pause my contract because I would probably have gone with them again when I move into my new house in a couple of months, not any more, obviously.

Anyway, the charge didn't turn up, and since Orcon emails have a habit of being spam filtered (see the title of the blog post), I thought I would give them a call. It turns out that " a request was made to cancel, but it didn't work". What? It didn't work?! How does a simple cancel request not work, and why does it fail silently? Why didn't you call or email? That is pretty dumb to start with. Of course the phone person didn't know why, but she promised to re-do the request, I suppose I will have to call back to check it works. This is not a good customer service protocol.

Then she asks if I have one of their modems, because if so, I will be charged if I haven't sent it back. So I have a modem, I have no idea if it is theirs, I suspect it is, but I acquired it several years ago and it has no Orcon stickers on it, so who knows? I do remember buying my own wireless router, so at worst I have a cheap wired-modem, probably worth $10 new. It does not seem making a fuss over, given I have been a customer from multiple houses for multiple years. But the really annoying thing is the customer service person has to ask me if it is theirs. Presumably someone at Orcon knows or they wouldn't be able to charge me for it. But the customer-facing people don't.

Here comes the software design bit - presumably this was an explicit constraint - they decided to deliberately separate the knowledge about who has their modems from the customer service record. WHY WOULD YOU DO THAT?! Who sits down and thinks that this is a good idea? What kind of idiot designs a system like this?

And breathe. So, can anyone recommend an internet/telecom provider in NZ?

Sunday, September 09, 2012

More on the Layers refactoring

The refactoring is coming along nicely, I think the hard work is done - I have a set of abstractions I'm happy with, and most of the code has been refactored into the new architecture. There are a lot of rough edges still to iron out, and a lot of things marked TODO. There are also some of the Compositor related refactorings, which are not related to texture passing, still to do.

In this blog post, I want to outline the new abstractions for dealing with graphics and why I've chosen them, I'll mostly describe them as if they are new, but of course, Layers have been there all along, I'm mostly just breaking them up. The main abstractions are: shadowable layers, shadow layers, buffers, and textures. The latter two are divided up into host and client versions, where shadowable ~= client and shadow ~= host. The terminology should settle down as the refactoring finishes off.

There are different kinds of layers for different kinds of content, in particular image, canvas, and Thebes (these last are badly named and we should change the name to ContentLayers soon, thus Thebes ~= Content for Buffers, but more on that later). Colour layers don't transfer any kind of rendered graphics to the compositor, so don't matter here. Container layers are just for organising the layer tree, so also don't matter here. Shadowable layers are always drawn without HWA, so there is only the software backend. They should work with any compositing backend - Shadowable layers should be backend-independent.

A texture represents a lump of graphics data and is responsible for its memory and how that data is transmitted from drawing to compositing processes. Textures are backend dependent. Theoretically any texture can be used by any kind of layer, but in practice each kind of layer only uses a subset of kinds of texture.

A buffer abstracts how a layer uses its textures. A layer has a single buffer, and that buffer has one or more textures. In the simplest case, a buffer just has a single texture. Examples of more complex buffers are YUV images which have a texture for each channel, and tiled images which have a texture for each tile. The buffer defines how the textures are organised: when drawn, when transmitted to the compositor, and when (and how) they are composited. A layer only interacts with textures via a buffer, in fact layers do not even know that textures exist. Buffers are backend independent and (theoretically) can work with any kind of Layer and any kind of texture. In practice, each layer uses different buffers, and the names reflect this (CanvasClient, ImageClient, ContentClient are the different kinds of BufferClient). Also, the buffers are kind of picky about which textures they use. There is some re-use, for example canvas and image layers use the same kind of BufferHosts, and some kinds of texture are used by multiple kinds of buffer. Hopefully re-use will increase with time.

As an example of how this ties together, lets use a tiled Thebes layer as an example (in fact, I haven't converted tiled layers yet): some part of the layer is invalidated, the layer is responsible for knowing which parts are visible and valid, and therefore which regions must be updated. It tells its buffer to update this region, the buffer works out which textures must be repainted and repaints them (by calling back into the layer, because the repainting code is independent of buffers, textures, etc.). The buffer then tells the changed texture to update the compositor, which they do in their own way. On the shadow side, the invalidated textures are updated. When the screen needs updating, the shadow layer knows which region to draw, the buffer host knows how to map that region into textures and how to draw it to the screen.

Communication between renderer and compositor is on a per-layer basis, and I do not change this protocol. But instead of just sending a layer and some data, we now send a layer, a texture identifier, and some data. The texture client sends the message, a reference to the layer is passed in, and it knows its own identifier. On the compositing side, the message is received by the shadow layer (via ShadowLayersParent) and passed directly to its buffer host. The buffer host can then use the identifier as it chooses. The identifier includes the type of buffer and texture and this is verified. If the buffer has only one texture host, then it can be updated with the data from the message. If there is more than one, then the buffer host uses the texture identifier to work out which texture host should be updated. Buffer hosts/clients are free to use the identifier however they like.

Buffer and texture hosts and clients are not created directly, hosts are created by the compositor and clients are created by a specific factory (name TBC). A layer asks for a buffer client of a certain kind, the factory will create that buffer client if possible, or might return some other kind of client if not. Buffer hosts are created in a similar fashion when they are required (when a texture host is added, see later). Texture clients are usually created by a buffer client's constructor (but sometimes not, and sometimes later as well). Texture clients are also created with a call to the factory, the buffer client should know what kind of texture client it wants, and whether it can make do with some other kind if it is not available. When a texture client is created, a message is sent to the compositing side, and the compositor creates a corresponding texture host and adds it to the corresponding shadow layer, the layer should pass the texture host straight to its buffer host, creating one if necessary (what kind of buffer host is included with the message to create the texture host). Likewise destroying a texture client alerts the compositor to destroy the corresponding texture host. But, texture hosts can be created and destroyed at will, without alerting the drawing side. So, host/client pairs (buffers and textures) are loosely associated - there will 'always' be a pair, related by the combination of a layer reference and an identifier, but the exact objects in the pairing might change.

That's enough for one blog post, probably too much. Next time - types!

Thursday, September 06, 2012

Azure canvas on by default

As of tonight, Azure canvas is on by default on all platforms! The last platform to get turned on was Linux, for which Anthony Jones did a lot of work on performance to bring Azure canvas up to par with Thebes, and exceed it in some cases. In the next cycle we will get to remove Thebes canvas from the code-base altogether.

The flavour of Azure canvas you get depends on the platform: Firefox OS, Android, and Linux get Cairo, Mac gets Quartz, and Windows gets Direct2D, if you have Vista or 7, and your drivers are up to date, and your graphics card is not blacklisted; otherwise you get Cairo.

You can see which Azure backend you are getting for Canvas (and possibly content) in about:support. You can also see the fallback backend, this is used if for some reason the preferred backend cannot be used, usually for very large canvases which are larger than the maximum texture size for Direct2D.

You can change the canvas settings in about:config: gfx.canvas.azure.enabled should be true, you can force Thebes canvas by setting it to false (for now). gfx.canvas.azure.backends contains an ordered list of backend names, for example, "direct2d, skia, cairo". Your preferred backend is the first backend in that list which your platform supports. Your fallback backend is the first backend on that list which your platform supports, is not the preferred backend, and is Cairo (which means you might not have a fallback backend).

Wednesday, August 29, 2012

Layers refactoring

OMTC is one of the major performance wins for Firefox at the moment and we want to bring it to all platforms. The major sub-system involved in OMTC is Layers, in particular the shadow layers system (which I blogged about a while back). Unfortunately, it is a bit of a mess at the moment, and it is very hard to reuse code. The Layers system design is very elegant and extendable, and I like it a lot, but the extension to multiple process (or threads) is not the greatest. Bas Schouten and Ali Juma came up with a major redesign for the whole system, which was just lovely, but unfortunately we don't have the resources to work on that, and it seems extremely difficult to do such a major refactoring without being disruptive or getting stuck in an endless cycle of rebasing.

So, we are going to do a slightly less ambitious refactoring, mostly of the shadow classes which work on the compositing thread. Bas and Ali came up with the design, Ali started it off, and I am now carrying the baton. The goal is to be able to implement OMTC on a platform with as little extra layers code as possible. We separate out the shadow layer manager into a layer manager and a compositor. The layer manager takes care of the layer hierarchy and the compositor is responsible for the graphical composition of quads. There will be one compositor per backend, but only only one shadow layer manager. Any backend specific code in the shadow layers will be moved into the compositor, so there will only be one shadow layer of each kind (e.g., Thebes layer, container layer).

The method of sharing graphics memory (for example, textures for the accelerated backends) are backend dependent and will be moved out of the layers classes. We will introduce the concept of texture hosts and texture clients. Texture hosts reside on the compositor thread, and clients on the drawing thread. There will be pairs for each method of sharing a buffer: shared memory, OpenGL textures, Direct3D textures, and so forth. Obviously not all host types will have implementations on all backends, but potentially there could be multiple implementations of each kind of texture host. Texture clients are only used by basic layers, and there is only one implementation for each kind of 'texture' (but, for e.g., OpenGL textures and D3D textures count as different kinds of texture client).

Texture host/clients are designed to be very lightweight, they only manage graphical memory and the communication between host and client processes (most of which is actually done outside of those classes, anyway). I think we also need a higher level, heavier weight abstraction. At the moment (in the refactoring) these are called image hosts and image clients, but the name will probably change. These 'images' represent some buffer of graphical data, which could be a single texture or it could be in YUV form - with one texture for each colour channel - or could be an image made up of many texture tiles. I haven't figured out exactly how this will work yet. These images should be backend independent (they will use different texture host/clients on different backends), and it would be nice to share as much code as possible between layer types, so, for example, canvas layers can use the same image host/client as image layers. There is a lot of duplication there at the moment.

The compositor/layer manager work is mostly done, and the texture host/client work is on its way. The image host/client is also on its way, but the design is evolving as I go along. You can follow on the graphics branch, if you are interested. I will blog some more as the design solidifies.

Friday, August 10, 2012

Azure/Cairo canvas - progress

For those following along at home, as of this morning Azure/Cairo canvas is on for Android (and Firefox OS (b2g), presumably) on inbound (bug 773460). Barring any dramatic last minute backout, it should be on nightly tonight.

I recently fixed a couple of bugs - one on Android with 565 surfaces (which were being incorrectly rendered, bug 779401) and one on Windows where font interop was causing a crash in Google Maps GL (bug 780392).

Now just Linux to go for the Azure canvas treatment. Anthony Jones is working on a performance issue there, but hopefully we will turn that on soon. Hopefully, that means that Azure will be the default canvas on all platforms in Firefox 17, and we can remove Thebes canvas altogether in 18.

Thursday, August 09, 2012

Mask Layers on the OpenGL backend

The OpenGl layers backend is similar to the Direct3D backends, in implementation (using shaders) and design. There are differences due to the architectural quirks of OpenGL and also because it is the main backend for mobile, so it gets much more aggressive optimisations.

The shaders work similarly to the DirectX ones, we do the same trick to account for perspective correct interpolation for layers with a 3D transform. Things are different in that the OpenGL shaders are generated using a Python script from source in a custom macro language. I extended this language with a simple templating/textual substitution system, so we do not have loads of duplicated code for shaders with/without a mask (although we still end up with duplication in the actual shaders). The bulk of a shader can be defined uniformly, with slots for masking code. Then masked and non-masked (and masked for 3D) versions are generated.

I also changed shader management in the OpenGL layers backend. Previously it was managed in LayerManagerOGL, and was rather manual. I moved out most of the logic dealing with shader programs into ShaderProgramOGL and ProgramProfileOGL classes. I created a profile class (ProgramProfileOGL) which keeps references to a set of shaders and variables (that is, which variables are required, not the actual values), for each use of the GL pipeline. All this kind of info is encapsulated within the profile, rather than the layer manager. ShaderProgramOGL represents an instantiation of a profile, it stores actual values for the shaders' parameters and encapsulates the interactions with the shaders (loading the parameters, creating the shaders and programs, loading the mask texture and transform).

OpenGL is the primary backend for compositing for OMTC (off main thread compositing). Mask layers need to work here, and in the end the solution is fairly simple. We will have ensured that any image layers representing a mask have been rendered to shared memory (see the basic layers part of mask layers). Rendering a shadow layer with a mask is just like a regular layer, we load the mask into a texture (here in ShadowImageLayerOGL::LoadAsTexture, and the procedure is different compared with main thread compositing) and then use the mask texture in the shaders. The texture and transform for the mask are auto-magically copied to the shadow image layer by IPDL.

My first attempt at mask layers broke tiled textures. These are used on mobile where there is a maximum texture size smaller than the desired texture (where the texture is the buffer for some layer). In that case, the texture is broken up into tiles, and these can be pushed to GPU memory separately. The problem is that if we do the same thing with the mask, then the mask tiles might not line up with the underlying tiles (for example, if we scroll the underlying layer, but not the mask layer), that could quadruple the effective number of tiles and thus rendering calls. Instead we always use a single tile for a mask, if the tile is too big, then we have to scale it down to the maximum size, and scale back up when rendering, luckily the rescaling pretty much comes out in the wash, because in the shader textures are always in a 1x1 coordinate space anyway. The 'hard' work is done in FrameLayerBuilder, where we find a maximum size for the texture (using LayerManager::GetMaxTextureSize) and, if necessary, incorporate the scaling into the mask layer's transform.

So, that concludes this belated series on my mask layers work. Just in time, because we are embarking on a refactoring of the Layers system, so this will probably all be out of date very soon...

Saturday, August 04, 2012

Azure canvas in Firefox nightlies

Over the past week or so, I've been landing patches by Anthony Jones, Matt Woodrow, and myself to get the Cairo and Skia backends working for Azure canvas. The latest nightly should have Azure/Cairo canvas on by default for Windows users without Direct2D, that is Windows XP users or later Windows users who don't have up to date drivers or have pref'ed off Direct2D or Direct3D 10 layers.

It would be great to find any bugs with these backends, so if you are in the mood, I would appreciate you experimenting with these canvas backends. Cairo has some known problems on Android, but as far as we know works perfectly elsewhere, Skia has only passed testing on Windows, but casual experimentation has shown it works pretty well everywhere, I think there are font problems on Android, but I'm not sure. To use these backends you need to set some prefs in about:config - azure canvas must be enabled (already the case on Windows and Mac), to do this set gfx.canvas.azure.enabled to true (note that there is also gfx.content.azure.enabled, unless you have Windows with Direct2D working, you almost certainly don't want to make this true). You then need to specify a backend for Azure canvases by setting gfx.canvas.azure.backends to "cairo" or "skia" (other options are "direct2d" or "cg" for core graphics). If you set this to an empty string, you will not get any backend and will get the old Thebes canvas. You can set multiple backends (e.g., "direct2d,cairo"), the front of the list has the highest priority. You will never get a backend that is not supported. (In the example, you will get direct2d if you are on recent Windows and have up to date drivers, cairo if not. If your canvas is too large for direct2d to handle, we will fallback to cairo canvas).

If you find any bugs please file a bug using bugzilla and cc me (:nrc). Thanks and have fun experimenting!

If you read one thing this week...

...about the IT industry, it should be this. The usual caveat about nor reading comments on the internet applies.

Tuesday, July 24, 2012

Azure Canvas

Other than the mask layers stuff, most of my work has been on getting Azure canvas working, passing tests, and landed in the tree. Most of the work has been done before, my contribution has been on debugging, testing, and wiring things together, and of course a few interesting edge cases. To describe the work, I'll give a little background on our graphics libraries.

Back in the day, 'all' rendering in Firefox was done using the Cairo graphics library. This was wrapped in a thin wrapper library called Thebes (Cairo is the external library which does all the work, Thebes is the internal wrapper). We would like to support different graphics libraries and we would like a more efficient wrapper layer, and so was born Azure. This is a new internal system which abstracts a lot of our drawing code. Multiple backends can be slotted into Azure, and, in turn, Azure can be used as a backend for Thebes instead of using Cairo. So, for example, our graphics stack could use Thebes on top of Azure on top of Cairo (or Skia or Direct2D, etc.). Azure has some interesting advantages over Thebes, mainly that it is a state-less API (well, except for a transform and clipping rect), Bas and roc have written some interesting blog posts about this before.

The long term plan is for Thebes to be removed, and Azure to be used directly. For now there are people implementing content rendering using Thebes on top of Azure (on top of Direct2D on Windows, Core Graphics on Mac, and Skia or Cairo elsewhere).

HTML 5 canvas uses a different rendering path from the rest of content. We actually have two canvas implementations right now, one on top of Thebes and one on top of Azure. Currently the latter is used only on Windows where Direct2D is present, and the Thebes canvas is used elsewhere. We want to remove the Thebes canvas, because having two canvas implementations is a maintenance burden and affects our DOM bindings implementation.

So, I have been working on getting Azure canvas working with both Skia and Cairo backends, the former only on Windows, the latter everywhere, with Anthony Jones. It is all about ready to land, after that, we'll let it settle for a while to ensure there are no problems, and then we can remove Thebes canvas completely. Some of the interesting bits of this work have been using Skia canvases as source for tab previews on Windows, organising different backend preferences on different platforms, and ensuring we can always fall back to a safer backend, and just getting all the complex logic in Azure canvas and the various backends right. Plus the occasional memory leak, and some interesting interactions between the layers system and Azure.

In terms of the bigger picture, this work should lead to Azure totally replacing Thebes for canvas rendering. Elsewhere, Azure is being developed into the primary backend for Thebes, so that it will be used everywhere for rendering content, rather than using Cairo directly. Then we will work to remove the Thebes layer altogether so we are rendering directly with Azure, this will be fiddly because there is Thebes code everywhere in the code base. But at the end of it all, we should have faster, more efficient code, and be able to easily plugin different graphics backends, which is a win.

Monday, July 16, 2012

Mask layers on the basic backend

Using mask layers on the basic backend is a bit different from the GPU backends because we don't use shaders, and instead must do a bit of a dance with Cairo. Also, the transforms are different in basic layers vs. the GPU layers (relative to user space, rather than relative to the layer's container), which means we must handle the layer transform differently.

When using basic layers, the drawing is done using Cairo, or more precisely, using Thebes our wrapper graphics library. Therefore, we do not (as on the GPU backends) use shaders to do the masking, but resort to actual mask operations. The basic idea is we draw the content un-masked, use that as a source, use the mask layer as the mask, and just paint the source through the mask. Of course it gets more complicated, because sometimes the source layers use paint, and sometimes fill. Furthermore, if the source layer has an opacity, then we must do a push/pop group because there is no mask with opacity operation in Cairo. In this case, we push the current drawing context, do the paint or fill, then use this new context for the masking source.

Thebes layers are more complex because they use a rotating buffer for drawing to minimise redrawing. Masking is not affected too much though, for each quadrant we just mask as if we were drawing the whole layer, using the mask's transform and ignoring any rotation. The only slight complication is that the mask and its transform must be extracted from the source layer in the basic layers code, and passed to the buffer separately.

The transforms in basic layers work differently to the other backends. In layout, we set up the mask transform the same way, but we use a different method of calculating the effective transform. The actual computation for mask layer transforms is the same, but the matrix passed to ComputeEffectiveTransformForMaskLayer will be different. When we come to rendering, we use only the mask layer's transform for when using the mask, that is, we use SetMatrix, we don't add the mask's transform to the current transform like we do on the GPU backends.

The basic layers backend is always used on the drawing thread when we use OMTC (either the OpenGL or basic backends will be used for compositing). Masking is mostly done on the compositing thread. IPDL does most of the heavy lifting for this, the only thing we have to do is to ensure that if we are only drawing (that is, not also compositing), then we must 'draw' the mask to an off-screen buffer so that it can be used by the compositing thread. Masking is only done if we are compositing also, so when we draw a layer, we check whether to draw it with the mask (if we are compositing) or draw it and its mask separately (if we are only drawing). Of course there is an exception, colour layers are masked on the drawing thread.

Sunday, June 24, 2012

Some mask layers optimisations

I've been working on a couple of small optimisations for mask layers, required for Boot to Gecko, but which will help on all platforms.

Drawing the mask layer (bug 757346)

The common case for mask layers is a single rounded rect clip. We would like this to be fast (obviously). The general case is some number of rounded rect clips, in this case the mask is the intersection of these rounded rects. Before mask layers, clipping was done by just drawing each rounded rect, then using that as a clip. Doing this multiple times gives the intersection we want. So, when we did the mask layers work, we kept this method of drawing the mask, we used the rounded rects as clips, and painted alpha = 1 into the whole mask image, giving the desired mask.

Unfortunately, clipping to an arbitrary path like this is often slow (apparently). So, for the common case of one rounded rect, we just draw the rounded rect onto the mask surface (kind of obvious, really). For the general case, we use the first n-1 rounded rects as clips and we draw the last one, again giving the desired effect, and ever so slightly faster, but the real optimisation is for the common case.

Sharing mask images (bug 757347, work in progress)

If we have multiple, similar masks on one page, then currently they get a mask image each. This is not efficient in terms of memory or speed (because the mask has to be created multiple times, which is slow). So we would like multiple mask layers to share a single mask image. This turns out to be more complicated than it sounds.

The basic idea is that we have a global hashtable, and every image we create for a mask is put into it, using the rounded rects used to create it as a key. We keep a count of how many mask layers are using the image, and when that falls to zero, we delete the image. When we need to create a new mask layer, we first check if we can reuse an image from the cache.

This sounds fairly simple, but we need to totally re-organise the way mask layers are built to do this. Fortunately, the changes are limited to the creation of masks and their transforms, the backends did not need to change. Rather than store the part of a mask which overlaps with the masked layer's visible region, we store the whole mask and nothing but the mask. This means that a layer with the same mask in different places can use one image, or different layers can share a single mask if it is at different places within each layer.

The new way to create a mask image is to calculate a bounding rect for the clipping rounded rects, this is in the masked layer's coordinate space. The mask image will be the same size as the bounding rect (modulo a maximum texture size), so the transform for the mask layer simply translates the mask to the right place (formally, it is the transform which takes us from mask space to the masked layer's space, and might involve a scale). For any other layer which reuses the mask, we just need a different transform.

To manage the cache of mask images we create a new class: MaskLayerImageCache, this holds the hashtable which stores the images, and has methods for using the cache. We also need classes for an entry in the hashtable (MaskLayerImageEntry), a key for looking up images (MaskLayerImageKey), and another representation of the rounded rects used to create the mask (PixelRoundedRect). An entry object keeps a reference to the image container (we actually cache the container, not the image itself), and to its key. When we get a hit in the hashtable, we return both the key and the image container, the entry is private to MaskLayerImageCache. The created mask layer (which is an image layer) keeps a reference to the image container. We keep a reference to the key in the mask layer's user data.

(Now it gets interesting), image containers are reference counted and we don't want the container to die when it is still referenced by a mask layer, but nor do we want to keep the image around for ever. So the entry has an owning ref to the image container which keeps it alive (as do the mask layers, but they don't actually need to), we then maintain our own count of how many mask layers reference the image in that image's key, which is primarily maintained by the mask layer's user data (which uses an nsRefPtr). We must separate the key from the entry because the hashtable can move around the entries in memory, and the mask layer (well, its user data) needs a permanent link to something to keep a reference count.

Once the count of mask layers in the key gets to zero we can remove the linked image from the cache, destroying the last reference and causing it to be released. To do this we sweep the cache once the layer tree has been constructed checking for zero reference counts. We do this after building but before the tree is destroyed, so that an image can be unreferenced between layer tree builds and not be destroyed, this means, we can sometimes save building a new mask image if it was used in the previous cycle.

This is all very nice (no, really, it saves us quite a lot of work in some cases), but it is not perfect. The main problem is with OMTC, where the image might be copied across to the parent process multiple times (once for each reference). This is a weakness of our multi-thread system, and needs a separate fix; it ought to be done soon-ish (for other reasons too).

The last problem with this work is that mask images can be shared between backends, and in our earlier work the different backends required different image formats (A8 vs ARGB32). This has been surprisingly difficult to fix, and I'm on about my third attempt, as always, Android is being particularly problematic. Hopefully, I have it sorted now, fingers crossed...

Tuesday, June 12, 2012

Mask Layers on the Direct3D backends

In the next few posts I'll describe how mask layers are used in the various graphic backends. The Direct3D 9 and 10 backends are pretty similar, so I'll describe them together. The Direct3D 10 backend is the cleanest, so that's where I will start.

A mask layer is loaded from the surface created in SetupMaskLayer into a DX10 texture in LayerD3D10::LoadMaskTexture. This should be fast because we use a DX10 texture as the backend to that surface. In fact, LoadMaskTexture does more than that, it also sets everything up for the shader to use the mask and returns a flag indicating the kind of shader to use (with a mask or without).

Each DX10 layer class has a GetAsTexture method which returns a texture snapshot of that layer. Currently, that is only implemented for image layers, and it returns the backing texture. This method is called by LoadMaskTexture to get the texture for the mask layer. GetAsTexture returns a shader resource view (SRV) which wraps the texture memory, LoadMaskTexture can simply set this SRV as a resource for the effect (a DX10 abstraction for a group of shaders and state), called tMask (t for texture).

GetAsTexture also returns the size of the texture. LoadMaskTexture uses this and the mask layer's effective transform to calculate the texture's bounding rectangle in the coordinate space which will be used for rendering, this is passed to the effect as vMaskQuad (v for vector, it is just a vector of length four, containing the position and dimensions of the bounding quad).

When a layer is rendered (RenderLayer), a system of shader flags is used (including the flag returned from LoadMaskTexture) to work out which shader to use. LoadMaskTexture us called as part of this calculation, so when we have the shader flags, ready the mask and its quad will be passed to the GPU is they are needed. SelectShader is used to pick the actual shaders (a technique, technically, which is a group of passes, which are combinations of vertex and fragment shaders), and the other arguments are passed to the effect. The technique is called and the layer is rendered.

There is one kind of vertex shader and a bunch of different kinds of fragment shaders (for different colour schemes, etc.). When we added mask layers, we introduced vertex shaders for layers with masks and for layers with masks and a 3D transform. We doubled the number of fragment shaders, adding a version of each for layers with masks (plus a couple of extra for 3D). For masks, the vertex shader calculates the location on the mask at which to sample (in the texture's coordinate space), and the fragment shader pretty much just does a multiply with the alpha value at that sample from the mask and the result of rendering the rest of the layer at that point.

The vertex shader operates on each corner of the rendered quad (which is a rectangular window onto a layer), it figures out the location of the corner it is working in the current coordinate space (which is that of the containing layer) and uses that to calculate a position in the coordinate space of the texture being rendered (an image of the layer). For mask layers, we do likewise to calculate a position on the mask's texture, the graphics hardware interpolates these positions to find texture coords for each pixel.

For layers with a 3D transform (with perspective) this interpolation could lead to distortion because the graphics hardware does perspective correct interpolation, but our mask layer was rendered to the texture, after being transformed, so interpolation should be linear. We therefore have to do a little dance to 'undo' the perspective correct interpolation in any vertex and fragment shaders which could be used on layers with 3D transforms.

DirectX 9 works pretty similarly to the DirectX 10 backend, the shaders and textures work differently, but the concepts are the same. The management of the mask layers changes to reflect the way the LayerManager is written too, but again the changes are minor. On the DirectX 9 backend, we support shadow layers for OMTC, but I'll discuss this stuff when I talk about the OpenGL backend, because that is where more of the effort for OMTC went.

Thursday, June 07, 2012

More on my new phone

So I am still pretty much in geek-love with my phone (HTC One X). But there has been something about it that has been annoying me and I have only just worked out what it is. I'm not sure quite how to phrase this, but using the phone for passive computing tasks is amazing, better (mostly) than using a proper computer. Reading stuff on the internet, listening to music, watching videos, playing games, and so forth, all an awesome experience. But, anything that involves active interaction is frustrating to the point of impossible - writing an email, doing some coding, in fact, getting any work done at all. This is kind of obvious given the input methods.

To put it another way, the device is perfect for consumption, but terrible for creation. This bother me because, for me, using a computer has always been a creative experience; the whole point is to make things. For me, the phone is much more like a TV then a computer in terms of interaction, and that makes me sad.

It is an amazingly powerful device, and seems to me that I ought to be productive with it, but the only useful thing I've done with it is testing Firefox. I suppose in a way, it's good, it means I don't need to get too attached to another expensive gadget.

Tuesday, June 05, 2012

An ill-thought out theory about what makes programming interesting

This one is going to be a bit random. Bear with me. I've been thinking a bit about what makes some programming tasks interesting and some not so interesting (to me). As a result of this thinking, I present my theory of different kinds of programming (TM). I'm sure someone has come up with this before, but I've not come across it, I think.

So, let us define normal programs as 1st order programming, they operate on data and produce an output. Most programming tasks are of this nature. Let us call data itself 0th order programming, that is, there is no real programming. 2nd order programs are those that operate on 1st order programs (for e.g., compilers, modern web browsers); 3rd order programs operate on 2nd order ones (e.g., compiler-compilers, parser generators, tools for the analysis of compilers or compiler theories, such as Coq); and so forth, although I can't think of any 4th order programs, and as with most nth-order type theories, the higher orders blur together and probably stop being interesting.

Furthermore, we can have half orders, where the programming task is meant to be extensively reused. So standard libraries and frameworks are 1.5th order and compiler libraries and so forth (e.g., llvm) are 2.5th order. Possibly there are 0.5th order programs, but I don't think of libraries here so much as hybrid data/programs such as a TeX document.

I think that from a programming perspective, the problems get more interesting as we go up the scale (for some definition of interesting, which might just be complicated). So, for a programmer, making a web page (0th order) is boring as hell, making web-based software is much more interesting, and making a compiler for that software is more interesting still. In fact, I don't think it is just about complicated, I think that as we go up the scale the problems are qualitatively different (and, to me at least, more interesting).

BUT, there are advantages to lower-order programming, and disadvantages to higher-order programming: working with data, you can be creative and solve hard problems about all kinds of non-progamming things. Likewise, 1st order programs tend to allow you to work closer to the 'people' problems, which, apparently, are the really interesting ones. The lower order a program is (in general), the easy it is to test and to specify the problem (the lack of a clear specification being one of the most annoying things, for me, about programming). As we go up the orders, the tasks can actually get repetitive, and certainly the risk of failure is higher, the problem more abstract and has a less direct impact on the real world. And at some point the complexity stops being fun - I have dabbled in a few 3rd order tasks, and progress was very slow, and the output very small, and a lot of the problems encountered were tedious edge cases.

So thinking about myself, 2nd order programming seems to be my Goldilocks kind of programming, not too abstract and complex, but operating on code, not data. All the projects I have really enjoyed fall into this category - compilers, interpreters, Firefox.

I wonder where programming language theory falls into my scheme. I think of it as just programming really, but with maths, rather than a compiler. It doesn't feel like any of my categories though (but has elements of all of them), maybe I need a different scheme.

Tuesday, May 29, 2012

Building a mask layer

Mask layers are built in layout/base/FrameLayerBuilder.cpp. Most of the work constructing the mask layer is done in SetupMaskLayer (which I guess should actually be called SetupMaskLayerForRoundedRectClip or something). But before that, we have to figure out if making a mask layer makes sense or not.

This is easy for image, colour, and container layers. But for Thebes layers it is more complicated, there can be multiple display items in a single Thebes layer, and we can only create a mask for the rounded rect clips that all items have in common. In practice, this is usually one clip, and often when there is only one item in the layer, but we cope with more. As items are added to a Thebes layer we keep track of the number of rounded rect clips in common using ThebesLayerData::mCommonClipCount, mostly in ThebesLayerData::UpdateCommonClipCount. We keep the clip for some item in the layer in UpdateCommonClipCount::mItemClip, the first mCommonClipCount clips in this clip are common to all items in the layer (we only find common clips if they are the first for each item). As we add items to the layer, their clips are compared to mItemClip, and mCommonClipCount is updated as necessary.

When Thebes layers are popped from the stack (i.e., we are done building), then mCommonClipCount is moved to ThebesLayerEntry::mCommonClipCount (it must be initialised, i.e., greater than zero by now). We create a mask layer for the first mCommonClipCount clipping rects. When the thebes layer is drawn, we only clip for clipping rounded rects other than the common ones.

FrameLayerBuilder has a neat way of recycling layers, this is a combination of caching and memory management: we recycle and create layers in order and if we get back the same layer as we had last time round (common case) we don't bother to rebuild it, but leave the layer as is. We do a similar thing with mask layers. This is a bit complicated because we only guarantee that layers of the same type are recycled in order, so we need to keep lists of mask layers for each type of layer. We store information about the clips we are masking for in the user data on the mask layer, this is checked in SetupMaskLayer to see if we have to build the mask fresh, or if we can get away with re-using the old version. Thus, we avoid building the mask too often.

When the mask is built in FrameLayerBuilder.cpp, we create a mask surface which is the same size as the masked layer's visible region. We paint the clip into this surface, but only the relevant bits of the clip will be inside the surface. So, the mask is in it's own coordinate system, which is relative to the masked layer's visible region. We create a transform for the mask which transforms the mask's coordinate system to the masked layer's coordinate system. (Note, I'm currently changing this stuff, so it might not be true for long). After the layer tree is built, it is walked to compute the effective transform of each layer (this means different things for the different backends). We updated this walk to also calculate an effective transform for mask layers. So, when we come to render the layer tree, the effective transform for the mask layer is the one that we need for drawing the mask layer 'on top of' the masked layer's visible region, however the masked layer is rendered (I'll describe the transform in more detail in later posts).

Sunday, May 27, 2012

Assembly - strings

I want to find out a little about how strings work in assembly, and as a side effect call some functions that I haven't defined. So I will try to encode the following rather contrived example:

  char* hello = "hello";
  char* world = "world";
  char* helloWorld = (char*)malloc(64);
  strcpy(helloWorld, hello);
  helloWorld[5] = ' ';
  strcpy(helloWorld+6, world);
  printf("%s!\n", helloWorld);

The interesting bits are stack (I'm going to allocate the format string on the stack) and heap allocation (the latter done by malloc), copying strings and doing some character manipulation and calling a function which takes a variable number of arguments. I will make the hello and world strings global variables. Here is the assembly:

  // allocate on the heap for helloWorld
  push 64
  call malloc
  add esp, 4
  // save address of helloWorld on the stack
  push eax
  // copy hello to helloWorld
  mov ebx, 0
  mov ecx, hello
copyh:
  mov dl, BYTE PTR [ebx+ecx]
  mov BYTE PTR [eax], dl
  inc eax
  inc ebx
  cmp ebx, 5
  jl copyh
  // add a space
  mov [eax], ' '
  inc eax
  // copy world to helloWorld
  mov ebx, 0
  mov ecx, world
copyw:
  mov dl, BYTE PTR [ecx+ebx]
  mov BYTE PTR [eax], dl
  inc eax
  inc ebx
  cmp ebx, 5
  jl copyw
  // null terminate helloWorld
  mov [eax], 0
  // temporarily move address of helloWorld to eax
  pop eax
  // allocate format on the stack and save address in ebx
  sub esp, 8
  mov ebx, esp
  // create format
  mov byte ptr [ebx], '%'
  mov byte ptr [ebx+1], 's'
  mov byte ptr [ebx+2], '!'
  mov byte ptr [ebx+3], '\n'
  mov byte ptr [ebx+4], 0
  // call printf
  push eax
  push ebx
  call printf
  // tidy up printf call and format
  add esp, 16
  // should really call delete, but meh

One thing that surprised me is that you don't need to pass the number of vararg arguments - the callee just has to work it out! Getting the call to malloc working was a pain, I had to link the standard library statically rather than as a dll, and a whole bunch of other compilation options which I don't know whether they affected it or not (compile as C, not C++, turn off randomised addressing, etc.).

Saturday, May 26, 2012

Mask Layers

The bulk of my work at Mozilla so far has been on mask layers. These just landed (bug 716439) and should be in Aurora at the next uplift. I'm going to try and explain the work in some detail over a series of blog posts. Here I'll try to give an overview.

The motivation for mask layers is to allow fast (hardware accelerated) clipping. Before mask layers, clipping to a rectangle is basically free, but clipping to any other shape is extremely expensive, because it was not done on graphics hardware. A particularly example is rectangles with rounded corners, so a video clipped to a rectangle was very smooth, but as soon as you use CSS border-radius, it would degrade horribly.

Mask layers solve this by adding a mask layer to each layer in the layer tree (which is just null if no masking is needed), the alpha of the mask is used when compositing layers, which clips anything where the mask alpha is zero.

At present there is only support for image layers as mask layers, so the mask is simply a bitmap in memory, but there is no reason other layer types cannot be supported. Mask layers are currently only used to support rounded corner rectangles, but the mechanism is fully general.

So, overview: when building an active layer (no masks on inactive layers) in FrameLayerBuilder, if an image, colour, or container layer has a rounded rect clip, or a all items in a Thebes layer have the same rounded rect clip, then we create a mask layer for that layer. When rendering the layers, the mask is used whilst compositing. This means that in OMTC, masking is done on the compositor thread, and the mask layer is passed across as part of the PLayers protocol.

In the hardware backends, the masking is done using shaders. For each shader that used to exist, we create an extra one which does the masking as well. In fact, for some shaders we need a third shader for masks with a 3D transform, because we must take account of perspective-correct (which is incorrect, in our case) interpolation done by the GPU.

On the basic layers backend, masking is done using Cairo masking.

Inactive layers are done the old way, that is, when the layer is drawn, and rounded rect clips on content are clipped using Thebes. There is new work (Bug 755078) to enable turning mask layers on or off at compile time using the MOZ_ENABLE_MASK_LAYERS flag. It might be necesssary to do this on mobile because the mask layers are not playing nice with tiled textures (although hopefully, this is fixed now).

Assembly - calling conventions - callees

The callee has to do the other side of the handshake I discussed last time. But it also has to do a little more housekeeping. There are two registers which point into the stack, esp points to the top of the stack and is kept up to date by push/pop. ebp is the base pointer and points to the bottom of the current stack frame. It must be set when we create a new stack frame, such as when we enter a function. Since ebp is a callee-preserved register we must save the old value of ebp to the stack and later restore it.

cdecl

void __declspec(naked) copy(...)
{
  __asm {
    //prologue
    push ebp
    mov ebp, esp
    push esi
    //copy
    mov eax, [ebp+12]
    mov esi, [ebp+8]
    cmp eax, esi
    je finish
    mov ecx, [ebp+16]
start:
    mov dl, BYTE PTR [esi+ecx]
    mov BYTE PTR [eax+ecx], dl
    loop start
    mov dl, BYTE PTR [esi]
    mov BYTE PTR [eax], dl
finish:
    //epilogue
    pop esi
    pop ebp
    ret
  }
}

I've elided the arguments to the function (since we are doing the calling now, they are irrelevant). The __declspec(naked) declaration means that the compiler will not generate prologue and epilogue for the function. Our epilogue saves ebp to the stack, moves esp to ebp and saves esi to the stack. In the epilogue, esi is restored (it is the only register we need to preserve), then restore ebp. Because there are no push/pops in the body of the function, the stack pointer is correct. The body of the function is pretty much unchanged, the only difference is that we access the arguments by offset from ebp (i.e., in the stack), rather than by name.

Normally, we would store the arguments on the stack, somewhere between ebp and esp, but here we never take them out of the registers, so no need. We would often allocate memory on the stack (also between ebp and esp) for local variables, but we don't need to do that either.

fastcall

Here, we get the arguments in registers, we could copy them to the stack, but we don't need to, so we'll just use them. Because we don't pass anything on the stack, there is nothing to tidy up. We could avoid a couple of moves by using different registers, but I haven't.

  //prologue
  push ebp
  mov ebp, esp
  push edi
  //fillWithCount
  mov edi, ecx
  mov ecx, edx
  dec ecx
start:
  mov BYTE PTR [edi+ecx], cl
  loop start
  mov BYTE PTR [edi], cl
  //epilogue
  pop edi
  pop ebp
  ret

stdcall

Here the arguments are on the stack, and we must restore it on exit.

  //prologue
  push ebp
  mov ebp, esp
  push edi
  push ebx
  //fill
  mov edi, [ebp+8]
  mov ebx, [ebp+12] 
  mov ecx, [ebp+16]
  dec ecx
start:
  mov BYTE PTR [edi+ecx], bl
  loop start
  mov BYTE PTR [edi], bl
  //epilogue
  pop ebx
  pop edi
  pop ebp
  ret 12

We use the arguments in the same way as for the cdecl function. The difference is that the ret instruction takes an argument, the number of bytes to pop from the stack in order to tidy up. We could also do this manually:

pop edx
add esp, 12
jmp edx

Monday, May 21, 2012

Assembly - calling conventions - callers

Next up calling conventions, what does C do when it calls a function? Actually there are a few ways of calling functions, called calling conventions, there are choices about whether the caller or callee must restore registers, how argument and the return value are passed, how this is passed if it's a class method (not to mention virtual dispatch, but that's beyond the scope of this blog post). I will look at the following common conventions: cdecl (standard convention for C on x86), stdcall (used for the Win32 API), and fastcall (there are actually many varieties, but I'll look at the Microsoft version). There are many other conventions on different platforms and for different languages. In particular, there is thiscall, which is used by Microsoft for calling member functions, and a bunch of conventions for 64 bit processors, which favour registers, because there are more of them (cite: http://msdn.microsoft.com/en-us/library/984x0h58.aspx).

We'll use the call and ret instructions. In all cases, call saves the return address to the stack and ret pops it, so the return address is always on top of the stack when a call is made.

I'm going to continue to use the functions from last time, but now I'm going to add a calling convention to the signature and call them from assembly, rather than C.

Here are the new signatures:

void __cdecl copy(unsigned char* aSrc, unsigned char* aDest, unsigned int aLength);
void __stdcall fill(unsigned char* aDest, unsigned char aVal, unsigned int aLength);
void __fastcall fillWithCount(unsigned char* aDest, unsigned int aLength);

And here is the current calling code in C (there is some allocation before hand, and some printing after, but that is not important):

   fillWithCount(src, 256);
  fill(dest, 5, 256);
  copy(src, dest, 200);

cdecl
Arguments are passed on the stack in right to left order. Return value is in eax. eax, ecx, and edx must be preserved by the caller, other registers are preserved by the callee. The caller must clear the stack on return. So, the call to copy looks like:

//copy(src, dest, 200);
__asm {
  push 200
  push dest
  push src
  call copy
  add esp, 12
}

Which is actually not very interesting because there is no return value, nor registers to restore. The only thing of note is that we pop the three params off the stack by manually fiddling with the stack pointer.

stdcall

stdcall is similar to cdecl, but the callee tidies up the stack. So we don't need to pop anything from the stack.

//fill(dest, 5, 256);
__asm {
  push 256
  push 5
  push dest
  call fill
}

fastcall

fastcall is similar to stdcall in that the callee tidies up the stack, but two parameters are passed in ecx and edx, the rest on the stack like the other conventions. The return value is left in eax.

//fillWithCount(src, 256);
__asm {
  mov ecx, src
  mov edx, 256
  call fillWithCount
}