Xplain

Explaining X11 for the rest of us

A brief aside

This article will be a bit different from all the others so far. While the others have mostly been high-level descriptions of how the X server works, together with interactive demos, today we're going in depth into one of the cool data structures of the X server.

In the first Xplain article, I mentioned a data structure known as the "region", which is a list of rectangles. When I first built Xplain, I took the region code from pixman and compiled it with emscripten, simply because I didn't understand the algorithms. Frankly, I'm terrible at algorithms. I must have looked at tens of different explanations of sorting algorithms over the course of my career, and I still don't understand how they actually work.

For my own fun, I recently decided to investigate the data structures and algorithms behind regions, and I finally understood it. What I found I'm going to share with you.

What are regions?

As mentioned previously, regions are lists of rectangles. You can also think of them as a 1-bit bitmap, where every pixel is either black or white. However, actually storing them as a 1-bit bitmap is expensive, so we use a different representation, which optimizes for common cases, and is more convenient and faster to operate on than large bitmaps.

The X server uses regions all over the place when computing the amount of coverage each window has. So, given the desktop below:

Then the text editor window has as its "visible region" the following L shape.

The X server has to compute these regions efficiently whenever windows are created, destroyed or moved to figure out which window should be painted at which location, and which input should receive mouse events.

The X server has four basic operations it can run on regions:

  • When we add two regions together, we produce a union of them. It's as if you took two pieces of cardboard, laid them on top of each other, and then traced that as the resulting region. Perhaps surprisingly, the X server doesn't use this operation very often.
  • The area that a window can paint to doesn't include its child windows, since the child windows paint there instead. So, when calculating a window's region, we first intersect with the parent, then subtract out the regions of any children the window has.
  • Windows are clipped to their parents. When we want to calculate the region, in absolute space, that a window occupies, we intersect the window's region its parent's.
  • When a window changes its shape, we need to paint two areas: space the window occupies that it didn't before, and space the window no longer occupies. Calculating the areas that are different between two regions is known as xor, after the bitwise operator, which can be thought of as calculating which bits differ between two integers.

To get that "L" shaped region of the text editor above, we take the two rectangles of the two windows, subtract the bottom one from the top one, and we have our "L".

What is a region made of?

Before we fully go into these algorithms, let's first go over what a region is. I've previously mentioned it's a "list of rectangles", so let's start with that. For the example region below, there are of course many different lists of rectangles we could make that result in that image.

Are all of these regions? Do our algorithms have to handle all of these cases? It turns out that the answer is "no". A valid region is not just any list of rectangles, but an ordered, minimal set of horizontal bands. What does that mean? Let's work through it. First, let me show you the correct construction of our example region.

OK. The graphics programmers in the room are now smirking. I'll explain why later. But, let's start chewing apart our definition. First up, let's talk about "horizontal bands". A "band" in this case looks like this:

A band is simply a list of rectangles that all start and end at the same Y position. We only have one other rule about the rectangles in a band: they must not touch or overlap. If two rectangles in a band touch or overlap, they can be coalesced into one rectangle. Using these rules, we can "formally" define a band as an "ordered, minimal set of rectangles with the same top and bottom".

A region, it might follow, is simply a bunch of these bands stacked on top of each other. We also make the same rule here: it is not possible for bands to overlap or touch in their top and bottom positions. This can be unintuitive for the case of what look like visibly separate bands. You might imagine that the region below has two bands, but our "overlapping" rule means that it must have three.

Somewhat intuitively, these two rules mean that for one given "bitmap", there is actually only one way to construct it as a region.

Merging bands

With these definitions, let's start constructing our algorithm. First, let's start with the simpler case: two bands with the same tops and bottoms together. For our example, we'll start with union, but we can start generalizing it afterwards.

Since all of the rectangles in a band have the same top and bottom, we only care about the starts and ends of these rectangles on the X axis. I'm going to call these two values "walls". Given that rectangles in our bands do not overlap or touch, it can be intuitive to see that we will always have an even number of these walls, and that the first wall "enters" the region, while the second wall "exits" the region. The third enters, the fourth exits — it's a toggle.

Remember that these two bands have the same tops and bottoms. I'm only putting them on two different Y axes in the demonstration below simply so that you can see them both at the same time, since they do overlap.

Visually, it's very easy to see what the correct union for these two bands would be: 20 and 50. But algorithmically, it can be difficult to figure out what to do.

For future reference, whenever you see a data structure construction like this, with a big focus on slicing up simple geometry into horizontal segments, it's a pretty big hint that we're going to perform a "sweep-line algorithm". These are also sometimes called "scanline" algorithms in the graphics and game development space, and they are so common there, that a lot of game developers I know of first start imagining ways to reduce problems to a scanline algorithm. That's why I said they'd be there, smirking in the background. They've known this trick for decades.

What we're going to do is walk over the walls of both bands at the same time, and then keep track of whether we're inside or outside of the corresponding band. When we transition from being outside both bands to being inside either band, or vice versa, we output a wall.

function unionBands(bandA, bandB) {
    // The resulting band from the union.
    var outWalls = [];

    // The index into the walls of both.
    var iA = 0, iB = 0;

    // Whether we're inside either band.
    var insideA = false, insideB = false;

    while (iA < bandA.numWalls || iB < bandB.numWalls) {
        var wallA = bandA.walls[iA];
        var wallB = bandB.walls[iB];

        // The next wall we want to process is the one on the left,
        // that is, the minimum of the two.
        var wall = Math.min(wallA, wallB);

        // Whether we're outside or inside the bands.
        var wasInside = (insideA || insideB);

        // Update our state for both bands, and advance to the next wall.
        if (wall === wallA) {
            iA++;
            insideA = !insideA;
        }
        if (wall === wallB) {
            iB++;
            insideB = !insideB;
        }

        // Are we now inside?
        var isInside = (insideA || insideB);

        // If we're transitioning from inside to outside or outside to
        // inside, then output a wall.
        if (isInside != wasInside)
            outWalls.push(wall);
    }

    return outWalls;
}

Yes, I know this code is actually broken. In the case where we run out of one set of walls, then we'll retrieve undefined from the array, and Math.min will return NaN. This is a complexity that I'm choosing not to deal with in the pseudo-code above.

Let's run through this, visually, in our heads.

  1. Pick the leftmost wall we haven't encountered. That's the wall on the left, 20. Now we're inside band A. Since we weren't inside either band beforehand, we output a new wall, 20.
  2. Pick the next wall. That's 30. We're now inside B, but since we were inside A before, we don't emit a wall.
  3. The next wall is 40. We exit A. Since we're still inside B, we don't emit a wall.
  4. The next wall is 50. We finally exit B. Since we're no longer inside either band, we emit a wall, at 50.

So it does work!

Implementing other operations

Now we've implemented the basic "union bands" algorithm. What about intersection, subtraction, and xor? Let's look at intersection first. The core of the algorithm is when we output a wall. For union, we output a wall when we transition from not being inside either band to being inside either band, or vice versa. For intersection, we should do it when we change from not being inside both bands to being inside both bands, and vice versa. So the only code that changes are the values of wasInside and isInside, which become insideA && insideB. In fact, for all four operations, there's a boolean test for every one.

In my actual implementation, not being content with ugly closures and, you know, actual code, I implemented each operation as a bitwise trick. But it doesn't have to be that way — all I'm doing is implementing the above four primitive operations a bit more concisely.

Putting it back together

Now we know how to merge bands together. How can we use this to our advantage to merge regions? While you might just start iterating over the bands of each region, and merging them together blindly, we have to be a tad more careful than that. For instance, given the "text editor and kitten" region example we started out with, we might first take the first band of the top region, that being the kitten window, and try to merge it with... well, what, exactly? The other region doesn't have a band there! So now we have to keep track of which regions have bands where, and the code quickly becomes littered with special cases. While we could do this, like the X server does, let's take a different approach.

The issue we're trying to work around is that we sometimes don't have bands to merge in certain cases, which complicates the state tracking of where we are in which region. But the power of computer science is that we can define our own rules. If there's no band where we want, why not make one up? It would be a band with no walls, but if you look closely at the algorithm I've defined above, assuming we fix the issue with running out of walls, it should work for a band with no walls.

This means that a region, in some sense, is defined everywhere. Point to a location anywhere in a region, and I will tell you which band contains it, even if I made the band up.

With that trick, we can simply iterate through all the bands in a region in a similar way. Iterate through bands looking for the one that starts lower, and the one that ends higher, and then merge the walls between them.

function combineBands(bandA, bandB, bandOperation) {
    // As above in unionBands, except we take a bandOperation
    // which is a function like:
    //   function(insideA, insideB) { return insideA || insideB; }

    // Implementation left as an exercise for the reader. :)
}

function combineRegion(regionA, regionB, bandOperation) {
    // A region's bands always starts with:
    //   { top: -Infinity, bottom: topOfNextBand, walls: [] }
    // and it always ends with:
    //   { top: bottomOfPrevBand, bottom: Infinity, walls: [] }

    var newBands = [];

    var iA = 0, iB = 0;
    while (iA < regionA.bands.length || iB < regionB.bands.length) {
        var bandA = regionA.bands[iA];
        var bandB = regionB.bands[iB];

        var top = Math.max(bandA.top, bandB.top);
        var bottom = Math.min(bandA.bottom, bandB.bottom);

        var walls = combineBands(bandA, band, bandOperation);
        newBands.push({ top: top, bottom: bottom, walls: walls });

        if (bottom === bandA.bottom)
            iA++;
        if (bottom === bandB.bottom)
            iB++;
    }

    return newBands;
}

And this is it! The only tiny thing here left is that sometimes this algorithm might not properly coalesce bands. If we have a union of two regions where one completely engulfs the other, we aren't yet smart enough to take the three resulting bands and merge them into one. That can be fixed by simply checking whether we're about to push a band that has the same properties as the last one; nothing too difficult.

Some Useful Regions...

And that's it! That's all you need to know about regions — how they're constructed, how they're operated with, and how they're used in the X server!

But you might be wondering, where might have I encountered regions before? Regions don't tend to be used very often, but you might notice their tell-tale signs. The most useful region you've encountered would be the 1-bit rounded border region. You know, it looks like this:

Before the desktop world had the Composite extension for true transparency, we used regions like this to simulate having rounded corners. But, of course, it was terribly sampled. Some applications, like Chromium, actually still do this in their own window decorations.

I have also mentioned that regions are an optimization of a 1-bit bitmap with different semantics. While bitmaps scale linearly with their size, regions however scale with the number of bands and walls: that is, how many times we toggle the region on and off. You can already imagine what the worst case for a region data might be.

It's a checkerboard. Each pixel in here has two walls. Each "on" pixel is two walls, and each row is its own band, so assuming that we use 32-bit numbers for the positions, 8 bytes per pixel, and 8 extra bytes per band to store the top and bottom. Since is checkerboard is 16x10, that means we have (16*4+8) bytes = 72 bytes per band, and we have ten of them, for 720 bytes. In a bitmap, each row could be packed into two bytes, for a total of 20 bytes. Yeesh.

Enlightenment used to use checkerboard regions like these to fake transparency, along with other tricks like copying the front buffer to its own windows. The Xorg of the day tended to crumble under the weight of such regions and tricks, but it was an important part of history that let us figure out how to move forward. As the COMPOSITE extension specification puts it, "clever hacks are an awfully close substitute for changes in the underlying system".

Window shapes, both on the output and input sides, are clearly the most useful case for complex regions like this. You likely won't need to write a window system like this any time soon. But I think the simplicity of the data structures and algorithms in play here are something we can all take lessons from. This isn't a constructed algorithms class example, this is a real example from real code to solve a real problem.

Next time...

I know, I know. I've been particularly bad at writing new entries to this series. One might assume it's because of a lack of time, or because I'm just "too busy with real work", but the reality is that I have plenty of time. I simply, have, until recently, lacked the motivation to sit down and finish an article. It's especially hard since I've chosen such a niche audience, that I'm disappointed when it doesn't get more recognition than it does.

This is a labor of love project for me. I get an incredible joy out of explaining things and turning them inside out; turning the esoteric and archaic into something accessible and fun. I'm not an incredible programmer by any means. If you've enjoyed this series thus far and want me to continue, the best way to do it would be to tell me you appreciated it. Send me questions, perhaps fix a few of my silly typos in a pull request. Nothing inspires me more than knowing I helped somebody, and that people are enjoying my work.

Recently, a handful of you have reached out, and because of that, I've picked this series back up again. I had a lot of fun writing the text and code for this article, and I've already started the next. It's "Expert Window Techniques", which covers the practical ways of combining X11's toolbox of windows to construct a working desktop environment. We'll go over window managers, taskbars, the ICCCM, the Extended Window Manager Hints, and all of that fun jazz. I can't promise a deadline for it, and I won't even make vague statements like "by the end of the year". It'll come out when I'm happy with it, when I'm motivated enough to complete it.

Thank you.