Explaining X11 for the rest of us


Last year, I wrote an article describing the free and open-source graphics stack, explaining all of the interconnected pieces: X11, graphics drivers, DRM, and DRI, and their roles in getting pixels placed on the screen. In the year since, I've answered a lot of really good questions from the Linux community about my post, about X11, and about Wayland. I've learned a lot. Several new technologies have appeared, and I have started to take a more active role in this myself, as part of our efforts to port GNOME to Wayland.

However, I still field plenty of questions from lots of people about this, and a lot of the time, it's extremely simple stuff: "What is X?" "How does it interact with my graphics card and mouse/keyboard?" "What do apps use X for?" "What is Wayland, and how does it fit into the picture?" "What problems did X have that made us want to write new display server technologies?"

These sort of questions were what inspired me to write "The Linux Graphics Stack" in the first place, but there's really never been a comprehensive, historical writeup of our display server technologies in general. So, I chose to spend my free time at Red Hat writing it.

Now, compared to some of my colleagues, I'm relatively new to the Linux development scene, with only five or so years of experience. I've also arrived late, way after we've added lots of hacks to keep the X Window System fresh. Most of my knowledge comes from talking to my more experienced coworkers in person, digging through mailing list and newsgroup archives, and IRC discussions with the great sports in #xorg-devel. So, there's a good possibility that I'll get something wrong, guy-who-worked-at-DEC-in-the-80s-waiting-to-correct-me! I'm genuinely sorry if I misrepresent something. If you want to discuss this at all, shoot me an email at jstpierre@mecheye.net, and let's try to work something out!

I also really don't want to inject my bias into this too much. While I do personally believe Wayland is going to become The Linux Display Server Of The FutureĀ®, and that is a good thing, I'm not trying to rip X apart! My intention here is twofold: I want to explain a lot of the more undocumented or harder-to-grasp features built into X11 (the permutations of input grabs, for instance), and I also want to provide an accurate, historical context on why a certain feature was added (COMPOSITE, for instance), and all the clever hacks we've invented along the way (WM_TAKE_FOCUS, for instance). I really want this to be a resource for somebody looking to learn. If I've inspired anybody out there to write any code, I've accomplished my goal.

But before we begin...

Since this article contains a lot of interactive demos relying on fairly modern browser technology, let's make sure that everything is OK before continuing. I've been testing this on modern versions of Firefox Nightly and Chrome Canary.

If you can see the stipple pattern above, that means that your browser is modern enough to see the interactive demos.

You might have noticed that when you ran your mouse over the stipple, your cursor changed. That's because this isn't just any old stipple image, that stipple is actually the background of a full X server session running in your browser using HTML5 canvas. All of the interactive demos will use this framework to explain what's going on under the hood.

Basic Architecture

Although it may sound a bit stilted, notice how I keep saying "the X Window System" instead of the more traditional shorthands "X", "X11", or "Xorg"? I want to be very careful to separate the ideas and design of the system from its component parts.

The X Window System is a networked display system. A server component, the X server, is responsible for coordinating between all of the clients connected, taking input from the mouse and keyboard, and pushing pixels on the output. The most popular X server implementation is the Xorg X server, developed by the X.Org Foundation and community. There are other X server implementations: you might remember that Xorg was forked from XFree86 a decade ago, that Sun Microsystems has had several X server implementations, in both Xsun and XNeWS. Today, Xorg is the dominant X server implementation, getting most of the development. But back in the day, multiple competing implementations existed.

X servers and X clients all talk a standardized network protocol to each other, known as X11. This protocol is well-specified, from the wire format to the semantics of every request. The protocol documentation linked above is invaluable documentation for any hacker who wants to learn more about this stuff.

Applications and toolkits don't write the wire format onto the socket directly, however. They often use client libraries that implement the protocols, like the traditional Xlib library, or the somewhat newer xcb.

I'm going to try and be precise in my nomenclature in this article.

When I talk about features or design decisions of the overall system, I will try to call it the X Window System, even if it sounds a bit verbose. e.g. The X Window System provides us Pixmaps, which are images in the server's memory.

When I talk about features or details of the network protocol, I will talk about the X11 protocol. e.g. The X11 protocol provides for a generic extension mechanism, which allows for a forward-compatible way to implement new features without having to redesign older parts of X11.

When I talk about the behavior of a client or a server, I'll say X client or X server e.g. Using the MIT-SHM extension, X clients can pass memory buffers to the X server using POSIX shared memory, which prevents networking and large copies.

When I talk about features or the architecture in the Xorg X server implementation, I'll mention it explicitly as Xorg or the Xorg X server, e.g. In order to make drawing calls accelerated, Xorg video drivers can provide hardware-accelerated versions of certain drawing primitives through EXA.

If I ever say "such-and-such is a feature of X", it's a bug.

Table of Contents

Credit where credit's due

The source code to this article and all demos is freely available under the MIT/X11 license on GitHub for inspection. I've tried to make the source code well-commented, and be a useful resource for learning from as well, so for a third dimension of learning, you can inspect my source code for the server and all the demos and figure out how things work that way.

Some of the more tricky code in there has been ported from the Xorg X server itself along with its helper library, pixman.

Some of the images in this article have been adapted from other images on the web. The two kitten images in the Expose event demo were adapted from photos taken by quatre mains and fazen, shared under a CC-BY-2.0 license. Thanks to [placekitten] for finding these images for me.

The cursors and icons used in the demos have been adapted GNOME's default theme, Adwaita. Some of the icons in the inspector have been adapted from the GNOME Art Libre Symbolic Icon Theme, shared under the LGPL and CC-BY-SA-3.0. All rights reserved.

Special thanks to Keith Packard, Alan Coopersmith, Adam Jackson, Peter Hutterer, and Owen Taylor for digging into the source code, protocols, mailing lists and other archives to help me figure out some of the more odd and hairier X11 semantics when I was working on this. Sorry for driving you guys crazy with this project.

If you have any questions or comments, or just want to thank me, feel free to email me at jstpierre@mecheye.net, contact me on IRC on Freenode as Jasper, or open a GitHub issue here for this repo. I read every piece of mail I get, and I try to reply to everything as well.