Explanations

Play, don't show

Last time...

Last time, we talked about some basic principles of 2D graphics and how one might implement a basic 2D graphics rasterizer. Today, we'll take a step back and look a bit more carefully about what we did. Along the way, we'll develop a more generalized and abstract model for 2D graphics and rasterization, and from that new understanding, start building the standard toolbox of techniques that is commonly seen in vector graphics: affine transformations, filled polygons, Bezier curves. Finally, we'll contrast this with contemporary 3D graphics and GPUs, and see if we can find some commonality between our approaches.

It's worth pointing out that while we do have a "standard toolbox" consisting of strokes, fills, line caps, mitered joints and Bezier curves that's seen consistently from HTML5's <canvas> to Microsoft's GDI, it wasn't always this way. Though we rarely hear of it today, a little language known as PostScript remains quite possibly the largest influence for programmers and artists that interact with 2D vector graphics. Since it's important (and fun!) to learn, let's take a trip back in time. Back before CRTs and pixels. Back, all the way, to the 1960s.

The rise of PostScript

The early days of computer graphics were dominated by "plotter printers" which operated on a pen-down, pen-up kind of fashion, usually driven by some form of basic ASCII command language. HP, manufacturer of some of the earliest plotter printers, used a variant of BASIC called "AGL" which sent commands in a proprietary language, HP-GL, to the printer. During the 1970s, we saw the rise of affordable graphics terminals, starting with the Tektronix 4010. It has a CRT for its display, but don't be fooled: it's not a pixel display. Tektronix came from the analog oscilloscope industry, and these machines work by driving the electron beam dynamically, not in a grid-like order. As such, the Tektronix 4010 didn't have pixel output. Instead, you sent commands to it with a simple graphing mode that drew lines but, again, in a pen-up pen-down fashion.

Like a lot of other inventions, this all changed at Xerox PARC. Researchers there were starting to develop a new kind of printer, one that was more computationally expressive than what was seen in plotters. This new printer was based on a small, stack-based Turing-complete language called Forth, and they named it... the Interpress! PARC, obviously, was unable to sell it, so the inventors jumped ship and founded a small, scrappy startup named "Adobe". They took Interpress with them and tweaked it until was no longer recognizable as Interpress, and they renamed it PostScript. Besides the cute, Turing-complete stack-language language it comes with to calculate its shapes, the original PostScript Language Reference marks up an Imaging Model in Chapter 4, near-identical to the APIs we widely see today. Example 4.1 of the manual has a code example which can be translated to HTML5 <canvas> nearly line-by-line:

/box {
    newpath
    0 0 moveto
    0 1 lineto
    1 1 lineto
    1 0 lineto
    closepath
} def

gsave
72 72 scale
box fill
2 2 translate
box fill
grestore
function box() {
    ctx.beginPath();
    ctx.moveTo(0, 0);
    ctx.lineTo(0, 1);
    ctx.lineTo(1, 1);
    ctx.lineTo(1, 0);
    ctx.closePath();
}

ctx.save();
ctx.scale(72, 72);
box(); ctx.fill();
ctx.translate(2, 2);
box(); ctx.fill();
ctx.restore();

This is not a coincidence.

Apple's Steve Jobs had met the Interpress engineers on his visit to PARC. Jobs thought that the printing business would be lucrative, and tried to simply buy Adobe outright. Instead, Adobe countered and instead sold a five-year license for PostScript to Apple. At the same time, Jobs and Apple were funding a small startup, Aldus, which was making a WYSIWYG app to create PostScript documents, "PageMaker". In early 1985, Apple released the first PostScript-compliant printer, the Apple LaserWriter. The combination of the point-and-click Macintosh, PageMaker, and the LaserWriter singlehandedly turned the printing industry on its head, giving way to "desktop publishing" and solidifying PostScript its place in history. The main competition, Hewlett-Packard, would eventually license PostScript for its competing LaserJet series of printers in 1991, after consumer pressure.

PostScript slowly moved from being a printer control language to a file format in and of itself. Programmers eventually noticed the underlying PostScript sent to the printers, and started writing PostScript documents by hand, introducing charts and graphs and art to their documents. Demand sprung up for PostScript outside of the printer. Adobe noticed, and quickly rushed out the Encapsulated PostScript format. It used specially formatted PostScript comments to give metadata about the size of the image, its preferred orientation, and so on, and restricted the use of the more "printer-y" commands, like "page feed". That same year, 1985, Adobe started development on "Illustrator", an application for artists to create Encapsulated PostScript files through a point-and-click interface. These Encapsulated PostScript files could then be placed into Word Processors, which then created... PostScript documents which were sent to PostScript printers. The whole world was PostScript, and Adobe couldn't be happier. Microsoft, while working on Windows 95, wanted to create its own graphics API for developers, and a primary goal was making it PostScript-compatible so the graphics could be sent to a printer as easily as a screen. This API was eventually released as GDI. The only major problem with PostScript was its Turing-completeness — viewing page 86 of a document means first computing pages 1-85. And that could be slow. Adobe caught wind of this user complaint, and decided to create a new document format that didn't have these restrictions, called the "Portable Document Format", or "PDF" for short. A quote from the PDF specification, Chapter 2.1, "Imaging Model": At the heart of PDF is its ability to describe the appearance of sophisticated graphics and typography. This ability is achieved through the use of the Adobe imaging model, the same high-level, device-independent representation used in the PostScript page description language. By the time the W3C wanted to develop a 2D graphics markup language for the web, Adobe championed the XML-based PGML, which had the PostScript graphics model front and center. PGML should encompass the PDF/PostScript imaging model to guarantee a 2D scalable graphics capability that satisfies the needs of both casual users and graphics professionals. Microsoft's competing format, VML, was based on GDI, which was based on PostScript. Both standards were eventually combined to make up W3C's "Scalable Vector Graphics" ("SVG") technology.

Even though it's old, let's not pretend that the innovations PostScript brought to the world are anything less than a technological marvel. Apple's PostScript printer, the LaserWriter, had a CPU twice as powerful as the Macintosh that was controlling it, just to interpret the PostScript and rasterize the vector paths to points on paper. That might seem excessive, but if you were already buying a fancy printer with a laser in it, the expensive CPU on the side doesn't seem so expensive in comparison. In its first incarnation, PostScript invented a fairly sophisticated imaging model, with all the features that we take for granted today. But the most powerful, wowing feature? Fonts. Fonts were, at the time, drawn by hand with rule and protractor, and cast onto film, to be printed photochemically. In 1977, Donald Knuth showed the world what could be with his METAFONT system, introduced together with his typesetting application TeX, but it didn't catch on. It required the user to describe fonts analytically, using brushes and curves, which wasn't a skill that most typographers really wanted to learn. And such fancy curves turned into mush at small sizes: the printers of the time did not have dots small enough, so they tended to bleed and blur into each other. Adobe's PostScript proposed a novel solution to this: an algorithm to "snap" these paths to the coarser grids that printers had. This is known as "grid-fitting". To prevent the geometry from getting too distorted, they allowed fonts to specify "hints" about what parts of the geometry were the most important, and how much should be preserved.

Adobe's original business model was to sell this font technology to people that make printers, and sell special recreations of fonts, with added hints, to publishers, which is why Adobe, to this day, sells their versions of Times and Futura. Adobe can do this, by the way, because fonts, or, more formally, "typefaces", are one of five things explicitly excluded by US Copyright Law, since they were originally designated as "too plain or utilitarian to be creative works". What is sold and copyrighted instead is the digital program that reproduces the font on the screen. So, to prevent people from copying Adobe's fonts and adding their own, the Type 1 Font format was originally proprietary to Adobe and contained "font encryption" code. Only Adobe's PostScript could interpret a Type 1 Font, and only Adobe's Type 1 Fonts had the custom hinting technology allowing them to be visible at small sizes. Once Adobe pivoted towards the prosumer software market instead, they later standardized and released the specifications to Type 1 Fonts.

Grid fitting and font hinting, by the way, was so universally popular that when Microsoft and Apple were tired of paying licensing fees to Adobe, they invented an alternate method for their font file competitor, TrueType. Instead of specifying declarative "hints", TrueType gives the font author a complete Turing-complete stack language so that the author can control every part of grid-fitting (and coincidentally avoid Adobe's patents on declarative "hints"). For years, the wars between the Adobe-backed Type 1 and the TrueType raged on, with font foundries being stuck in the middle, having to provide both formats to their users. Eventually, the industry reached a compromise: OpenType. But rather than actually decide a winner, they simply plopped both specifications into one file format: Adobe took Type 1 fonts, removed the encryption bits, gave them a small amount of spit polish, and released CFF / Type 2 fonts, which were grafted into OpenType wholesale as the cff table. TrueType, on the other hand, got shoved in as glyf and other tables. OpenType, while ugly, seemed to get the job done for users, mostly by war of endurance: just require that all software supports both kinds of fonts, because OpenType requires you to support both kinds of fonts.

Of course, we're forced to ask: if PostScript didn't become popular, what might have happened instead? It's worth looking at some other alternatives. The previously mentioned METAFONT didn't use filled paths. Instead, Knuth, in typical Knuth fashion, rigourously defines in his paper Mathematical Typography the concept of a curve that is "most pleasing". You specify a number of points, and some algorithm finds the one correct "most pleasing" curve through them. You can stack these paths on top of each other: define such a path as a "pen", and then "drag the pen" through some other path. Knuth introduced componentization and recursion to path definition. Knuth's thesis student, John Hobby, designed and implemented algorithms for calculating the "most pleasing curve", "flattening" the nested recursion of paths, and rasterizing such curves. For more on METAFONT, curves, and the history of font technology in general, I highly recommend the detailed reference of Fonts & Encodings, and the papers of John D. Hobby.

... Apologies if that was too long. I hope that was as fascinating to you as it is to me. Let's get back to our toy rasterizer.

Establishing Coverage

In the first article, we established the idea of "coverage", and the process of "rasterization", which takes our ideal shape and lossily maps it onto the pixel grid. So, for now, let's continue our understanding of shapes and coverage by tossing the color out the window. The fancy fill styles, the colors, the gradients, they weren't really that important: they were sort of the final shiny effect after we rasterized our shape. But for now, let's make bigger shapes. Better shapes. Polygons and polylines and fancy splines under the influence of affine transformations. Where we're going, we don't need color, we need coverage.

As a refresher, let's give you some basic coverage. Basic, geometric, functional methods of coverage. You give me a function that tells me whether it's inside or outside the shape, I plot it on the grid. You ready?

You'll be seeing this sort of code editor show up across this article. Try to both read and understand the reference code. To help you out, you can modify and play with it! Don't be afraid, it won't freeze your browser, even if you stick in a while(true) ; in there. You can also resize the final output canvas to make it taller if you want more room to play.

Also, just so you're aware, graphics code tends to have short and sweet variable names. "x" and "y" should hopefully be pretty obvious, but conventions like "dx", "cx", "sy" are a bit less obvious. In my code, "dx" means "distance x" or "delta x" and means a difference between two X coordinates. The "c" in "cx" tends to stand for "center", and the "s" in "sy" tends to stand for "shape", as in, the shape's coordinates. "w" is width, "h" is height, "hw" is "half-width". You'll also see "x2" which tends to mean "the right side of a rectangle", aka "x+w". The more you play with it, the more you start to recognize and be able to read graphics code. I tell you now because when you go look it up, you'll see this sort of stuff everywhere, and it's a really good habit to figure out how to read this more... terse sort of graphics lingo. Everyone has their own style, and I certainly am not the best or cleanest programmer, but it matches the level of terseness you'll see among the industry. It can be challenging, but if I can figure it out, so you can you. I promise.

Combining Shapes

Now that you're familiar with the basics, let's start off with a puzzle. Let's say we wanted to build a long, "capsule" type shape. It has caps for circles at the ends, connected with a giant rectangle in the middle. Using the tools we have right now, can we combine them to make this new shape?

We can, of course, combine shapes by using boolean operators. In the above example, we're considered inside the capsule as long as we're inside any of the shapes: the two circles or the connecting rectangle. This technique goes back to the earliest days of computer graphics, but modern variations on the technique have brought it back from the realm of the forgotten.

Of course, we can also use all the boolean operators available to us, in the same way we calculated our bands in Regional Geometry, to union, intersect, subtract, and xor shapes. What kinds of shapes can you come up with?

Squash and Stretch

We've mostly played with circles and rectangles, and it's easy to see why: they're extremely basic and simple primitives. But sometimes we want things that move with weight, things that have elasticity, things that morph and rotate and skew. It's important for animation, after all. We can easily adjust our coordinates of a rectangle to make it wider than tall, but it's a bit unclear about how to do that with a circle. And what about other transformations besides simple scaling, like skewing or rotating? It's unrealistic to modify all of our primitives to support this. How can we easily, generically implement transformations?

It becomes easy by changing our perspective. Instead of changing our shapes, we change how we sample it. To make an ellipse that's twice as long, we put our samples closer together along the X axis. You can also think of this as "shrinking" our pixel grid. When we unshrink the pixel grid back to normal again, we find an elongated circle.

We can actually implement a lot of different kinds of transformations this way. By morphing our grid in weird ways, we can implement something along the lines of Photoshop filters to our shape. This functionality isn't something usually provided in typical 2D graphics libraries. There, they restrict our ability to squash and stretch to something a bit simpler to render, called a "linear transformation".

A linear transformation is one that only adds the inputs together, without using anything like sines or divisions. Any linear equation can be normalized to take the form x' = 1*x + 3*y, where we line up the variables in order and put the multiplicative terms. As a notational convenience, we can drop the "template" part of the expression and just jot down the coefficients, in a sort of box shape. The resulting box is known as a matrix. Matrices have lots of uses in math and computer science and graphics but it's easy to forget that it means the same as the above set of linear expressions. You might have seen these matrices before. If you're having trouble with matrices, try converting them to expression form — I often find the expression form is easier to read and understand.