Programming data for display: the PDF story

At the 2017 Papers We Love Conf, a previously-scheduled speaker fell ill. With just 90 minutes to go before the vacant slot, the organizers asked me if I could fill in. I didn't have any sort of appropriate talk prepared, but given my long history working with PDF documents, I thought I'd be able to put together a reasonably-entertaining presentation on the history, heritage, and design decisions that led to the PDF file format and specification while living up to the high standards and expectations of the Papers We Love community.

I was so relieved that the result was well-received!

The buildup

I said at the top that I had just 90 minutes to prepare the talk. How that went down is a good story…

Since I was volunteer staff at the Papers We Love Conf, I heard pretty early that one of the scheduled speakers was ill. At first there was hope that she would be able to recuperate enough to present, but by noontime, that had evaporated. Darren Newton, one of the lead organizers for the event, asked me if I'd be able to fill in. I said "yes", but it wasn't a done deal yet: Zeeshan Lakhani, the other lead organizer, was working on maybe convincing a Strange Loop keynote speaker to take the spot. While that process churned along, I told Darren I'd be back in a little while, as I had a video hangout with my family planned for lunchtime.

At some point, Darren thought I'd simply gotten cold feet and disappeared, so there was some period of high anxiety between him and Zeeshan, especially once the other Strange Loop keynote speaker eventually declined to fill in. I made my way back down to the conference space after my family call, and got some lunch. There followed a flurry of Twitter DMs and then a confirming conversation with Darren about the topic I had in mind.

I then rushed back to my hotel room to snag my laptop, and sat down at the conference swag table (of all places) just before 2:00pm to figure out exactly what I'd be talking about. To settle my nerves, I asked to no one in particular if someone could procure a beverage; David Ashby (another conference volunteer) heard the call and showed up with an old fashioned in a coffee cup 5 minutes later. I plugged in my headphones, and settled in for the most intense ~90 minutes of outlining and brainstorming and googling and slide preparation of my life. The result was never going to be "done", or exactly what I wanted, but around 3:25pm, I sidled up to the podium and gave what I could.

The Papers We Love Conf site has a page for the talk that includes video of the result and its abstract and references.

The slides for the talk — such as they are, given time spent preparing them! — are here:

Though Papers We Love talks are generally motivated by one or many academic papers that are influential in their field or that the speaker personally finds illuminating or inspiring, that was not true in this case. Since the most common page description languages — PostScript and PDF, both of which I discuss in the talk — were developed and refined within commercial organizations during a period where it was rare for such organizations to publish findings, there sadly aren't any published papers to love. Thus, their history is far less well-established than other common and important technologies.

Though there are a handful of internal corporate memos that provide a window into the motivations of the engineers at the time, our best source of information on the development of page description languages, and PostScript and PDF in particular, has been passed along via narrative histories. Those were the sources I relied upon most in forming the talk, much of which I've simply internalized over many years of working with PDF documents.

Reception and revision

After I delivered the talk, and throughout the rest of the week at Strange Loop (with which PWLConf was co-located), I had numerous conversations with people that had seen the talk, or heard about it later (perhaps since the snap preparation is a pretty good story). I came away surprised on a few fronts:

  1. People really enjoyed the talk! Perhaps because I've been working with PDF documents for so long, I didn't expect such an enthusiastic reaction from a "general" audience (even among software professionals).
  2. There is a deep desire among many to better understand the things we use and rely upon every day. In computing, PDF is absolutely one of those things, since it is used so pervasively not only for publishing (i.e. as electronically distributable paper), but for data interchange in the most important and sensitive domains. The latter is obviously of great importance to me, us here at PDFDATA.io, and our customers that — for better or worse — rely upon PDFs as a data source…but again, I underestimated the broader level of interest.

With this in mind, I am newly resolved to dig deeper into the history and heritage of PDF, which I plan to publish in future posts here and disseminate as widely as I can through future events. As exciting as the future of computing might be, it is incredibly important that we have a solid grounding in the why and how of the present state of computing, and PDF is a big part and reflection of that.

Have thoughts on this post? Let us know via Twitter @pdfdataio.

Start with PDFDATA.io for free

Check out our service plans, dig into our friendly API reference.

You'll be extracting data from your PDF documents in minutes.