top of page

Teaching Thursday: A Post-PDF World

  • May 8, 2025
  • 3 min read

Next year, my institution will implement a new series of accessibility guidelines for classes. Among many of these, most of which are reasonable (e.g. alt text on images and so on), is a policy discouraging us from using PDF files to share texts in our classes. Needless to say, this is a massive sea change especially since PDFs remain the backbone of the publishing business not only in production but also in terms of distribution and archiving published work.

That said, PDF’s do have accessibility challenges. They are not easily scalable and reflowable, and this makes it difficult for them to be viewed on different devices and expanded, for example, for the visually impaired. Shifting away from the PDF seems like a good thing to me, even if it will represent a significant inconvenience for fields that have long relied on the PDF as gold standard for portable documents. (And I shudder to think of the accessibility challenges that my colleagues in fields that are heavily dependent on, say, data or specialized applications will face!).

As I reviewed my syllabi, I quickly became comforted by the realization that recently published articles often appear in multiple formats (or at least HTML and PDF). At the same time, I realized that many of the older articles and book excerpts will need to be converted. This led me down a rabbit hole.

First, I decided, I would be cutting edge and us ChatGPT to convert my scanned book chapters into markdown. I instructed ChatGPT to remove the page headings and put the page numbers in brackets. The initial results were promising, but there were no breaks between paragraphs. So I asked ChatGPT to add breaks between paragraphs. This is where the madness started. First, it put breaks between every sentence. This is not helpful. I asked ChatGPT not to do that and instead put breaks before every indented line. This caused ChatGTP to put every word on a separate line. This was not what I wanted it to do. At this point, I got moderately frustrated and decided that this would not be the best way to convert these documents. It boggles the mind that ChatGPT can perform so consistently on handwritten field notebook pages can  

I then turned to ABBYFine Reader and used its solid OCR engine to convert the text. The upside is that it does much better with preserving formatting (to the point of mirroring the formatting on left and right pages), but the downside is that converting a text using this method still requires a good many manual interventions to make it comfortable and attractive to read.

Doing this did introduce some new practical concerns. First, I realized that I need to find a way to integrate endnotes or footnotes into my OCR-ed text. It may be that this has to be done manually. As someone who is a bit amphibious — working between humanities disciplines committed to footnoting and endnoting and the social sciences use of in-text citation — one wonders whether accessibility alone will warrant a shift to in-text citation over footnotes or endnotes?  

Second, I started to wonder whether this would shape the content that I assign in class. At present, I try to syncopate my use of older scholarship with newer content. This not only serves to give my courses a bit of historiographic depth, but also demonstrates continuity in the discipline. Since JSTOR remains the most widely accessible archive of scholarly publishing and it is deeply dependent on the PDF as the media for scanned content, it will be interesting to see how it (and we) adapt.

Third, there seems to be a curious carve out for books. On the one hand, paper books seem to be exempted from these policies even as they are not generally the most accessible media for students. On the other hand, one can’t help but wonder whether paper books are at risk in the near future. The push toward Open Access publishing and increasingly rigorous standards for accessibility on the other make books a complicated option for courses. This comes at a time when convincing students to spend time in the library has become more and more challenging and libraries face a growing number of budgetary challenges.

Finally, one also wonders how this is all tangled up in copyright law. One can easily see how the moral, ethical, and institutional requirements for accessibility bump up against the copyright and fair-use rules. More than that, I suspect that the growing interest in accessibility is driven in some part by the pivot in the publishing industry from printing books (and journals) to managing content and mediating access. It would not be beyond the realm of possibility that we’re witnessing an effort from commercial publishers to assert themselves in a different way in content management. 

Recent Posts

See All
Summer Reading List

Every summer, I put together a reading list that is mostly aspirational. It’s a combination of books I want to read, books I should read, and books that I have to read for my research or just being a

 
 
 
Dolia

This past week involved a good bit of travel and this meant that I had some time to read in flights and in airports. I spent a good bit of that times with Caroline Cheung’s recent-ish book on Dolia: T

 
 
 

Comments


bottom of page