Grab Annotations from a PDF with pypdf2
There are some situations where I don’t have access to my speaker notes. Usually this is a good reason, such as I have mirrored my displays so I can demo or play a video without fiddling with my display settings in the middle of a talk. Sometimes, it’s because something bad happened and I’m presenting from someone else’s machine or a laptop that’s completely off stage and I only have the comfort monitor. For those situations I use a printed set of backup speaker notes so I thought I’d share the script that creates these.
First, a complete aside
If you’d like to be ready to support a conference speaker with tech fail, read the post about presenting from PDF and install the tools on your laptop. A bunch of my colleagues past and present have done this and I’ve hugely appreciated that support! Almost every presentation tool can export to PDF so if the borrowed laptop has presentation tools you at least get the view that has the timer and the next slide.
Script to drag titles and annotations out of a PDF
Speaker notes are usually applied to PDF slides as annotations that are not in the visible space (they’re like off the top left or something). You can use this approach for other PDF annotations too.
With the following python code in notes.py
:
import sys
import PyPDF2, traceback
import pprint
from subprocess import call
try :
src = sys.argv[1]
except :
src = r'/path/to/my/file.pdf'
# put the role into the rst file
print('.. role:: slide-title')
print('')
input1 = PyPDF2.PdfFileReader(open(src, "rb"))
nPages = input1.getNumPages()
for i in range(nPages) :
# get the data from this PDF page (first line of text, plus annotations)
page = input1.getPage(i)
text = page.extractText()
print(':slide-title:`' + text.splitlines()[0] + '`')
print('')
try :
for annot in page['/Annots'] :
# Other subtypes, such as /Link, cause errors
subtype = annot.getObject()['/Subtype']
if subtype == "/Text":
print(annot.getObject()['/Contents'])
print('')
except :
# there are no annotations on this page
pass
print('')
To use this, run python notes.py [pdf file name]
.
The script will output rst
content (that I then use with rst2pdf) with the first line/title of each slide and the annotations associated with it. Even if there are no annotations, the title is still added which can be useful for just keeping track of which slides are coming up when you don’t have any other support information.
It’s a simple script but I found it handy (and will probably find it here another day and use it for something else, which is exactly the point of a blog). I hope it’s useful to you too!
Also published on Medium.
Thanks a lot for this Lorna Jane.
I think your Python code lost indentation after posting it in this page.
Do you have that notes.py file shared on github gist or alike?