Combining PDF Files With Pdftk
For years I’ve used PDF Shuffler for this sort of thing but I wondered if there was an easy way of doing this from the command line this time, since I literally wanted to glue together a bunch of files one after another. Predictably, there is and it’s called pdftk – the PDF Toolkit.It does exactly what I need, but I initially stumbled at supplying all the PDFs in order. My files are called 1-intro.pdf, 2-basics.pdf, and so on. Which is fine but there are 14 of them and when sorted, 9-oop.pdf is the last entry :) I untangled this with:
ls *-*.pdf | sort -n > files.txt
One vim macro later and I had them all on one line (yes, I realise there must be a better way to do this, leave me a comment and tell me what it is!), and so I could pass them into pdftk:
pdftk 1-intro.pdf 2-basics.pdf 3-strings.pdf 4-arrays.pdf 5-functions.pdf 6-files.pdf 7-config.pdf 8-qstyles.pdf 9-oop.pdf 10-http.pdf 11-api-data.pdf 12-security.pdf 13-databases.pdf 14-tips.pdf cat output all.pdf
So there you have it, a great little tool that I will immediately forget the name of so hopefully I’ll remember to come and read my blog to remember what to do …
[geshi lang=bash]ls *-*.pdf | sort -n | xargs[/geshi]
will combine the output into one line. You could also:
[geshi lang=bash]ls *-*.pdf | sort -n | xargs -I % pdftk % cat output all.pdf[/geshi]
to do everything all in one step (-I with gnu xargs, -J on bsd xargs).
Wonderful, thanks John!!
pdftk is also rather handy for SEO when you’re building sites which contain PDFs supplied by other people. You can use it to change the metadata for an existing PDF without having to rebuild the layout (which I’ve found is frequently a lossy process depending on how the PDF was authored.) Google uses the metadata title in the search results, so you can make your listings consistent (‘dump_data’ and ‘update_info’ are the switches you need.)