Working with PDFs From the Command Line
July 22, 2018, updated November 11, 2022
Install Some Tools
sudo snap install pdftk
Create a PDF from a Single or Range of Pages from Another PDF
pdftk input.pdf cat 2-2 output page2.pdf pdftk input.pdf cat 1-2 output pages1-2.pdf pdftk A=in1.pdf B=in2.pdf cat A B output out1.pdf pdftk A=in1.pdf cat A1-12 A14-end output out1.pdf
Split a PDF up into Individual Page PDF Files
pdftk input.pdf burst
Notice this will generate a doc_data.txt file and individual PDF files for each page of the document.
Join Multiple PDF Files Into One
pdftk 1.pdf 2.pdf 3.pdf 4.pdf cat output merged.pdf
or
convert 1.pdf 2.pdf 3.pdf 4.pdf merged.pdf
Note that in my experience convert will result in a low quality output using the default options. pdftk seems to give a better result.
Extract Text from a PDF
pdftotext input.pdf output.txt
Extract Images from PDF
pdfimages input.pdf prefix
Notice that all of the output ppm images are inverted in color.
Create a New PDF from Extracted Images
convert -negate *.ppm output.pdf