Skip to content

July 19, 2012

Filling PDF forms with PHP – Part 1

by Jeff

At work, we have oodles of databases containing untold treasures of information.  We also have boatloads of fillable PDF’s for just about every possible task. (It’s government work after all.)  As I walk around, I see these poor souls listlessly going back and forth between their database apps and their PDF forms, copying and pasting.  I finally decided that surely I could help the disenfranchised masses remove ten or twenty Ctl-C/Ctl-V operations out of every form they fill out.  Nobody wants to spend their day making “copypasta”.   So, I’m secretly (never tip your hand until successful) undertaking this as a side project. (Because apparently I have something against sleeping.)

So, my initial tests look promising, but I haven’t generated a lick of PHP yet, so don’t get excited.  Hopefully this will give you an idea as to what I’m doing.

Here’s the gist of it:

The Swiss Army Knife of PDF tools, pdftk, can generate an FDF file from a PDF form like so:

pdftk a_pdf_form.pdf generate_fdf

The generated FDF file had a few odd characters (like ^@ and þÿ) that I had to scrub out to make it useful.  That could be my environment, your mileage may vary. (CentOS 5.8, pdftk 1.44, BTW)  If you have the same, I used the following 2 commands in vim to scrub them out:

:%s/^@//g   #Note you get ^@ by typing Ctl-v then Ctl-@

Now I had a raw, blank FDF to work with, but all my fields were out of order and had names like “TextField[1]” or “CheckBox5[0]”, which was further made ugly by the fact that the names repeat for each row of fields nested inside the PDF’s table structure.  Icky.  My first thought was to enter values for every text field like “textField1″ to “textFieldn”, but when I laid the data back over the form, there was no semblance of order and it was going to be a nightmare.  I decided to go back and fill out the original PDF with descriptive names, regenerated my FDF, and cleaned it up like above.  Now I had an FDF where I could tell up from down, mostly.  The pertinent parts of the FDF look like this:

/V /1
/T (CheckBox5[0])
/V /
/T (CheckBox5[1])
/V (SSN4)
/T (TextField[1])

I still can’t discern the checkboxes, so some trial and error will be necessary there.  The basics of it are thus:  The value is in the “/V” line and the field type/designator is the “/T” line AFTER it. Text values go inside parentheses.  For checkboxes, a lone slash means unchecked.  The checked value will depend on your form.  The PDF spec calls for “(Yes)” for a checked box.  The form I was working with uses “/1″.  Trial and error.  Good luck on that.

If you change some values in the FDF and want to try generating a filled form, you use the following syntax:

pdftk original_pdf.pdf fill_form generated_and_modified_fdf.fdf output new_filled_pdf.pdf

If all goes well, you’ll have a neat new filled PDF.

That’s great, you say, but how does this work with PHP?  As I see it, you use the FDF file as a template by inserting placeholders in each of the values.  You parse and replace them, then merge with the original PDF to get your filled form. (using shell_exec() or similar)  I’ll work that part out and write Part II.

If you work it out first, or already have and really like me, post it in the comments.

Read more from PHP

Leave a Reply