Monday, May 31, 2010

PDF::API2 is Underrated

I feel like there has not been enough talk about PDF::API2 in the Perl community. Searching for code examples and tutorials does not yield many results, however it does lead to this tutorial, which is the only useful one that I have been able to find. It is worth reading. Oh, and it's written by my boss ;) Go and read it if you plan on using PDF::API2 in the future.

At work, for the past year, we have used PDF::API2 to create this nice looking report and many others like it. Since then, I have learnt a number of lessons and picked up a few tips for using this powerful library, so here are some of them. This is by no means a tutorial, or even an introduction to PDF::API2, so if you are looking for something like that, refer to the tutorial I referenced in the previous paragraph. This is more of a coredump of the part of my brain that knows PDF, so to speak.

The coordinate system is that of a Cartesian plane

Personally, every toolkit I have used in the past (which, admittedly, is not a long list) has placed the origin (0, 0) in the top-left corner. I assume, without actually researching, that this was because it was easier for developers to think about placing the top-left corner of a widget/window around the screen. We (mostly) read articles and books and posters from the top-left to the bottom-right.

However, coordinates in PDF::API2 work differently; the origin resides at the bottom-left of the document, just like the first quadrant of a Cartesian plane. Once you think of it this way, it becomes a lot easier to visualise what you are doing and the placement of your content.

Work from the bottom-left

This is linked to the previous point. If you are factoring out specific data or widgets into separate classes or functions, specify the position of the data by the bottom-left coordinate, rather than the top-left. This saves a lot of recalculating when you have to use various text/graphics object functions.

I prefer to refer to the coordinates as 'bottom' and 'left', which clears up any confusion that using 'x' and 'y' might introduce. I recommend doing the same.

Occasionally you may need to work from the top of your widget, e.g. for tabular data, in which case you can simply use the bottom + height to get the top-most Y-coordinate.

Useful CPAN documentation

PDF::API2 documentation is shocking. It's horrible. It's almost non-existant. However, PDF::API2::Content contains 95% of the functions that you probably want. Specifically, it contains all of the member functions for the text and graphics objects, which are the objects you will use most of the time.

Useful function that you might skim past

$text->advancewidth('foo bar', \%options)

Given the state of your text object, i.e. font name and size, this will return the width, in points, required to display the text 'foo bar' on the page. Great for figuring out how much space you need to display content like names of people. If you do not want to use the current state of your text object, the second parameter, with text object settings, can be used instead. I have never needed to use the second parameter so far.

That's it for now. In the future I would like to write some more about this library - preferably with more structure and a nice example rather than random ramblings - but that will have to wait until I can spare the time.


  1. Hi Luke.. I came across this blog post because it's pretty high in the google results when you are looking for help with API2.

    Have you had any problems using this module with PDFs of version 1.3 or earlier? I am having a hard time with this. My program receives a PDF document uploaded by a user, then opens it with API2 and stamps a few words at the top of the first page. However, for certain documents which happen to be PDF Version 1.3, but not ALL 1.3 PDFs, the words get stamped in TINY text (about 1/5th the normal size) and in the middle of the page instead of the top.

    Do you have any idea why this would be happening? As you mentioned in your post, there is a serious lack of documentation or resources regarding this module, so I am just reaching out to anyone with experience. Thanks very much.

  2. Hmm... I haven't come across anything like that before.

    The obvious places to start would be:

    1. Are you explicitly setting the coordinates and font size right before writing the text to the page? Is there anything between those operations that might change the state of the text object?

    2. Use a brand new text object. Does it still happen?

    3. Use Data::Dumper to dump the text object you're using. Beware though, you'll get a lot of output. Use this as a last resort. It doesn't always pay off.

    If you figure it out, feel free to post a solution here!

    Good luck!