Note: You are looking at a static snapshot of documentation related to Robot Framework automations. The most recent documentation is at https://robocorp.com/docs

Handling PDF files

The Portable Document Format (PDF) has become ubiquitous in our daily life, and countless business processes rely on manipulating PDF files for reports, invoices, and a variety of other documents. This, in turn, means that learning how to manipulate PDF files is a very important skill to master for Software Robot Developers.

Which automation library should I use?

With the Robocorp stack, PDF operations are performed using the RPA.PDF library, part of RPA Framework.

⚠️ Keep in mind that this library works with text-based PDFs, and it can't extract information from an image-based (scan) PDF file. For accurate results, you have to use specialized external services wrapped by the RPA.DocumentAI library.

Creating PDF files

Using the keywords provided by the RPA.PDF library, you can create PDF files in multiple ways:

  • Creating PDF files starting from an HTML template: This method allows to create PDF files based on an HTML template and a set of data. For an example, check out the PDF invites creator robot example.
  • Converting HTML content into a PDF file: This can be achieved as a special case of the above. To see this approach at work, you can check the PDF creation chapter of the Beginners' course.
  • Creating the PDF file from scratch: The RPA.PDF library also includes the fpdf2 Python library to enable more advanced and fine-tuned ways of creating PDF files. Refer to the fpdf2 documentation for more information about the usage and the options available.

Filling PDF forms

PDF files can contain forms that users can fill using a desktop program like Acrobat Reader or Preview on macOS. Using the RPA.PDF library, you can automate this operation. See how in the how to fill PDF forms article.

Reading data from PDF files

Extracting text and data from PDF files is not a simple operation, mostly because this was not the intended use case for the PDF file formats. If possible, using PDF files as a source of data should be avoided. If you absolutely must (😀), you can see a possible approach in the how to read PDF files article.

Are you stuck? Get help on our Slack!

If you have questions or need help with your automation project, register a free account and get help on Slack!

Last edit: September 14, 2021