Handling PDF files
The Portable Document Format (PDF) has become ubiquitous in our daily life, and countless business processes rely on manipulating PDF files for reports, invoices, and a variety of other documents. This, in turn, means that learning how to manipulate PDF files is a very important skill to master for Software Robot Developers.
Which automation library should you use?
With the Robocorp stack, PDF operations are performed using the RPA.PDF library, part of RPA Framework.
Creating PDF files
Using the keywords provided by the RPA.PDF library, you can create PDF files in multiple ways:
- Creating PDF files starting from an HTML template: This method allows to create PDF files based on an HTML template and a set of data. For an example, check out the PDF invites creator robot example.
- Converting HTML content into a PDF file: This can be achieved as a special case of the above. To see this approach at work, you can check the PDF creation chapter of the Beginners' course.
- Creating the PDF file from scratch: The
RPA.PDFlibrary also includes the pyFPDF Python library to enable more advanced and fine-tuned creation of PDF files. Refer to the pyFPDF documentation for more information about the usage and the options available.
Filling PDF forms
PDF files can contain forms that users can fill using a desktop program like Acrobat Reader or Preview on macOS. Using the RPA.PDF library, you can automate this operation. See how in the how to fill PDF forms article.
Reading data from PDF files
Extracting text and data from PDF files is not a simple operation, mostly because this was not the intended use case for the PDF file formats. If possible, using PDF files as a source of data should be avoided. If you absolutely must (😀), you can see a possible approach in the how to read PDF files article.