PDF is a library for managing PDF documents.

It can be used to extract text from PDFs, add watermarks to pages, and decrypt/encrypt documents.

There is also limited support for updating form field values.

Input PDF file can be passed as an argument to the keywords, or it can be omitted if you first call Open PDF. Reference to the current active PDF will be stored in the library instance and can be changed by using Switch To PDF with another PDF file path, therefore you can asynchronously work with multiple PDFs.

Examples

Robot Framework

*** Settings ***
Library    RPA.PDF
Library    String

*** Tasks ***
Extract Data From First Page
    ${text} =    Get Text From PDF    report.pdf
    ${lines} =     Get Lines Matching Regexp    ${text}[${1}]    .+pain.+
    Log    ${lines}

Get Invoice Number
    Open Pdf    invoice.pdf
    ${matches} =  Find Text    Invoice Number
    Log List      ${matches}

Fill Form Fields
    Switch To Pdf    form.pdf
    ${fields} =     Get Input Fields   encoding=utf-16
    Log Dictionary    ${fields}
    Set Field Value    Given Name Text Box    Mark
    Save Field Values    output_path=${OUTPUT_DIR}${/}completed-form.pdf
    ...                  use_appearances_writer=${True}
from RPA.PDF import PDF
from robot.libraries.String import String

pdf = PDF()
string = String()

def extract_data_from_first_page():
    text = pdf.get_text_from_pdf("report.pdf")
    lines = string.get_lines_matching_regexp(text[1], ".+pain.+")
    print(lines)

def get_invoice_number():
    pdf.open_pdf("invoice.pdf")
    matches = pdf.find_text("Invoice Number")
    for match in matches:
        print(match)

def fill_form_fields():
    pdf.switch_to_pdf("form.pdf")
    fields = pdf.get_input_fields(encoding="utf-16")
    for key, value in fields.items():
        print(f"{key}: {value}")
    pdf.set_field_value("Given Name Text Box", "Mark")
    pdf.save_field_values(
        output_path="completed-form.pdf",
        use_appearances_writer=True
    )