Add Files To Pdf

Add images and/or pdfs to new PDF document

Arguments

Argument Type Default value Description
fileslistNone
target_documentstrNone

Image formats supported are JPEG, PNG and GIF.

The file can be added with extra properties by denoting : at the end of the filename. Each property should be separated by comma.

Supported extra properties for PDFs are:

  • page and/or page ranges
  • no extras means that all source PDF pages are added into new PDF

Supported extra properties for images are:

  • format, the PDF page format, for example. Letter or A4
  • rotate, how many degrees image is rotated counter-clockwise
  • align, only possible value at the moment is center
  • orientation, the PDF page orientation for the image, possible values P (portrait) or L (landscape)
  • x/y, coordinates for adjusting image position on the page

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Add files to pdf
    ${files}=    Create List
    ...    ${TESTDATA_DIR}${/}invoice.pdf
    ...    ${TESTDATA_DIR}${/}approved.png:align=center
    ...    ${TESTDATA_DIR}${/}robot.pdf:1
    ...    ${TESTDATA_DIR}${/}approved.png:x=0,y=0
    ...    ${TESTDATA_DIR}${/}robot.pdf:2-10,15
    ...    ${TESTDATA_DIR}${/}approved.png
    ...    ${TESTDATA_DIR}${/}landscape_image.png:rotate=-90,orientation=L
    ...    ${TESTDATA_DIR}${/}landscape_image.png:format=Letter
    Add Files To PDF    ${files}    newdoc.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

list_of_files = [
    'invoice.pdf',
    'approved.png:align=center',
    'robot.pdf:1',
    'approved.png:x=0,y=0',
]
def example_keyword():
    pdf.add_files_to_pdf(
        files=list_of_files,
        target_document="output/output.pdf"
    )
param files:list of filepaths to add into PDF (can be either images or PDFs)
param target_document:
 filepath of target PDF

Add Watermark Image To Pdf

Add image to PDF which can be new or existing PDF.

Arguments

Argument Type Default value Description
image_pathstrnull
output_pathstrnull
source_pathstrNone
coveragefloat0.2

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Add Watermark Image To PDF
    ...             image_path=approved.png
    ...             source_path=/tmp/sample.pdf
    ...             output_path=output/output.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.add_watermark_image_to_pdf(
        image_path="approved.png"
        source_path="/tmp/sample.pdf"
        output_path="output/output.pdf"
    )
param image_path:
 filepath to image file to add into PDF
param source:filepath to source, if not given add image to currently active PDF
param output_path:
 filepath of target PDF
param coverage:how the watermark image should be scaled on page, defaults to 0.2

Close All Pdfs

Close all opened PDF file descriptors.

Close Pdf

Close PDF file descriptor for certain file.

Arguments

Argument Type Default value Description
source_pdfstrNone
param source_pdf:
 filepath to the source pdf.
raises ValueError:
 if file descriptor for the file is not found.

Convert

Parse source PDF into entities which can be used for text searches, for example.

Arguments

Argument Type Default value Description
source_pathstrNone

Parse source PDF into entities which can be used for text searches, for example.

This is also used inside other PDF keywords.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Convert    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.convert("/tmp/sample.pdf")
param source_path:
 source PDF filepath.

Decrypt Pdf

Decrypt PDF with password.

Arguments

Argument Type Default value Description
source_pathstrnull
output_pathstrnull
passwordstrnull

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${success}=  Decrypt PDF    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    success = pdf.decrypt_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the decrypted pdf.
param password:password as a string.
return:True if decrypt was successful, else False or Exception.
raises ValueError:
 on decryption errors.

Dump Pdf As Xml

Get PDFMiner format XML dump of the PDF

Arguments

Argument Type Default value Description
source_pathstrNone

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${xml}=  Dump PDF as XML    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    xml = pdf.dump_pdf_as_xml("/tmp/sample.pdf")
param source_path:
 filepath to the source PDF
return:XML content as a string.

Encrypt Pdf

Encrypt a PDF document.

Arguments

Argument Type Default value Description
source_pathstrNone
output_pathstrNone
user_pwdstr
owner_pwdstrNone
use_128bitboolTrue

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Encrypt PDF    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.encrypt_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default to output_directory.
param user_pwd:allows opening and reading PDF with restrictions.
param owner_pwd:
 allows opening PDF without any restrictions, by default same user_pwd.
param use_128bit:
 whether to 128bit encryption, when false 40bit encryption is used, default True.

Extract Pages From Pdf

Extract pages from source PDF and save to a new PDF document.

Arguments

Argument Type Default value Description
source_pathstrNone
output_pathstrNone
pagesList[int], List[str], str, NoneNone

Page numbers start from 1.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${pages}=    Extract Pages From PDF
    ...          source_path=/tmp/sample.pdf
    ...          output_path=/tmp/output.pdf
    ...          pages=5

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pages = pdf.extract_pages_from_pdf(
        source_path="/tmp/sample.pdf",
        output_path="/tmp/output.pdf",
        pages=5
    )
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default in output_directory.
param pages:page numbers to extract from PDF (numbers start from 0) if None then extracts all pages.

Find Text

Get closest text (value) to the anchor element.

Arguments

Argument Type Default value Description
locatorstrnull
pagenumint1
directionstrright
strictboolFalse
regexpstrNone
only_closestboolTrue

PDF needs to be parsed before elements can be found.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${value}=  Find Text    text:Invoice Number

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    value = pdf.find_text("text:Invoice Number")
param locator:element to set anchor to. This can be prefixed with either text: or coords: to find the anchor by text or coordinates. Default is text.
param pagenum:page number where search if performed on, default 1 (first).
param direction:
 in which direction to search for text, directions 'top'/'up', 'bottom'/'down', 'left' or 'right', defaults to 'right'.
param strict:if element margins should be used for matching points, used when direction is 'top' or 'bottom', default False.
param regexp:expected format of value to match, defaults to None.
param only_closest:
 return all possible values or only the closest.
return:all possible values, only the closest value, or an empty list.

Get All Figures

Return all figures in the PDF document.

Arguments

Argument Type Default value Description
source_pathstrNone

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${figures}=  Get All Figures    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    figures = pdf.get_all_figures("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
return:dictionary of figures divided into pages.

Get Input Fields

Get input fields in the PDF.

Arguments

Argument Type Default value Description
source_pathstrNone
replace_none_valueboolFalse

Stores input fields internally so that they can be used without parsing the PDF again.

Parameter replace_none_value is for convience to visualize fields.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${fields}=  Get Input Fields    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    fields = pdf.get_input_fields("/tmp/sample.pdf")
param source_path:
 source filepath, defaults to None.
param replace_none_value:
 if value is None replace it with key name, defaults to False.
return:dictionary of input key values or None.

Get Number Of Pages

Get number of pages in the document.

Arguments

Argument Type Default value Description
source_pathstrNone

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${page_count}=    Get Number Of Pages    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    page_count = pdf.get_number_of_pages("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf
raises PdfReadError:
 if file is encrypted or other restrictions are in place

Get Pdf Info

Get metadata from a PDF document.

Arguments

Argument Type Default value Description
source_pathstrNone

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${metadata}=    Get PDF Info    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    metadata = pdf.get_pdf_info("/tmp/sample.pdf")
param source_path:
 filepath to the source PDF.
return:dictionary of PDF information.

Get Text From Pdf

Get text from set of pages in source PDF document.

Arguments

Argument Type Default value Description
source_pathstrNone
pagesList[int], List[str], str, NoneNone
detailsboolFalse

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${text}=    Get Text From PDF    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    text = pdf.get_text_from_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param pages:page numbers to get text (numbers start from 0).
param details:set to True to return textboxes, default False.
return:dictionary of pages and their texts.

Html To Pdf

Generate a PDF file from HTML content.

Arguments

Argument Type Default value Description
contentstrnull
output_pathstrnull

Note that input must be well-formed and valid HTML.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    HTML to PDF    ${html_content_as_string}  /tmp/output.pdf
*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.html_to_pdf(html_content_as_string, "/tmp/output.pdf")
param content:HTML content.
param output_path:
 filepath where to save the PDF document.

Is Pdf Encrypted

Check if PDF is encrypted.

Arguments

Argument Type Default value Description
source_pathstrNone

Returns True even if PDF was decrypted.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${is_encrypted}=    Is PDF Encrypted    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    is_encrypted = pdf.is_pdf_encrypted("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
return:True if file is encrypted.

Open Pdf

Open a PDF document for reading.

Arguments

Argument Type Default value Description
source_pathstrNone

This is called automatically in the other PDF keywords when a path to the PDF file is given as an argument.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Open PDF    /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    metadata = pdf.open_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
raises ValueError:
 if PDF is already open.

Rotate Page

Rotate pages in source PDF document and save to target PDF document.

Arguments

Argument Type Default value Description
pagesList[int], List[str], str, Nonenull
source_pathstrNone
output_pathstrNone
clockwiseboolTrue
angleint90

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Rotate Page
    ...          source_path=/tmp/sample.pdf
    ...          output_path=/tmp/output.pdf
    ...          pages=5

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def rotate_page():
    pages = pdf.rotate_page(
        source_path="/tmp/sample.pdf",
        output_path="/tmp/output.pdf",
        pages=5
    )
param pages:page numbers to extract from PDF (numbers start from 0).
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default to output_directory.
param clockwise:
 directorion that page will be rotated to, default True.
param angle:number of degrees to rotate, default 90.

Save Field Values

Save field values in PDF if it has fields.

Arguments

Argument Type Default value Description
source_pathstrNone
output_pathstrNone
newvalsdictNone
use_appearances_writerboolFalse

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Open PDF    ./tmp/sample.pdf
    Set Field Value    phone_nr    077123123
    Save Field Values    output_path=./tmp/output.pdf

Multiple operations
    &{new_fields}=       Create Dictionary
    ...                  phone_nr=077123123
    ...                  title=dr
    Save Field Values    source_path=./tmp/sample.pdf
    ...                  output_path=./tmp/output.pdf
    ...                  newvals=${new_fields}

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.open_pdf("./tmp/sample.pdf")
    pdf.set_field_value("phone_nr", "077123123")
    pdf.save_field_values(output_path="./tmp/output.pdf")

def multiple_operations():
    new_fields = {"phone_nr": "077123123", "title": "dr"}
    pdf.save_field_values(
        source_path="./tmp/sample.pdf",
        output_path="./tmp/output.pdf",
        newvals=new_fields
    )
param source_path:
 source PDF with fields to update.
param output_path:
 updated target PDF.
param newvals:new values when updating many at once.
param use_appearances_writer:
 for some PDF documents the updated fields won't show visible. Try to set this to True if you encounter problems.

Save Figure As Image

Try to save the image data from Figure object, and return the file name, if successful.

Arguments

Argument Type Default value Description
figureFigurenull
images_folderstr.
file_prefixstr

Try to save the image data from Figure object, and return the file name, if successful.

Figure needs to have byte stream and that needs to be recognized as image format for successful save.

param figure:PDF Figure object which will be saved as an image
param images_folder:
 directory where image files will be created
param file_prefix:
 image filename prefix
return:image filepath or None

Save Figures As Images

Save figures from given PDF document as image files.

Arguments

Argument Type Default value Description
source_pathstrNone
images_folderstr.
pagesstrNone
file_prefixstr

If no source path given, assumes a PDF is already opened.

param source_path:
 filepath to PDF document
param images_folder:
 directory where image files will be created
param pages:target figures in the pages, can be single page or range, default None means that all pages are scanned for figures to save
param file_prefix:
 image filename prefix
return:list of image filenames created

Save Pdf

Save the contents of a PyPDF2 reader to a new file.

Arguments

Argument Type Default value Description
output_pathstrnull
readerPdfFileReadernull
param output_path:
 filepath to target PDF
param reader:a PyPDF2 reader.

Set Anchor To Element

Sets anchor point in the document for further searches.

Arguments

Argument Type Default value Description
locatorstrnull

This is used internally in the library.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
     ${success}=  Set Anchor To Element    text:Invoice Number

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    success = pdf.set_anchor_to_element("text:Invoice Number")
param locator:element to search for.
return:True if element was found.

Set Convert Settings

Change settings for PDFMiner document conversion.

Arguments

Argument Type Default value Description
line_marginfloatNone
word_marginfloatNone
char_marginfloatNone

line_margin controls how textboxes are grouped - if conversion results in texts grouped into one group then set this to lower value

word_margin controls how spaces are inserted between words - if conversion results in text without spaces then set this to lower value

char_margin controls how characters are grouped into words - if conversion results in individual characters instead of then set this to higher value

param line_margin:
 relative margin between bounding lines, default 0.5
param word_margin:
 relative margin between words, default 0.1
param char_margin:
 relative margin between characters, default 2.0

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Set Convert Settings  line_margin=0.00000001
    ${texts}=  Get Text From PDF  /tmp/sample.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.set_convert_settings(line_margin=)
    texts = pdf.get_text_from_pdf("/tmp/sample.pdf")

Set Field Value

Set value for field with given name on the active document.

Arguments

Argument Type Default value Description
field_namestrnull
valueAnynull
source_pathstrNone

Tries to match on field identifier and its label.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Open PDF    ./tmp/sample.pdf
    Set Field Value    phone_nr    077123123
    Save Field Values    output_path=./tmp/output.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.open_pdf("./tmp/sample.pdf")
    pdf.set_field_value("phone_nr", "077123123")
    pdf.save_field_values(output_path="./tmp/output.pdf")
param field_name:
 field to update.
param value:new value for the field.
param source_path:
 source PDF filepath.
raises ValueError:
 when field can't be found or more than 1 field matches the given field_name.

Switch To Pdf

Switch library's current fileobject to already open file or open file if not opened.

Arguments

Argument Type Default value Description
source_pathstrNone

Switch library's current fileobject to already open file or open file if not opened.

This is done automatically in the PDF library keywords.

Examples

Robot Framework

*** Tasks ***
***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Switch to PDF    /tmp/another.pdf

Python

*** Tasks ***
from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.switch_to_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
raises ValueError:
 if PDF filepath is not given and there are no active file to activate.

Template Html To Pdf

Use HTML template file to generate PDF file.

Arguments

Argument Type Default value Description
templatestrnull
output_pathstrnull
variablesdictNone

It provides an easy method of generating a PDF document from an HTML formatted template file.

Examples

Robot Framework

*** Tasks ***
*** Settings ***
Library    RPA.PDF

*** Variables ***
${TEMPLATE}    order.template
${PDF}         result.pdf
&{DATA}        name=Robot Generated
...            email=robot@domain.com
...            zip=00100
...            items=Item 1, Item 2

*** Tasks ***
Create PDF from HTML template
    Template HTML to PDF   ${TEMPLATE}  ${PDF}  ${DATA}

Python

*** Tasks ***
from RPA.PDF import PDF

p = PDF()
orders = ["item 1", "item 2", "item 3"]
data = {
    "name": "Robot Process",
    "email": "robot@domain.com",
    "zip": "00100",
    "items": "<br/>".join(orders),
}
p.template_html_to_pdf("order.template", "order.pdf", data)
param template:filepath to the HTML template.
param output_path:
 filepath where to save PDF document.
param variables:
 dictionary of variables to fill into template, defaults to {}.