Add Files To Pdf

Add images and/or pdfs to new PDF document

Arguments

Argument Type Default value Description
fileslist, NoneNonelist of filepaths to add into PDF (can be either images or PDFs)
target_documentstr, NoneNonefilepath of target PDF
appendboolFalseappends files to existing document if append is True

Image formats supported are JPEG, PNG and GIF.

The file can be added with extra properties by denoting : at the end of the filename. Each property should be separated by comma.

Supported extra properties for PDFs are:

  • page and/or page ranges
  • no extras means that all source PDF pages are added into new PDF

Supported extra properties for images are:

  • format, the PDF page format, for example. Letter or A4
  • rotate, how many degrees image is rotated counter-clockwise
  • align, only possible value at the moment is center
  • orientation, the PDF page orientation for the image, possible values P (portrait) or L (landscape)
  • x/y, coordinates for adjusting image position on the page

Examples

Robot Framework

*** Keywords ***
Add files to pdf
    ${files}=    Create List
    ...    ${TESTDATA_DIR}${/}invoice.pdf
    ...    ${TESTDATA_DIR}${/}approved.png:align=center
    ...    ${TESTDATA_DIR}${/}robot.pdf:1
    ...    ${TESTDATA_DIR}${/}approved.png:x=0,y=0
    ...    ${TESTDATA_DIR}${/}robot.pdf:2-10,15
    ...    ${TESTDATA_DIR}${/}approved.png
    ...    ${TESTDATA_DIR}${/}landscape_image.png:rotate=-90,orientation=L
    ...    ${TESTDATA_DIR}${/}landscape_image.png:format=Letter
    Add Files To PDF    ${files}    newdoc.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

list_of_files = [
    'invoice.pdf',
    'approved.png:align=center',
    'robot.pdf:1',
    'approved.png:x=0,y=0',
]
def example_keyword():
    pdf.add_files_to_pdf(
        files=list_of_files,
        target_document="output/output.pdf"
    )
param files:list of filepaths to add into PDF (can be either images or PDFs)
param target_document:
 filepath of target PDF
param append:appends files to existing document if append is True

Add Watermark Image To Pdf

Add an image into an existing or new PDF.

Arguments

Argument Type Default value Description
image_pathstr, Pathnullfilepath to image file to add into PDF
output_pathstr, Pathnullfilepath of target PDF
source_pathstr, Path, NoneNone
coveragefloat0.2how the watermark image should be scaled on page, defaults to 0.2

If no source path is given, assume a PDF is already opened.

Examples

Robot Framework

*** Keyword ***
Indicate approved with watermark
    Add Watermark Image To PDF
    ...             image_path=approved.png
    ...             source_path=/tmp/sample.pdf
    ...             output_path=output/output.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def indicate_approved_with_watermark():
    pdf.add_watermark_image_to_pdf(
        image_path="approved.png"
        source_path="/tmp/sample.pdf"
        output_path="output/output.pdf"
    )
param image_path:
 filepath to image file to add into PDF
param source:filepath to source, if not given add image to currently active PDF
param output_path:
 filepath of target PDF
param coverage:how the watermark image should be scaled on page, defaults to 0.2

Close All Pdfs

Close all opened PDF file descriptors.

Examples

Robot Framework

*** Keywords ***
Close Multiple PDFs
    Close all pdfs

Close Pdf

Close PDF file descriptor for a certain file.

Arguments

Argument Type Default value Description
source_pdfstr, NoneNonefilepath to the source pdf.

Examples

Robot Framework

*** Keywords ***
Close just one pdf
    Close pdf   path/to/the/pdf/file.pdf
param source_pdf:
 filepath to the source pdf.
raises ValueError:
 if file descriptor for the file is not found.

Convert

Parse source PDF into entities.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonesource PDF filepath
trimboolTruetrim whitespace from the text is set to True (default)
pagenumint, str, NoneNonePage number where search is performed on, defaults to None. (meaning all pages get converted -- numbers start from 1)

These entities can be used for text searches or XML dumping for example. The conversion will be done automatically when using the dependent keywords directly.

param source_path:
 source PDF filepath
param trim:trim whitespace from the text is set to True (default)
param pagenum:Page number where search is performed on, defaults to None. (meaning all pages get converted -- numbers start from 1)

Examples

Robot Framework

***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Convert    /tmp/sample.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.convert("/tmp/sample.pdf")

Decrypt Pdf

Decrypt PDF with password.

Arguments

Argument Type Default value Description
source_pathstrnullfilepath to the source pdf.
output_pathstrnullfilepath to the decrypted pdf.
passwordstrnullpassword as a string.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Make PDF human readable
    ${success}=  Decrypt PDF    /tmp/sample.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def make_pdf_human_readable():
    success = pdf.decrypt_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the decrypted pdf.
param password:password as a string.
return:True if decrypt was successful, else False or Exception.
raises ValueError:
 on decryption errors.

Dump Pdf As Xml

Get PDFMiner format XML dump of the PDF

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source PDF

Examples

Robot Framework

***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    ${xml}=  Dump PDF as XML    /tmp/sample.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    xml = pdf.dump_pdf_as_xml("/tmp/sample.pdf")
param source_path:
 filepath to the source PDF
return:XML content as a string

Encrypt Pdf

Encrypt a PDF document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf.
output_pathstr, NoneNonefilepath to the target pdf, stored by default in the robot output directory as output.pdf
user_pwdstrallows opening and reading PDF with restrictions.
owner_pwdstr, NoneNoneallows opening PDF without any restrictions, by default same user_pwd.
use_128bitboolTruewhether to 128bit encryption, when false 40bit encryption is used, default True.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Secure this PDF
    Encrypt PDF    /tmp/sample.pdf

Secure this PDF and set passwords
    Encrypt PDF
    ...    source_path=/tmp/sample.pdf
    ...    output_path=/tmp/new/sample_encrypted.pdf
    ...    user_pwd=complex_password_here
    ...    owner_pwd=different_complex_password_here
    ...    use_128bit=${TRUE}

Python

from RPA.PDF import PDF

pdf = PDF()

def secure_this_pdf():
    pdf.encrypt_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default in the robot output directory as output.pdf
param user_pwd:allows opening and reading PDF with restrictions.
param owner_pwd:
 allows opening PDF without any restrictions, by default same user_pwd.
param use_128bit:
 whether to 128bit encryption, when false 40bit encryption is used, default True.

Extract Pages From Pdf

Extract pages from source PDF and save to a new PDF document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf.
output_pathstr, NoneNonefilepath to the target pdf, stored by default in the robot output directory as output.pdf
pagesint, str, List[int], List[str], NoneNonepage numbers to extract from PDF (numbers start from 1) if None then extracts all pages.

Page numbers start from 1.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Save PDF pages to a new document
    ${pages}=    Extract Pages From PDF
    ...          source_path=/tmp/sample.pdf
    ...          output_path=/tmp/output.pdf
    ...          pages=5

Save PDF pages from open PDF to a new document
    ${pages}=    Extract Pages From PDF
    ...          output_path=/tmp/output.pdf
    ...          pages=5

Python

from RPA.PDF import PDF

pdf = PDF()

def save_pdf_pages_to_a_new_document():
    pages = pdf.extract_pages_from_pdf(
        source_path="/tmp/sample.pdf",
        output_path="/tmp/output.pdf",
        pages=5
    )
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default in the robot output directory as output.pdf
param pages:page numbers to extract from PDF (numbers start from 1) if None then extracts all pages.

Find Text

Find the closest text elements near the set anchor(s) through locator.

Arguments

Argument Type Default value Description
locatorstrnullElement to set anchor to. This can be prefixed with either text:, regex: or coords: to find the anchor by text or coordinates. text is assumed if no such prefix is specified. (text search is case insensitive)
pagenumint, str1Page number where search is performed on, defaults to 1 (first page).
directionstrrightIn which direction to search for text elements. This can be any of 'top'/'up', 'bottom'/'down', 'left' or 'right'. (defaults to 'right')
closest_neighboursint, str, None1How many neighbours to return at most, sorted by the distance from the current anchor.
strictboolFalseIf element's margins should be used for matching those which are aligned to the anchor. (turned off by default)
regexpstr, NoneNoneExpected format of the searched text value. By default all the candidates in range are considered valid neighbours.
trimboolTrueAutomatically trim leading/trailing whitespace from the text elements. (switched on by default)

The PDF will be parsed automatically before elements can be searched.

param locator:Element to set anchor to. This can be prefixed with either text:, regex: or coords: to find the anchor by text or coordinates. text is assumed if no such prefix is specified. (text search is case insensitive)
param pagenum:Page number where search is performed on, defaults to 1 (first page).
param direction:
 In which direction to search for text elements. This can be any of 'top'/'up', 'bottom'/'down', 'left' or 'right'. (defaults to 'right')
param closest_neighbours:
 How many neighbours to return at most, sorted by the distance from the current anchor.
param strict:If element's margins should be used for matching those which are aligned to the anchor. (turned off by default)
param regexp:Expected format of the searched text value. By default all the candidates in range are considered valid neighbours.
param trim:Automatically trim leading/trailing whitespace from the text elements. (switched on by default)
returns:A list of Match objects where every match has the following attributes: .anchor - the matched text with the locator; .neighbours - a list of adjacent texts found on the specified direction

Examples

Robot Framework

PDF Invoice Parsing
    Open Pdf    invoice.pdf
    ${matches} =  Find Text    Invoice Number
    Log List      ${matches}
List has one item:
Match(anchor='Invoice Number', direction='right', neighbours=['INV-3337'])

Python

from RPA.PDF import PDF

pdf = PDF()

def pdf_invoice_parsing():
    pdf.open_pdf("invoice.pdf")
    matches = pdf.find_text("Invoice Number")
    for match in matches:
        print(match)

pdf_invoice_parsing()
Match(anchor='Invoice Number', direction='right', neighbours=['INV-3337'])

Get All Figures

Return all figures in the PDF document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Image fetch
    &{figures}=  Get All Figures    /tmp/sample.pdf

Image fetch from open PDF
    &{figures}=  Get All Figures

Python

from RPA.PDF import PDF

pdf = PDF()

def image_fetch():
    figures = pdf.get_all_figures("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
return:dictionary of figures divided into pages.

Get Input Fields

Get input fields in the PDF.

Arguments

Argument Type Default value Description
source_pathstr, NoneNoneFilepath to source, if not given use the currently active PDF.
replace_none_valueboolFalseEnable this to conveniently visualize the fields. ( replaces the null value with field's name)
encodingstriso-8859-1Use an explicit encoding for field name/value parsing. ( defaults to "iso-8859-1" but "utf-8/16" might be the one working for you)

Stores input fields internally so that they can be used without parsing the PDF again.

param source_path:
 Filepath to source, if not given use the currently active PDF.
param replace_none_value:
 Enable this to conveniently visualize the fields. ( replaces the null value with field's name)
param encoding:Use an explicit encoding for field name/value parsing. ( defaults to "iso-8859-1" but "utf-8/16" might be the one working for you)
returns:A dictionary with all the found fields. Use their key names when setting values into them.
raises KeyError:
 If no input fields are enabled in the PDF.

Examples

Robot Framework

Example Keyword
    ${fields} =     Get Input Fields    form.pdf
    Log Dictionary    ${fields}

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    fields = pdf.get_input_fields("form.pdf")
    print(fields)

example_keyword()

Get Number Of Pages

Get number of pages in the document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Number of pages in PDF
    ${page_count}=    Get Number Of Pages    /tmp/sample.pdf

Number of pages in opened PDF
    ${page_count}=    Get Number Of Pages

Python

from RPA.PDF import PDF

pdf = PDF()

def number_of_pages_in_pdf():
    page_count = pdf.get_number_of_pages("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf
raises PdfReadError:
 if file is encrypted or other restrictions are in place

Get Pdf Info

Get metadata from a PDF document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source PDF.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Get PDF metadata
    ${metadata}=    Get PDF Info    /tmp/sample.pdf

*** Keywords ***
Get metadata from an already opened PDF
    ${metadata}=    Get PDF Info

Python

from RPA.PDF import PDF

pdf = PDF()

def get_pdf_metadata():
    metadata = pdf.get_pdf_info("/tmp/sample.pdf")
param source_path:
 filepath to the source PDF.
return:dictionary of PDF information.

Get Text From Pdf

Get text from set of pages in source PDF document.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf.
pagesint, str, List[int], List[str], NoneNonepage numbers to get text (numbers start from 1).
detailsboolFalseset to True to return textboxes, default False.
trimboolTrueset to False to return raw texts, default True means whitespace is trimmed from the text

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
Text extraction from PDF
    ${text}=    Get Text From PDF    /tmp/sample.pdf

Text extraction from open PDF
    ${text}=    Get Text From PDF

Python

from RPA.PDF import PDF

pdf = PDF()

def text_extraction_from_pdf():
    text = pdf.get_text_from_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
param pages:page numbers to get text (numbers start from 1).
param details:set to True to return textboxes, default False.
param trim:set to False to return raw texts, default True means whitespace is trimmed from the text
return:dictionary of pages and their texts.

Html To Pdf

Generate a PDF file from HTML content.

Arguments

Argument Type Default value Description
contentstrnullHTML content.
output_pathstrnullFilepath where to save the PDF document.
encodingstrutf-8Codec used for text I/O.

Note that input must be well-formed and valid HTML.

Examples

Robot Framework

*** Keywords ***
Create PDF from HTML
    HTML to PDF    ${html_content_as_string}  /tmp/output.pdf
from RPA.PDF import PDF

pdf = PDF()

def create_pdf_from_html():
    pdf.html_to_pdf(html_content_as_string, "/tmp/output.pdf")
param content:HTML content.
param output_path:
 Filepath where to save the PDF document.
param encoding:Codec used for text I/O.

Is Pdf Encrypted

Check if PDF is encrypted.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to the source pdf.

If no source path given, assumes a PDF is already opened.

param source_path:
 filepath to the source pdf.
return:True if file is encrypted.

Examples

Robot Framework

*** Keywords ***
Is PDF encrypted
    ${is_encrypted}=    Is PDF Encrypted    /tmp/sample.pdf

*** Keywords ***
Is open PDF encrypted
    ${is_encrypted}=    Is PDF Encrypted

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    is_encrypted = pdf.is_pdf_encrypted("/tmp/sample.pdf")

Open Pdf

Open a PDF document for reading.

Arguments

Argument Type Default value Description
source_pathstr, Pathnullfilepath to the source pdf.

This is called automatically in the other PDF keywords when a path to the PDF file is given as an argument.

Examples

Robot Framework

*** Keywords ***
Open my pdf file
    Open PDF    /tmp/sample.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    metadata = pdf.open_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
raises ValueError:
 if PDF is already open.

Rotate Page

Rotate pages in source PDF document and save to target PDF document.

Arguments

Argument Type Default value Description
pagesint, str, List[int], List[str], Nonenullpage numbers to extract from PDF (numbers start from 1).
source_pathstr, NoneNonefilepath to the source pdf.
output_pathstr, NoneNonefilepath to the target pdf, stored by default in the robot output directory as output.pdf
clockwiseboolTruedirectorion that page will be rotated to, default True.
angleint90number of degrees to rotate, default 90.

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keywords ***
PDF page rotation
    Rotate Page
    ...          source_path=/tmp/sample.pdf
    ...          output_path=/tmp/output.pdf
    ...          pages=5

Python

from RPA.PDF import PDF

pdf = PDF()

def pdf_page_rotation():
    pages = pdf.rotate_page(
        source_path="/tmp/sample.pdf",
        output_path="/tmp/output.pdf",
        pages=5
    )
param pages:page numbers to extract from PDF (numbers start from 1).
param source_path:
 filepath to the source pdf.
param output_path:
 filepath to the target pdf, stored by default in the robot output directory as output.pdf
param clockwise:
 directorion that page will be rotated to, default True.
param angle:number of degrees to rotate, default 90.

Save Field Values

Save field values in PDF if it has fields.

Arguments

Argument Type Default value Description
source_pathstr, NoneNoneSource PDF with fields to update.
output_pathstr, NoneNoneUpdated target PDF.
newvalsdict, NoneNoneNew values when updating many at once.
use_appearances_writerboolFalseFor some PDF documents the updated fields won't be visible, try to set this to True if you encounter problems. (viewing the output PDF in browser might display the field values then)
param source_path:
 Source PDF with fields to update.
param output_path:
 Updated target PDF.
param newvals:New values when updating many at once.
param use_appearances_writer:
 For some PDF documents the updated fields won't be visible, try to set this to True if you encounter problems. (viewing the output PDF in browser might display the field values then)

Examples

Robot Framework

Example Keyword
    Open PDF    ./tmp/sample.pdf
    Set Field Value    phone_nr    077123123
    Save Field Values    output_path=./tmp/output.pdf

Multiple operations
    &{new_fields}=       Create Dictionary
    ...                  phone_nr=077123123
    ...                  title=dr
    Save Field Values    source_path=./tmp/sample.pdf
    ...                  output_path=./tmp/output.pdf
    ...                  newvals=${new_fields}

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.open_pdf("./tmp/sample.pdf")
    pdf.set_field_value("phone_nr", "077123123")
    pdf.save_field_values(output_path="./tmp/output.pdf")

def multiple_operations():
    new_fields = {"phone_nr": "077123123", "title": "dr"}
    pdf.save_field_values(
        source_path="./tmp/sample.pdf",
        output_path="./tmp/output.pdf",
        newvals=new_fields
    )

Save Figure As Image

Try to save the image data from Figure object, and return the file name, if successful.

Arguments

Argument Type Default value Description
figureFigurenullPDF Figure object which will be saved as an image. The PDF Figure object can be determined from the Get All Figures keyword
images_folderstr.directory where image files will be created
file_prefixstrimage filename prefix

Figure needs to have byte stream and that needs to be recognized as image format for successful save.

Examples

Robot Framework

*** Keyword ***
Figure to Image
    ${image_file_path} =     Save figure as image
    ...             figure=pdf_figure_object
    ...             images_folder=/tmp/images
    ...             file_prefix=file_name_here

Python

from RPA.PDF import PDF

pdf = PDF()

def figure_to_image():
    image_file_path = pdf.save_figure_as_image(
        figure="pdf_figure_object"
        images_folder="/tmp/images"
        file_prefix="file_name_here"
    )
param figure:PDF Figure object which will be saved as an image. The PDF Figure object can be determined from the Get All Figures keyword
param images_folder:
 directory where image files will be created
param file_prefix:
 image filename prefix
return:image filepath or None

Save Figures As Images

Save figures from given PDF document as image files.

Arguments

Argument Type Default value Description
source_pathstr, NoneNonefilepath to PDF document
images_folderstr.directory where image files will be created
pagesstr, NoneNonetarget figures in the pages, can be single page or range, default None means that all pages are scanned for figures to save (numbers start from 1)
file_prefixstrimage filename prefix

If no source path given, assumes a PDF is already opened.

Examples

Robot Framework

*** Keyword ***
Figures to Images
    ${image_filenames} =    Save figures as images
    ...             source_path=/tmp/sample.pdf
    ...             images_folder=/tmp/images
    ...             pages=${4}
    ...             file_prefix=file_name_here

Python

from RPA.PDF import PDF

pdf = PDF()

def figures_to_images():
    image_filenames = pdf.save_figures_as_image(
        source_path="/tmp/sample.pdf"
        images_folder="/tmp/images"
        pages=4
        file_prefix="file_name_here"
    )
param source_path:
 filepath to PDF document
param images_folder:
 directory where image files will be created
param pages:target figures in the pages, can be single page or range, default None means that all pages are scanned for figures to save (numbers start from 1)
param file_prefix:
 image filename prefix
return:list of image filenames created

Save Pdf

Save the contents of a PyPDF2 reader to a new file.

Arguments

Argument Type Default value Description
output_pathstrnullfilepath to target PDF
readerPdfFileReadernulla PyPDF2 reader

Examples

Robot Framework

*** Keyword ***
Save changes to PDF
    Save PDF    /tmp/output.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def save_changes_to_pdf():
    pdf.save_pdf(output_path="output/output.pdf")
param output_path:
 filepath to target PDF
param reader:a PyPDF2 reader

Set Anchor To Element

Sets main anchor point in the document for further searches.

Arguments

Argument Type Default value Description
locatorstrnullElement to set anchor to. This can be prefixed with either text:, regex: or coords: to find the anchor by text or coordinates. text is assumed if no such prefix is specified. (text search is case insensitive)
trimboolTrueAutomatically trim leading/trailing whitespace from the text elements. (switched on by default)
pagenumint, str1Page number where search is performed on, defaults to 1 (first page).

This is used internally in the library and can work with multiple anchors at the same time if such are found.

param locator:Element to set anchor to. This can be prefixed with either text:, regex: or coords: to find the anchor by text or coordinates. text is assumed if no such prefix is specified. (text search is case insensitive)
param trim:Automatically trim leading/trailing whitespace from the text elements. (switched on by default)
param pagenum:Page number where search is performed on, defaults to 1 (first page).
returns:True if at least one anchor was found.

Examples

Robot Framework

Example Keyword
     ${success} =  Set Anchor To Element    Invoice Number

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    success = pdf.set_anchor_to_element("Invoice Number")

Set Convert Settings

Change settings for PDFMiner document conversion.

Arguments

Argument Type Default value Description
line_marginfloat, NoneNonerelative margin between bounding lines, default 0.5
word_marginfloat, NoneNonerelative margin between words, default 0.1
char_marginfloat, NoneNonerelative margin between characters, default 2.0

line_margin controls how textboxes are grouped - if conversion results in texts grouped into one group then set this to lower value

word_margin controls how spaces are inserted between words - if conversion results in text without spaces then set this to lower value

char_margin controls how characters are grouped into words - if conversion results in individual characters instead of then set this to higher value

param line_margin:
 relative margin between bounding lines, default 0.5
param word_margin:
 relative margin between words, default 0.1
param char_margin:
 relative margin between characters, default 2.0

Examples

Robot Framework

***Settings***
Library    RPA.PDF

***Tasks***
Example Keyword
    Set Convert Settings  line_margin=0.00000001
    ${texts}=  Get Text From PDF  /tmp/sample.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.set_convert_settings(line_margin=)
    texts = pdf.get_text_from_pdf("/tmp/sample.pdf")

Set Field Value

Set value for field with given name on the active document.

Arguments

Argument Type Default value Description
field_namestrnullField to update.
valueAnynullNew value for the field.
source_pathstr, NoneNoneSource PDF file path.

Tries to match with field's identifier directly or its label.

param field_name:
 Field to update.
param value:New value for the field.
param source_path:
 Source PDF file path.
raises ValueError:
 When field can't be found or more than one field matches the given field_name.

Examples

Robot Framework

Example Keyword
    Open PDF    ./tmp/sample.pdf
    Set Field Value    phone_nr    077123123
    Save Field Values    output_path=./tmp/output.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def example_keyword():
    pdf.open_pdf("./tmp/sample.pdf")
    pdf.set_field_value("phone_nr", "077123123")
    pdf.save_field_values(output_path="./tmp/output.pdf")

Switch To Pdf

Switch library's current fileobject to already opened file or open a new file if not opened.

Arguments

Argument Type Default value Description
source_pathstr, Path, NoneNonefilepath to the source pdf.

This is done automatically in the PDF library keywords.

Examples

Robot Framework

*** Keywords ***
Jump to another PDF
    Switch to PDF    /tmp/another.pdf

Python

from RPA.PDF import PDF

pdf = PDF()

def jump_to_another_pdf():
    pdf.switch_to_pdf("/tmp/sample.pdf")
param source_path:
 filepath to the source pdf.
raises ValueError:
 if PDF filepath is not given and there are no active file to activate.

Template Html To Pdf

Use HTML template file to generate PDF file.

Arguments

Argument Type Default value Description
templatestrnullFilepath to the HTML template.
output_pathstrnullFilepath where to save PDF document.
variablesdict, NoneNoneDictionary of variables to fill into template, defaults to {}.
encodingstrutf-8Codec used for text I/O.

It provides an easy method of generating a PDF document from an HTML formatted template file.

Examples

Robot Framework

*** Keywords ***
Create PDF from HTML template
    ${TEMPLATE}=    Set Variable    order.template
    ${PDF}=         Set Variable    result.pdf
    &{DATA}=        Create Dictionary
    ...             name=Robot Generated
    ...             email=robot@domain.com
    ...             zip=00100
    ...             items=Item 1, Item 2
    Template HTML to PDF
    ...    template=${TEMPLATE}
    ...    output_path=${PDF}
    ...    variables=${DATA}

Python

from RPA.PDF import PDF

p = PDF()
orders = ["item 1", "item 2", "item 3"]
data = {
    "name": "Robot Process",
    "email": "robot@domain.com",
    "zip": "00100",
    "items": "<br/>".join(orders),
}
p.template_html_to_pdf("order.template", "order.pdf", data)
param template:Filepath to the HTML template.
param output_path:
 Filepath where to save PDF document.
param variables:
 Dictionary of variables to fill into template, defaults to {}.
param encoding:Codec used for text I/O.