Creating a PDF

Maria creates a PDF from the sales data table in the intranet and sends it out as a company newsletter to ensure that her colleagues look at it.

After all, it would be a shame if no one saw the result of all that copy-pasting!

Maria copies the table into Microsoft Word and exports it to PDF with some additional software. Our robot, instead, will do it all by itself automatically.

We want to turn the table on the left into the PDF on the right:

Html table and pdf file side by side

As always, let's start by adding a new step in our @task function, before the log_out() step:

@task def robot_spare_bin_python(): """Insert the sales data for the week and export it as a PDF""" browser.configure( slowmo=100, ) open_the_intranet_website() log_in() download_excel_file() fill_form_with_excel_data() collect_results() export_as_pdf() log_out()

Then we add a new function:

def export_as_pdf(): """Export the data to a pdf file"""

As always, we plan to do this in steps. (Remember the poor elephant we are eating? ๐Ÿ˜)

Our plan for this keyword is:

  • we isolate the part of the page that contains the sales table
  • we assign the content (HTML markup) of that part of the page to a variable
  • we create a PDF with the HTML content of the table.

Getting the HTML table element out of the page

The HTML markup of the table area on the page looks like this:

... <div id="sales-results"> <table class="table table-dark table-striped"> ... </table> </div> ...

Look at that beautiful code. It's almost like that page was created for this course! ๐Ÿ˜

The table is wrapped in a <div> element with an id attribute of sales-results. Our selector will then be #sales-results.

Next, we want to put the HTML markup of that element into a variable, sales_results_html. We can do this with the inner_html() function like this:

We will modify our keyword like this:

def export_as_pdf(): """Export the data to a pdf file""" page = browser.page() sales_results_html = page.locator("#sales-results").inner_html()

Ok, we admit this was not too easy to guess. ๐Ÿ˜… But no panic! Let's see what's going on on this new line.

We create a variable (sales_results_html). We store into it what we get out of the inner_html() function. This will return all the HTML markup of that element, innerHTML.

Alright! Let's rerun our robot.

The log shows the robot has now grabbed the HTML markup for the table:

Log containing the markup for the HTML table

Creating the PDF file out of the HTML contents variable

Only one more step to go!

Fantastic libraries and where to find them :zap:๏ธ

We have ambitious plans for our robot, and along the way, we will add more libraries to make it smarter and allow it to do more things.

But how do we know which libraries are available and how they work?

The best place to start is the Libraries page, where we have compiled a list of useful libraries for Robotic Process Automation. Take the RPA.PDF library that look exactly like what we need, for example. It has its own documentation page where you can see examples on how to use it, and the list of all the keywords and functions it provides. This library is adapted both for python and Robot Framework.

Now that we have the HTML contents of the table in a variable, we need to create a PDF file out of it. To do it, we will add the RPA.PDF library!

Add a new library, get new functions... Wax on, wax off... ๐Ÿฅ‹ Practice will make us perfect! ๐Ÿ’ช

from robocorp.tasks import task from robocorp import browser from RPA.HTTP import HTTP from RPA.Excel.Files import Files from RPA.PDF import PDF

Now we can add the final lines to our keyword.

def export_as_pdf(): """Export the data to a pdf file""" page = browser.page() sales_results_html = page.locator("#sales-results").inner_html() pdf = PDF() pdf.html_to_pdf(sales_results_html, "output/sales_results.pdf")

We use the html_to_pdf() function provided by the RPA.PDF library to create a sales_results.pdf file out of our sales_results_html variable's contents, and place it again into the output folder (output/sales_results.pdf).

And that's it!

Here's what our robot code looks like now:

from robocorp.tasks import task from robocorp import browser from RPA.HTTP import HTTP from RPA.Excel.Files import Files from RPA.PDF import PDF @task def robot_spare_bin_python(): """Insert the sales data for the week and export it as a PDF""" browser.configure( slowmo=100, ) open_the_intranet_website() log_in() download_excel_file() fill_form_with_excel_data() collect_results() export_as_pdf() log_out() def open_the_intranet_website(): """Navigates to the given URL""" browser.goto("https://robotsparebinindustries.com/") def log_in(): """Fills in the login form and clicks the 'Log in' button""" page = browser.page() page.fill("#username", "maria") page.fill("#password", "thoushallnotpass") page.click("button:text('Log in')") def fill_and_submit_sales_form(sales_rep): """Fills in the sales data and click the 'Submit' button""" page = browser.page() page.fill("#firstname", sales_rep["First Name"]) page.fill("#lastname", sales_rep["Last Name"]) page.select_option("#salestarget", str(sales_rep["Sales Target"])) page.fill("#salesresult", str(sales_rep["Sales"])) page.click("text=Submit") def download_excel_file(): """Downloads excel file from the given URL""" http = HTTP() http.download(url="https://robotsparebinindustries.com/SalesData.xlsx", overwrite=True) def fill_form_with_excel_data(): """Read data from excel and fill in the sales form""" excel = Files() excel.open_workbook("SalesData.xlsx") worksheet = excel.read_worksheet_as_table("data", header=True) excel.close_workbook() for row in worksheet: fill_and_submit_sales_form(row) def collect_results(): """Take a screenshot of the page""" page = browser.page() page.screenshot(path="output/sales_summary.png") def export_as_pdf(): """Export the data to a pdf file""" page = browser.page() sales_results_html = page.locator("#sales-results").inner_html() pdf = PDF() pdf.html_to_pdf(sales_results_html, "output/sales_results.pdf") def log_out(): """Presses the 'Log out' button""" page = browser.page() page.click("text=Log out")

Let's run the robot one final time.

A new sales_results.pdf file appears in the output directory, containing the sales data! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

If you click on the PDF document, VS Code might suggest finding a suitable extension for viewing PDF documents. You can accept and install an extension!

What we learned

  • You can create a PDF file from HTML content using the RPA.PDF library and the html_to_pdf() function.