Scrape price data from a Shopify shop
Scrape product and price data from Shopify-based store using Crul
We found this new really cool tool Crul, and wanted to put it in use! While it can handle data from APIs as well, in this case we are scraping product name and price data from a e-commerce site that runs on Shopify, and store it to an Excel file with a timestamp.
This might not work out of the box with ANY Shopify store, but go ahead and try. It's easy to edit the query.
What you will learn with this example
- How to invoke Crul queries from a robot
- How to extend Robot Framework robot with Python class
- How to write data to a locally stored Excel
Prerequisites
- Get a hosted account and credentials from Crul. Try your luck in their Slack.
- Set up a Vault in your Robocorp Control Room with name
Crul
and have one key calledapikey
that has your Crul API key in this format:crul [KEYHERE-IT-IS-LONG]
.
Crul query explained
To get a better idea of how a Crul query works in general, check out the documentation and quickstart!
Below is query that included as an example in the crul-query.txt
file of this robot. This query has been broken up by stage and documented. It's a verbose explanation as this could be your first time seeing a Crul query, but reach out to Crul any time and we would love to answer any questions or help you write your own queries!
- Opens the provided URL, renders the page, and transforms into a tabular structure which includes the html, and hashes of the html for future grouping.
- Filters the page data to only include elements matching the filter expression.
- Adds a _sequence column to each row containing the row number.
- Processes the element HTML into a row for each of its children.
- Filters the page data to only include elements matching the filter expression.
- Include only relevant columns.
- Groups page elements by the parent hash.
- Renames column
- Renames column
- Sorts according to the previously added sequence number to preserve the order of elements as they appear on the page.
- Adds a timestamp to each row.
- Include only relevant in the final set of results.
Robot explained
Robot itself is straightforward, and showcases how to use Python to extend the capabilities. Crul communications is wrapped in a multipurpose class that can be reused with other robots. With that's there is two key pieces.
This part gets a query string from a text file, calls the Crul wrapper and creates a table from the results.
This for-loop iterates through data and appends the rows to an Excel.
Running it
You can edit the Crul query in the crul-query.txt file, but apart from that running is straightforward. Do it in your VS Code by hitting the magic command and choose "Run Robot" (make sure to have our extensions installed), or you can set up a Process in the Control Room and run it there.
Technical information
Last updated
16 December 2022License
Apache License 2.0