Desktop automation and RPA
Despite the ever-growing popularity of web-based applications in the workplace, many business processes still involve desktop applications for reasons of legacy, security, or hardware needs. Being able to control these types of applications programmatically opens up a world of automation possibilities.
Desktop automation allows your robot to accomplish tasks acting like a human operator, directly controlling a desktop interface. This includes operations like opening and closing applications, simulating mouse movements and clicks, triggering keyboard keys and shortcuts, taking screenshots.
Compared to browser automation, desktop automation is a more varied and complex field. However, the main idea stays the same across operating systems and access methods: you need a way to point the robot to specific parts of the desktop screen, and once a target is identified, the robot can be instructed to interact with it by clicking on it, typing on it, dragging it, etcetera.
The Robocorp stack supports different desktop automation approaches, which differ mostly in how the "targets" for the robots are identified and managed.
In image template-based desktop automation, you provide the robot with screenshots of the interface's parts that it needs to interact with, like a button or input field. The images are saved together with your automation code. The robot will compare the image to what is currently displayed on the screen and find its target.
Using this same technique, you can also find a specific part of the interface on the screen and then add an offset in pixels, telling the robot, for example, to "click 200 pixels on the right" of the image that you are providing.
This technique enables automating environments like Citrix and other remote terminals where you don't have access to the target machine itself, but effectively only to a "video stream" of the desktop.
Our VS Code extensions provides a set of UI tools to take, manage, and store image-based locators in your automation projects.
When using this approach, these are some of the challenges you should be aware of:
- System settings can impact the recognition of the images: How the interface elements look on a screen depends on system settings like color schemes, transparency, and system fonts. Images taken on a system might end up looking different than the target system, and the robot might not recognize them, stopping the process.
- Screen resolution is a factor: A different screen resolution might cause elements on the screen to move around or change in size.
- Different versions of the same operating system can differ visually: Operating systems provide the general guidelines of how the interface elements are drawn on the screen. If the operating system is updated, image templates might stop being recognized.
To mitigate this type of issues and make your automation less fragile, we recommend:
- sticking to default settings for fonts and colors
- using accessibility options to reduce visual effects like shadows and transparencies
- if possible, using the target machine to take the locator images to ensure that all settings are the same.
If you are automating a Microsoft Windows application, instead of using images, you can try to target the actual UI elements within it, referring to them by their identifiers.
To get to the identifiers, you can inspect the running application using Accessibility Insights.
If it is available to you, this approach will make your automation less fragile. The identifiers will stay the same across operating system versions and are not impacted by screen resolution or other visual settings.
Unfortunately, not all Microsoft Windows applications can be inspected, and your results might vary depending on the framework used to develop the application. Also, suppose you are accessing the system remotely using Citrix or a similar protocol. In that case, this option will not be available to you, and you will have to go with image-based automation.
Another available option is to create locators using OCR (Optical Character Recognition). Using this approach, you can find elements on the screen by their textual content. For example, you could find the "Send" button by telling the robot to click wherever on the screen the "Send" text appears.
This approach is similar to the image template-based one and shares some of the same weaknesses:
- You have to make sure that the text appears only once on the screen, so choose your targets wisely.
- Test that the OCR engine can find and correctly recognize the text: it might be "defeated" by system settings like opacity, shadows, low contrast, etc.
In general, OCR locators are quite fragile and should be used as a last resort. The primary use case for OCR capabilities in desktop automation is reading text information from specified screen regions.
In most desktop applications, a lot can be accomplished by using keyboard shortcuts, and since we can control the keyboard, we can use them directly. Check the documentation for the application you are automating to see what is available.
Check this Windows desktop application robot article for an example.
Here's a video of automating a video game using keyboard shortcuts:
The RPA Framework set of open-source libraries that Robocorp develops supports all of the approaches we talked about.
For image-based desktop automation, you should use the cross-platform
RPA.Desktop library, which provides:
- Mouse and keyboard input emulation
- Starting and stopping applications
- Finding elements through image template matching
- Scraping text from given regions
- Taking screenshots
- Clipboard management
- OCR selector support
For textual locator-based desktop automation using identifiers in Microsoft Windows, you can use the
RPA.Desktop.Windows library, which also provides Windows-specific keywords to open and close applications.
If you are automating macOS applications, in addition to the functionality provided by the
RPA.Desktop library, you can take advantage of the automation capabilities that are included with the operating system itself. For example, using the
Run keyword from the
Operating system Robot Framework library which allows you to run any arbitrary command on your Mac, you can run AppleScript instructions via the
osascript command, or trigger Automator workflows.
Once you have your desktop automation working locally and in the target system, you are ready to benefit from the orchestration and hosting features of Control Room.
Control Room provides you a centralized place to manage your robot code. You can then trigger robots, get reports and traceability, set up access control, and many other features.
Once you have identified the machine that the automation will run (it could be the same computer you used for development, or, more likely, another physical or virtual machine), select the best option for running according to the guidance for Find the correct setup for you. Check out Setup for Windows Desktop automation as well, to make sure that your Worker will run correctly.
Also, by using the Assistant of Control Room, you can set up attended desktop automation workflows where robots and human operators can work together in accomplishing a task.
Here are complete robot examples demonstrating desktop automation with the Robocorp stack:
Simple Windows Calculator Robot
A very simple robot that just interacts with Windows 10 Calculator in different ways.
Windows Desktop App Robot
This software robot opens the Spotify desktop application, searches for the given song, and plays the song. The robot demonstrates the basic Windows-automation capabilities of the RPA Framework, using keyboard navigation.
Travel Directions Desktop Automation Robot On Mac OS
This example robot demonstrates the use of image templates and keyboard shortcuts to find travel directions between two random locations on Earth using the macOS Maps application. Also, it demonstrates the use of desktop automation and browser automation combined in one robot.
Desktop Automation With Image Recognition And OCR
This robot demonstrates automating a desktop application with image recognition and OCR, interacting with the open-source accounting software GnuCash.