Environment validation and hardening

The Problem

A common concern with any software with external dependencies is security:
How do you verify that the dependencies do not bring in malware or unwanted content?

Somehow this tends to be connected to using open-source libraries. Still, the fact is that it does not matter if the dependency is a proprietary Windows DLL, C++ library, or open-source Python library. Implementing any software always needs some basic vigilance about the security of the sources used. In the case of open-source, the concerns are somewhat more accessible because anyone can follow the codes and do whatever level of analysis.

However, manually checking these kinds of things is so much of an effort that machines are needed. To help in fully utilizing different methods and tools around the topic, the Robocorp toolchain and RCC enable some crucial help.

Manage your dependencies

To begin with, you need to know what dependencies you are using, and in this dependency versioning is a crucial part that you can use to control when you want things to update.

๐Ÿ‘‰ pip install something gives you ANY version of the package named "something", so the result of running that can differ from day-to-day or even hour-to-hour, so please do not do that ๐Ÿ˜‰

The fact that the dependencies of your robot are listed as code in conda.yaml next to your robot is already significantly more secure than having them listed in some setup guide that relies on multiple people to run them in the same way. So even just using the templates and our basic robot structure, you are in good shape.

Locking down all dependencies using the freeze files

The next step in locking down your dependencies is provided out-of-the-box by RCC. If you just run your robot, the output -folder will get a file that looks like: environment_windows_amd64_freeze.yaml. This file is your environment freeze file that contains a fully resolved set of libraries loaded into the environment with fully locked-down version numbers. If you build your environment with this YAML file, you are close as you can get to building the exact environment where even all the sub-dependencies are locked down.

The following steps are needed to "freeze" the dependencies:

  1. Copy the freeze file to the root of your robot file (or the folder where you keep the conda.yaml)
  2. Edit/check that your robot.yaml has the following section pointing to your environment YAML files:
    environmentConfigs: - environment_windows_amd64_freeze.yaml - environment_linux_amd64_freeze.yaml - environment_darwin_amd64_freeze.yaml - conda.yaml
    • RCC handles the configs above in order.
    • If a freeze file that matches your operating system exists, it is used first. That's it. The next run of your bot will be running on an environment where you see exactly what libraries and versions were used.

And if you need to update libraries, you update the conda.yaml, delete the freeze files, rerun the bot and take the updated freeze file from the output-folder. Checkout the RCC docs here

Pre-build Environments

Pre-building your environments is pretty much the ultimate level you can get to when making sure the environment for the robot is exactly what you want.

The support of RCC Shared Holotree since RCC version v11.14.3 enables us to pre-build environments so that they can be moved as zip files into the executing machines in a way that all users on the target machine get their environments from the single cache source. Environment import/export commands are in RCC to handle the actions.

This feature enables some massive things:

  • Your production machines do not need to spend time building the environment if you get them there before handled
  • If your environment has things from private PyPIs etc., you can build the environment in a separate machine that has access to those and deploy the resulting environment.
  • ...and probably the most significant feature for absolute security; you can deploy any virus scanning or validation tools available on the environment zip.

Using the pre-built environment means that only the exact files in the environment are used when running the robot. The limitations are that both the building machine and the executing machine must have the Shared Holotree enabled, and the operating system must be the same.

You have to pre-build the environments on the same operating system as the target system.
Windows for Windows, Mac for Mac, Linux for Linux. This is due to platform-specific dependencies.
If you only target one platform, like Windows, you do not need to pre-build for all platforms.

To export your environment:

  1. Build your environment once on the source machine
    • Running your bot always builds the environment, but you can also only build the environment by calling:
      rcc ht vars <your conda.yaml>
  2. Export the environment into a .zip file:
    • RCC v11.20.0 or later can export the environment just by pointing to your robot's robot.yaml:
      rcc ht export --robot robot.yaml
    • You can also export environments based on the hash of your conda.yaml:
      rcc ht vars <your conda.yaml>
      rcc ht export <hash from the previous command>
    • The result will be a hololib.zip file in the folder where you ran the command
  3. Move the hololib.zip to the target machine and import it:
    • rcc ht import hololib.zip
    • As an example: On Windows, a Playwright environment that contains browsers is ~950MB on disk, but when exported, the zip is ~400MB

๐Ÿš€ You are done.

If you now run the bot on the target machine, it should jump to execution without the build phases.

There are a lot of extra features like exporting multiple environments at the same time in RCC, and more features are coming to ease the use of this, so it is always good to follow the documentation in the RCC repository directly.

This is quite an epic feature and has been a bit of an epic quest to get done, so feedback is welcome, and we always welcome stars in RCC repository if you like it ๐Ÿ˜‰

Last edit: October 14, 2022