Retry logic

Sometimes a process, or a part of a process, fails. There can be many reasons for failure.

Possible reasons for failure

The failure might be due to the system under automation having a temporary bad moment when the software robot tries to log in, press a button to proceed to the next step in the process, or scrape data from a web element that is not loaded yet.

Unrecoverable failures

Some of the failures might be such that human intervention is required to proceed. In those cases, there is not much the software robot can do, other than report the failure and continue with other tasks in the work item queue.

Recoverable (temporary) failures

Some of the failures might be temporary. In those cases, it is better to retry the step or steps instead of giving up immediately. Giving up means that a human needs to get involved, thus reducing the benefits of process automation.

Identify potential points of failure

Try to identify the potential parts of the process flow where the system under automation might fail for any reason.

Document the points of failure

Document the identified potential failure points. Determine if the failure might be such that retrying the task or step could enable the software robot to proceed with the task execution.

Implement retry logic case-by-case

Implement retry logic for all the actions that might potentially fail, but are such that retry might remedy the situation.

Typical retry logic might include deciding how many times to retry and how long to wait before retrying. How many times to retry and how long to wait varies case-by-case.

See Wait Until Keyword Succeeds Robot Framework keyword documentation for an example of retrying.

Avoid failures in the first place

The system under automation might have well-known outages due to production deployments. If the deployment times are scheduled, the best solution is to schedule the software robot to execute the tasks outside those outages.

Report failures that could not be resolved

Sometimes retry attempts fail, too. In these cases, the software robot should report the failures in a way that a human can step in and remedy the situation.

The point of retrying

The point of process automation is to reduce the need for a human to do manual tasks. The software robot should try its best to recover from temporary failures to minimize the need for human intervention. Many of the typical failures can be overcome by waiting a bit and trying again.