RPA.DocumentAI.Base64AI

module RPA.Base64AI

class RPA.DocumentAI.Base64AI.Base64AI

Library to support Base64.ai service for intelligent document processing (IDP).

Library requires at the minimum rpaframework version 19.0.0.

Service supports identifying fields in the documents, which can be given to the service in multiple different file formats and via URL.

Robot Framework example usage

Python example usage

from RPA.DocumentAI.Base64AI import Base64AI from RPA.Robocorp.Vault import Vault secrets = Vault().get_secret("base64ai-auth") baselib = Base64AI() baselib.set_authorization(secrets["email-address"], secrets["apikey"]) result = baselib.scan_document_file( "invoice.pdf", model_types="finance/invoice,finance/check/usa", ) for r in result: print(f"Model: {r['model']}") for key, props in r["fields"].items(): print(f"FIELD {key}: {props['value']}") print(f"Text (OCR): {r['ocr']}")

Portal example: https://github.com/robocorp/example-idp-base64


variable BASE_URL

BASE_URL = 'https://base64.ai'

variable ROBOT_LIBRARY_DOC_FORMAT

ROBOT_LIBRARY_DOC_FORMAT = 'REST'

variable ROBOT_LIBRARY_SCOPE

ROBOT_LIBRARY_SCOPE = 'GLOBAL'

method filter_matching_signatures

filter_matching_signatures(match_response: Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]], confidence_threshold: float = 0.8, similarity_threshold: float = 0.8)

Gets through all the recognized signatures in the queried image and returns only the ones passing the confidence & similarity thresholds.

Additionally, this keyword simplifies the original input match_response structure and returns a dictionary with all the detected and accepted reference signatures as keys, and lists of similar enough query signatures as values.

  • Each reference signature (key) is a tuple of (index, coordinates).
  • Each query signature (sub-value) is a dictionary of {index, coords, similarity}.
  • The coordinates describe the bounding-box enclosing the detected signature portion from the original image, as follows: (left, top, right, bottom) corners.

Use the original match_response object and the indexes from here if you need to retrieve extra details not found here (e.g. confidence score). Use the Get Signature Image to save and preview the image crop belonging to the signature of choice.

Parameters
  • match_response – The raw JSON-like response retrieved with the Get Matching Signatures keyword.
  • confidence_threshold – The minimum accepted confidence score (0.0-1.0) for a candidate to be considered a signature. (to avoid false-positives)
  • similarity_threshold – The minimum accepted similarity score (0.0-1.0) for a query signature to be considered an alike signature. (to discard different or fraudulent signatures)
  • Returns: A dictionary of accepted reference signatures and their similar ones found in the queried image.

Example: Robot Framework

Example: Python

matches = lib.filter_matching_signatures(sigs) print(matches)

method get_fields_from_prediction_result

get_fields_from_prediction_result(prediction: Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]])

Helper keyword to get found fields from a prediction result. For example see Scan Document File or Scan Document URL keyword.

  • Parameters: prediction – prediction result dictionary
  • Returns: list of found fields

method get_matching_signatures

get_matching_signatures(reference_image: Union[Path, str], query_image: Union[Path, str])

Returns a list of matching signatures found from the reference into the queried image.

The input images can be paths to the files or URLs.

The output JSON-like dictionary contains all the details from the API, like the detected signatures in both the reference and query image and for every such signature, its bounding-box geometry, confidence and similarity score. Use the Filter Matching Signatures over this value to get a simpler structure.

Parameters
  • reference_image – The reference image (jpg/png) to check query signatures against. (e.g. driving license, ID card)
  • query_image – The query image containing signatures similar to the ones from the reference image. (e.g. signed contract, bank check)
  • Returns: A JSON-like dictionary revealing recognized signatures and how much they resemble with each other.

Example: Robot Framework

Example: Python

from RPA.DocumentAI.Base64AI import Base64AI lib = Base64AI() sigs = lib.get_matching_signatures( "driving-license.jpg", "signed-check.png" )

Portal example: https://github.com/robocorp/example-signature-match-assistant


method get_signature_image

get_signature_image(match_response: Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]], *, index: int, reference: bool = False, path: Optional[Union[Path, str]] = None)

Retrieves and saves locally the image cut belonging to the provided index.

The image data itself is provided with the original match_response object as base64 encoded content. This utility keyword retrieves, decodes and saves it on the local disk customized with the path parameter. By default, the searched index is considered a query image, switch to the reference type by enabling it with the reference parameter.

Parameters
  • match_response – The raw JSON-like response retrieved with the Get Matching Signatures keyword.
  • index – The image ID (numeric) found along the coordinates in the output of the Filter Matching Signatures keyword. (the list order is stable)
  • reference – Set this to True if you’re looking for a reference (not query) image instead. (off by default)
  • path – Set an explicit output path (including file name) for the locally saved image. (uses the output directory as default)
  • Returns: The image path of the locally saved file.

Example: Robot Framework

Example: Python

qry_sig = list(matches.values())[0][0] path = lib.get_signature_image(sigs, index=qry_sig["index"]) print("Preview query signature image crop: ", path)

method get_user_data

get_user_data()

Get user data including details on credits used and credits remaining for the Base64 service.

Returned user data contains following keys:

  • givenName
  • familyName
  • email
  • hasWorkEmail
  • companyName
  • numberOfCredits
  • numberOfPages
  • numberOfUploads
  • numberOfCreditsSpentOnDocuments (visible if used)
  • numberOfCreditsSpentOnFaceDetection (visible if used)
  • numberOfCreditsSpentOnFaceRecognition (visible if used)
  • hasActiveAwsContract
  • subscriptionType
  • subscriptionPeriod
  • tags
  • ccEmails
  • status
  • remainingCredits (calculated by the keyword)
  • Returns: object containing details on the API user

Robot Framework example:

Python example:

userdata = baselib.get_user_data() print(f"I have still {userdata['remainingCredits']} credits left")

method scan_document_file

scan_document_file(file_path: str, model_types: Optional[Union[str, List[str]]] = None, mock: bool = False)

Scan a document file. Can be given a model_types to specifically target certain models.

Parameters
  • file_path – filepath to the file
  • model_types – single model type or list of model types
  • mock – set to True to use /mock/scan endpoint instead of /scan
  • Returns: result of the document scan

Robot Framework example:

Python example:

result = baselib.scan_document_file( "./files/Invoice-1120.pdf", model_types="finance/invoice,finance/check/usa", ) for r in result: print(f"Model: {r['model']}") for key, val in r["fields"].items(): print(f"{key}: {val['value']}") print(f"Text (OCR): {r['ocr']}")

method scan_document_url

scan_document_url(url: str, model_types: Optional[Union[str, List[str]]] = None, mock: bool = False)

Scan a document URL. Can be given a model_types to specifically target certain models.

Parameters
  • url – valid url to a file
  • model_types – single model type or list of model types
  • mock – set to True to use /mock/scan endpoint instead of /scan
  • Returns: result of the document scan

Robot Framework example:

Python example:

result = baselib.scan_document_url( "https://base64.ai/static/content/features/data-extraction/models//2.png" ) for r in result: print(f"Model: {r['model']}") for key, props in r["fields"].items(): print(f"FIELD {key}: {props['value']}") print(f"Text (OCR): {r['ocr']}")

method set_authorization

set_authorization(api_email: str, api_key: str)

Set Base64 AI request headers with email and key related to API.

Parameters
  • api_email – email address related to the API
  • api_key – key related to the API

Robot Framework example:

Python example:

secrets = Vault().get_secret("base64ai-auth") baselib = Base64AI() baselib.set_authorization(secrets["email-address"], secrets["apikey"])