Analyze Document

Analyzes an input document for relationships between detected items

Arguments

Argument Type Default value Description
image_file str Nonefilepath (or object name) of image file
json_file str Nonefilepath to resulting json file
bucket_name str Noneif given then using image_file from the bucket
param image_file:
 filepath (or object name) of image file
param json_file:
 filepath to resulting json file
param bucket_name:
 if given then using image_file from the bucket
return:analysis response in json

Create Bucket

Create S3 bucket with name

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
param bucket_name:
 name for the bucket
return:boolean indicating status of operation

Create Queue

Create queue with name

Arguments

Argument Type Default value Description
queue_name str None[description], defaults to None
param queue_name:
 [description], defaults to None
return:create queue response as dict

Delete Bucket

Delete S3 bucket with name

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
param bucket_name:
 name for the bucket
return:boolean indicating status of operation

Delete Files

Delete files in the bucket

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
files list Nonelist of files to delete
param bucket_name:
 name for the bucket
param files:list of files to delete
return:number of files deleted or False

Delete Message

Delete message in the queue

Arguments

Argument Type Default value Description
receipt_handle str Nonemessage handle to delete
param receipt_handle:
 message handle to delete
return:delete message response as dict

Delete Queue

Delete queue with name

Arguments

Argument Type Default value Description
queue_name str None[description], defaults to None
param queue_name:
 [description], defaults to None
return:delete queue response as dict

Detect Document Text

Detects text in the input document.

Arguments

Argument Type Default value Description
image_file str Nonefilepath (or object name) of image file
json_file str Nonefilepath to resulting json file
bucket_name str Noneif given then using image_file from the bucket
param image_file:
 filepath (or object name) of image file
param json_file:
 filepath to resulting json file
param bucket_name:
 if given then using image_file from the bucket
return:analysis response in json

Detect Entities

Inspects text for named entities, and returns information about them

Arguments

Argument Type Default value Description
text str NoneA UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
langenlanguage code of the text, defaults to "en"
param text:A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
param lang:language code of the text, defaults to "en"

Detect Sentiment

Inspects text and returns an inference of the prevailing sentiment

Arguments

Argument Type Default value Description
text str NoneA UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
langenlanguage code of the text, defaults to "en"
param text:A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
param lang:language code of the text, defaults to "en"

Download Files

Download files from bucket to local filesystem

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
files list Nonelist of S3 object names
target_directory str Nonelocation for the downloaded files, default current directory
param bucket_name:
 name for the bucket
param files:list of S3 object names
param target_directory:
 location for the downloaded files, default current directory
return:number of files downloaded

Get Cells

[summary]

return:[description]

Get Document Analysis

Get the results of Textract asynchronous Document Analysis operation

Arguments

Argument Type Default value Description
job_id str Nonejob identifier, defaults to None
max_results int 1000number of blocks to get at a time, defaults to 1000
next_token str Nonepagination token for getting next set of results, defaults to None
param job_id:job identifier, defaults to None
param max_results:
 number of blocks to get at a time, defaults to 1000
param next_token:
 pagination token for getting next set of results, defaults to None
return:dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Examples

*** Tasks ***
Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Analysis  ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

Get Document Text Detection

Get the results of Textract asynchronous Document Text Detection operation

Arguments

Argument Type Default value Description
job_id str Nonejob identifier, defaults to None
max_results int 1000number of blocks to get at a time, defaults to 1000
next_token str Nonepagination token for getting next set of results, defaults to None
param job_id:job identifier, defaults to None
param max_results:
 number of blocks to get at a time, defaults to 1000
param next_token:
 pagination token for getting next set of results, defaults to None
return:dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Examples

*** Tasks ***
Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Text Detection    ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

Get Pages And Text

Get pages and text out of Textract response json

Arguments

Argument Type Default value Description
textract_response dictJSON from Textract
param textract_response:
 JSON from Textract
return:dictionary, page numbers as keys and value is a list of text lines

Get Tables

[summary]

return:[description]

Get Words

[summary]

return:[description]

Init Comprehend Client

Initialize AWS Comprehend client

Arguments

Argument Type Default value Description
aws_key_id str Noneaccess key ID
aws_key str Nonesecret access key
region str NoneAWS region
use_robocloud_vault bool Falseuse secret stored into Robocloud Vault
param aws_key_id:
 access key ID
param aws_key:secret access key
param region:AWS region
param use_robocloud_vault:
 use secret stored into Robocloud Vault

Init S3 Client

Initialize AWS S3 client

Arguments

Argument Type Default value Description
aws_key_id str Noneaccess key ID
aws_key str Nonesecret access key
region str NoneAWS region
use_robocloud_vault bool Falseuse secret stored into Robocloud Vault
param aws_key_id:
 access key ID
param aws_key:secret access key
param region:AWS region
param use_robocloud_vault:
 use secret stored into Robocloud Vault

Init Sqs Client

Initialize AWS SQS client

Arguments

Argument Type Default value Description
aws_key_id str Noneaccess key ID
aws_key str Nonesecret access key
region str NoneAWS region
queue_url str NoneSQS queue url
use_robocloud_vault bool Falseuse secret stored into Robocloud Vault
param aws_key_id:
 access key ID
param aws_key:secret access key
param region:AWS region
param queue_url:
 SQS queue url
param use_robocloud_vault:
 use secret stored into Robocloud Vault

Init Textract Client

Initialize AWS Textract client

Arguments

Argument Type Default value Description
aws_key_id str Noneaccess key ID
aws_key str Nonesecret access key
region str NoneAWS region
use_robocloud_vault bool Falseuse secret stored into Robocloud Vault
param aws_key_id:
 access key ID
param aws_key:secret access key
param region:AWS region
param use_robocloud_vault:
 use secret stored into Robocloud Vault

List Buckets

List all buckets for this account

return:list of buckets

List Files

List files in the bucket

Arguments

Argument Type Default value Description
bucket_namename for the bucket
param bucket_name:
 name for the bucket
return:list of files

Receive Message

Receive message from queue

return:message as dict

Send Message

Send message to the queue

Arguments

Argument Type Default value Description
message str Nonebody of the message
message_attributes dict Noneattributes of the message
param message:body of the message
param message_attributes:
 attributes of the message
return:send message response as dict

Set Robocloud Vault

Set Robocloud Vault name

Arguments

Argument Type Default value Description
vault_nameRobocloud Vault name
param vault_name:
 Robocloud Vault name

Start Document Analysis

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Arguments

Argument Type Default value Description
bucket_name_in str Nonename of the S3 bucket for the input object, defaults to None
object_name_in str Nonename of the input object, defaults to None
object_version_in str Noneversion of the input object, defaults to None
bucket_name_out str Nonename of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out str textract_outputname of the S3 bucket for the analysis result object,

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

param bucket_name_in:
 name of the S3 bucket for the input object, defaults to None
param object_name_in:
 name of the input object, defaults to None
param object_version_in:
 version of the input object, defaults to None
param bucket_name_out:
 name of the S3 bucket where to save analysis result object, defaults to None
param prefix_object_out:
 name of the S3 bucket for the analysis result object,
return:job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

Start Document Text Detection

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Arguments

Argument Type Default value Description
bucket_name_in str Nonename of the S3 bucket for the input object, defaults to None
object_name_in str Nonename of the input object, defaults to None
object_version_in str Noneversion of the input object, defaults to None
bucket_name_out str Nonename of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out str textract_outputname of the S3 bucket for the analysis result object,

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

param bucket_name_in:
 name of the S3 bucket for the input object, defaults to None
param object_name_in:
 name of the input object, defaults to None
param object_version_in:
 version of the input object, defaults to None
param bucket_name_out:
 name of the S3 bucket where to save analysis result object, defaults to None
param prefix_object_out:
 name of the S3 bucket for the analysis result object,
return:job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

Upload File

Upload single file into bucket

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
filename str Nonefilepath for the file to be uploaded
object_name str Nonename of the object in the bucket, defaults to None
param bucket_name:
 name for the bucket
param filename:filepath for the file to be uploaded
param object_name:
 name of the object in the bucket, defaults to None
return:tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

Upload Files

Upload multiple files into bucket

Arguments

Argument Type Default value Description
bucket_name str Nonename for the bucket
files list Nonelist of files (2 possible ways, see above)
param bucket_name:
 name for the bucket
param files:list of files (2 possible ways, see above)
return:number of files uploaded
Giving files as list of filepaths:
['/path/to/file1.txt', '/path/to/file2.txt']
Giving files as list of dictionaries (including filepath and object name):
[{'filepath':'/path/to/file1.txt', 'object_name': 'file1.txt'}, {'filepath': '/path/to/file2.txt', 'object_name': 'file2.txt'}]