Fetching Data in UiPath
It All Begins with Data
I started my career in IT developing database solutions using Microsoft Access. I quickly learned that under the hood, every program is just a high-speed data processor. Every program takes data as an input, does something to it, and produces output. No matter how it is presented to the user, whether a spreadsheet, a form, paragraphs, reports, images, emails, or something else, there is data behind it.
I loved organizing data. When I talked to clients about what they wanted the system to do, I’d be talking in English, but thinking in data. I’d have dreams about tables and relationships. I learned to always start with data and I still think that way.
Let’s look at UiPath from a data layer perspective. UiPath has so many built-in Activities to fetch, manipulate, and output data that it’s easy to lose the forest through the trees. In this post, let’s focus on the input side of a process. Let’s categorize the most common types of data sources that UiPath can use as input, and some of the common UiPath Activities available to fetch each type.
The Data Journey
No matter the type of data source, the journey of a piece of data is the same for every process. First and foremost, a connection to the responding system is established, if it’s not local. Second, a command is given to fetch the data. The request may take many forms: an HTTP request, a read file command, or a form presented to a human, to name a few. Every system has a way to receive these requests and reply to them, (even humans). The asking system then waits for a response. Assuming it did not timeout, the responding system bundles up the response and sends it back to the requestor in some pre-defined format. Once the data is received from the source, it is temporarily stored in memory, to pass along to the rest of the process, where the data will be transformed in some way to produce the output.
Types of Data Sources in UiPath
User Input
There’s just no way around it, sometimes we need humans to answer questions. The most straightforward way to get an answer from a human is to present them with some kind of a form, and wait for them to hit submit. Humans can handle very simple forms, but are unfortunately very error-prone, which must be anticipated and handled. Input screens must be presented in the simplest way possible.
When fetching | Activity | Returned As | Tips |
---|---|---|---|
Single Input | Input Dialog | String | |
Multiple Inputs | Show Form, Get Form Values [Form.Activities] | Dictionary (Of String, String) | use global variable to store form values |
System Info
Another source of data is the computer itself. Computers are good at knowing the current date, time, timestamps, starting a timer etc. You can fetch system information such as the current user, file information, and the current environment variables. This is normally done through code, which you can assign to a variable. Here are a few VB examples.
When Fetching | Code or Activity | Returned As |
---|---|---|
Current Date | Date.Today | System.Date |
Timestamp | DateTime.Now | System.DateTime |
Current User | Environment.UserName | String |
Domain Name | Environment.UserDomainName | String |
File Info | Get File Info Activity | System.IO.FileInfo (Object) |
Current Directory | Environment.CurrentDirectory | String |
Folder Info | Get Folder Info Activity | System.IO.DirectoryInfo (Object) |
Spreadsheets
There are many advantages to using spreadsheets to store records of data. Basically a stand-alone table, a spreadsheet file is easy to read, understand, update, and share. Ubiquitous and user-friendly, most users today already have the skills to work with spreadsheets. Spreadsheets are ideal for smaller tables, and for projects where the user needs to see the data. Spreadsheets do have their drawbacks, however. There are data limits with spreadsheets, and they are not a good choice for big data (over a million rows). There are several methods in UiPath to fetch data from spreadsheets.
Source | Activity | Returned as | Tips |
---|---|---|---|
Excel | Excel Process Scope > Use Excel File > Read Range [Excel.Activities] | DataTable | Excel must be installed; File must not be open |
Excel | Read Range Workbook [Excel.Activities] | DataTable | Excel installation not needed |
Excel (Office 365) | Read Range [MicrosoftOffice365.Activites] | DataTable | slower than using local files |
Excel (ODBC) | Run Query [Database.Activites] | DataTable | Provider: “System.Data.Odbc” |
Google Sheets | Read Range [GSuite.Activities] | DataTable |
When fetching data from a spreadsheet, UiPath Activities return the data tucked inside a DataTable Object variable, which has its own syntax. Learn more. Care must be taken to validate the data, to ensure that data is not missing and is in the correct format and type. Sometimes the data must be manipulated to get it into the correct type. The data type of each column is automatically set when data is returned (usually a string or general), but you can loop through the DataTable variable and convert the data using a conversion function. Use the Invoke Code Activity and pass in the DataTable variable as an argument. Here’s an example:
'to open data table and loop through
dtMyTable.AsEnumerable().ToList().ForEach(
Sub(rs)
rs("Date")=CDate(rs("Date"))
End Sub
)
Instead of the above code, however, I highly recommend a package called BalaReva.EasyDataTable.Activities. It contains several Activities to make manipulating data in a DataTable variable much easier. It gives the developer the ability to view the table after it’s imported, and convert the data type of a column if necessary. Learn how to use it here.
Emails
Email has been around a long time, and there are now a myriad of protocols, applications, and methods used to deal with it. It can all get a bit confusing. For now, let’s focus on the most common email applications, and the latest, easiest ways to fetch a bunch of them.
Source | Activity | Returned As | Tips |
---|---|---|---|
GMail | Get IMAP Mail Messages [Mail.Activities] | List<MailMessage> | server: “imap.gmail.com” pwd: set up app password for UiPath port: 993 IMAP must be enabled in GMail settings |
Gmail | Get Email List [GSuite.Activities] | List<GMailMessage> | |
Outlook | Get Outlook Mail Messages [Mail.Activities] | List<MailMessage> | |
Outlook 365 | Use above now |
Once you get a list, you can loop through it and fetch the details of each email, such as the Subject, Body, Sender, etc. You can also see if there are attachments and download them as well.
Word Processing
UiPath can extract text from various word-processing documents, such as Microsoft Word, Google Docs, and PDFs. Usually only the text is imported, not the formatting. When using OCR to extract text from images you must select an OCR Engine.
Source | Activity | Returned As | Tips |
---|---|---|---|
MS Word | Word Application Scope > Read Text [Word.Activities] | String | Must have Word installed |
Google Docs | Get Text [GSuite.Activities] | String | To extract a section, it must be defined in Doc as a section break |
PDFs (browser) | Get Text [UIAutomation.Activities] | String | For UI Automation project as background automation if PDF is digital |
PDFs (native) | Read PDF Text [PDF.Activities] | String | Ignores images |
PDFs (scanned) | Read PDF Using OCR > [Select OCR Engine] [PDF.Activities] | String | Both text & images Slower |
Websites
Extracting data from websites, also called data-scraping, web-scraping, or screen-scraping, is kind of UiPath’s thing. You can extract text, attributes (including image urls), and tablular data from any website using UiPath’s UIAutomation.Activities package. Start with the Use Application/Browser Activity to set the URL. Save it to a UiElement variable (like uiOutput) for use later.
Type | Activity | Returned As | Tips |
---|---|---|---|
Text | Get Text | String | |
Attribute | Get Attribute | String | to see list of attributes use UiExplorer, Inspector, or loop through uiOutput.Attributes VB Code: uiOutput.Get(“attributeName”) |
Table Data | Extract Table Data | DataTable | use Table Extraction wizard in Design |
Image | Get Attribute | String (of URL) | find img tag in Inpector to see if url is in “src” or “srcset” attribute |
Databases
Databases are specifically designed systems used to store, organize, and provide access to data. As the Internet has evolved to offer more and more information, databases have also evolved to store and retrieve massive amounts of data using different structures and methods. Databases range from Access or Foxpro sitting on your computer, to BigData tables that power Google. There are relational databases and unstructured, or NoSQL, databases. Most databases are secured, and read access must be granted by the database administrator in order to retrieve data. Each type of database uses a specific driver. A driver, or provider, is a translation software that sets a communication protocol between the requestor and the database. Sometimes it’s necessary to download the driver. Then, a connection string is created that holds details such as the driver to be used, the name of the database, username and password.
In UiPath, database Activities can be found in the UiPath.Database.Activites package. Some experimentation may be required to connect to your specific database. UiPath offers a connection string wizard to help with building and testing the connection string, and includes many standard drivers. Once connection is established, the Run Query Activity is used to tell the system exactly what data you want, using standard SQL such as “SELECT CustomerId, CustomerName from Customers”. All data is returned as a DataTable variable.
API Calls
API stands for Application Programming Interface, and is a way to get just the data out of a website, without the presentation layer you see on your browser. Think of it as a back door to the data. API programmers create several ways in, called endpoints, which receive calls to their server directly, and return just the data. Programmers use APIs all the time to refresh their content, to update a map, to save something to the database, etc. Some APIs are private, and allow access only if you have a special authorization key of some kind. Others are open, and allow anyone to make the call.
There are two approaches used for APIs: SOAP or REST. The main difference is the SOAP (Simple Object Access Protocol) returns data in XML format, while REST (Representational State Transfer) can return the data in JSON, HTML, XML, or plain text. Of these JSON is the most common. Websites almost always have API instructions published on their site, which gives other programmers the details of their endpoints, formats, and data returned. It is important to review these instructions to set up your call correctly. An API call is basically just a URL string. For example, if you copy and paste this into your browser, you’ll see data returned.
https://reqres.in/API/users?page=2
To make an API call in UiPath, you need the package UiPath.WebAPI.Activities. The HTTP Request Activity has a configure button that launches a form where you can enter the endpoint, the authorization key if needed, and different parameters, and test the returned data. Returned output is split into the status code (an integer – 200 means success) and the returned content (a string in whatever format).
Once received, the content string must be parsed into something useful. This is called deserialization. You can then drill down into that object to pick out the data you need. For example, the above API call returns this JSON-formatted string, lets save it a variable strOutput.
{
"page": 2,
"per_page": 6,
"total": 12,
"total_pages": 2,
"data": [
{
"id": 7,
"email": "michael.lawson@reqres.in",
"first_name": "Michael",
"last_name": "Lawson",
"avatar": "https://reqres.in/img/faces/7-image.jpg"
},
{
"id": 8,
"email": "lindsay.ferguson@reqres.in",
"first_name": "Lindsay",
"last_name": "Ferguson",
"avatar": "https://reqres.in/img/faces/8-image.jpg"
},
{
"id": 9,
"email": "tobias.funke@reqres.in",
"first_name": "Tobias",
"last_name": "Funke",
"avatar": "https://reqres.in/img/faces/9-image.jpg"
},
{
"id": 10,
"email": "byron.fields@reqres.in",
"first_name": "Byron",
"last_name": "Fields",
"avatar": "https://reqres.in/img/faces/10-image.jpg"
},
{
"id": 11,
"email": "george.edwards@reqres.in",
"first_name": "George",
"last_name": "Edwards",
"avatar": "https://reqres.in/img/faces/11-image.jpg"
},
{
"id": 12,
"email": "rachel.howell@reqres.in",
"first_name": "Rachel",
"last_name": "Howell",
"avatar": "https://reqres.in/img/faces/12-image.jpg"
}
],
"support": {
"url": "https://reqres.in/#support-heading",
"text": "To keep ReqRes free, contributions towards server costs are appreciated!"
}
The UiPath Activity Deserialize JSON accepts as input a JSON-formatted string, and converts it into a JSON object. So we enter our previous variable, strOutput, and save the converted output to a new variable. Lets call it objJSON.
Now, we can actually get our data:
Single value
objJSON("total")
Single value embedded
objJSON("support")("url")
Assign Records into a DataTable Variable
In Javascript, arrays are enclosed in square brackets, and objects are enclosed in curly brackets. We can see from the result above there is an array called “data”, which has a list of objects. So let’s take a shortcut and save just save these records into a DataTable variable using an assign activity. The method we will use is found in the newtonsoft.json namespace, so make sure this is imported into your project.
newtonsoft.json.JsonConvert.DeserializeObject(Of Datatable)(objJSON("data").tostring)
Integration Service
We can see from the API Calls section above that a bit of programming is required to get the data into a format that we can use. It is necessary to read and understand how each API has been exposed, and how the call must be crafted. Plus, the API can change, which might break any program that calls it.
To try to make it easier, UiPath has created its Integration Service. Here, the most common 3rd-party APIs are pre-programmed into objects called connectors. As of this writing, there are 144 connectors in UiPath’s Integration Service and more being added all the time. There are connectors for sites such as Microsoft One Drive, Microsoft 365, Salesforce, GMail, Google Drive, AWS, PayPal, Slack and many others. It is up to the programmer of the connector to make sure it is kept up-to-date with the latest API. This should solve any breaking issues, but keep in mind that when using the Integration Service, you are outsourcing its maintenance.
Integration Service handles several things for you: the connection credentials, the API call, and the return format of the data. Another cool feature is triggers. Triggers open up a two-way communication between the site and your automations. With triggers, the 3rd-party site itself notifies UiPath that some event has happened, say an email has been received, which you can subscribe to and then launch an automation. The trigger passes the notification to your process in the form of arguments. In this example it would send the objectID of the incoming email, not the email itself. In your process, you could then fetch the email by its objectID.
To use Integration Service to fetch data in Studio, the activity package corresponding to the IS connector must be installed into the project. Most of the latest UiPath official packages are written to correspond with the latest IS connector. Details can be found in each individual connector documentation. Keep in mind that connections are folder-specific. So you could theoretically have different credentials for different folders. So make sure that your Studio instance is pointing to the right folder when you use Integration Service activities.
Storage Buckets, Assets & Queues
In Orchestrator, UiPath has folder-specific storage areas reserved for holding data for use in processes. Unless limited, users and robots with access to the folder will be able to access this data. There are 3 types of data storage available: Assets, Storage Buckets, and Queues.
Assets are key-value pairs stored at the folder level. Each asset has a name (a key) and a value that can be text, boolean, or integer. Another special type of asset is the Credential Asset, which stores both the UserID and Password. All assets are encrypted. Assets are set up in Orchestrator with their values, although can be set from within the process as well.
Storage Buckets are basically just an online, shared folder used to organize files that many processes can use as input. Any files, such as datasets for Machine Learning, Excel files, PDFs, even testing data files can be stored here. Storage Buckets are set up on a per-folder basis, and usually have the same permissions as the folder. Files are uploaded to them from within Orchestrator. Txt files can be read into a process, but most files must be downloaded for further processing depending on its type.
Queues are a special type of storage used for processing records of data. Queues are set up in Orchestrator with a specific schema. Once a queue is set up, a separate process called a Dispatcher is in charge of loading up the queue with the records. Then, another process called a Performer actually fetches these records from the queue to process them. One advantage of using Queues is built-in validation and error handling. Each record is tracked whether it has been processed successfully or not. Thus Queues are useful for large data sets, where something could go wrong, and can failures can be handled and reviewed. Queues can also be accessed by more than one machine, which can distribute the load of processing the items in the queue. NOTE: Queues are not to be used like databases, with updateable fields. They are intended as read-only, process-once then delete transactional processing.
For this post, let’s assume that the resources already exist in Orchestrator and let’s focus on the fetching activities used for each type. All these activities exist in the Core package.
Type | Activity | Returned As | Tips |
---|---|---|---|
Asset | Get Asset | String, Boolean, or Int as set in Orch | Easy place to store URLs and path strings |
Asset Credential | Get Credential | UID: String PWD: Secure String | |
Storage Bucket | Read Storage Text | String | for txt files |
Storage Bucket | Download Storage File | ILocalResource | In project create folder to hold downloads Put “foldername/filename.ext” in File name and location property Will overwrite Save result into variable varOut for later use varOut.LocalPath points to file location |
Queue | Get Queue Items | IEnumerable<QueueItem> | Filter by status (usually New) Returns first 100 records Does not change status automatically Use For Each to loop through result To get values: QItem.SpecificContent(“fieldname”) |
Queue | Get Transaction Item | QueueItem | Automatically sets status to “In Progress” To get specific record, queue must have unique Reference field To get values: QItem.SpecificContent(“fieldname”) After processing Set Transaction Status to “Successful” or “Failed” and put in the reason |
Document Understanding (AI)
UiPath.IntelligentOCR.Activities package
App Integration > Document Understanding