Image-Based Automation and Strategies to Implement

Image Based Automation Cover

What is image-based automation, and how does it differ from object-based automation?

When it comes to identifying fields within automation, be it functional/regression tests or Robotic Process Automation (RPA), they are usually recognized via one or more programmatic properties. A basic example of this is the Login button that has the “type” property value of “button” and “name” property value of “Login”, the combination of these properties gives us a field that we can interact with. This method is often called object-based automation.


When it comes to image-based automation, the field is recognized based primarily off its image. In this case, the Login button would use an image capture of the field for the automation tool to recognize the field in order to interact with it.


Why would image-based automation be needed?

There are certain situations in which an object-based automation cannot effectively be used to identify the fields needed to be interacted with. Some examples of these situations include:

  • Application based on technology not supported by the automation tool.
  • Application contains fields that either are not easily identifiable or not accessible.
  • Application contains fields whose properties frequently change. This can be caused by development tools that automatically generate the code in a way that the field properties are not preserved.
  • A region of the application contains a multiple functional sub-region represented by a single field. An example is the application containing a map component.
  • Automation tool unable to access the application to interact with it programmatically (such as tool not being on the same environment as the application).

Leveraging an image-based automation approach can be used to overcome these issues when it comes to automation development.


It is important to note that the automation solution does not have to be just object-based or image-based. A combination of these methods is often used for the overall solution when necessary and based off the automation tool capabilities.


Tools with image-based automation capabilities

Thankfully, the industry has responded to the need for image-based automation, so a good number of existing tools will have this capability already included. Some tools, such as Eggplant, are built around mainly image-based automation. For open source tools such as Selenium, there exist libraries like Sikulix that makes image-based automation a possibility.


Below is a list of just some of tools in which image-based automation is an existing feature:

  • Unified Functional Testing (UFT)
  • TestComplete Ranorex
  • Eggplant Functional
  • Automate Anywhere
  • UiPath
  • Tosca
  • Katalon


Challenges when implementing image-based automation

While image-based automation’s biggest strength is that it only requires an image to recognize a field rather than a series of properties within the application code, it also brings its own set of challenges. It can be sensitive regarding the visual environment in which it executes. Some common challenges to consider are:


Different web-browsers and operating systems can render images differently. This can result in different images for different environments, and can be reduced by targeting specific environments that would be most critical to run against to limit potential impact.


Differences in the displays to run the automation against impacting the image. The two main factors that have the most impact are screen resolution and color settings. For PC devices this can be limited by having a consistent display setting across all devices. For mobile devices it would be good to target devices that would be used most by end users to limit the impact.


Same field appears multiple times in a single application screen (e.g. multiple Submit buttons within the application screen). This could be potentially limited by seeing if it is possible to make the field the only image match through a specific action, such as zooming in, or by further tuning the image to identify a specific occurrence.


Image of field in application changes. While changes to the code of the field will not impact the automation, any changes to the image will impact the field which likely would require the associated image to be updated.


Image does not initially appear on the screen. For automation to interact with a field based off the image, it needs to be visible on the screen. This may require development of ways to scroll or perform actions to make it visible.


Image is not a square or rectangle in shape. When images are captured, they generally are rendered in a shape of a square or rectangle as is seen when capturing an image through the snipping tool. Take the search results pin in google maps whose shape is not rectangular, if I try to snip around the whole object for Chicago Union Station (as indicated by the black square in image below):

Chicago Union Station Map Image


If we tried to use the same image to verify a different location such as Ogilvie/Union Ferry terminal it would fail as the image within the capture area would not match (see image below):

Ogilvie Station Maps Image


To address these challenges, there have been features developed by automation tools to assist with minimizing the impact of some of these issues. Image Processing techniques such as thresholding, similarity, edge-detection, downsizing, visual location in relation to other fields, etc. can be applied to allow similar images to match. Each automation tool has different features to assist in image recognition so please be sure to review documentation with a SME when researching a tool for image-based automation.


Given the above complexity, it is generally considered harder to develop and maintain cross-platform automation using image-based rather than object-based solutions, so it is generally best to develop image-based automation on a specific target environment(s).


Basic example of image-based automation

In a recent project for an electric utility company client, there was an application that handles their outage management with a map component as a major part of the user interactions within the application. Using a combination of object-based (items not on the map) and image-based (items on the map) automations, we were able to perform the needed task that could be reused within any script. Essentially we searched for the device, centered the device on the map, zoomed in, and highlighted the device using the objects on the application ribbon. This gave us a unique image on the map for us to verify the device was found and perform the necessary actions on the map. For new data scenarios only the specific data and device images need to be added; there is no need to create any additional code to handle the different data. We also structured the images to be stored in a shared location outside the automation tool (the tool allowed the use of external images if they were acceptable file type, in this case PNG).


In order to give an illustration of how this was achieved and to mimic the image repository, we’ll use a basic example with Google Maps. Like the client application we were testing, Google maps has a section where users can interact using object-based recognition (the side bar). In order to get the focus where we want to be, we are going to perform the following:


  1. Enter in the Search field “Union Station, South Canal Street, Chicago, IL” (object-based recognition)
  2. Click on Search button (object-based recognition)
  3. Maps page will refresh, and search results for the address will displayed with the red pin (image-based recognition)

Image Based Automation Map 1


If we use the image as outlined with the black square on point 3 in image above there could be a problem, as part of the image contains additional imagery in the map that could be different if we want to use the same image to verify search result regardless of the search location. To avoid this possibility, we could restrict the image to remove any areas that could change based off the difference in the map as shown below:

Image Based Automation Map 2


We could not only verify that the search result location is found, but there are also several icons displayed on the map that can be checked if one or more are present on the map. Below is an example of the image repository after capturing several generic images that could possibly be used as part of the automation:

Image Based Automation Example


Adding Data Scenarios to the example

Now that we have our automation built to perform a search task, let’s expand this task to be able to perform a search and verify the search location was found for these additional locations:

  • Brookfield Zoo
  • Lincoln Park Zoo
  • Navy Pier
  • United Center

If we use the “Search Found” image for every search, we will be able to run not only on just the locations added but also any location that returns a single location found.


What if the verification wants to ensure that the search results verified against a unique image for each search? We would then need to capture an image for each search results and ensure the code takes the search data and grabs the correct image for verification from the image repository. For our 5 locations this could look like the image examples below in the repository:

Image Based Automation


Remember the goal is for there to be a unique image (include part of the point icon and name of the location). You want to avoid potential external imagery that could cause our automation to fail such as:

  • Capturing imagery for other icons that could potentially change, such as Point of Interest
  • Capturing profile specific details such as time from work and time from home.
  • Additional information that could change such as a location showing as temporarily closed

The above images all contain issues as described in the bullet points. Now let’s recapture to exclude imagery that could change but still capture a unique image. See results in image below:

Image Based Automation


To add additional locations, the only thing that will need to be added is an image in the repository corresponding to the search location provided. There should be no need to update the code each time if the data and image name follow a specified pattern.


To summarize, this overview provided with an understanding on what image-based automation is, what are the differences and challenges compared to object-based automation, and an example of how such methods can be implemented.

Get in touch with Olenick to learn how we can support your Automation efforts.