Test Data Management in Test Automation

Data Management System graphic connecting different data points

Data is vital within Test Automation. It essentially is what drives automation and allows it to move.

Without data, an application would sit there useless. You cannot log into an application without a username and a password. A search cannot be performed without knowing the parameters of the search, and what to look out for in the results.


Data can be ever changing and growing. While the number of possible logins can never exceed the number of registered users, the number of potential searches (depending on the application) could be infinite.


In terms of test automation of a given application for login, some potential things may need to be considered in order to determine how much data would be needed:

  1.  Different types of users of an application to perform functions within an application (e.g. Agent, Customer)
  2.  If a unique user login is needed (e.g. a user can only be logged into one machine at a time)
  3.  If data access is limited for a given user (e.g. a customer will only be able to access data related to them)


Searching the automation would not search every possible combination, but a reasonable subset of data to perform the specific tasks that should be determined by testing working with the business stakeholders – but more on that later.


The idea is that while not every possible combination of data will be tested, the automation framework should be able to accommodate any modifications or additions to the data as needed.


Since data is an important part of Test Automation, having a good Test Data Management process established will help ensure automation is used to the maximum effect.


So, what is Test Data Management?

It is the administration of data necessary for the automated test processes to fulfill the needs for business. It should ensure the quality of the data and that the needed data is available at the correct time.


Regarding Test Automation, where would one start to ensure the implementation of proper Test Data Management for a given application? This starts with the development of the automation framework. To put it simply, when incorporating data into the automation framework, follow this one simple rule:

Keep data separated from code.


This separation basically creates a front end (which is represented by the data) and a back end (which is represented by all the code elements) for the automation framework. Additional benefits also include:

  1.  Ability to hand off data to a manual tester, business user, etc. for modifications when they do not have access to the automation code
  2.  Ability to create/modify test cases to regression test quickly once automation solution is deployed
  3.  Ability to leverage tools that assist in managing and maintaining the system, such as databases, as well as recover previous versions as needed without impacting code
  4.  A possible reference to show non-technical people what the test cases do


During the design of test automation for an application, it is important to work with the business stakeholders to determine how data will be handled within the scenarios.

The data can be stored in whatever file is best suited for the project, be it Excel, CSV, Database, etc. Just be sure that the following are covered:

  1.  The data is backed up regularly or version controlled, so that changes can be rolled back as needed in a timely fashion
  2.  Be certain that everyone who will interact with the data has proper access and knows how to utilize working with the data type involved
  3.  The code can interact with the data to bring it in during execution


When working to determine the structure of how your data file should be broken down, some things to consider when working as a team are:

  1.  Have a well-defined naming convention so it will be easy to reference column names to objects within the code
  2.  Possibly break data up by Page or Screen
  3.  If needed, further breakdown of data into logical sections within Page/Screen
  4.  Determine if there is any data that can be shared across test cases (Profile data)


Let’s look at a couple examples of test automation that was developed at a client, and how it worked with the business users to determine the data structure to apply the Test Data Management needed for each project.

Both applications are customer facing sites that service around 50 different companies active in production. The first application is an enrollment system that has the following:

  1.  Each Company can be customized to have products applicable to that company that display 80 different plan types
  2.  Complex back end system has rules and rates that can vary for each company
  3.  Upload process for adding employee data, existing coverage, dependents entered into the application for use of enrollment (Pre-Requisite to Test Cases)
  4.  Initial scope of automation was to test end-to-end enrollment flow for initial set of companies
  5.  Need to be able to support all active companies
  6.  Rate validation needed

The second application is a Pre-Enrollment system that serves the same 50 companies with:

  1. Only 4-8 data entry points to be entered in the application for each company
  2.  Displays the rates for Medical, Dental, and Vision for each company as applicable (rate validations can vary from 100 – 300 per test case)
  3.  Initial scope of automation is to be able to verify rates based off any combination of data for initial companies as well as verify specific site context (text, links, PDFs, etc.).

Both applications followed the same pattern for the structure of the data. The similarities include:

  1. Data structure for both applications was built using existing structure of data elements.
    • When looking into creating the automation data, it is necessary to look to see what is currently in existence for business. Note: Remember data creation should be something that can be handed over to manual testers or business/operations to create data so that Engineers can focus on code creation/modification.
  2.  Used Excel to store the data. Each company had its own file to store all the associated Test Cases.
  3.  Each file had various sheets for each logical section within the application for input data, validation data, etc.
  4.  Each company has its own data file to store all the Test Case data for each application.
  5.  Test Case flow is included in the data, indicating the flow of reusable modules to execute.


Below is an example of how the data breakdown could look for the enrollment system:

Test Structure Graphic

An important factor for both projects is the validation of rates, since rates can vary based on the changes to the data – both from the data file and rates already present in the system.

So, how do you ensure what values to input into the data sheet for rates? Should you just enter the scenario manually into the application and capture the rates the application generates? Absolutely not.


You want to make sure that rates are correct, even if you tested against a baselined code, it is best to get rates from a source external to the application that has been verified to be correct. If possible, getting the rates from the actuaries would be great – but again, we must keep in mind that the test data can change or more data scenarios can be added. It might be necessary to get access to the existing rate engines or develop a mechanized rater consolidating the rate requirements, so that data can just be entered, and the correct rates will be generated. Just remember that if an external method is created, a process for updating the requirements will need to be implemented to ensure rates are correct and data for test automation gets updated.


So, what were the end results of the initial implementation of our projects?

The first project (enrollment system): by focusing on 10% of the companies available, the automation and the data structure can support 60% of the companies active. For scenario creation, a process was developed to assist in building data scenarios due to the large amount data involved with the application. The structure of the automation data will not need to be changed, and many scenarios can be generated as needed without changes to the data file. Obtaining rate data leveraged the rating database and uploaded that data into the application database.


For the second project (pre-enrollment system): a mechanized rater was created for each company, so that based off any valid data criteria the rates to be displayed would be generated. This was used as input to the automation scripts. It was determined that for each company there were 88 combinations of data to fill the requirements for automation. Within the first month, one company was automated with 88 test cases running for the current year. By the end of the 3rd month, there were six companies for the current year for a total of 528 test cases and three companies for the next year’s sites with 264 test cases that were created.


In both projects, the benefits of spending the up-front effort to build well-structured data for automation included:

  1.  Increased confidence in data quality of automation test cases; ability to have business stakeholders create data to be fed into automation
  2.  Centralized location that can be queried as needed to capture data scenario coverage metrics
  3.  Ability to build added coverage as needed with little effort for data scenarios
  4.  Ability to add additional configuration after additional rollout was much faster; only the different structures need to be handled, which can be 60% or greater depending on the changes. This can lead to an even faster ROI for automation.


Reach out to Olenick to start a conversation on how Olenick’s Test Data Management services can improve your project.


Headshot image of Paul Patterson

Paul Patterson    

Related Content: Automation, Functional Test Automation, Quality & Testing, Test Data Management