Big Data - Under the Hood

Hacking Data Migration with Selenium

By Steven Brew

Data migration is a laborious yet essential task when moving between discrete systems. This task involves the transfer of data between storage types or file formats and can vary in difficulty depending on how the data is stored. Data that is well organized and easily parsed can typically be transferred without issue. However, data is often in a format suited for an end user and is intended for a human to read. This information typically isn’t structured for parsing and may be too ambiguous for traditional forms of data transfer. A technique known as data scraping is often relied upon in this instance to extract data from a program via it’s human-readable output. Several data scraping programs can help in this endeavor, but finding that perfect solution for your data migration problem can be difficult.

In recent years many have looked upon Robotic Process Automation (RPA) to assist with data migration. A form of clerical process automation, RPA enables a user to configure computer software to automatically capture data. RPA utilizes the idea of a software “robot” operating within a virtual workstation that can mimic the actions of a human user. The primary difference between a RPA tool and a traditional data scraping tool is found in the way they are set up. Traditional automation scripts are typically programmed using a form of code-based instructions, requiring some level of technical knowledge. The “robots” within RPA are configured by recording and “learning” from a user’s actions – thus allowing a non-technical type to set up a test. In almost every domain a RPA program is a strong solution to data migration except for one, cost. Acquiring even a single RPA “robot” on it’s own server can easily cost a couple thousand dollars. Because of this the cost of a RPA can be prohibitive and perhaps an overkill for some data migration projects.

While working on my own project I stumbled across an unlikely tool – Selenium IDE. Data scraping with Selenium may initially seem unorthodox, but under certain circumstances Selenium has an edge over other established data scraping programs. Selenium is a great fit if you are looking for a simple yet cost effective application to take information from one UI and plug it in elsewhere via another system’s UI.

Well, what is Selenium?

An open source project started in 2006, Selenium is an automated testing suite consisting of four primary components – Selenium IDE, Selenium Webdriver, Selenium RC and Selenium Grid. All four components are typically used to create and administer automated tests for the purposes of quality assurance. Selenium IDE is the simplest automation tool of the entire Selenium suite and this simplicity most readily allows Selenium IDE to be repurposed for data scraping.

But why use Selenium IDE over a traditional data scraping tool?

First things first, Selenium IDE is a Firefox plugin which requires no technical experience to set up, and much like a RPA it doesn’t require high level technical skills to use. If you know basic HTML and know how to access the developer’s console then you can easily learn how to work with Selenium IDE. First you automate a test script using Selenium IDE’s GUI to simulate the data migration actions. The Selenium IDE test script is then saved as an HTML file which can be easily updated using any text editor. This allows for a more flexible script that is able to accommodate any changes to the targeted systems’ UI. Selenium IDE is also superior at automatically navigating through an elaborate UI to gather data – and it excels at taking this scraped data and inserting it into another separate web application via that system’s UI. Many traditional data scraping tools simply cannot match the level of ease nor the powerful automation capabilities offered by Selenium IDE.

Selenium IDE is ideal for quickly and easily scraping human oriented data from one web application’s UI and inserting it into another’s UI, especially when working with a legacy system that is difficult to interface with or has a clunky API. So next time data migration is on the agenda and you need a lightweight and easy solution for a data scraping task, give Selenium IDE a try!

Having issues setting up and running Selenium? Tune In to my next blog post where we will delve into setting up and creating scripts with Selenium!

Steven Brew is a Solutions Architect at Microshare, Inc.


Leave a Reply

%d bloggers like this: