Today in the vRad technology quest we set forth to understand Dev (or test) Environments.
Let’s jump right in.
vRad’s agility in the market is fueled by our ability to develop and test code – so naturally, all of that developing and testing needs environments to run in. We maintain more than 30 different dev and test environments at vRad, and it’s no small feat.
As usual, let’s start with a definition to frame our discussion for those unfamiliar:
Dev or Test Environment: A set of servers and infrastructure that is used by development and test teams that is separated from production. |
In production, we have servers and databases running in our datacenter. When images and patient demographics come in for processing, they run through this production environment. In order for us to test changes to the platform, we create copies of this production environment – these copies of production serve as our development and test environments. We use these environments for testing of all kinds, from automated unit tests to full end-to-end testing (pushing images into the system, assigning orders, reading orders, and delivering reports).
Several years ago, we found ourselves in a quandary: a single environment took a week to build. Environments were constantly different, out of sync, broken or simply unavailable. Engineers would spend hours trouble shooting an application on a server, only to find out that someone else had been tinkering with settings months ago - and forgot to set it back. This difficulty in managing our environments was not only expensive, but added risk to our releases, because we couldn’t be sure that our software changes would act the same in our test environment as they would in production.
We’ve spent several years improving our tools and processes in environment management. There are two key aspects to vRad test environments we’ll cover:
When a team begins a project, they request one or more environments. Each environment is comprised of a handful of application servers and a handful of database servers (all virtualized). Both environments run on hardware and networks specific to development – that is, they don’t impact the production build in any way.
We maintain two templates to handle these requests – one for application servers, and the other for database servers. These templates are used to generate individual servers and prepare them for usage.
The provisioning process itself is broken into two phases:
The first run takes about 20 minutes due to initial installations, but the second (idempotent) run takes less than 2 minutes. We run this second phase, the configuration phase, during each software deployment. This ensures that if a developer (or other user) goes out to an environment and tweaks settings, we reset them to our known baseline for each deployment.
Most of this provisioning process is straightforward – we rely primarily on the scripting language PowerShell, as well as interact with the servers and virtualization interface to complete the setup. One aspect that is unique to vRad is how we deal with our application container; Internet Information Services (IIS) on each server.
If you aren’t familiar with IIS, it’s similar to Tomcat or Apache. If you aren’t familiar with Tomcat or Apache:
IIS is an application built into Windows Servers that hosts websites and web applications. It allows us to host web services, ClickOnce applications, and our web portal applications. |
There are a number of ways to manage IIS – search “manage IIS” on Google and the first page will most likely have 3 – 4 different tools for managing IIS. To name a few, you can use VBS, PowerShell, AppCmd, REST APIs, xml tools, PowerShell modules, desired state configuration and several others. At vRad, consistency and reproducibility are incredibly important, so we chose to manage IIS in a way that ensures the configuration remains consistent – the key to managing thousands of configurations.
We store a copy of each server type’s base applicationhost.xml file with our source code. This file is an XML file that contains the complete configuration for IIS. During the provisioning and follow up server configuration scripts, we deploy this file to the server which resets the entire configuration to our desired, known state. Then, we change anything that is environment specific.
This management of IIS is custom built to ensure that we have granular control of our environments. It is essential for us to know that when we release a software revision to production, we’re aware of any changes to the environments we may need to consider.
The other key aspects to maintaining our development environments are database restores.
Database Restores: This is the process of taking a backup of a database in production and restoring it to a development or test environment. |
First, we start with a copy of production data. This data is replicated via SAN technology from production to our development environments.
A SAN is a “Storage Area Network” – basically, it’s a whole bunch of hard drives that are put together into a device and represented as one really big hard drive. |
There are hundreds of SAN vendors and different SANs provide different functionality. We happen to use Nimble SANs.
Once the data is replicated to the development SAN, the fun begins. The data is replicated and stored as “volumes”. A volume is like a virtual disk – think of it like the “C:\” drive on your computer. You probably have other drives or volumes – perhaps a D:\ drive that represents your DVDs or an “H” drive that represents a spot on the network. These volumes contain the copies of the databases from production that we need to use in order to create a development environment.
We take these replicated volumes and perform a zero copy clone (ZCC) to each of them.
“What’s a zero copy clone?”
Great question. This is a SAN technology that enables us to make a copy without actually copying anything. If you have a movie on your computer and you take 10 copies of that and put them on your computer in different folders, they take up a reasonable amount of space. And we all hate running out of hard drive space on our computers.
vRad’s databases are pretty large – larger than the hard drive on the device you’re using to read this – and we have about 20 different ones, with over 30 environments. So that’s a lot of development databases, and that would take up a lot of space.
A ZCC of a volume allows us to “share” space. So instead of copying that movie 10 times, the hard drive knows it is exactly the same movie and so it only takes a little bit extra space each time it is copied. Maybe it is a home movie and you want to edit it – that’s okay too. The ZCC technology will save changes to a separate location. This technology enables us to very quickly “copy” our entire set of databases for usage by a development environment.
Once these databases are created, we attach them to a running SQL Server instance – we use Microsoft SQL Server for technology. Then, we cleanse them of any unnecessary data and add in limited permissions so the project teams can access the data. This process takes about 20 minutes – many of our environments are restored nightly or multiple times a day, so that we always are in sync with what’s happening in the production environment.
These two processes – VM provisioning and database restoration using ZCC technology – take less than an hour.
Compare that to our old process: getting a ‘database restore’ to an environment to ensure consistency with production was an hours-long, complicated process that took huge amounts of space on our SANs and often failed near the end, needing to be completely restarted. As I said before, the process was completely un-manageable.
My job title has the word “DevOps” in it, so I am a bit biased, but this problem of environment stability was huge and was equating to a lot of lost time, confusion and inefficiency. Not only were we spending valuable engineering time tracking down environment issues, but the lack of environment consistency impacted our ability to drive timely updates. We believed strongly that getting a handle on our environments would help us code more efficiently.
Imagine if every day you came into your office, sat down to use your computer, and someone else had been using your office and computer all night. Nothing is quite the way you left it. You would still be able to adjust everything each morning and get your work done of course, but you would lose valuable time. This is similar to how environment management at vRad was prior to implementing more robust DevOps.
Today, the ability to quickly create a test environment, fill it with data, and then deploy applications to it has enabled vRad to develop faster and with a significant improvement in quality.
I hope you enjoyed this look into the vRad approach to managing dev environments; stay tuned for our next article on vRad Test Automation (#4). And remember, we’ll be tracking all 7 keys to unlocking a DevOps Culture, as they’re released, in the vRad Technology Quest Log.
Until our next adventure,
Brian (Bobby) Baker