Welcome back to the vRad Technology Quest Series. We’ve shared how vRad builds and deploys code (vRad Development Pipeline (#2)). This article is an in-depth look at our build automation strategy: how we build the system that builds the code. This is one of my favorite topics and I’ll be explaining some of the more technical aspects of the development pipeline.
vRad’s current build automation system is a toolkit called “BuildGuru”, and I’ll explain how it works (why it works that way) and how we leverage this toolkit. But first, let’s start by exploring where vRad was 5 years ago and the values we hold that led us to our current solutions.
Prior to creating BuildGuru around five years ago, vRad had several different toolchains for building, packaging, and deploying our software. Two of our subsystems, Biz and RIS, utilized a series of hand-crafted scripts based on NAnt and PowerShell; meanwhile, our PACS also utilized NAnt, but through a completely separate system.
Powershell: A scripting tool primarily used for interacting with Microsoft Windows. |
NAnt: A “build” language and engine for Microsoft that was based on a Java build engine called “Ant”. |
I don’t want to disparage any of these systems – they were lovingly built by great software engineers. Yet, software is iterative, so we strive to always look at how we can do things better. For the sake of simplicity, I’ll refer to the previous version of our tool chain as “NAnt scripts”.
The NAnt scripts took approximately two and a half hours to create a build package; an hour and a half if we ran both the Biz and RIS scripts simultaneously. We have more than 100 different components or modules that we build and package, so an hour and a half isn’t all that bad. (It’s challenging to put compile times into perspective because there are so many factors: but as an example, a compilation of a Linux system averages a few hours.)
The duration of a vRad System build is particularly important because we value packaging everything, every time (I’ll share more about why below). When a full package takes over an hour, it’s difficult to continually build them and, in turn, it is difficult to fix broken packages. Packages break for various reasons – perhaps a developer forgot to check in a file, or a developer had a syntax error in the programming language or other various reasons. This led to frequently requiring many attempts to create an initial Release Candidate.
A Release Candidate is a package of software that is ready for final testing prior to releasing to production. As testing occurs, if software bugs are found, they will be documented and fixed. When the bugs are fixed a new package will take place and that new package will become the new Release Candidate. |
We performed major releases only 3 – 4 times a year; when the time came to build a release candidate, the team would begin the build process to create the package and it would fail. The team would make updates to fix the source code and begin the process again. This would generally take quite a few iterations, resulting in an average of 3 days to create the first package for testing.
In addition to package duration, deployments were an additional hour and a half, creating a three hour grand total of deployment time.
In comparison to the NAnt scripts, BuildGuru can package and deploy our entire platform in 40 minutes (packaging takes 20 minutes and deployment takes 20 minutes)…that’s it. It’s also a single system, based on metadata that is simpler to maintain than the hand-crafted NAnt scripts.
We’ve come a long way.
In order to understand an organization’s build automation tools, understanding what the organization values in a development toolchain is fundamental. Let’s examine vRad’s values around development.
We believe that there is inherently less risk and less code cruft to always re-compile everything with each build.
Cruft is jargon for anything that is left over, redundant and getting in the way. |
This means that we do not rely on internal package feeds or a compiled library. For instance, Application “App1” might require an in-house library called “Lib1”. In turn, Lib1, might require multiple parts of our framework (which are also libraries) like LibF1, LibF2, LibF5 and LibF6. At package time, the source code for each module is recompiled; this means that we always have the newest libraries from the top tiers of our code down to the bottom tiers of our code. It also means we do not have much need for dependency management. For example, we do not have situations where App1 has LibF5 v2.1 and App2 has LibF5 v4.5; that simply never happens for us.
The major downside to this approach is that it increases testing needs. If we change LibF5 it might impact 100 applications. We mitigate this through test automation (covered in my last post: Smarter Test Automation (4)) and some internal tools that ensure traceability from changes to applications so we can verify that test scenarios are covered.
Another common question I get for this value is “doesn’t that make the build times a lot longer?” The answer was surprising to me when I first investigated – not really. Analysis shows that our 20-minute build (using BuildGuru) of our entire system could be shortened by only about 20 – 40 seconds if we converted from a build-everything model to a dependency model.
If you’ve read the articles about our environments and general processes, we have a fairly robust set of environments that we manage. We break things up into sub-systems (Biz Apps, PACS and RIS). During development, we deploy our full suite – all three sub-systems – to each stack; during releases, we deploy our three sub-systems on three separate days spread across a week and a half. We also test like that – a Biz Apps test environment will have Biz Apps updated, but the other sub-systems might not be if we aren’t deploying them before Biz Apps. Biz Apps might be at version v12.1 while RIS and PACS are at version v12.0 for instance.
For each sub-system, we deploy everything for a release (we do perform limited patches that are excluded from this rule). The reasons for this are similar to why we value building everything, every time.
It’s just simpler.
We don’t deal with a matrix of versions and we know everything is tested each time and that it works. And on the off-chance our engineers get called at 2 a.m. on Wednesday after a company party and need to investigate something, they aren’t stuck tracking down what version of software each component is and looking up that version of the code.
Most of our builds happen on a set of servers dedicated to building our software (we do about 6,000 full package builds each year, our development branch alone often sees 12 full package and deployment cycles each day). We go to great lengths to keep engineer PCs doing engineering and not compiling parts of the system they aren’t currently concerned with. That being said, we value the ability for developers to perform full builds if needed (or if they want to just for fun). Each engineer has the power of the package and deployment system on their laptops. Our engineering culture strongly values enabling our developers – we want them to feel comfortable changing all parts of our platform. This helps us drive innovation and make important large scale changes that keep our platform up-to-date.
A build system is typically comprised of three components:
At vRad, the build server and the trigger system aren’t particularly special. For the server itself, we use a single physical server with 64 cores, 65 GB and some solid-state drives (SSD) on the Peripheral Component Interconnect Express (PCIe) bus. Basically, a big hunk of hardware.
We use Atlassian Bamboo for a trigger mechanism. Like other such systems, Bamboo allows us to do scheduled triggers (triggers based on watching our code repository for changes) and manual (UI button) triggers.
SSDs use internal flash chips to house files, while hard disk drives (HDDs) use a physical, spinning disk to keep everything contained. The benefits of SSDs over their older HDD counterparts are numerous, including a more compact size, lower power requirements, and much faster speeds across the board. This means a computer will boot and launch programs faster. PCIe SSDs take it a step further, by using one of the highest bandwidth channels available for blindingly fast speeds. (Definition from How to Geek) |
The third component of our build system, BuildGuru, is a custom software application that generates MSBuild scripts based on application metadata.
For each module that we build, BuildGuru identifies about 20 different attributes. These attributes include which server to deploy to, what type of application the module is (web app, ClickOnce Windows app, SQL, etc.), what subsystem(s) the module belongs to and how to configure (or reconfigure) the module. These attributes help create groupings of applications. For instance, one group is all of our ClickOnce applications; another group is all of the applications that deployed during a PACS release.
BuildGuru automatically creates roll-up targets based on these various groupings. So, if I need to deploy all web applications in our environment I could make a call to do so – BuildGuru automatically knows which modules to deploy based on the metadata.
This is really important at vRad for two reasons.
The first is the size of our environments. Our production environment consists of nearly a hundred servers made up of five different application server types and four different database server types. Our development environments, which are smaller versions of production, are made up of nearly 300 different servers in total. And across those servers we deploy over 100 modules to each of our 30 environments – some modules to single servers, some to multiple servers of the same type and some modules go to multiple servers. Our management of these environments and our applications spread across them relies on being able to use BuildGuru to say “deploy web apps to the PACS server” without necessarily knowing what each individual web application is.
Second, the module list is constantly changing. We build new services, remove old web applications, or change configuration continually as we improve our platform and build new features. Keeping track of what application is what type and what server (or servers) it needs to belong to and the configuration associated with that can be really difficult. Since BuildGuru automatically builds the roll-up targets, we do not have to worry about remembering to add modules to the right place for them to get deployed, configured, and otherwise managed. This includes configuration. One of our most common usages of BuildGuru is to simply reconfigure our environment. Perhaps a developer wants to turn a configuration-based feature on: he or she simply needs to update our configuration master file and call “Reconfigure” and BuildGuru will reach out to the environment specified and update configurations for all modules required.
To discuss build systems in a Microsoft environment, it’s important to have a basic understanding of how Microsoft projects are structured. Each application has a file (often called a project file) that contains information about individual code files, other projects being referenced, 3rd party libraries and other details. A developer, working in Visual Studio, doesn’t generally load up a project file, however – instead she will load a solution file. A solution file is a collection of projects with some metadata describing the order to build the projects and a little bit of configuration information.
Quite a few Microsoft companies are able to leverage built-in Visual Studio tools to do builds and deploys without needing much, if any, additional tooling. In addition, many Microsoft shops utilize a product called Team Foundation Server (TFS) or the cloud-based version Visual Studio Team Service (VSTS). These products come with additional tools, including source code repositories and build engines.
Visual Studio and TFS come with built in functionality like BuildGuru. Often, this is solution-based. You might have a solution that has your “Store Front” application – maybe you called it StoreFront.sln. You can tell TFS to build, package, and deploy the web application (project) in that solution and TFS will compile the solution, package up (usually zip) the output of the compilation and then copy the files over to a web server.
We chose to build projects instead of solutions. This was not a trivial decision – most .Net/Microsoft build systems utilize solutions. We feel that solutions are fairly arbitrary – we have over 100 solutions in our code base at any given time and are quite often adding new solutions and deleting others to match the work we are doing. The build system itself does build solutions – but it doesn’t use them for packaging.
We chose MSBuild for the primary language of the build system. MSBuild is a declarative, XML based language similar to Ant or NAnt with quite a bit of extensibility (we use the MSBuild extension pack v4.0 to supplement). MSBuild is a Microsoft product and underlays quite a few systems like TFS. Our previous incarnations of build systems were primarily NAnt based with a bit of batch and PowerShell thrown in.
We rebuild our build system before every usage of it. Yes. That’s right. We build our build system before we use our build system to build our code. BuildGuru builds itself, essentially. You might cringe at this – that’s ok, we did too. BuildGuru is based on metadata that describes each of our applications. The metadata includes the name of the project, the friendly name of the application, the username associated, the type of application, the configuration file of the application and about 15 other pieces of information. BuildGuru takes this information in and produces MSBuild files that are then actually utilized as the “build system.”
At first, using BuildGuru to build itself was just a development technique–we found ourselves constantly iterating on our design decisions to handle new scenarios. Since MSBuild is declarative, there was frequently a lot of rework. We considered trying to build something outside of MSBuild instead of generating MSBuild – but it just seemed so 90s of a solution. Finally, we settled on BuildGuru building itself while we were developing it and then scraping the code generation as a backup plan. But in the end, it worked so well and made itself so flexible, we simply kept it as is; so BuildGuru builds itself prior to each usage.
BuildGuru is a DevOps Swiss Army Knife for vRad. It builds solutions, builds projects, builds SQL, builds, builds, builds. And it also deploys; it deploys web applications, services, SQL, windows applications, deploys, deploys, deploys. It also runs tests; it runs NUnit, FxCop, StyleCop, Ranorex Tests, all sorts of things.
I’ve shared that BuildGuru starts off with metadata about each application. From there, BuildGuru performs a whole bunch of code base searching. We have quite a large code base as you might imagine, so we optimized this to be pretty quick. The searching is divided up into different types of searches. We search for solution files, project files, NUnit files, SQL files and a few other things.
Let’s look at solutions first. I mentioned we don’t use solutions to package our code. We do, however, use them with Visual Studio to develop our software – they are the workspace a developer uses. BuildGuru fires up and searches the whole branch for all files of the extension “.sln”. It then generates MSBuild targets to compile every solution. This is important because solutions are arbitrary for us – a developer might create a solution for a project specific with a handful of applications and the corresponding dependent projects at any given time. We encourage this. When the developer checks that solution in, it’s important to us that it remains current. So we compile it on our continuous integration plan. This ensures that developers can keep working by doing a quick compilation check and validating that all solutions are working correctly. This is an opt-out mechanism, meaning that by default, all solutions are compiled and a developer must add exclusions to BuildGuru configuration in order to exclude a solution from being compiled.
The next search we have is for projects. We develop mostly in C#, so BuildGuru searches for all files with a “.csproj” file extension; it then compares those files with the applications configured in the metadata and if there is a match will generate targets based on that file.
Each searcher we have returns a list of “Target Providers”; these target providers do the actual MSBuild generation. So a searcher looking for project files will return about a dozen different handles for target providers for each project – a build provider, package provider, deployment provider, code analysis provider, etc.
Most other searches work similarly – we search for SQL, NUnit, File Repositories, etc. and return a list of target providers.
The target providers are nothing special – they generate the MSBuild based on the metadata of the project (or SQL or file repository or etc.).
When all the searching is done, BuildGuru writes out the MSBuild files using the target providers. Again, the targets themselves aren’t particularly important. The real power of BuildGuru is in the roll-ups. Remember, vRad has over a hundred different modules split across over 10 different server types (application servers, database servers, file repository servers, etc.) and there are nearly a dozen different types of modules involved.
BuildGuru takes all the metadata, including application type, deployment type and sub-system, and builds roll-up targets – targets that basically call the other targets. Let’s say a developer wants to deploy all Portal Web Applications – no problem, there’s a target for that. How about all web applications across the system? No problem, there’s a target for that. A full subsystem? Target for that. All applications that start with the letter C and have a file named ‘HelloWorld”? There’s not a target for that, but I can make you one in a few minutes.
vRad builds and deploys over a 100 modules across hundreds of servers in development and production on a daily basis. Our toolkit to do this, BuildGuru, is designed to make managing all of these modules and deployments simpler and more efficient. This toolkit has enabled us to create the development pipeline we described in vRad Development Pipeline (#2). The flexibility and adaptability that BuildGuru enables allows vRad to constantly build, deploy and test our software.
I hope you learned something from this look under the hood of our build system – thanks for joining me on my favorite leg of the journey! As always, stay tuned for the next post in the series on vRad’s Software Security (6); And remember, we’ll be tracking all 7 keys to unlocking a DevOps Culture, as they’re released, in the vRad Technology Quest Log.