Automating a multi-platform build
This is the second post in the series describing our development infrastructure. We started with describing the external goals related to multi-platform support. In the first post we also reasoned why we need to test on different platforms separately. We concluded the post with the fact that instead of “support[ing] as many configurations as possible”, a lot more feasible goal would be to “support as many users as possible”. From this customer-oriented goal we have now extracted internally used development goals:
- Automate builds. Plumbr release has to be built without any manual intervention.
- Automate tests. Plumbr release has to be verified automatically using different testing techniques ranging from unit to acceptance tests.
- Automate infrastructure. Build infrastructure has to be able to launch and destroy the server instances used to build and test Plumbr automatically.
- Provide transparency. Whether it is a functionality implemented or a bug fixed – we need to know the versions in which this change is present. If an exception is thrown, we need to be able to map the obfuscated stacktrace back to the actual source code using the correct version of the obfuscation map.
Filling those goals builds the foundation for a solid and verified release. So let’s dig in and see what have we done in order to achieve all this.
To give you some background – each Plumbr release consists of close to 20 different deliverables. Those deliverables consist of 10 different native binaries for different platforms alongside the java artifacts, such as the javaagent itself, internally used dashboards or the demo application. As you can imagine, building such a deliverable is neither a simple nor quick task. Due to the amount of dependencies and the complexity of it we are required to automate a lot. And we have applied several build automation techniques to reach to the desired one click build nirvana.
Let’s start by introducing the platforms used. If you recall, we had to support five different OS families with different processor architectures underneath:
- We have a completely normal Amazon EC2 running a recent Ubuntu distribution. All the Linux builds are happily running in this single box. So far so good. The rest of the gang is not so mainstream and boring though.
- We have a Mac Mini sitting in the office. This Mac brings a lot of bang for the 599 bucks we spent on it, running:
- A museum-grade SPARC running Solaris builds for the SPARC architecture. As the noise during the build resembles a jet-fighter we had to move this one to the separate room in the office to bear the sound.
This gang of two physical and five virtual machines is orchestrated by a Jenkins node responsible for starting and stopping the builds. Lets look into what a build consists of.
First step of the build is acquiring the source code from the version control. We are using a Bitbucket repository for source code management. The model in the repository is truly simple – all the development and stabilization for new releases is done in the default branch. This has been possible due to the small team size we have had so far. Looking from the build perspective, getting updates is as easy as monitoring a single branch for updates and pulling the updates upon discovery.
The next step is to build the native agent. We need this part of the platform to hook into the low-level JVM internals. As this is impossible to achieve in Java we had to write this part of the code in C. The native part of Plumbr is built with the help of a good old makefile, containing several conditional branches supplying suitable flags for different compilers and linkers. For example – we use gcc on Linux and Mac. On Microsoft platforms we use ‘cl’ which takes 29, I kid you not, command line parameters to compile a dll. Then we have bitness issue, meaning that we have to build 32 and 64 bit versions of our native code. So each of those five virtual machines is responsible for building two native libraries.
Now the build is ready for the Java modules to be assembled. The Java section of Plumbr consists of several modules, such as the agent itself, graphical user interface and the demo application shipped along with the distribution. All those will be built from the same repository with the help of the multi-project Gradle script. We have used Ant in our early days, but the complexity of evolving this XML mess forced us to switch to the richer alternative. Pity that we had not done this a year earlier.
After all this, we have finally compiled everything we need to run Plumbr. In this phase both the unit- and integration tests are run. Those tests are written using TestNG and are used to verify the correctness of the build.
Now is the time to obfuscate the generated code to prevent reverse engineering our superalogrithms. This is done by using Proguard. Obfuscation is in itself a simple process becoming complex only when you care about the preservation of original stacktraces. Throw in the need to support multiple versions all with their own obfuscation maps and you have another dimension in your build process to worry about. And you start feeling sympathetic for the guy nurturing the build.
All the build results are finally packaged into an ZIP file published to the Artifactory repository. The final ZIP consists of the Java agent we will later ship to our clients along with the two platform-specific binaries for each OS (a 32-bit and a 64-bit version).
Now is the time for Jenkins to start orchestrating acceptance tests. In short, acceptance tests are a set of applications being either deployed to an application server or run in standalone mode. Plumbr is then attached to the application and users are being emulated to verify that we are indeed able to find all known leaks in those applications.
With the acceptance tests present, we need to run those tests on different environments. Namely, more than 200 of them. The process involves launching a specific virtual machine, starting a pre-configured application server in it, deploying a test to the server and launching the simulation. All this thrown into a single sentence sounds simple. In reality we have sunk endless hours into both extracting the testcases to the test applications and configuring the machines to include different JDK’s and application servers. And we are still a long way from the goal of supporting our 200 required configurations – in the current form we cover just 50 most popular combinations.
When all the tests have succeeded, the distribution is made public in the form of the latest nightly build on the Plumbr download page.
Creating an official release goes through the exact same process. The only extra steps added are tagging the release in Bitbucket and publishing the built artifacts into the Artifactory production repository.
As simple as that. Has taken only about a man-year to create the aforementioned process. Moral of the story? Java applications are definitely a huge leap ahead in terms of cross-platform compliancy. But instead of Write Once, Run Anywhere concept you are better off with the Write Once, Test Everywhere approach. Or you will end up shipping code that makes your end user’s life miserable.
The post might seem familiar to those of of you who were participating in JavaOne Moscow this year. Indeed, you had a chance to hear me on stage with the same presentation. But for the rest of the ~8,310,700 who did not have a chance to be present I hope I was able to introduce some interesting concepts. If so, subscribe to our twitter feed to be notified about future posts.