As one of our previous blog articles already explained, our new stack is quite modern and based upon several cutting-edge technologies: JanusGraph graph database, RxJava 2, Vert.x based backend and Angular 4 frontend.
In addition, we always challenge whether we implement a service ourselves or integrate it, simply because it's not our core business or others already offer a solution which meets our requirements. Consequently, our stack contains services, such as identity providers, message brokers, relational databases, accounting & billing services, etc. from 3rd party vendors as well. On top, we design for the cloud and one of the core principles there is to be scalable, with no single infrastructure component or service that breaks the system because of too much load or simply because it's not available. Below, you can see a simplified system architecture for our Goto Cloud initiative that we currently deal with:
Thanks to these principles and ideas we are ready for the cloud, from an architectural and design perspective, but what does this mean for development, quality assurance, continuous integration, operations and production? All these areas, which are crucial for the success of an agile IT company, must deal with a lot more complexity because of our new heterogenous microservice stack. Our developers need a good amount of all these infrastructure services for proper development, quality assurance requires the same set for testing as do our continuous integration, performance and functional testing machines. Sure, you can optimize the setups for all these areas. So maybe you can skip some of the infrastructure components for particular scenarios, like e.g. ldap import for continuous performance tests. Nevertheless, all these new heterogenous services sound like a lot of work to manage throughout all stages and environments, right?
So the solution we and luckily already many others came up with is to code our infrastructure and the incarnation of this idea at Celum is Docker. Docker is a software container platform and helps to provide a consistent experience from development machines, throughout all continuous delivery pipelines and hosts to, last but not least, production environments on mainframes and cloud platforms. It's a powerful way of not only isolating the actual operating system from the requirements of the software application but also isolating software applications and their dependencies from each other on the same operating system. Applications can run side-by-side and therefore optimize the resource utilization on host operating systems and machines.
At Celum, we use Docker everywhere, and although we had a more or less tough start with it (more on this later) its usage nowadays is undisputed and we appreciate the huge benefits of this technology more and more every day. I'll present you a couple of ideas on how Docker makes our life easier here at Celum by presenting examples on how we use it in development, quality assurance, continuous integration & testing and finally production.
Our developers have, naturally, a strong focus on coding and infrastructure, to be honest, is a necessity but not (yet) the goal. So ideally infrastructure should work hassle-free and out-of-the-box. With Docker, we get a cross-platform container environment which provisions servers and runs integrations locally in the same way as they are run on production platforms. Therefore, Docker creates a consistent development experience across the Windows, Mac and Linux operating systems our teams are using and helps to reduce the necessary technology and process investments to support all platforms in development. A very popular example is the isolation of conversion tools for development: Here we went as far as isolating invocations of individual tools such as ImageMagick, FFmpeg, LibreOffice, etc. with Docker freeing the developers from the need of error-prone cross-platform installations of these applications.
The quality assurance team benefits as well from the technology abstraction, the unified configuration and the streamlined deployment processes introduced with Docker. The team doesn't have to care that much about the underlying technology stack anymore, because it knows about the simplified lifecycle how to start, stop and restart containers. Therefore, no matter whether the development team decided to use Spring, Vert.x, Node, .net or any other web application or microservice framework, the way how these services are operated is strongly unified. The same is true for deploying these applications which means pulling the image, starting the container and monitoring the status of the container instead of following a proprietary and often cumbersome installation procedure. As for configuration, best-practices were established how to configure applications and if followed consequently, configuration is simplified as well by allowing to override all configuration values on a container level.
Our continuous integration and testing environments require test execution coordinators, test slaves and provisioned servers. Due to the simple Docker lifecycle and the unified interface for interacting with these servers (starting, stopping, resetting), provisioning servers necessary for integration testing could be strongly simplified. Currently we more or less hardcode the services required for a set of integration tests but it's planned to move to handy annotations specifying the provisioning requirements for individual tests in the future.
Last but not least, production benefits from well-tested and proven deployment procedures and technologies resulting in less feedback rounds to development that something broke in production that already worked in development or QA. In addition, feedback from production is brought much faster back to development as the same setup already exists and doesn't have to be requested from customer site to reproduce bugs. In the past, we had troubles reproducing bugs in development as they were rooted in infrastructure components that often simply didn't exist in development. Nowadays every developer can add a load balancer, a ldap server, database or other infrastructure components to his stack without having to make an extra roundtrip to our IT department and asking for software and potentially hardware resources.
Finally, we could also reduce the time required for understanding the nitty-gritty details of the technology stacks of the applications that we operate at customer site or in-house, because of the abstractions, mapping and integration capabilities introduced with Docker.
There is one other thing that I want to mention here explicitly and that can never be strengthened enough:
Invest into (de facto) standards
It usually pays off because of the scaling effects that can be achieved. And in the case of Docker and its achievement of mainstreaming container technologies, it does. During the last couple of years many of the big players in the industry adopted the Docker container technology and built integrations for cloud services, orchestration and clustering tools which are free to use and often work out-of-the-box, similar to the thousands of available Docker images for existing services that can be added to your own stack easily.
So to be clear on this one: Docker is awesome. But what are now the things that gave us a hard time adopting it? I guess the very same reasons that make change hard in general, and for us we learnt that we can improve on communicating and teaching new technologies right from the beginning.
Whenever change happens some people are reluctant of appreciating or adopting it. And that's okay because change is not always good. When we introduced Docker there was also some resistance within our company and people immediately argued against it. We improved communication around the new technology and arranged presentations including very simple prototypes solving our problems in operations to better communicate the benefits of the new technology. Over time, people grasped and appreciated more and more the idea that Docker helps us to automate and simplify mundane manual tasks and finally frees up time to work on much more fulfilling and intellectual tasks. The important learning here is that change doesn’t simply happen from one day to the other no matter how convinced somebody is of a new technology. It requires steadiness and persistence in communicating the benefits and goals again and again.
Even Docker, trying to be as simple as possible, has to be understood to a certain degree to be useful in the development process. When we started with Docker a few of our early Docker adopters created the docker-compose.yml for our development team consisting of the infrastructure services that were needed for developing our new product line. The start was simple as well. People installed Docker Toolbox (and later Docker for Windows), ran 'docker-compose up -d' and started developing awesome new microservices. Shortly after, however, problems arose. Containers stopped unexpectedly, had to be restarted and data was lost. People started to find their own solutions or recipes to deal with the problems that occurred: Restarting the containers, restarting the docker services and even restarting the development machines. Of course, it didn't take long that the 'Docker Blocker' was coined for pointing out these kind of impediments. With hindsight, all these problems are explainable and could have been prevented if a basic set of Docker knowledge and principles (state is in containers if not mapped to host, increase memory settings for Docker for Windows especially if you want to use Cassandra or other memory-intense services) would have been taught and coached early.
If you decide to use 3rd party infrastructure or services instead of implementing those on your own, don't treat them as black boxes and appropriately study them. We decided to go with Cassandra, Keycloak, RabbitMQ, etc. and started with simple docker-compose files for creating containers for all these services. Now whenever something didn't work out with these services, the first thing that happened was that Docker was blamed in the daily standup. With hindsight, most of these problems were not caused by Docker but misconfigurations in the containerized services. The key learning here is that only because it’s very convenient to introduce and integrate new services via Docker, it doesn’t relieve, ideally the whole team, of understanding and studying the new technology.
To sum it, Celum benefits a lot by the introduction of Docker. Although there were some challenges that we had to master when it came to communicating, adopting, teaching and understanding this new technology, it helps us now throughout all our departments to unify our procedures and technologies, getting bugfixes back to production quicker, reducing infrastructure costs and increasing efficiency. Or as our head of QA stated this year, after being a little reluctant to adopt Docker in the beginning:
We should do everything with Docker in the future!
It feels good if you have to introduce necessary change, and eventually receive this feedback :-)
Although we already went great lengths with containerizing our applications and using container technologies throughout the whole stack, we have still many challenges ahead of us. I guess automation is one of these and while Docker already helps to simplify the provisioning of test slaves, setting up test environments for functional tests and running performance tests, there is still a gap between fully automated software development infrastructure and manual tasks that have to be executed. So the plan is to consequently push automation to reach a state where only adding new hardware is a manual step, any it's already foreseeable (or reality, but not yet for everything at Celum) that all hardware is simply provisioned by cloud services... but more on that in a later blog article :-)