Google, Facebook, or Netflix already did it: They put on reactive and non-blocking software architectures in supporting highly efficient and scalable infrastructures. We at CELUM are one of the first to adopt the same principles for enterprise software applications – by combining a domain driven design, and our very own strongly typed and graph based data model. That’s how we did it:
Nowadays software is confronted with more of everything: users, data, files, connections, … - you name it. In classical enterprise architecture, the typical architecture is a blocking one, with all the typical problems, and adding more threads as the typical solution. Unfortunately, this model doesn’t scale as resources are wasted. For example switching threads who are hidden in the core of the operating system cost valuable processing time.
For the last 1.5 years, we at CELUM have been working on a completely new software architecture. The result is intriguing:
⦁ A common, statically typed data model, stored in an NoSQL graph database and directly accessible on both frontend and backend
⦁ Our very own, fluent query and modification language for this data model, available in both, TypeScript and Java
⦁ A CQRS based data flow throughout the architecture
⦁ Layered frontend architecture, combining Angular2 with the flux pattern
⦁ Loosely coupled reactive services and components, making heavy use of ReactiveX
⦁ A dockerized (auto!)-deployment for all parts
Thanks to this new architecture we can achieve much higher scalability, performance and resilience than with our classical Spring-Hibernate based architecture we had before. From now on, this will serve as the basis for all our products.
Let us take a closer look on the most prominent parts of our new architecture.
The Entity Data Model is the main cornerstone of our new architecture. Our products manage and maintain a highly structured and interconnected data model that needs to adapt itself dynamically to the specific needs of our customers. Therefore, upfront partitioning of the data model and distributing it among several micro-services would have a severe performance impact.
The basic idea is to query as much data from the model as is required by the use case. For example, to display the contents of a folder we need to get:
⦁ both the subfolders
⦁ the files in the parent folder,
⦁ the users having last modified the files,
⦁ the tags associated with the folders and files
⦁ the number of version of each file.
We store this data in a JanusGraph database with the Apache Cassandra backend. The different parts in the list above (we call them entities) are connected by relations with each other. E.g. the file and the user are connected by a “modifiedBy” relation, the files are connected to their parent folder by a “filesInFolder” relation. Using graph queries, we can get all that data in one single step, navigating the relations as needed for the specific use case.
There are already several mature graph databases out there. Thanks to Apache Tinkerpop, they come with a common query language called Gremlin. Still, we decided to put our own data abstraction on top of it: It gives us clearly defined entity and relations classes, which we use for documentation purposes. Thus it can generate code from it, and, last but not least, strong typing (we are big fans of strongly typed code, by the way).
The big benefit of our solution is the fluent query and modificationy API of our data model, available in Java and TypeScript.
The other big cornerstone of our new architecture is clearly its reactive and non-blocking design. We make heavy use of the ReactiveX standard, in form of RxJava (with big thanks to Netflix!) and RxJS. That allows us to keep a consistent design across our system.
On the backend we rely on vert.x, a superb reactive framework (think NodeJS on speed). It gives us:
⦁ reactive streams,
⦁ an eventbus bridging the different components
⦁ threads on a single machine
⦁ multiple instances on different servers
It even reaches into the frontend, allowing us to easily send messages back and forth. Thus you don’t have to distinguish, whether the request come from another backend service or from a frontend client!
On the frontend side we adopted Angular 2, however React appealed to us too. Thus we ended up combining both and implemented the infamous Flux dataflow pattern ourselves. TypeScript comes naturally with Angular 2, but as we are big fans of static typing anyway (yes, I already mentioned that), it was very welcome and could already prove its power several times.
ReactiveX and Angular also go along very well. With the reactive architecture our Entity Data Model and query mechanism is flexible: It is able to query complete subgraphs in one single step and – here it comes – additionally displays first results on the screen, while the rest of the data is still being fetched from the database. This gives you a really natural and responsive feeling when using our application.
All of that is now bundled up in a set of Docker containers, allowing for a lot of flexibility and different system architectures, but already helps us in our daily work. All auto deployments are done with Docker and starting a nightly deployment on a local machine is a matter of executing a docker-compose command.
Combining the Vert.x verticles, the maven modules, and the Docker containers was a key element. In our setup there is at max one verticle in a maven module. We can then put one or more verticles, including their non-verticle maven modules, into a Docker container. Thanks to the Vert.x cluster, the verticles will find each other, even if residing in different containers.
Unfortunately, a single blog post is by far too small for all the nitty gritty as well as cool features of our new architecture. Therefore, we want to make this a series, where our experts will give insights into the result of their work.
If you would like to contribute as well, just get in contact and tell us about it!
Or you are totally excited about the work like we are, send an application to firstname.lastname@example.org