loading
Jetzt bewerben

Last time we gave an overview about our persistence stack. You learned that we use a statically typed data model that can be accessed via a statically typed query language on top of Apache TinkerPop. Also, the article covered the main steps involved in querying data from the front-end:

  1. Specify the query in the environment specific language (TypeScript or Java) in a type-safe way
  2. Translate the query into a language-agnostic and serializable Statement
  3. Sending the Statement to the back-end
  4. Transpile the Statement to the Gremlin language
  5. Execute the Gremlin traversal and push the results to the front-end in a reactive manner

This time, we focus only on the first step and take a look on how to express type-safe queries in our back-end code (Java). Thus, we will trade cute Gremlins for creepy Generics. Nevertheless, bear with me for one reason:

Type-safe queries pay off. During the implementation of real features, we learned that while the compile-time checks initially slow us down when writing code, they later speed us up by reducing debugging effort. Especially, these checks help us to uncover detrimental data model changes before we run any code. Finally, our fluent interface enables the IDE to provide code completion and to highlight type errors:

typeahead queries celum architecture

Specifically this article allows you to:

  • Get to know our Entity Data Model
  • Understand how Java Code interacts with our Entity Data Model
  • See how the Java Compiler verifies the correctness of a Query at compile-time
  • Learn about the limitations of our type-safety approach

Let’s start.

The Entity Data Model (EDM)

Our data model consists of Entities that are connected via Relations. Both have a unique ID and may have several attributes. They have a specific type that defines the available attributes on the Entity or the Relation:

celum entity model architecture

For the time being, Entities and Relations are directly mapped to Vertices and Edges within a JanusGraph database. However, among other features, we support abstract types and multi-inheritance: Our Marketing Product Management Solution features the well-known paradigm of folders and files. As folders and files have common attributes like name or creation date and support common use cases like moving or renaming, we reflected these facts in our data model.

For example, the name and creation date attributes are defined on the abstract type ContentItem from which both the concrete types File and Folder inherit. Additionally, Folder inherits from the abstract type ContainerElement to indicate that it may contain other items as well as have a parent container.

content container elements

We are now able to put a File into a Folder by creating a Relation of type ContainerHasElementsRelation that points from the Folder to the File. As this relation type mandates that its source must some sort of container, the compiler would complain if we tried to establish such a relation between two files. In fact, depending on how sophisticated the data model is, this allows us to detect semantic errors at compile time.

Model becomes Code

As our query language needs to leverage the model’s type information, we need to represent our Entity and Relation Types as Java Types. We call them EDM Type Descriptors. Their code is generated automatically via a code generator that utilizes the Eclipse Modelling Framework and Xpand. This allows developers to design data models graphically within Eclipse:

entity content container celum architecture

That being said, we do not look further into this code generation and assume our EDM Type Descriptors as given throughout this article.

Compile-time Verification

Let’s start by looking how the query language works with the types. Loading the items within a folder called Photos would look like this (simplified as we skip information about the executing user):


Query query = Query.fromEntitiesOfType(FolderType.instance()).restrict(R.attributeValue(FolderType.NAME,AttributeMatcher.equalTo(“Photos”))).followRelationsOfType(ContainerHasElementsRelationType.instance()).build();

In English, this reads:

  1. Start with all folders.
  2. Only proceed with folders that have an attribute called Name with the value “Photos”
  3. Navigate to the folders’ content via the ContainerHasElementsRelation type

The compiler needs to verify two aspects:

First, whether the query is syntactically correct: The query language uses chained method calls on Builder classes to implement a fluent interface: As each part of the query can define the available successor methods, a defined grammar emerges. This allows the compiler to detect queries that do not make sense structurally, such as navigating directly from a Relation to another Relation without visiting the source or target Entity.   

Second, whether the used EDM types fit together, namely that:

  1. a Folder actually can have a Name attribute (for Step 2)
  2. it is sensible to navigate away from a Folder via a ContainerHasElementsRelation which actually specifies ContentContainerType, a super type of Folder, as source
  3. the results of the query are actually instances of ContentElement

The following diagram visualizes these aspects:

celum architecture entity model

On the bottom are the builder classes with their fluent interface methods that govern the structure of the query. Except the static entry method on Query, our simple query only utilizes the EntityQueryBuilder.

In the middle (green) you see the static type information that is inferred for the fluent method chain at compile time: Each method takes generic parameters that must fulfil certain constraints (green arrows downwards) and passes on static type information to its successor (green arrows upwards). The diagram does not visualize the constraints for conciseness. We will look at them in detail in the following sections.

Finally, on the top is the actual data that resides within the graph database. At execution time, our runtime infers the static type information from the Steps that the builders generated. This ensures that the actual data matches the type constraints as well.

Now let’s decompose the query’s individual steps of our query:

1.     Starting with all Folders


Query.fromEntitiesOfType(FolderType.instance())

The signature of the method is:


<ENTITYTYPE extends AnyEntityType, ENTITYTYPEDEFINITION extends EntityTypeDefinition<ENTITYTYPEDEFINITION,?,ENTITYTYPE>> EntityQueryBuilder fromEntitiesOfType(ENTITYTYPEDEFINITION entityType);

We pass something we call an EDM Type Descriptor to the method. It allows us to refer to EDM types within our Java application. Also, the method mandates that several generic parameters need to match the EDM Type Descriptor for Folder. The relevant parts of the latter are


public interface FolderType extends ContainerElementType, TaskAttachableType{ […]  
public static FolderTypeDefinition instance() { […] }

class FolderTypeDefinition extends EntityTypeDefinition<FolderTypeDefinition,FolderDto,FolderType> implements FolderType {[…]}
}

The following figure illustrates the inheritance hierarchy around these definitions:

Type Definition

On the one hand, FolderType reveals that the EDM type Folder inherits from its super types via plain Java type inheritance: It inherits directly from ContainerElementType. In turn, as we follow the hierarchy, ContentItem inherits from AnyEntity. AnyEntity is the top level type of all entity types. Thus, FolderType matches the generic parameter ENTITYTYPE of the method signature.

On the other hand, there is an instance accessor method that returns an instance of FolderTypeDefinition. You may be puzzled why we introduce an interface and an inner class rather than just one class. This is easily explained when we contrast the demands of our EDM with the capabilities of Java:

First, our EDM supports multi-inheritance: Folder is a ContentItem as well as a TaskAttachable. In Java, a class can only inherit from a single class, but from multiple interfaces. Therefore, we model the inheritance aspect via interfaces altogether.

Second, FolderTypeDefinition inherits from an abstract class with generic parameters:


abstract class EntityTypeDefinition<T extends EntityTypeDefinition, DEFAULTDTO, TYPE extends AnyEntityType>

In Java, due to type erasure, classes or interfaces can only inherit once from a specific type that has generic parameters. This prevents us from letting our EDM Type Descriptors (e.g. FolderType) directly inherit from such a type, as in combination with our inheritance hierarchy, it would need to inherit from EntityTypeDefinition multiple types with different generic parameters. The generic parameters themselves entangle the EDM Type Descriptor with the type definition class and its data transfer object type (DEFAULTDTO) that queries can return as a result.

Looking back at the method signature of fromEntitiesOfType, we see that the compiler can now take advantage of this entanglement and infer that we work with FolderType when we pass FolderType.instance():

Thus, the next method operates on an EntityQueryBuilder that is typed to FolderType.

2.     Filtering by name

Now, let’s decompose the name matching part of the query:


Query.fromEntitiesOfType(FolderType.instance()).restrict(R.attributeValue(FolderType.NAME,AttributeMatcher.equalTo(“Photos”)))

First, it must be ensured that the attribute Name exists on the entity type Folder. Our code generation framework takes care of this as the attributes of an EDM Element Type become fields within the EDM Type Descriptor. These fields point to the according EDM Attribute Type Descriptor. Besides that, the generics for the Name Attribute Type Descriptor class define that the attribute’s value must be of type String:


public interface ContentItemType extends AnyEntityType {

    public static final Name NAME = new Name();
[…]
  public static class Name implements AttributeType<contentitemtype,string>
{ […] }</contentitemtype,string>

The signature of the restrict method looks like this (simplified):


public class EntityQueryBuilder<ENTITYTYPE extends AnyEntityType> […] {
public <OUTTYPE extends AnyEntityType> EntityQueryBuilder<OUTTYPE> restrict(Restrictionsuper ENTITYTYPE, OUTTYPE> restriction) {[…]} 

It accepts any restriction that takes an instance of the EDM Type Descriptor that the surrounding builder is typed to (namely ENTITYTYPE) or a super type of ENTITYTYPE. It returns a Builder typed to whatever static type the concrete restriction specifies (namely OUTTYPE). As determined by the earlier call of fromEntitiesOfType, in our case, ENTITYTYPE resolves to FolderType.

The signature of our attribute value restriction is as follows:


class R {
static <T extends AnyElementType, DATATYPE> Restriction<T, T> attributeValue(AttributeTypeT, DATATYPE> attributeType, AttributeMatcher<DATATYPE> matcher) {[…]}

As first argument, it accepts only an EDM Attribute Type Descriptor that connects the incoming element type with the given attribute data type. As second argument, it accepts any attribute matcher that works with the given attribute data type. The output type of the restriction is the same as the input type. The actual type inference looks like this and yields the fact that the restriction accepts as well as returns ContentItemType or one of its sub types:

slides celum architecture

Together with the surrounding restrict method, through type inference the compiler will only allow String attributes on FolderType or one of its super types:

celum architecture

3.     Navigating to the child elements

The last step of the query moves from the selected Folder to the folder’s contents:


Query.fromEntitiesOfType(FolderType.instance()).restrict(R.attributeValue(FolderType.NAME,AttributeMatcher.equalTo(“Photos”))).followRelationsOfType(ContainerHasElementsRelationType.instance())

The signature of this method is:


class EntityQueryBuilder<ENTITYTYPE extends AnyEntityType> […] {
 <RELATIONTYPEDEFINITION extends RelationTypeDefinition<RELATIONTYPEDEFINITION, ? super ENTITYTYPE, TARGETTYPE, ?, ?>, TARGETTYPE extends AnyEntityType> EntityQueryBuilder<TARGETTYPE> followRelationsOfType(
   
RELATIONTYPEDEFINITION relationType);
}

As you spotted the type RelationTypeDefinition, this will remind you of the query’s initial method with the type EntityTypeDefinition:



margin: 0px; line-height: normal;"><ENTITYTYPE extends AnyEntityType, ENTITYTYPEDEFINITION extends EntityTypeDefinition<ENTITYTYPEDEFINITION,?,ENTITYTYPE>> EntityQueryBuilder<ENTITYTYPE> fromEntitiesOfType(ENTITYTYPEDEFINITION entityType);#

Indeed, followRelationsOfType is similar but more complex: On the one hand, it also needs to consider the return type of the previous method. Thus, it will bind FolderType to the generic parameter ENTITYTYPE:


, TARGETTYPE extends AnyEntityType> EntityQueryBuilder followRelationsOfType(

    RELATIONTYPEDEFINITION relationType);
}

On the other hand, it needs to check that the relation type ContainerHasElements accepts Folder as source and to infer the relation’s target entity type. It does so by utilizing the generic parameters of the RelationTypeDefinition which entangles the relation’s EDM Type Descriptor with the source and target entity types:


abstract class RelationTypeDefinition<T extends RelationTypeDefinition<T,SOURCETYPE,TARGETTYPE,DEFAULTDTO,TYPE>,SOURCETYPE extends AnyEntityType, TARGETTYPE extends AnyEntityType, DEFAULTDTO, TYPE extends AnyRelationType>

Analogous to an Entity Type definition interface, the instance method of our ContainerHasElementsRelation EDM Type Descriptor will return a definition instance:


public static ContainerHasElementsRelationTypeDefinition instance() {
 return ContainerHasElementsRelationTypeDefinition.instance;
}
class ContainerHasElementsRelationTypeDefinition extends RelationTypeDefinition<containerhaselementsrelationtypedefinition,contentcontainertype,contentelementtype,containerhaselementsrelationdto,containerhaselementsrelationtype> </containerhaselementsrelationtypedefinition,contentcontainertype,contentelementtype,containerhaselementsrelationdto,containerhaselementsrelationtype>implements ContainerHasElementsRelationType {[…]
}

Considering the relation type’s source type, we can verify that ContentContainerType is a super type of FolderType. Likewise, the relation type’s target type resolves to ContentElementType. In combination, type inference identifies ContentElementType as the result type:

entity model celum software architecture

4.     Verifying the result type

Finally it’s time to build the query:

public Query<ENTITYTYPE> build() { […] }

As we already know from the previous builder method that ENTITYTYPE resolves to ContentElementType, the compiler can safely verify the final variable assignment:


Query query = Query.fromEntitiesOfType(FolderType.instance()).restrict(R.attributeValue(FolderType.NAME,AttributeMatcher.equalTo(“Photos”))).followRelationsOfType(ContainerHasElementsRelationType.instance()).build();

Limitations

I hope my article could convince you that the presented kind of type-safety is a good thing. However, our approach also has limitations:

One does not simply talk about Generic Mismatches

When compile errors occur, at first glance, we often think about bugs within our generic builder structure. Yet, in nearly all cases, it turned out that the compiler was right and our query was flawed. The problem is that it’s hard to understand exactly what needs to be changed if the compiler complains:

In the above example, the actual issue is that we traverse the relation in the wrong direction. As javac gets very secretive once it starts complaining, we mostly solve such compile errors by thinking through our query again from a semantic perspective. Nevertheless, we clearly prefer its veto over running code that does not yield the desired result.

Super is not superior

When using our language for more elaborate purposes (e.g. control constructs or writing into the database), we encounter situations where we would like an element to be a super type of another generic parameter and then pass on the element’s type to another method in the chain. In other words, changing the well-known construct in line 1 into the one in line 2:

 // works

// compile error

// works

As you can see, super does not allow to capture a parameter. Super only accepts wildcards. The only workaround we found is to express this inheritance relationship via a separate generic type that must be instantiated for each method call. The simplest application without additional parameters is our upcast:


// ENTITYTYPE is defined by the surrounding builder
<SUPERTYPE extends AnyEntityType> EntityQueryBuilder<SUPERTYPE> upcast(Upcast<ENTITYTYPE, SUPERTYPE> upcast);
public final class Upcast<FROM, TO> {
 
public static <SUB extends SUPER, SUPER extends AnyElementType> Upcast<SUB, SUPER> of() {
   
return new Upcast<>();
  }
}

The notable fact is that the compiler itself can infer the inheritance relationship once we instantiate our Upcast object and pass it to a method, e.g.



EntityQueryBuilder b = Query. fromEntitiesOfType(FileType.instance()).upcast(Upcast.of());

It is hard to explain our use cases without diving into more details. In the end, it boils down to two interrelated facts:

·        Some later fluent interface method calls need to refer to the specific super type again and thus, the super type must be captured

·        Methods that work on relations must not only consider the relation type hierarchy, but also the entity type hierarchy of the relation type’s source and target entity types

 

However, I mainly put this in as a solution idea for developers that have similar problems. If you are interested in further details or have a different solution idea – Feedback is always appreciated.

 

Conclusion

 

This article explained the benefits of a query language that ensures compile-time safety. It also gave an idea on how to design one and the limitations that come with it. Stay tuned for further news on our Entity Data Model.