At the beginning are the triples
An introduction to Dgraph core concepts through hands-on.
The hands-on examples are a way to better understand each concept by experiencing directly with Dgraph. They are not a substitute for product documentation.
Prerequisite
Get a Dgraph instance up and running
You can start in seconds by provisioning a Dgraph Cloud instance
In the Dgraph Cloud console, click Launch new backend
.
- Select a plan, cloud provider, and region that meets your requirements.
- Type a name for your Dgraph cloud instance.
- Click Launch
For this blog, we will work without schema (more on this in the next episode) : in the Dgraph cloud console, click Settings
and set Schema mode to flexible
.
Access Ratel UI
Ratel is a graphical data visualization tool.
On your cloud instance access “Ratel” in the left Menu.
Self-managed cluster
You can perform all steps using a local Learning Environment with a Dgraph instance, and Ratel UI running in docker containers.
Episode 1 – At the beginning are the triples
Dgraph is all about interconnected data. The W3C uses the term “Semantic Web” to refer to the Web of linked data and RDF is one the Semantic Web Standards for data interchange. The RDF 1.1 introduced a very simple yet powerful representation allowing structured and semi-structured data to be mixed, exposed, and shared across different applications. Its main focus is to name the relationships between things as well as the two ends of the link (this is usually referred to as a “triple”). Dgraph is using a simplified version of the standard. It is so simple that it takes the form of a line of text with 4 elements and a final dot, all separated by a space.
It is important to understand how powerful and transformative this simple approach is as it is one of the underlying principles of Dgraph.
Let’s look at an example.
<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .
The 4 elements of the notation are
- an identifier of a ‘thing’ we are talking about : the subject,
- a predicate,
- a literal value, or an identifier of another ‘thing’ : the object
- an optional list of characteristics associated with the predicate: the facets
Those lines could be read as
- There is a first thing, that we refer to as ‘jedi1’, having a
character_name
“Luke Skywalker”, - There is a second thing, that we refer to as ‘leia’, having a
character_name
“Leia”, - There is a thing that we refer to as ‘sith1’, having a
character_name
“Anakin”. Thecharacter_name
of ‘sith1’ has a characteristic ‘aka’ equal to “Darth Vador” and “villain” equal true. - The thing referred to as ‘sith1’ has a relation
has_for_child
with the thing referred to as ‘jedi1’. - The thing referred to as ‘sith1’ has a relation
has_for_child
with the thing referred to as ‘leia’.
Comments
-
We can see those simple lines as a list of facts. They represent certain information and knowledge (at one point in time it was even a revelation).
We will save those facts directly in Dgraph.
As you can store facts akaknowledge
in Dgraph as a graph, the term “knowledge graph” is sometimes used. -
We have used the term
thing
for the subject because nothing is enforcing a specific semantic for the subject. As a generic term, we prefer node or entity rather than thing. -
The notation
_:jedi1
. It is called ablank node
in the RDF specification. It is a temporary identifier of the node. It means that we don’t have a better way to reference the node we are talking about, but as we need to reference the same node in the next lines, as subject or object, we just refer to it as <_:jedi1> in this group of lines. -
The object part may be an entity <:sith1> <has_for_child> <:jedi1>. In that case it’s natural to see the predicate as a relationship.
The object part may be a literal value. <_:jedi1>“Luke Skywalker”.
In that case, we understand the predicate has an attribute of the subject node.
Let’s play with Dgraph
In Ratel Console, Select Mutate
tab
and enter
{
set {
<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true).
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .
}
}
and hit RUN
.
Check the JSON response tab:
{
"data": {
"code": "Success",
"message": "Done",
"queries": null,
"uids": {
"jedi1": "0x1",
"leia": "0x2",
"sith1": "0x3"
}
},
...
Dgraph has successfully saved the facts and it also tells us that it has given unique identifiers for the blank nodes that we have provided. We can use those identifiers to add or change facts about the entities.
Just copy the jedy1 identifier ( 0x01 in this case)
And run another mutation.
{
set {
<0x01> <eye_color> "blue".
}
}
It’s time to retrieve information from Dgraph using a query.
{
characters(func:has(character_name)) {
character_name @facets
eye_color
has_for_child { character_name }
}
}
This query can be understood as
- Build a list called ‘characters’ with all the entities having a predicate
character_name
. - Tell me the
character_name
of the found entities with all the attached characteristics (facets). - I know that such entities may have information about
eye_color
so give me that info too. - I’m also interested in the
has_for_child
predicate. If it exists it links to another entity and I want to know the character_name of that entity.
Select Query and copy-paste the request and hit RUN
.
Select the Graph
tab to display the result … Et voilà.
Your first graph shows 3 entities and two relations.
If needed, move the nodes in the visualization to better see the relation name.
Select Luke to display the panel with the attributes for this node.
As you are curious, click on the JSON tab, it displays a JSON format of the query response :
{
"data": {
"characters": [
{
"character_name": "Luke Skywalker",
"eye_color": "blue"
},
{
"character_name": "Leia"
},
{
"character_name|aka": "Darth Vador",
"character_name|vilain": true,
"character_name": "Anakin",
"has_for_child": [
{
"character_name": "Luke Skywalker"
},
{
"character_name": "Leia"
}
]
}
]
}
,...
We will dig into that later but the most remarkable point here is that the response has exactly the structure of the query. It makes it a very powerful tool for client applications as they always know the structure of the response even with dynamically created queries. This capability is referred to as being “declarative” : we declare what we are interested in.
Questions
What happened to my identifier _:jedi1
?
<_:jedi1>
was a temporary identifier. It is valid in the context of the transaction: all RDF lines in the same transaction referencing <_:jedi1> are referencing the same entity. Dgraph has generated a unique id for it and it was returned when we submitted the mutation. The ‘jedi1’ identifier is not saved by Dgraph.
You can easily decide to add a triple to the transaction to save the “fact” that jedi1 is an identifier for you.
So simply add
<_:jedi1> <identifier> "jedi1" .
Note: the convention is to use “xid” for external id as the predicate.
What if I run the mutation again ?
If you submit the mutation
{
set {
<_:sith1> <character_name> "Darth Vador".
<_:jedi1> <character_name> "Luke Skywalker" .
<_:sith1> <has_for_child> <_:jedi1> .
}
}
Again, Dgraph will see temporary identifiers and so will generate new entities with new internal ids for them. You may want to avoid creating duplicate information. In this case you will have to check the existence of the entities, using the external id or any criteria, before adding the information. This is done with an upsert mutation.
What we have learned.
- Dgraph handles data as a network of objects with materialized links between them. This makes Dgraph the preferred choice for managing highly interconnected data.
- One way to inject information into Dgraph is to simply describe facts in the form of RDF triples and to send a mutation request.
Dgraph extends the triples with the notion of ‘facets’ which are characteristics attached to a predicate. - Facts can be added, without declaring a schema, making Dgraph storage extremely flexible. We will see why one would eventually need a schema in the next episode.
- The concepts used in the RDF model: subject – predicate – object, translate naturally for humans in
entities
havingattributes
andrelations
between the entities. - An intuitive visualization is a graph with
nodes
andedges
: the kind of drawing we do when we sketch relations between things. - Dgraph offers a querying language to retrieve the knowledge stored in the graph DB with a predictive response in JSON format.
References
https://www.w3.org/TR/n-quads/
Photo by cottonbro studio