RDF Nuts & Bolts

RDF Nuts & Bolts

Introduction

In RDF 101 we presented a conceptual introduction to RDF. Now we need to get down to brass tacks. What does "real" RDF look like? How do you create it? What vocabulary do you use?

In this lesson, we will continue where RDF 101 left off by providing the details surround RDF creation and usage.

Objectives

In this lesson you will learn:

  • The different RDF serializations
  • How to read and write real RDF
  • Some common RDF vocabularies to use when getting started
  • What to look for in a triple store

Prerequisites

Today's Lesson

In RDF 101 we presented the simple RDF graph on the right and used it to explain what RDF is on a conceptual level. Let's keep with it but take it a step down.

Instead of representing it visually, the two triples that make up that graph can be represented as:

But that's still conceptual. You couldn't send that text to a database or another program that understands RDF.

Real RDF™

RDF is an abstract data model. What this means is that there is no one concrete representation of RDF.

To understand what this means, consider XML. Everyone knows what XML looks like. You can open up a text editor, write some XML and feed it to any program that knows how to parse XML and you're golden. (Note: XML does have an abstract data model called the XML Infoset, but unless you're into XQuery you never use it.)

With RDF, there is not a single way to represent it. Instead there are several valid serializations for RDF data, including:

  • RDF/XML. This is simply RDF represented as valid XML. This was originally proposed and used due to the plethora of existing tools that could parse and store XML. RDF/XML is verbose and somewhat difficult to read and write as a human, though it can be read and written by just about any RDF tool, so you'll see it around. It's usually not the best serialization to use.
  • N-Triples. N-Triples is a very basic RDF serialization. Its key feature is that only one triple exists per line so that it's very quick to parse and so that Unix command-line tools can easily manipulate it. It's also highly compressible, so large, public RDF sources like DBpedia often publish data in N-Triples form.
  • Turtle. If you're writing RDF today, you're probably writing it in Turtle. Turtle is significantly more compact than RDF/XML, more readable than N-Triples, and lacks the first-order logic extensions from Notation3. Furthermore, the SPARQL query language expresses RDF queries in almost exactly the same way.
  • TriG. TriG is Turtle but with support for named graphs. It's the de facto standard for serializing RDF with named graphs.
  • RDFa (RDF embedded in HTML). You can embed RDF data within normal web pages by using RDFa. This is a very powerful technique that has been used by major companies such as Best Buy. We'll cover RDFa in a future lesson.
  • Notation3 (N3 for short). This is a largely legacy serialization that was originally proposed by Tim Berners-Lee in 1998. It extended RDF with a form of first-order logic that was never very popular. I'm including it here for completeness.

We're not going to cover RDF/XML or Notation3 in any depth; you can find examples or references of both elsewhere if necessary. They're included here because you are bound to see them in your educational journey learning Semantic Web technologies.

I'll give a quick example of N-Triples here before moving onto Turtle, which is what we'll use throughout the rest of our lessons. Here are the two triples in N-Triples:

<http://www.cambridgesemantics.com/people/about/rob>
       <http://xmlns.com/foaf/0.1/name>
       "Rob Gonzalez" .
<http://www.cambridgesemantics.com/people/about/rob>
       <http://xmlns.com/foaf/0.1/member >
       <http://www.cambridgesemantics.com/> .

Each line contains a single triple (Note: due to your browser width, we included newlines within the two triples above, which ironically makes our N-Triples example not valid N-Triples!), and each line ends with a period. Not much more to it.

Turtle feels like a very natural representation of RDF. The best way to get into it is by example, so let's translate our little RDF graph into Turtle:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix csi: <http://www.cambridgesemantics.com/> .
@prefix csipeople: <http://www.cambridgesemantics.com/people/about/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

csipeople:rob foaf:name "Rob Gonzalez"^^xsd:string .
csipeople:rob foaf:member csi: .

Let's break this down a little.

The first lines simply declare prefixes to use as a shorthand throughout the rest of the document so that you don't have to write full URIs everywhere. For example, you'll notice csi:, csipeople:, and foaf: are used in the triples themselves. If we didn't use prefixes, the lines would look exactly like the N-Triples lines above with URIs surrounded by <> (with the caveat that Turtle allows for a single triple to wrap across lines).

It's typical for a Turtle document to start with many declarations, and for people who write a lot of RDF to have a huge block that they simply copy & paste into every single document for convenience. For example, one set of prefixes I have is this:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dbp: <http://dbpedia.org/> .
@prefix dbpr: <http://dbpedia.org/resource/> .
@prefix dbpp: <http://dbpedia.org/property/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

A great resource to look up popular namespaces and is prefix.cc, a resource for RDF developers.

One more thing about namespaces. It's very typical to use a URL that starts with your company or organization's URL, as I have in this example, for any resources you create. You can break down a namespace into subspaces as well. In this example <http://www.cambridgesemantics.com/people/about/> is a namespace we use for people. How you organize your resources is really up to you; there are no hard and fast rules. That said, the Linked Open Data movement has some guidelines that I recommend that you follow, including that you should serve up useful data from URIs.

After the prefixes are the triples themselves.

Trivially, each line can contain a single triple, as we've done. Subject, predicate, and object are separated by whitespace. Literals' values are between quotes, optionally given a datatype (more on that in the next section). The triple ends with a period, and you're done.

Now, you can get a little fancier. If you're writing a set of statements about a single resource, you can combine them. For example:

csipeople:rob
      foaf:name "Rob Gonzalez"^^xsd:string ;
      foaf:member csi: .

The only difference is that only the last statement is ended by a period. The intermediate statements are ended by semicolons.

Last is the ^^xsd:string that comes after "Rob Gonzalez". This defines the datatype of the literal. RDF reuses the same datatypes that XML uses the XSD datatypes, including xsd:string, xsd:float, xsd:double, xsd:integer, and xsd:date. RDF can also contain custom datatypes that (you guessed it!) are simply named with a URI.

If you omit a datatype declaration it be considered as a plain literal by many RDF tools, which is not the same thing as a string. However, as of RDF 1.1 (still in development at the time of writing) this distinction is going away, so going forward you should be able to treat "Rob Gonzalez" and "Rob Gonzalez^^xsd:string as equivalent, and many tools already do.

Lastly, Turtle documents have the extension .ttl.

TriG and Named Graphs

When using named graphs, TriG is the de facto serialization. It's the same as Turtle except that statements in a single graph are grouped with {}. For example:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix csi: <http://www.cambridgesemantics.com/> .
@prefix csipeople: <http://www.cambridgesemantics.com/people/about/>
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://www.cambridgesemantics.com/semantic-university> .

:RobGraph {
      csipeople:rob foaf:name "Rob Gonzalez"^^xsd:string .
      csipeople:rob foaf:member csi: .
}

This trivial example puts all the statements in the document into a single named graph, egotistically called :RobGraph. Again, like all things in RDF, :RobGraph is a URI.

TriG documents have the extension .trig.

RDF Vocabulary

Now that you know how to create namespaces, resources, and write RDF in Turtle and TriG, the last major remaining topic is what to use for predicates.

One thing you can do is to simply make up your own predicates, just as you made up your own resources. In fact, as is presented in Semantic Web Misconceptions, there are valid arguments for creating new vocabularies even when there exist ones that you could use.

Vocabularies are defined using RDFS (RDF Schema) or OWL (Web Ontology Language), both of which we'll cover in upcoming lessons.

To get started, a few simple, common vocabularies include:

  • DC (Dublin Core)—standard, general-purpose vocabulary for describing resource metadata such as titles and creators.
  • FOAF (Friend of a Friend)—lots of vocabulary to relate people to each other and to organizations.
  • GoodRelations—ecommerce vocabulary.

We'll cover many more in lessons to come on RDFS and OWL.

Triple Stores

Now that you can create RDF, no doubt you want to store it somewhere. That's where a triple store comes in.

There are a variety of triple stores available that differ in performance, scalability, platform support, APIs and features. For example:

  • Some do reasoning out-of-the-box, others do not.
  • Most support transactions, but granularity varies.
  • Most support access control, but granularity varies.
  • Some focus on Big Data and scale over clusters of machines, others do not.
  • Some have analytic capabilities for doing aggregations and mathematical calculations to create charts and graphs. Most do not.

So there is no "best" store, and no clear "Oracle of RDF" for now. In fact, both Oracle and IBM DB2 have basic support for RDF today in their flagship products.

So which to choose?

Wikipedia lists a variety of stores, including Apache Jena, Sesame, Virtuoso, AllegroGraph, BigData, OWLIM, Stardog, and more. For getting started just look around and pick the one that you're most excited about. Make sure to look out for good getting started guides, but otherwise don't stress too much about your choice. When you get further along and have a better idea what features matter for your purposes, you can always switch to another store. That's the beauty of RDF; it's standard!

Triple Stores or Semantic Web Platforms?

I'd like to put in one last thought before concluding this lesson.

Working with raw RDF can be difficult for real-world applications given the relative lack of design patterns in existence for working with RDF. For this reason, for enterprise applications I would recommend going with a larger platform that provides more tooling and high-level data management capabilities than pure triple stores.

Let me illustrate why.

Think about named graphs for a minute. Let's say that you have an RDF database with a billion triples. Questions arise:

  • How do you manage transactions (what do you lock when adding or removing a single triple? Think about a block of triples representing a profile using FOAF, for example.)?
  • Access control (who can see what triples)?
  • Versioning?

Named graphs are a good first step towards this, but they complicate queries. It's expensive to have your SPARQL queries go over an entire database, so you need some kind of heuristic determining which triples go into which named graphs. How do you do this? The big enterprise platforms (full disclosure: Cambridge Semantics, for which I am an employee, is one such vendor) all have some answer to these questions, and, in general, for real applications you don't want to be reinventing the wheel here.

That all said, when you're learning the technical space, for smaller projects, for standing up Linked Open Data on web servers, or for embedding RDF technologies into a larger application a standalone triple store might be exactly what you need.

Conclusion

This is a basic primer to RDF, including its common serializations, but there is much more to learn. Upcoming lessons will teach you how to query RDF using SPARQL, and how to define data models in RDF using RDF and OWL.

Next lesson: XSD Datatype Cheat Sheet