RDF vs. XML
Say the words “semi-structured data” and almost every technologist you talk to will think “XML” and not “RDF.” Tell one about RDF vs. XML and he’ll reply, “Why do I need something else? I already have XML for that!”
However, as a growing Semantic Web technology expert, you will patiently smirk and know that XML is simply not a substitute for RDF, and that it is very poorly suited for the distributed data linking that RDF is meant to do.
This lesson will compare and contrast XML and RDF and make recommendations on when to use one vs. the other.
RDF and XML both attempt to address the problem of enabling different programs and different computers to communicate effectively with each other. In its own way, each takes an important step towards a universal lingua franca for data.
This similar goal of creating a means for any system to communicate with any other is the basis for the confusion. However, there is more to it than that.
A Serialization Format vs. a Data Model
There are, broadly speaking, two problems when parsing a file or data sent over a network. The first is simply being able to read the data in—to translate the series of bytes on disc into logical data. The second is to do something intelligent with that data, such as display it in on the screen. XML solves the first of those problems, and RDF solves the second of them.
This brings us to the major foundational difference between XML and RDF. XML is primarily a serialization format (we’ll define this in a little more detail in a minute), while RDF is primarily a data model. From the beginning they are meant to serve two distinct purposes.
RDF vs. XML: An Analogy
Consider the book A Christmas Carol (which, by the way, really is excellent and absolutely deserves to have 100 different film adaptations).
You can purchase it in paperback or in hardcover. You can purchase it as part of a Dickens collection or on its own. I read it by having chunks emailed to me every single day, and you might read it on your Kindle or iPad.
Yet every one of those formats is still somehow the same book. The fact that it can be paper or electricity doesn’t fundamentally change the book itself.
For example, let’s say that I have two copies of A Christmas Carol: one in braille and one in regular print. Are they the same book?
From the point of view of RDF they absolutely are the same book. The book’s meaning is what matters in RDF. The information represented by RDF retains its self-same meaning regardless of its underlying format. If you save RDF file in Turtle or RDF/XML it’s still the same information. Braille or print: it’s the same book.
From the point of view of XML they are not the same book. A person who cannot read braille cannot consume one of the two. The representation is what matters in the XML world.
In this analogy, RDF represents the informational content of the book; XML is a choice of delivery mechanism (Braille or print). Both parts matter, for sure, but they are two different things.
XML: Meant for Serialization
A serialization format is a way to encode information so that when it’s passed between machines it can be parsed. In fact, the popularity of XML is due to its addressing the problem of too many file formats. For years, the first thing any programmer would do when creating a new program (for image editing, word processing, data storage…anything at all!) would be to create a way to save its data to disc.
The challenge was that any other program that wanted to read the file would have to special code for reading just that file in. Remember back before Word was the absolute dominant word processor? There were dozens of programs, and not all of them could read each other’s files. Even different versions of Microsoft Word couldn’t read each other’s files!
Aside: Now, the technically astute know that XML itself has multiple serializations. Microsoft Word, for example, serializes XML in a binary format, whereas most XML is serialized as text. For our purposes we’re ignoring this nuance since it doesn’t affect the overall point.
RDF: A Data Model
RDF, in contrast, is a data model, which is an abstract set of rules for representing information. That, unfortunately, is not a great definition, so let’s make it clearer by making some more analogies!
- As in the book example, the serialization is they physical format of data, while the data model is the way to represent the book’s inherent meaning.
- The serialization is like the grammar of a language, while the data model is the informational content behind words.
- The serialization is the word “green” spoken aloud in English, while the data model is a way to define the concept of “Green” such that it is unambiguous whether you say “Green” or “Verde” or think about the color of a leaf.
In RDF 101 you saw how RDF is used to define objects and concepts and relationships between them, so hopefully this makes sense. If not, it might be worth re-skimming that lesson.
Simply put, in the RDF world, it doesn’t matter how you send the data over the wire. Popular RDF serializations include:
- RDFa (RDF embedded in HTML)
See the last one? There is a way to represent RDF in XML! That is, if you have a parser that can read XML, it can read RDF/XML! There is no better proof that the two are not competing ideas than the existence of RDF/XML in the first place.
Comparing the XML Technology Stack to the Semantic Web Technology Stack
There is more to a data model than the model itself. How you interact with it (the query language) and how to describe it (the schema language) are incredibly important aspects in terms of practical usage.
The Semantic Web is a set of technologies for representing, storing, and querying data. XML too has a family of related technologies for representing, storing, and querying data.
This lesson is specifically focusing on RDF vs. XML since that is a specific topic that seems to come up very often. Other lessons will compare SPARQL to XQuery, and OWL to XML Schema.
To sum up:
- XML is concerned with serialization
- RDF is concerned with informational content
Thus the two technologies, though related, address two distinct problems. The existence of RDF/XML itself proves that they are not meant to compete.