The Semantic Web has been talked about for more than a decade. Over those years, several mistaken or misleading ideas about the Semantic Web have repeatedly popped up. This lessons looks at some of the most pervasive of these misconceptions and discusses both the confusion and the reality of the situation.
After completing this lesson, you will know:
- If ontologies should always be reused.
- How Semantic Web relates to query federation.
- Whether you need up-front agreement on ontologies or vocabulary to be successful.
- Whether Semantic Web tools necessarily replace existing systems.
- The real relationship between the Semantic Web and Artificial Intelligence.
- How Semantic Web relates to natural-language processing.
As semantic technologies have begun to move more and more into the public sphere, questions have naturally arisen about what exactly Semantic Web technologies are, how they work, and how they might interact with existing technologies. Some of the suggested answers to these concerns are more accurate and helpful than others. Separating fact from fiction may help clarify our understanding of the topic. With that in mind, here is the fact behind some common Semantic Web misconceptions:
Ontology reuse is a double-edged sword. To be sure, being able to reuse others' work to carefully model and define concepts and relationships for a particular topic can definitely have great value in certain circumstances. However, unless the scope and granularity of the information with which you are working lines up almost precisely with the existing ontology, you will have to work to translate your world view to that of the existing ontology. If you are reusing a large ontology, you will most likely find that you have to wade through hundreds of classes and properties in which you are not interested in order to reuse even a small fraction of the ontology.
On the other hand, creating a new ontology from scratch is not necessarily a bad thing. The resulting ontology will be well-tailored to your specific use cases and will align well with the ways in which you wish to present your data. Your application will involve fewer layers of translation from your source data, and the new ontology is more likely to be a true model of the information. The biggest cost of not reusing an ontology is the cost of developing a new one; however, many tools are now available that will do much of the "heavy lifting" for you, particularly if you are starting with existing information in a database, spreadsheet, XML file, or some other structured source.
Keep in mind that at least one situation exists where you should definitely try to reuse an existing ontology: if you find that an ecosystem of 3rd-party tools is available that know how to access and display information for a particular ontology, then it would be best for you to reuse that ontology if at all possible. By doing so, you will be able to apply these tools to your data without any additional work.
Generally speaking, two approaches to ontology development are available: top-down and bottom-up.
In top-down development, you begin by getting agreement on the core concepts in your domain and then build out a single model, one likely to be agreed upon by most people who might use the ontology. Eventually, individual communities of users can specialize those top-down ontologies by extending the concepts within them to meet their particular needs. Top-down ontology development is appealing because if everyone agrees from the beginning, then everyone will be able to reuse the same concepts, and the resulting data and software will all work well together.
Unfortunately, top-down ontology development is usually not practical. For one thing, it requires that you get all of the people who will initially need to buy in to the ontology to the same table. Additionally, it means negotiating the usually delicate balance amongst the many vested and entrenched interests of various people and organizations, which have often invested significant time or money in their various conflicting world views. By the time all is said and done, top-down ontology development can be an expensive proposition that takes months or even years to complete.
Fortunately, Semantic Web ontology standards (such as RDFS and OWL) are designed to also be used in a bottom-up approach. Here, individual users or communities of users can each develop their own small ontologies that suit their current needs. Later, ontology developers can use mechanisms provided by the Semantic Web technology stack to bridge and relate various elements of competing ontologies whenever an application comes along that needs to integrate information that has been modeled using two different ontologies. In this way, bottom-up ontology development effectively amortizes the cost of smoothing over different world views and in doing so allows everyone's ontologies to be developed and used in a quicker and much more agile way.
The origins of this misconception are fairly easy to understand. A major focus of Semantic Web technologies is the attempt to make it possible to integrate heterogeneous data across many sources. Furthermore, information in the Semantic Web is identified by means of a URI. In addition, SPARQL—the query language of the Semantic Web—lets developers pick and choose what sources of information should be searched for the answers to a query. Therefore, it is somewhat natural to conclude that a fundamental characteristic of Semantic Web applications is that they access data via federated (or distributed) queries.
In reality, the choice of data technology (i.e., Semantic Web vs. relational vs. something else) and the choice of integration paradigm (i.e., federation/EII vs. warehouse/ETL vs. something in between) are independent. People can (and do) perform federated data access using relational technology. Moreover, people can (and do) build ETL pipelines that populate Semantic Web warehouses.
Generally speaking, a warehouse/ETL approach provides better interactive query performance, eliminates runtime complexity, and guarantees consistency between information from different data sources. A federated query approach, on the other hand, avoids copying any data prematurely and can preserve source data security contexts. In both cases, choosing a Semantic Web data model gives additional flexibility that simplifies the process of extending and refining the integrated data model.
The association between artificial intelligence and the Semantic Web has a long history. The scenarios put forth in the 2001 Scientific American article that introduced the Semantic Web to the world involved a level of automated decision-making that seemed straight out of an AI textbook. Discussions of ontologies, inference, and description logics merely added to the confusion.
However, to equate Semantic Web with AI is to focus on the semantic aspects while ignoring the Web. In reality, Semantic Web technologies are as much (if not more) about the data as they are about reasoning and logic. RDF, the foundational technology in the Semantic Web stack, is a flexible graph data model that does not involve logic or reasoning in any way. In fact, for many people and applications, RDF is all they need (one example of this scenario is the Linked Data community). Even the parts of the Semantic Web technology stack that deal with reasoning and inference are grounded in well-understood formal semantics and can usually be expressed via straightforward sets of rules. As such, they lack both the complexity and the opacity of artificial intelligence approaches that are based on machine learning and neural models.
For more on this topic, see Applying the Semantic Web: Two Camps.
The Semantic Web technology stack is designed to be non-disruptive. This family of technologies provides the flexibility and expressiveness required to integrate a variety of data from a number of different sources; they're not designed to replace existing transactional databases, CRM systems, or XML Web Services. Instead, Semantic Web solutions take an overlay approach that virtualizes information from existing (non-semantic) source systems, imports that information into the Semantic Web data model, and then links together information between various connected systems.
To this end, the Semantic Web technology stack includes standards explicitly developed to help map data in legacy systems to RDF:
- R2RML is a markup language that allows you to specify how to map data from a relational database schema to RDF.
- GRDDL is a standard for associating XML documents with transformations that can be automatically run to convert XML into RDF.
Just as some people mistakenly equate Semantic Web technologies with artificial intelligence, others expect that Semantic Web technologies are all about using text analytics to understand natural language. While a great number of reasons may exist for choosing Semantic Web technologies as a vehicle for implementing NLP solutions, the Semantic Web itself does not deal with unstructured content; instead, it is about representing not only structured data and links but also the meaning of the underlying concepts and relationships. More about the relationship between Semantic Web and natural language can be found in these two Semantic University lessons:
Note: These two articles will be published soon. Stay tuned!
- Semantic Web vs. Semantic Technologies.
- Semantic Web and NLP.
We hope this clears up many of the common misconceptions surrounding Semantic Web technologies.
If you feel that we're missing any, let us know.