What Makes a Good Semantic Web Application?

What Makes a Good Semantic Web Application?

Introduction

Semantic Web technologies can be used to solve almost any information problem, so a critical question to consider is when they should be used.

Objectives

After completing this lesson, you will know:

  • Several key characteristics of a Semantic Web application.
  • Some of the drawbacks to using Semantic Web technologies.
  • How to determine whether a specific application is a good candidate for being a Semantic Web application.
  • A set of questions to ask to help determine whether an application is a good target for Semantic Web technologies.

Prerequisites

Introduction to the Semantic Web

Semantic Web Misconceptions

Today's Lesson

As discussed in the Introduction to the Semantic Web, the three primary technology standards of the Semantic Web are RDF, SPARQL, and OWL. Let's first quickly restate what makes each of those standards unique as compared to other technologies.

RDF: The graph nature of this data model means that it is by nature open-ended, so new data and new relationships can always be added.

SPARQL: The distributed nature of these queries across data sources requires extremely flexible and powerful JOIN-like and dynamic translation capabilities.

OWL: This language is descriptive, as opposed to prescriptive, so ontologies are independent of the data that they describe—unlike in a traditional database schema, where the data described are determined directly.

By looking at these technologies in this light, a couple of important observations can be made. The first is that all three of these technologies are open-ended. RDF can accept new data. SPARQL can flexibly join and translate data on the fly, including data from new sources. Even the data model itself can be modified after the fact!

This open-endedness makes Semantic Web technologies a natural fit for agile development. More significantly, it makes Semantic Web technologies a natural fit for solving open-ended problems.

Open-ended Problems

A specific problem can be open-ended at many different points and in many different ways (e.g., volume of transactions, size of data, types of information, views on data, usages of data, types of users…). Thus, we need to be more explicit regarding for which specific kinds ofopenness such Semantic Web technologies are particularly well-suited:

Complete Data Model Unknown: You are not absolutely certain that you will not need more data at some point in the future than you think you need today. For example, you may know now that you need to track a couple of specific elements (e.g., Customers and Orders), but you also know that the marketing department wants Campaign reporting to join in on the project as well.

Complete Usage Model Unknown: You are not completely sure that you are aware of every possible view or report that all your users might need. Today, you know that you need to group by regions and sum total sales, but are you sure that you will never need to group by product family? And are you sure that the definition of "product family" or "region" won't change at some point in the future?

Complete User Base Unknown: You may think that you're building something for corporate marketing, but in reality something very close to it would also be useful to product management, and even development. Adding new usages means new kinds of data and new usages of data.

Semantic Web applications thrive in these circumstances. A detailed, technical comparison of Semantic Web technologies versus relational database technologies will make this distinction clear in a future lesson.

To sum this up:

The more open-ended a problem is, the more you will benefit from Semantic Web technologies.

Conversely, the more static and well-defined a problem is the less important Semantic Web technologies become. For example, consider a billing system that only records customer credit card data. At the moment, merchants are required to retain a certain amount of fundamental information regarding creditcards. In the modern world, however, the chances of these requirements changing is as close to zero as you can imagine. Consequently, these kinds of dataare best kept in a traditional relational database.

On the other hand, if you ever find yourself needing to combine those credit card data with other information, you might want to wrap that same relational database within SPARQL endpoints in order to give yourself significantly greater flexibility in consuming those data.

Unstructured Data: A Perfect Fit

One entire class of problems that, by its very nature, is open-ended is any problem that includes unstructured information—that is, documents (Word, PDF, Excel, etc.), web pages, scientific publications, news articles, and the like. If your application could benefit by including data from sources such as these, then you will almost certainly not be able to anticipate from the beginning everything that you might possibly want to extract from your sources.

Be sure to check out the lessons on Semantic Web and NLP and Semantic Web and Semantic Search for a deeper understanding of how Semantic Web technologies are a good fit for unstructured data problems.

Contraindications

Just because you have a great hammer does not mean that every problem in the world is a nail. The flexibility inherent in Semantic Web applications comes with some drawbacks, and sometimes a problem can be more efficiently solved using other tools.

The following are just a few potential technical characteristics of your application which would hamper the effectiveness of today's Semantic Web tools:

Data Scale: Although Franz broke the Trillion Triple mark in 2011, a single Semantic Web solution cannot yet store as much data as a relational data warehouse can. Some workarounds can possibly be employed to boost the scale of effectiveness (e.g., just-in-time datamarts and query federation, to name the two most common); however, these workarounds inevitably increase the complexity of any project, so in many cases they are not good solutions. The exact cutoff is hard to determine for any specific application, but a decent rule of thumb is that if your application will have more than the equivalent of 100 million rows of data (#triples = #rows * #columns), then you should be concerned about using Semantic technologies.

Update Transaction Volume: You would not want to use today's Semantic Web technologies for high volume transactional applications. By "high volume," we mean thousands and thousands of reads or writes per second to a single server. Semantic Web servers are not optimized for high volume writes. The traditional technology stack is optimized for high volumes of transactions, which, in theory, could be applied to Semantic Web tools, but no simple, out-of-the-box integrations have yet been developed for these high-volume solutions. That said, it is easy to use Semantic Web technologies such as SPARQL to consume the data and combine it with information from other systems.

Computational Scale: Semantic Web servers are not optimized for high-scale numeric computations on a huge amount of numeric data. If you are performing statistical calculations or aggregates on terabytes of numeric data, today's Semantic Web tools will not perform as well as the current best-of-breed, highly-tuned market alternatives. That said, it is easy to pull data from Semantic Web systems into traditional BI tools for calculation and visualization.

These caveats aside, one of the great benefits of Semantic Web solutions is that they arestorage agnostic. That is, if you use a relational database for the high-volume transactional server, there's nothing wrong with wrapping that database within SPARQL endpoints in order to integrate it with a broader Semantic Web application or strategy.

Similarly, if you have an existing data warehouse containing petabytes of data, keep it there! Just define OWL ontologies for the subsets of warehoused data that you would like to consume in your Semantic Web application.

Examples of Semantic Web Applications

This is a rich topic in and of itself, and so an entire lesson has been dedicated to it, in addition to other, real-world case studies.

Cheat Sheet Questionnaire: When to Consider Using Semantic Web Technologies

The more "yes" answers, the better! There is no strict and definite answer as to when Semantic Web tools are an perfect solutions for solving any one specific problem, but if you can answer Yes to all (or even most) of the following questions, then you're much more likely to be happy with a Semantic Web solution than if you answer No!

Does your use case involve documents and other forms of unstructured data?

Do you expect to add more kinds of data in the future?

Do you expect to add more views on the data in the future?

Do you expect to expand your application to require more kinds of users in the future?

Is the data scale less than petabytes?

Is the transaction volume modest? (e.g., hundreds versus tens of thousands of users)· Does your application require only modest numeric calculations?

Conclusion

Finally, remember that this range between relational databases to Semantic Web technologies is a continuum. A Semantic Web application can—and almost always does—incorporate data from relational databases, content management systems (CMS), and other complementary technologies. For that reason, take this advice as a collection of guidelines instead of a hard set of rules.

Next lesson: Applying the Semantic Web - Two Camps