Semantic Web on the Web

Introduction

The Semantic Web on the World Wide Web is all about data. On the Web, all of the other pieces of the Semantic Web technology stack—ontologies, query, reasoning—take a back seat to publishing and consuming structured data. This lesson examines some of the more prominent examples of people and organizations publishing and consuming structured data on the Web.

Objectives

After completing this lesson, you will know:

  • At least five domains where Semantic Web data are currently being published on the World Wide Web.
  • How Semantic Web data are being used on the Web in several different fields such as science, eCommerce, and government.
  • How the Semantic Web is improving Web search results today.

Prerequisites

Today's Lesson

In the well-known 2001 Scientific American article that introduced much of the world to the Semantic Web, Tim Berners-Lee, Jim Hendler, and Ora Lasilla put forth a vision of interlinked data on the Web that would usher in a world of personalized, automated agents that could order prescriptions for us, book dinner reservations for us, arrange for school pick-ups, and more, all using semantic data available on the Web.

In the years following that groundbreaking treatise, thousands of people from industry, academia, government, and the W3C have come together to publish standards, develop best practices, and build tools all working towards bringing this vision to life.

Let's consider some of the more prominent examples of people and organizations publishing and consuming structured data on the Web.

Publishing Data on the Web

In 2007, the W3C Semantic Web Education and Outreach group started the Linking Open Data project to promote the publication of data on the Web. The project started with Tim Berners-Lee's proposal of linked data principles, and the effort quickly gained momentum as an advocacy and education effort aimed at encouraging Web publishers to make their content available as RDF data.

In a 2009 TED talk, Tim Berners-Lee gave an impassioned plea for people to publish structured data on the Web.

 

Today, Semantic Web data on the Web spans a large and growing number disciplines.

Social Data. The friend-of-a-friend (FOAF) project allows people to publish simple biographical information about themselves and their friends on personal websites. These data have been picked up in the past by social network sites such as LiveJournal and hi5. Drupal, a popular Web content management package that runs over 1% of the world's websites, automatically publishes Semantic Web data related to anyone who uses it. Since 2011, Facebook has used its Open Graph Protocol to encourage people to publish Semantic Web data as the primary way of integrating any 3rd-party website with Facebook.

Scientific data. Scientific data—particularly in the life sciences—makes up a significant portion of the Semantic Web data on the Web. Data on proteins and genes, pathways and sequences, chemistry and genetics, and much more already exist and are actively being put to use. Two major initiatives are currently promoting the availability of scientific data on the Semantic Web:

  • Bio2RDF: The Bio2RDF project began in 2005 and is today one of the largest providers of interlinked biological Semantic Web data. The project includes over fifty life sciences data sets involving information on chemical, gene expression, protein interaction, pathway, drug, disease, genome, sequence, and professional literature.
  • W3C Semantic Web in Healthcare and Life Sciences Interest Group (HCLS IG): Via two BioRDF and Linking Open Drug Data (LODD) task forces--both of which welcome both industry and academic participants--this W3C group has already published significant amounts of life sciences data on the Semantic Web. The LODD group catalogs the results of its efforts to publish disease, drug, and clinical trials data on a wiki page. The BioRDF group has put together a Semantic Web life sciences knowledgebase, which contains biomedical data that has been assembled as part of the NeuroCommons project and includes data from MeSH, Medine, and the NCBI.

GoodRelations Logo

eCommerce Data. Online retailers are eagerly taking up the mantle of structured data on the Web in order to make details of their product inventories more readily accessible to search engines. The most common way this is done is by using the GoodRelations vocabulary to add Semantic Web data to existing product Web pages. Retailers publishing eCommerce Semantic Web data include Best Buy, Sears, Kmart, O'Reilly publishing, and Overstock.com.

Government data. In 2010, the national government of the United Kingdom of Great Britain and Northern Ireland became the first of its kind to publish significant amounts of public information on the Web as Semantic Web data. Since January 2010, the UK has published 2,500 large Semantic Web data sets. Furthermore, the United States government publishes its own structured public data at data.gov and has recently begun including a significant number of Semantic Web data sets in this collection.

DBpedia Logo DBPedia. As one of the major hubs of structured data on the Web, DBPedia deserves its own entry in this list. DBPedia is a repository of Semantic Web data extracted from Wikipedia. It includes information from Wikipedia infoboxes (i.e., the boxes with summary information about a topic that appear on the top-right corner of a Wikipedia article) as well as article titles, abstracts, and classification data based on Wikipedia's many categories. Just as Wikipedia has emerged as a popular central location on the World Wide Web for finding general information on notable topics, DBPedia is a core resource on the Semantic Web for identifying and providing structured information about any topic already covered in Wikipedia. Many other sets of Semantic Web data link to data from DBPedia, making it a key link in connecting previously unrelated information on the Web.

Consuming Data on the Web

Broadly speaking, people are consuming this wealth of structured data on the Web today in two ways. First, they are building specific applications that consume and combine Semantic Web data on the Web for specific purposes. Second, major search engines are harvesting Semantic Web data from the Web in order to augment and enrich search results.

The quantity of Semantic Web data on the Web represents a significant amount of discoverable, interlinked data that are being incorporated into point solutions and applications, both on the Web and inside enterprises. In 2010, Tim Berners-Lee returned to TED to discuss some of these uses of structured Web data.

 

Perhaps the most visible use of Semantic Web data on the Web, however, is seen in three popular Web search engines: Google, Yahoo!, and Bing. These sites are reading the Semantic Web data embedded within Web pages and using those data to provide richer search results for their users. On Google, for example, this means that if a product's Web page happens to be marked up with Semantic Web data, then its search result listing may very well include information on reviews, ratings, pricing, and inventory.

The schema.org project—a joint effort of Google, Yahoo!, and Microsoft—documents the concepts and attributes that these three search engines will look for when Web publishers include Semantic Web data on their sites.

While the inclusion of this sort of rich data does not affect the overall ranking of a Web page within search results, it can have a significant impact on overall traffic in terms of how many people click a particular item in the search results. A few months after publishing Semantic Web data for their product inventory, Best Buy noted a 30% increase in overall organic search results and a 15% increase in how many people were clicking their product pages within search results.

Conclusion

Today, Semantic Web technologies are used for a wide range of applications, both on the Web and internally within various organizations. While Semantic Web usage on the Web is mostly about publishing and consuming data, Semantic Web technologies are also being applied within enterprises for a much broader set of applications.

About the Author

Lee Feigenbaum Bio
Lee Feigenbaum Bio
Co-Founder & VP, Cambridge Semantics & Co-chair, W3C SPARQL Working Group
On Twitter: