W3C logo
slanted W3C logo
Cover page images (keys)

Semantic Web technologies:

Introduction and Survey


Lee Feigenbaum
Cambridge Semantics Inc.
W3C SPARQL Working Group Co-chair.

Adapted from a presentation by Lee Feigenbaum and Eric Prud'hommeaux at C-SHALS 2008. This work is licensed under a Creative Commons Attribution 3.0 License, with attribution to W3C.

Program

A Motivating Example: Drug discovery

The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question:

Find me genes involved in signal transduction that are related to pyramidal neurons.

General search: 223,000 hits, 0 results

223K responses for previous query

Domain-limited search: 2,580 results

2.5K responses for previous query

Specific database: Too many silos!

A Semantic Web Approach

Integrate disparate database...

A Semantic Web Approach (cont'd)

... so that one query ...

unrendered image of HLCS demo #2 query

A Semantic Web Approach (cont'd)

... (trivially) spans several DBs ...

SenseLab Database headings and representation graph

A Semantic Web Approach (cont'd)

... to yield cross-specialty information
Alzgene query results

What was the trick?

The Resource Description Framework (RDF)

arrow tail, body and head are l are subject, property and value.

Patient data

<?xml version="1.0"?>
<ClinicalDocument transformation="hl7-rim-to-pomr.xslt">
  <recordTarget>
    <patientRole>
      <patientPatient>
	<name>
	  <given>Henry</given>
	  <family>Levin</family>
	</name>
	<administrativeGenderCode code="M"/>
	<birthTime value="19320924"/>
      </patientPatient>
    </patientRole>
  </recordTarget>
  <component>
    <StructuredBody>
      <Observation>
	<code displayName="Cuff blood pressure"/>
	<effectiveTime value="200004071430"/>
	<targetSiteCode displayName="Left arm"/>
	<entryRelationship typeCode="COMP">
	  <Observation>
	    <effectiveTime value="200004071530"/>
	    <value value="132" unit="mm[Hg]"/>
	  </Observation>
	</entryRelationship>
      </Observation>
      <Observation>
	<code displayName="Cuff blood pressure"/>
	<effectiveTime value="200004071530"/>
	<targetSiteCode displayName="Left arm"/>
	<entryRelationship typeCode="COMP">
	  <Observation>
	    <code displayName="Systolic BP"/>
	    <effectiveTime value="200004071530"/>
	    <value value="135" unit="mm[Hg]"/>
	  </Observation>
	</entryRelationship>
	<entryRelationship typeCode="COMP">
	  <Observation>
	    <code displayName="Diastolic BP"/>
	    <effectiveTime value="200004071530"/>
	    <value value="88" unit="mm[Hg]"/>
	  </Observation>
	</entryRelationship>
      </Observation>
    </StructuredBody>
  </component>
</ClinicalDocument>

RDF is good for modeling all this data...

[unrendered SVG image of HL7 data in RDF]

...regardless of its source.

[unrendered SVG image of HL7 data in a relational database]

Simple statement

_:p   r:type galen:Patient .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

Add a property ...

_:p   r:type galen:Patient ;
      foaf:familyName "Levin" .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

... and another.

_:p   r:type galen:Patient ;
      foaf:familyName "Levin" ;
      foaf:givenName "Henry" .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

Statements about ...

_:p   r:type galen:Patient ;
      foaf:familyName "Levin" ;
      foaf:givenName "Henry" .
_:scr dc:date "2006-03-18T18:23"^^xsd:dateTime .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

... other objects ...

_:p   r:type galen:Patient ;
      foaf:familyName "Levin" ;
      foaf:givenName "Henry" .
_:scr dc:date "2006-03-18T18:23"^^xsd:dateTime ;
      edns:systolic "132"^^edns:mmHg .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

... connect in the fabric.

_:p   r:type galen:Patient ;
      foaf:familyName "Levin" ;
      foaf:givenName "Henry" .
_:c   edns:patient _:p ;
      edns:screeningBP _:scr .
_:scr dc:date "2006-03-18T18:23"^^xsd:dateTime ;
      edns:systolic "132"^^edns:mmHg ;
      edns:diastolic "86"^^edns:mmHg ;
      edns:posture snomed:_163035008 .

nodes and arcs drawing of RDF graph of Henry Levin's blood pressure

A First Look at Turtle

<http://thefigtrees.net/lee/id#lee> 
  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://thefigtrees.net/lee/id#lee> 
  <http://xmlns.com/foaf/0.1/name> "Lee Feigenbaum" .
<http://thefigtrees.net/lee/id#lee>
  <http://xmlns.com/foaf/0.1/homepage> <http://thefigtrees.net/lee/> .

... is more succinctly represented as:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://thefigtrees.net/lee/id#lee> rdf:type      foaf:Person ;
                                    foaf:name     "Lee Feigenbaum" ;
                                    foaf:homepage <http://thefigtrees.net/lee/> .

Patient Data in RDF

_:p1  a galen:Patient ;
      foaf:family_name "Levin" ;
      foaf:firstName "Henry" .

_:c1a edns:patient _:p1 ;
      edns:screeningBP [
      a cpr:clinical-examination ;
      dc:date "2000-04-07T15:30:00" ;
      edns:systolic [
          a galen:AbsoluteMeasurement ;
          ex:unit "mm[Hg]" ;
          r:value "132" ;
          skos:prefLabel "Systolic BP" 
      ] ;
      edns:diastolic [
          a galen:AbsoluteMeasurement ;
          ex:unit "mm[Hg]" ;
          r:value "86" ;
          skos:prefLabel "Diastolic BP"
      ] ;
      edns:location snomed:_66480008 ; # SNOMED:left arm
      edns:posture snomed:_163035008   # SNOMED:sitting      
   ] .
There is a blood-pressure examination of a patient named Henry Levin. The examination was on 7-April-2000 at 3:30pm and was conducted on the patient's left arm while he was sitting. The examination resulted in a systolic blood pressure measurement of 132 and a diastolic measurement of 86.

Introduction to SPARQL

Why SPARQL?

SPARQL is the query language of the Semantic Web. It lets us:

SELECTing variables

?artist?album?times_platinum
Michael JacksonThriller27
Led ZeppelinLed Zeppelin IV22
Pink FloydThe Wall22

Triple patterns

A triple pattern is an RDF triple that can have variables in any of the subject, predicate, or object positions.

Examples:

Simple query pattern

We can combine more than one triple pattern to retrieve multiple values and easily traverse an RDF graph:

GRAPH constraints

SPARQL lets us query different RDF graphs in a single query. Consider movie reviews:

Example Query: Henry Levin's Blood Pressure

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edns: <http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX galen: <http://www.co-ode.org/ontologies/galen#>
PREFIX r: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX snomed: <http://termhost.example/SNOMED/>

SELECT ?date ?sys ?dias ?position {
?p    r:type galen:Patient ;
      foaf:family_name "Levin" ;
      foaf:firstName "Henry" .
?c    edns:patient ?p ;
      edns:screeningBP ?scr .
?scr  dc:date ?date ;
      edns:systolic [ r:value ?sys ] ;
      edns:diastolic [ r:value ?dias ] ;
      edns:posture ?position .
} ORDER by ?date

The sample query can be run against this sample data.

Or try sparql.org with this query against this data.

Using GRDDL to get RDF from XML, XHTML

GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a way to boostrap RDF out of XML and in particular XHTML data by explicitly indicating transformations from RDF to XML. GRDDL relies on:

  1. Source Document: an XHTML or XML document which references at least one GRDDL transformation and hence licenses a GRDDL-aware agent to extract RDF.
  2. GRDDL-aware agent: a software agent able to identify GRDDL transformations and run them to extract RDF.
  3. GRDDL Transformation: an algorithm--usually expressed in XSLT--for getting RDF from a source document

GRDDLing HTML

GRDDL can extract RDF from both XML and (X)HTML.

PatientSystolic BPDiastolic BP
Henry Levin13286
.........
<html>
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Clinical Study 8B1a: Patient BP</title>
    <link rel="transformation" href="bp-html-to-pomr.xslt" />
  </head>
  ...

GRDDL-aware Querying

Some SPARQL engines can directly query GRDDL source documents.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edns: <http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX galen: <http://www.co-ode.org/ontologies/galen#>
PREFIX r: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?date ?sys ?dias ?location ?position {
?p    r:type galen:Patient ;
      foaf:family_name "Levin" ;
      foaf:firstName "Henry" .
?c    edns:patient ?p ;
      edns:screeningBP ?scr .
?scr  dc:date ?date ;
      edns:systolic [ r:value ?sys ] ;
      edns:diastolic [ r:value ?dias ] ;
      edns:location ?location ;
      edns:posture ?position .
}

The actual query against the actual XML source is more complex. You can try it live against Virtuoso with this live query.

UltraLink in Action

annotated screenshot of Novartis's UltraLink

Why RDFa?

RDFa Example: Chemicals

InChI is a textual identifier for chemical substances. Consider inchi.html:

<table>
<tr>
  <th>Familiar name</th><th>InChI</th>
</tr><tr>
  <td>Methane</td>
  <td about="http://example.org/methane" property="chem:inchi"
      xmlns:chem="http://www.blueobelisk.org/chemistryblogs/">
    InChI=1S/CH4/h1H4
  </td>
...

This RDFa encodes the single RDF triple:

<http://example.org/methane> chem:inchi "InChI=1S/CH4/h1H4" .

Using RDFa: In context

operator on inchi data in RDFa

See inchi.html.

Using RDFa: Query

There are various ways to query Web pages marked up with RDFa:

# Find propane's InChI string
PREFIX chem: <http://www.blueobelisk.org/chemistryblogs/>
PREFIX ex:   <http://example.org/>
SELECT ?inchi 
FROM <http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.w3.org/2008/Talks/0305-C-SHALS/inchi.html>
WHERE {
  ex:propane chem:inchi ?inchi .
}

You can try it against sparql.org.

RDF Schema (RDFS) modeling

Groups/Sets/Classes

[Named] collections of things with similar attributes.

Groups/Sets/Classes

[Named] collections of things with similar attributes.

Groups/Sets/Classes

[Named] collections of things with similar attributes.

OWL Expressivity

Allows you to define classes.

(a,b,c) set enumeration

union

disjunction

algebraics

  

intersection

complement

restriction

cardinality

equivalence

OWL: Identity

OWL: Types within context

:DrJones :specialty :Cancer .
⇒ :DrJones rdf:type :Oncologist .

And there's much more...