Introducing SKOS

June 22, 2005

SKOS (Simple Knowledge Organization System), recently introduced by the W3C, is a model for expressing knowledge organization systems in a machine-understandable way, within the framework of the Semantic Web. The SKOS Core Vocabulary is an RDF (Resource Description Framework) application. Using RDF allows data to be linked and merged with other RDF data by Semantic Web applications. SKOS Core provides a model for expressing the basic structure and content of concept schemes, including thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries, and other types of controlled vocabulary. This article will provide some examples for using SKOS and discuss the general principles of building such knowledge bases.

Introduction

The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to process and integrate information available on the Web. The Semantic Web relies on XML's ability to define schemes and RDF's flexible approach to representing data. The next element required for the Semantic Web is OWL, the Web Ontology Language, which can formally describe—using, most commonly, a logical formalism known as Description Logic—the semantics of classes and properties used in Web documents.

OWL adds a layer of expressive power to RDF and provides powerful tools for defining complex conceptual structures, which can be used to generate, among other things, rich metadata. However, the class-oriented, logically precise modeling required to construct useful web ontologies is demanding in terms of expertise, effort, and therefore cost. In many cases this type of modeling may be unnecessary or unsuited to requirements. So there is a need for a language to express vocabularies of concepts for use in semantically rich metadata, which is powerful enough to support semantically enhanced search, but simple enough to be undemanding in terms of the cost and expertise required to use it.

The SKOS Core Vocabulary is a set of RDF properties and RDFS classes that can be used to express the content and structure of a concept scheme as an RDF graph.

As an example of the kind of structure SKOS was designed to represent, let's look at the example definition of the word "canals" from Alexandria Digital Library Thesaurus:

canals » A feature type category for places such as the Erie Canal.
Used for: » The category canals is used instead of any of the following.

canal bends
canalized streams
ditch mouths
ditches
drainage canals
drainage ditches
… more …

Broader Terms: hydrographic structures » Canals is a sub-type of "hydrographic structures."
Related Terms: » The following is a list of other categories related to canals (non-hierarchical relationships)

channels
locks
transportation features
tunnels

Scope Note: Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power.

Now let's represent this complex structure using the SKOS Core Vocabulary:

Figure #1. RDF graph representation of SKOS vocabulary entry--click image for full-size screen shot.

The corresponding machine-readable representation in RDF-XML (source code):

<rdf:RDF 

  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

  xmlns:skos="http://www.w3.org/2004/02/skos/core#">



  <skos:Concept rdf:about="http://www.my.com/#canals">

    <skos:definition>A feature type category for places 

	such as the Erie Canal</skos:definition>

    <skos:prefLabel>canals</skos:prefLabel>

    <skos:altLabel>canal bends</skos:altLabel>

    <skos:altLabel>canalized streams</skos:altLabel>

    <skos:altLabel>ditch mouths</skos:altLabel>

    <skos:altLabel>ditches</skos:altLabel>

    <skos:altLabel>drainage canals</skos:altLabel>

    <skos:altLabel>drainage ditches</skos:altLabel>

    <skos:broader rdf:resource="http://www.my.com/#hydrographic%20structures"/>

    <skos:related rdf:resource="http://www.my.com/#channels"/>

    <skos:related rdf:resource="http://www.my.com/#locks"/>

    <skos:related rdf:resource="http://www.my.com/#transportation%20features"/>

    <skos:related rdf:resource="http://www.my.com/#tunnels"/>

    <skos:scopeNote>Manmade waterway used by watercraft

    	or for drainage, irrigation, mining, or water 

	power</skos:scopeNote>

  </skos:Concept>

  

</rdf:RDF>

SKOS Concept Modeling and Labeling

The current edition of SKOS Vocabulary replaces the earlier SKOS Core 1.0 Guide published by the SWAD-Europe Thesaurus Activity. The origins and background of technologies preceding SKOS are well defined in a XTech 2005 Proceedings SKOS report, so let's skip history description and go directly to language definition.

Let's look at the RDF-XML more closely. The skos:Concept class says that a resource is a conceptual resource. This sounds vague, but according to the RDF Semantics standard: assertion is any expression which is claimed to be true; class is a general concept, category or classification; resource is an entity or anything in the universe. Actually skos:Concept is used to define an atomic conceptual resource. In the example above, the SKOS document defines a thesaurus entry for the entity "canals".

skos:Concept is not the only class available in SKOS. There are also other top-level classes:

skos:Collection is a meaningful collection of concepts. Labelled collections can be used with collectable semantic relation properties (skos:narrower), where you would like a set of concepts to be displayed under a node label in the hierarchy;
skos:CollectableProperty is a property which can be used with a skos:Collection;
skos:ConceptScheme is a set of concepts, optionally including statements about semantic relationships between those concepts. Thesauri, classification schemes, subject-heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary are all examples of concept schemes;
skos:OrderedCollection is an ordered collection of concepts, where both the grouping and the ordering are meaningful.

SKOS Core uses labeling properties to assign tokens to a resource, where the token is intended to denote the resource in natural language or other representations intended for human consumption. The skos:prefLabel and skos:altLabel properties allow you to assign preferred and alternative lexical labels to a resource. Under normal circumstances prefLabel and altLabel values can be considered synonyms. However, when labeling resources of type skos:Concept, it is not necessary to restrict preferred and alternative lexical labels to precise synonyms. For example, the following is valid:

<rdf:RDF 

  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

  xmlns:skos="http://www.w3.org/2004/02/skos/core#">



  <skos:Concept rdf:about="http://www.my.com/#good">

    <skos:prefLabel>good</skos:prefLabel>

    <skos:altLabel>bad</skos:altLabel>

  </skos:Concept>

</rdf:RDF>

Abbreviations and acronyms may also be used to label concepts, and the choice of whether to use them as preferred or alternative terms is unconstrained. However, misspelled words are normally included among the hidden labels. A hidden lexical label is a lexical label for a resource, where you would like that character string to be accessible to applications performing text-based indexing and search operations, but you would not like that label to be visible otherwise. To assign a hidden lexical label to a resource, use the skos:hiddenLabel property. The most common use of hidden labels is to include misspelled variants of other lexical labels. The value of the properties skos:prefLabel and skos:altLabel should be a plain literal. A plain literal is a character string with optional language tag, and the language tag may be used to restrict the scope of a lexical label to a particular language. The values permissible as language tags are given by RFC3066. Here's an example:

<rdf:RDF 

  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

  xmlns:skos="http://www.w3.org/2004/02/skos/core#">



  <skos:Concept rdf:about="http://www.my.com/#good">

    <skos:prefLabel xml:lang="en">good</skos:prefLabel>

    <skos:altLabel xml:lang="en">bad</skos:altLabel>

    <skos:prefLabel xml:lang="fr">bon</skos:prefLabel>

    <skos:altLabel xml:lang="fr">mauvais</skos:altLabel>

  </skos:Concept>

</rdf:RDF>

Symbolic labeling means labeling a concept with an image. To assign preferred and alternative symbolic labels to a concept, use the skos:prefSymbol and skos:altSymbol properties.

Adding Description of a Concept

There are eight properties you can use to add human-readable documentation to the description of a concept. The properties are skos:publicNote, skos:privateNote, skos:definition, skos:scopeNote, skos:example, skos:historyNote, skos:editorialNote and skos:changeNote. Descriptive notes for a concept can be public or private (skos:publicNote, skos:privateNote). Only skos:editorialNote and skos:changeNote are private notes, others are public. Thus a skos:definition is also a skos:publicNote, a skos:editorialNote is also a skos:privateNote and so on. To clarify the difference between skos:definition and skos:scopeNote, a definition should be an attempt to completely explain the meaning of a concept, whereas a scope note may consist of partial information about what is or is not included within the meaning (or scope) of a concept. To clarify the difference between a skos:historyNote and a skos:changeNote, a history note is a piece of information intended for users of the scheme, documenting significant changes to the meaning, form, or state of a concept, whereas a change note is intended for documenting fine-grained changes to a concept for the purposes of administration and management.

There are three recommended usage patterns for the SKOS Core documentation properties:

Documentation as an RDF Literal
Documentation as a Related Resource Description
Documentation as a Document Reference

An RDF Literal is the simplest pattern for using the SKOS Core documentation properties, where the property value (i.e. the object of the triple) is an RDF literal. This is the way we used it in our example SKOS document:


<skos:scopeNote>Manmade waterway used by watercraft or for

drainage, irrigation, mining, or water power</skos:scopeNote>

Actually this is a simplified example; presented in core RDF it will look a bit more complicated (using rdf:value tags). Related Resource Description allows you to structure documentation as a related resource description. Document Reference is a pattern that allows you to refer to documentation that is itself a document, via the URI of that document. For example,


<skos:scopeNote rdf:resource="http://www.my.com/note.txt"/>

Adding Relationships

The SKOS Core Vocabulary includes the following properties for asserting semantic relationships between concepts: skos:semanticRelation, skos:broader, skos:narrower and skos:related. In a property hierarchy semanticRelation is the top semantic relationship and others are children relationships. To assert that one concept is broader in meaning (i.e. more general) than another, where the scope (meaning) of one falls completely within the scope of the other, use the skos:broader property. To assert the inverse, that one concept is narrower in meaning (i.e. more specific) than another, use the skos:narrower property. This is how we used it in our example document:


<skos:Concept rdf:about="http://www.my.com/#canals">

    <skos:broader rdf:resource="http://www.my.com/#hydrographic%20structures"/>

</skos:Concept>

The properties skos:broader and skos:narrower are each other's inverse. Both the properties skos:broader and skos:narrower are transitive properties. To assert an associative relationship between two concepts, use the skos:related property:


<skos:Concept rdf:about="http://www.my.com/#canals">

    <skos:related rdf:resource="http://www.my.com/#channels"/>

    <skos:related rdf:resource="http://www.my.com/#locks"/>

</skos:Concept>

Collecting Concepts Together

You can create and define a meaningful group of concepts. However, meaningful collections of concepts are still unstable and can be changed in the future. SKOS Core has special vocabulary to handle collections. However, RDF has some generic vocabulary (rdf:Bag and rdf:Seq) to handle ordered and unordered groups of resources; while preparing a W3C Working Draft, there has been extended discussion in mailing lists as to whether these should be used. The choice has been made provisionally not to use rdf:Bag and rdf:Seq for this purpose. (See the explanation if you're curious.)

To define a meaningful collection of concepts, use the skos:Collection class and the skos:member property. To assign a lexical label to a collection, use the rdfs:label property. The most common use of a labelled collection is to enhance a hierarchical display. You can describe narrower and broader relationships between a concept and a collection. The class skos:CollectableProperty supports a generic mechanism by which collections can be involved in semantic relationships (and other sorts of statements). To define an ordered collection of concepts, use the skos:OrderedCollection class with the skos:memberList property. An ordered collection may also have a label (use rdfs:label). Ordered collections can be used with semantic relation properties in the same way as unordered collections (skos:OrderedCollection is a subclass of skos:Collection).

Usually concepts are defined in relation to other concepts, as part of an internally coherent concept scheme. As mentioned in the introduction, a concept scheme is defined here as a set of concepts, optionally including statements about semantic relationships between those concepts. The skos:ConceptScheme class allows you to assert that a resource is a concept scheme.