4.1.4 RDF: A Three Minute Summary

By mchoate
Last modified: 2006-09-04 12:59:37

RDF (Resource Description Framework) is an XML specification for documenting metadata. In a nutshell, an RDF file is a collection of statements about some resource (a resource is anything with a URI) and these statements can be formally expressed in XML. Each statement comprises three elements - a resource name, a property name and a property value, also known as the subject, predicate and object respectively. For example, if I want to say that this article was written by Mark Choate, the three terms would be something like this:

  1. Subject: This article

  2. Predicate: isWrittenBy

  3. Object: Mark Choate

This three-part statement is often called a "Triple" for obvious reasons, and this is the basic unit of RDF. (Technically, RDF is the XML serialization of an RDF Graf). The subject can be anything that has a URI on the Internet. Predicates are draw from RDF Schema, a set of XML terms that allows us to group resources into classes and properties. The reason predicates must be drawn from a Schema is because it insures we are all in understanding of what we mean by the predicate. The subject is unambiguous, since it consists of a URI that points uniquely at that particular resource. The predicate is much more ambiguous, so defining it clearly in Schema helps to make it as unambiguous as the subject. The object can either be another resource, or some literal value - an example would be a quantitative value. There are times when we need to restrict the terms used in the object, and for this we use a controlled vocabulary, which is nothing more than a list of terms that we are allowed to use (strictly speaking, this isn't RDF, but it plays a role in it, and PRISM has defined an RDF Schema for controlled vocabularies).

The Resource Description Framework (RDF) is a framework for describing things that can be found on the Internet. More specifically, it's a way of describing things that have specific addresses on the Internet, such as individual web pages, sites, movies, audio files and anything else you can think of. The reason that RDF was developed was to make it easier to find things online - the thinking was that if we get better at describing what these things ( called "resources" in RDF ) are, then we can more readily find them when we are looking for them. Everything that has an address on the Internet is considered a "resource". A "resource" is analogous to a noun - think of it simply as a thing. Since nouns and things always have properties - resources have properties, too.

For an example, let's start with a simple thing - a ball. A ball's shape is round. It's filled with air. It's also usually made of plastic, but it can also be leather. Balls can be any color. Balls can belong to one person and not to another (and that person can take their ball home with them whenever they like). All of these ideas are properties of "balls". If we look closely at these examples, we see that there are basically three units to any description:

  1. The thing being described.

  2. The name of the property of the thing we are interested in.

  3. The value of the property.

So, when I say that a ball's shape is round, here are the three componants of that description:

  1. The Thing: Ball

  2. Property: The shape of the ball.

  3. Value: round

RDF uses special terminlogy for these three componants. The first, we have already mentioned, is a "resource". Anything with an address on the Internet is a resource, and describing resources is what we do with RDF. The second componant is the "property", and it is more formally known as the "predicate" in RDF. The final component, the value of the property, is know as the "object". So, in RDF, we have three basic units of information - resource, predicate and object - and these three units, when considered as a whole, are called a statement (which is why they use terms like "predicate" and "object" to describe them). In addition to being called a "statement", they are also sometimes referred to simply as a "triple" or an "RDF triple"....get it, there's three of them, so they call it a "triple".

Very clever, no?

This is the point where many people begin to get confused and the confusion stems, in part, from how rich our language is when compared with RDF - there are a lot of different ways to describe a ball's shape in English and in many of them the property being described is implicit and understood through context. Compare these three sentences:

  1. Balls are spherical.

  2. The shape of balls is spherical.

  3. Balls are sphere shaped.

When we say, "balls are spherical" we are referring to the shape of a ball, but we are doing so implicity, because we know that the word "spherical" typically only applies to the shape of things and not to something else, like the color of things or the odor of things. In the case of RDF, we need to make this obvious fact explicit. All three of these sentences mean the same thing - they describe a property of a ball. The property itself is the ball's shape, and the value of the property is "spherical."

Now, let's muddy things up and say: "Balls are planet-shaped." A more precise way to say this is that "balls are the same shape that planets are shaped." Therefore, if we know what shape planet's are, then we know what shape balls are. Both balls and planets are things with spherical shapes ( remember that "resources" are similar to "things" - RDF helps us to say similar things about resources - in this case, balls and planets both share the property of a spherical shape. If we know the shape of planets, but not the shape of balls, we can now infer that balls are spherical from this statement).

At the simplest level, then, rdf is simply a way to describe attributes of documents (or, more broadly, media). In some cases, additional attributes can be inferred based upon the attributes explicitly described. The key is to insure that we all mean the same thing by the word "shape" and "spherical". In order for RDF to do that, then properties need to be explicitly defined. And since we're dealing with the world of the Internet, these properties will be defined on the Internet, which means that they will have an address and, since we know that resources are things which are addressable on the Internet, we come to the grand conclusion that properties must be *a kind of resource*, too! That is very true, but they are very special resources, because they can only be defined in RDF Schema.

Schema are the dictionaries of RDF (actually, they're more commonly called "vocabularies"). RDF Schema is (are?) a specification for describing properties. Now, if a property is a resource, what is the value of a property? It can be either a resource, or a literal - since some properties might have a quantitative value, we can't require property values to be resources - there's no URI for the number "1" on the Internet - so we allow literal values in addition to properties as objects of predicates. Now that we have identified the basic terminology for RDF, and have a basic understanding of the terms, we need to take a closer look at how we describe things, but I will save that for another time.