Semantic Web

The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. -- "Semantic Web Road map" Tim Berners-Lee --

[HOME]

[Professional XML]

[Next Gen Internet]


[Feedback Form]

From XML Schema to RDF

XML Schema

Standard Generalized Markup Language (SGML) provides a meta language for defining vocabularies that describe the content of the documents. An SGML DTD (Document Type Definitions) is good for expressing which elements is optional, which is mandatory, which may occur more than once, and in what order the components must appear. XML DTD is a simplification of SGML's. It is developed just for vocabularies that express data relationships. In this case DTD is not useful for content models in the way that:

The result of replacing the DTD syntax by XML syntax is XML Schema, also known as XML Schema defintion language (XSD). XML Schema provides 44 built-in datatypes, including about 20 variants of numeric types, plus several time-oriented types, boolean, URI, and binary types, as well as support for the original 10 types defined in XML DTD (i.e. CDATA, NMTOKEN, ID, IDREF, ENTITY, etc). Apart from the predefined datatypes, XSD also allow developers to derive application-specific or industry-specific datatypes by combining or constraining the values of built-in types. Some schema-related applications have influence the development of XSD:

There are some situations in which DTDs are preferable to XML Schema:

Resource Description Framewrok

In 1998, "Semantic Web Road map" Tim Berners-Lee wrote: "The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form."

Metadata (data about data) about Web resources is represented by RDF (Resource Description Framework). The RDF is an application of XML that allows Web-based systems to share "machine-understandable" descriptions of Web resources.

The key influences of the design of RDF came from the Web development community itself, in the form of HTML metadata and the Platform for Internet Content Selection (PICS). Attempts to turn PICS into a general metadata mechanism led the W3C to work on "PICS-NG", RDF's predecessor. There are various ways in which metadata can be associated with Web resources:

Other influences came from the library community, the structured document community, and the knowledge representation community. Framework design contributions also from object-oriented programming and modeling languages and databases. Examples, Netscape's Meta Content Framework (MCF) and Microsoft's XML-Data. To knowledge representation community the relationship has been realized with the introduction of the DAML+OIL (DARPA Agent Markup Language + Ontology Inference Layer) ontology language.

Two specifications currently define RDF. The separation reflects workflow issues of the standards process is therefore largely artificial. For example, the original PICS application of content rating contributed to the charter of the RDF Model and Syntax Working Group and the requirements specifications of RDF.

RDF is serialized using XML. More important from the standpoint of the model are Universal Resource Identifiers (URIs) because they are used to name everything in the model, and the basic mechanics of XML Namespaces, because they are used for a shorthand representation of URIs. At the core it is a model for making statements about objects. These objects can be Web resources such as documents and other Web pages, or more generally, they can be anything one can name using a URI, such as an application or a service. The core RDF data model consists of several types of entities: resources, properties, literals, and statements.

An RDF statement is a "triple" that consists of a subject, a predicate, and an object.

The value of a property can be another resource (which, in turn, can have properties with values, etc.), it is simple to construct arbitrary structures using RDF.

Example, "the author of Eric Tang's homepage is Eric Tang"

subject: "Eric Tang's homepage"
predicate: "author"
object: "Eric Tang"

We use a URI to denote the subject, a natural choice would be the URL of the specification (http://www.iohk.com/UserPages/erictang/). Objects can be other things named by URIs (e.g. email address), or they can be string literals. If you think of the subjects and objects as nodes in a direct graph, and the predicates as the labels for the directed arcs of the graph, you get a natural and convenient way of thinking about the RDF model. We say it is "graph-based" and used directed, labeled graphs (DLGs). For some nodes of a RDF graph, a URI is not given. These nodes are called "anonymous" and have an important role in the RDF model (W3C RDF Core WG call these "blank" or "b-nodes"). Individual anonymous nodes in a graph are separate, and this distinctness is preserved by the RDF serialization syntax.

Another way of thinking about the model would be to make it "node-centric" and consider predicates to be instance variables: this gives the object-oriented interpretation of the RDF model.

XML Topic Maps is an ISO specification from TopicMaps.org that may eventually be endorsed by W3C. Topic Maps facilitate quick and accurate retrieval of information. They build on the semantics of RDF and the syntax of XLink, and can be used in a variety of ways, one of which is to foster sophisticated navigation within a given site and across a much wider web of "topic space". Topic Maps are XML documents that define topics and state how individual documents relate to these topics. They act as navigation maps across information sets. The same set of documents can be represented by very different Topic Maps. The XTM 1.0 specification provides a grammer for representing the structure of information resources used to define topics, as well as the associations (relationships) between topics.


    Reference:
  1. Kenneth B. Sall, XML Family of Specifications - A Practical Guide (2002), Addison-Wesley.