http://www.sajim.co.za/websites.main.5nr2.asp?print=1 Web sites Vol.5(2) June 2003 Semantic Web: the devil is in the details Cindy Meltzer cmeltzer@hixnet.co.za The vision of the semantic Web (SW) is to facilitate machine-readable data and to extend relational data to the Web through metadata that describe Web page data. The SW also makes metadata available to people and computers through a cycle of creation, discovery, integration and re-use. While the Web owes a lot of its success to the simplicity of design and application, the SW, although appearing simple at first, is considerably more complex. The final success of the SW will depend on the successful resolution of a number of issues. In this issue, some key concepts of the SW are briefly discussed. A gateway to sites dedicated to various aspects of the SW is also provided. The SW is built as layers or levels of functionality in an architecture that bears similarity to the World-Wide Web architecture. In short, the SW assigns a uniform resource identifier (like a URL, but for data) to each relevant item and describes the relationship between the subjects and the properties in a resource description framework (RDF), using extensible markup language (XML) syntax. Tim Berners-Lee's articles on the SW design can be found at http://www.w3.org/DesignIssues/Overview.html. Figure 1 is Berners-Lee's 'layer cake' diagram of the SW architecture. Figure 1 Diagram of the SW architecture (Berners-Lee) Unicode and uniform resource identifier (URI) layers Unicode and uniform resource identifier (URI) layers define international character sets and ensure unique addresses. The URI is the principal method for unique identification on the Web and is the means of identifying resources (or subjects) of the SW with their objects. Extensible markup language (XML) layers: XML, namespace (NS) and XML schemas Extensible markup language (XML) is the universal format for structured documents and data interchange on the Web. The W3C XML recommendation was first published on 10 February 1998 (http://www.w3.org/TR/2000/REC-xml-20001006) and, since then, XML has had widespread success with storing and exchanging data (http://xml.coverpages.org/xmlApplications.html). XML is publicly available, platform independent, simple, flexible and well-supported by major commercial vendors. These vendors have supported the development of XML as well as software that enables the use of XML. There are excellent resources on XML, including: Cover Pages (http://coverpages.org). This is a comprehensive Web-accessible reference collection that supports the standard general markup language (SGML) and XML family of (meta) markup language standards and their applications. The objectives of this public knowledgebase are to promote and enable the use of open, interoperable standards-based solutions that protect digital information and enhance the integrity of communication. The knowledgebase also aims to provide reference material on enabling technologies compatible with descriptive markup language standards and applications. XML.com (http://www.xml.com/). This is a comprehensive site that features a rich mixture of information and services for the XML community. The site is designed to serve both people who are already working with XML and hypertext markup language (HTML), as well as users who want to graduate to the power and complexity of XML. W3C XML homepage (http://www.w3.org/XML/). This site contains listed and linked working groups that find and download formal technical specifications and links to W3C recommendations, proposed recommendations, working drafts, conformance test suites and other documents. An overview of XML, an article by Margaret van Steenderen. The article can be found at http://general.rau.ac.za/infosci/raujournal/vol2.nr1.01_07_2000/default.asp? to=news. To summarize, XML provides a syntax that allows a user-defined structure of a document in a standard, machine-readable form where the markup describes the structure of a document. XML documents can have any required elements. However, to define tag sets of all the elements and to attribute type names available to a type of document, users must either create a document type definition (DTD) that formally identifies the relationships between the various elements and attributes that form a document, or they must use a community standard DTD, for example, DocBook (http://www.docbook.org/) or an XML schema. An XML schema (http://www.w3.org/XML/Schema) describes a model for a whole group of documents. The model describes the possible arrangement of tags and text in a valid document. A schema might also be viewed as an agreement on a common vocabulary for a particular application that involves exchanging documents. It is therefore a collection (vocabulary) of type definitions and element declarations whose names belong to a particular XML namespace called a target namespace. Namespaces are the domain names associated with elements that produce a URI. This URI uniquely identifies the elements, which are central to both XML and RDF schemas. Target namespaces enable us to define groups of terms and to distinguish between definitions and declarations from different vocabularies. XML schema languages such as XSD (XML Schema Part 0: Primer) can enable a simple to very sophisticated validation of the structure of an XML tree and can datatype validation of tag content. In combination with application profiles, this provides a certain amount of interoperability among a limited number of schemas, but it is not scalable to the entire World-Wide Web. The RDF layer provides the metadata interoperability required for data sharing. RDF and RDF schemas RDF is the metadata language that encodes metadata about resources on the Web. A resource is anything with an associated uniform resource identifier (URI). The RDF model consists of simple statements about resources. An URI is assigned to each relevant item (subject or property), and the relationship between subjects and properties is defined in a resource description framework (RDF). The RDF is encoded in XML. RDF statements consists of three elements, namely: subjects or resources (anything that can be identified by a URI); predicates or properties (attributes to describe that resource); and objects (these values may be a literal or a second resource). The vocabulary description language of RDF and RDF schemas is an extension of RDF. It provides mechanisms for describing groups of related resources and the relationships between these resources. The starting site to visit is the W3C RDF home site, which contains reports, articles, developer tools, RDF validators and more. Go to http://www.w3.org/RDF/. Dave Beckett's Resource Description Framework (RDF) Resource Guide is also an excellent site that contains RDF publications, documents, tools, tutorials and more. Ontology layer The ontology inference layer (OIL) is a proposal for a Web-based representation and inference layer for ontologies. On the SW, ontologies represent the semantics of documents and enable the semantics to be used by Web applications and intelligent agents. An ontology is a consensual, shared and formal description of the concepts that are important in a given domain. Typically, an ontology identifies classes of objects that are important in a domain and organizes these classes in a subclass hierarchy. Each class is characterized by properties that are shared by all elements in that class. Important relations between classes or between the elements of these classes are also part of an ontology. OIL synthesizes work from three different communities to achieve the aim of providing a general purpose markup language for the SW. OIL defines a number of classes and organizes them in a sub class hierarchy. The Darpa Agent Markup Language (DAML+OIL) is currently being developed into the Web Ontology Language (OWL) standard. WebOnt articulated the following design goals of a Web ontology, namely to: provide for ontology sharing; support evolution and revision within ontologies; support interoperability; detect inconsistencies in vocabularies; balance knowledge representation with scalability; provided ease of use; be serializable in XML; and pay due attention to the global character of the Web (http://www.w3.org/TR/Webont- req/). The SW assumes that each ontology is consensual and congruent with the other shared ontologies to create a common domain of discourse that can be further interpreted by rules of inference and application logic. The current range of logic languages and information models have created a range of ontology forms, and the challenge is to create tools and ontologies capable of integration and interoperability. On-to-knowledge project The on-to-knowledge project aims to develop tools and methods for supporting knowledge management by relying on shareable and re-usable knowledge ontologies. The site provides information on OIL, papers, tool support and related initiatives. Go to http://www.ontoknowledge.org/ for more information. DAML ontology library This library provides the URI of ontologies by keyword, URI and more. Go to http://www.daml.org/ontologies/. Logic, proof and trust layers Schemas and ontologies provide vocabularies and relationships between terms in vocabularies. They enable inference in the logical layer. The next step in the development of the SW is a logical language that makes inferences. The trust layer will include XML encryption and signatures. These are areas under development. See Berners-Lee's paper on design issues at http://www.w3.org/DesignIssues/Logic.html. To process the knowledge available on the SW, an inference engine is necessary. Inference engines deduce new knowledge from already specified knowledge. There are two different approaches, namely general logic-based inference engines and specialized algorithms (problem-solving methods). The W3C XML query language is designed to provide a general purpose tool for querying XML documents. It will be defined in two parts, namely an XML Query 1.0 and XPath 2.0 Data Model and an XQuery 1.0 Formal Semantics. Issues to be resolved include browser support when moving queries from clients to servers and the ability of the XML linking language to return queries made into RDF resources. Digital signature layer Digital signatures are a way of securely proving that someone wrote a specific document. It is an important layer in the 'Web of trust' without which information derived from the Web is not authenticated. The XML signature syntax and processing W3C recommendation of 2002 specifies XML syntax and processing rules for creating and representing digital signatures. The recommendation points out that the XML signature is a method of associating a key with referenced data. It does not normatively specify how keys are associated with persons or institutions, nor does it specify the meaning of the data being referenced and signed. Consequently, while this specification is an important component of secure XML applications, it is not sufficient to address all application security and trust concerns, particularly with respect to using signed XML as a basis of human-to-human communication and agreement. Further security considerations must be taken into account. Go to http://www.w3.org/TR/xmldsig-core for more information. Development The SW is still in a developmental phase. Although rapid strides have been made in certain areas, other areas remain in a stage of possible change or have yet to be taken up. The SW community portal (http://www.semanticWeb.org) is the central resource for up-to-date information on all SW development and issues. Problems that must be confronted include providing incentives for page creators to improve interoperability, as XML schemas with application profiles will provide a basis for internal organization, as well as data exchange between cooperating groups. Kendall Grant Clarke http://www.xml.com/pub/a/2001/01/31/politics.html has pointed out the political nature of the SW. According to Clarke, schemas, one of the foundational parts of the SW, are essentially political because they reflect the interests of the institutions that produce them. Other problems discussed include defining mappings between different ontologies for interoperability, the development of inference engines and security issues in the proof, trust and digital signatures layers. The successful implementation of the SW will depend on a snowball effect that must begin with the development of tools, SW sites and standardization. In 2002, the specification of the expression of simple Dublin Core Metadata in RDF and XML became a Dublin Core Metadata Initiative (DCMI) recommendation. Visit the Dublin Core Metadata Initiative site for a detailed discussion of this and other metadata issues at http://dublincore.org/documents/2002/05/15/dcq-rdf-xml/. Interesting sites on the development of the SW include: The Maryland information and network dynamics lab SW agents project. This is Mindswap's first site on the SW and can be found at http://www.mindswap.org. This site harnesses many Web technologies (for example HTML and XML) and couples them with SW languages (RDF, RDFS, DAML+OIL, OWL) and tools that were built both here (see the download page) and elsewhere. Many of the pages contain links that allow you to either see the SW markup directly (the machine-readable markup) or to go to pages that describe how the pages are created and how the tools power them. This site has papers, demonstrations of Web service composers, DAML ontology search facilities, downloads of semantic markups, an ontology and RDF editor, an RDF instance creator and more. SWAD-Europe workplan. This site describes a number of reports, tools, projects and works in progress. Go to http://www.w3.org/2001/sw/Europe/reports/intro.html for more information. RDFLib.net is a Python library for working with RDF. RDFLib lets you generate, store and query RDF triples without requiring you to deal directly with RDF or XML syntax. It parses and serializes RDF and XML and stores RDF triples. Go to http://rdflib.net/ for more information. The Open directory project uses RDF technologies. Go to http://www.dmoz.org/. Musicbrainz uses RDF and XML to facilitate the exchange of audio-video related metadata. Go to http://www.musicbrainz.org/ for more information. RdfDB is a simple, scalable, open source database for RDF. Go to http://www.guha.com/rdfdb/ for more information. SuishQL is a demonstrator, open source software, RDF query engine, written in Java. It can take standard query language-like query strings. Go to http://swordfish.rdfWeb.org/rdfquery/ for more information. Disclaimer Articles published in SAJIM are the opinions of the authors and do not necessarily reflect the opinion of the Editor, Board, Publisher, Webmaster or the Rand Afrikaans University. The user hereby waives any claim he/she/they may have or acquire against the publisher, its suppliers, licensees and sub licensees and indemnifies all said persons from any claims, lawsuits, proceedings, costs, special, incidental, consequential or indirect damages, including damages for loss of profits, loss of business or downtime arising out of or relating to the user’s use of the Website. ISSN 1560-683X Published by InterWord Communications for the Centre for Research in Web-based Applications, Rand Afrikaans University