http://www.sajim.co.za/peer164.11nr3.asp?print=1
Peer Reviewed Article Vol.11(3) September 2009
Bootstrapping an XML Schema of syntactic functions into a skeleton ontology *
JH Kroeze
North-West University (Vaal Triangle Campus)
Vanderbijlpark South Africa
Kroeze.Jan@nwu.ac.za
Information Systems is regarded as an interdisciplinary science. Therefore, insights from the humanities are important for this
discipline, although this is not always recognised or valued. One outstanding example is the current upsurge in the study and use
of 'ontologies' in information systems, bridging the disciplines of philosophy and computing. The article begins with an overview of
an XML Schema that was used as a thesaurus to ensure consistency in the syntactic tagging of the Hebrew text in Genesis 1. The
broader syntactic taxonomy, on which the XML Schema is based, and which may be used to analyse the syntax of Biblical Hebrew
texts, is discussed in detail. The research also investigates how the concept of 'ontologies' is used in computational linguistic
projects. These concepts form the building blocks for suggesting an ontology of syntactic functions for Biblical Hebrew, which may
be implemented and used by linguistic information systems to ensure its quality and reliability (i.e. an ontology for information
systems). Some possibilities are also proposed of how such an ontology may be put to use.
Key words: XML Schema, thesaurus, taxonomy, ontology, Biblical Hebrew syntax
Received: 26 August 2009
Contents
1. Introduction
2. Taxonomy and XML Schema of Hebrew syntax
3. Ontologies in ICT
4. Links between XML schemas and ontologies
5. Use of IT ontologies in (computational) linguistics
6. Suggesting an ontology of syntactic functions in Biblical Hebrew
7. Possible uses of such an ontology
8. Conclusion
9. References
1 Introduction
Information Systems (IS) is regarded as an interdisciplinary science. Although it mainly focuses on social aspects regarding the
development and use of software in organisations, it also deals with programming and algorithms and, therefore, contains
elements of mathematical and physical sciences. In addition, insights from the humanities are as important for this discipline,
although this is not always recognised or valued. Many papers, books and articles have been written on humanities computing,
that is, the computer-based study of various humanities disciplines. However, not that much is available on, what the author
would like to call, 'Humanities-enriched Information Systems', meaning a humanities approach and exploration of various aspects
of computing subjects (Kroeze 2009).
Some Information and Communication Technology (ICT) disciplines, such as Information Science, may be regarded as the
humanities branch of ICT, because they developed out of humanities disciplines associated with systems development for
knowledge representation such as taxonomies and classification systems. However, humanities approaches are also present and
embedded in other branches of ICT. One outstanding example is the current upsurge in the study and use of 'ontologies' in
information systems. Ontology has traditionally been (and still is) a philosophical discipline that studies the nature of existence. In
IS, however, ontologies refer to subsets of reality and how knowledge about these entities may be represented electronically.
'Ontologies are used to capture knowledge about some domains of interest. An ontology describes the concepts in the domain and
also the relationships that hold between those concepts' (Horridge 2009:10).
This article gives an example from Biblical Hebrew (BH) grammar to illustrate a typical IS ontology. The idea for this originated in
a study regarding an XML Schema of Hebrew syntax used to ensure consistent tagging of the Hebrew Bible text (Kroeze 2006).
The next section gives an overview of this schema and its underlying taxonomy. Links between XML schemas and ontologies are
discussed, and a brief overview of the use of ontologies in ICT and computational linguistics is then presented before suggesting
an ontology of syntactic functions in BH and its possible uses. Consistency checking and visualisation of a skeleton ontology are
implemented as examples of the benefits of an ontological approach.
The research is an interpretive, qualitative study. An interpretivist approach is more suitable for describing both the phenomena of
syntax and ontologies because both are cultural products of society. IS ontologies capture and formalise subjective realities, while
syntactical systems reflect the theoretical assumptions of various linguistic schools. The research strategy is 'design and creation'
since the suggested BH taxonomy and ontology may be regarded as constructs or artifacts which form the main focus and
contribution of the research (Oates 2006:291-296, 108-109).
2 Taxonomy and XML Schema of Hebrew syntax
This section discusses an XML Schema used in the syntactic tagging of the Hebrew text in Genesis 1 and a more detailed
underlying syntactic taxonomy that may be used to analyse the syntax of BH texts. This marked-up text of Genesis 1 was used as
a databank in a thesis on the text data mining of linguistic data (Kroeze 2008). Syntactic functions refer to the formal,
top
Page 1 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
grammatical roles and relations in clauses. 'Syntax describes the form of clauses and sentences. The syntactic function of an
element in a clause is the formal relation of that element to the other elements' (Kroeze 2000a:99). The concept of syntactic
function is used as an umbrella term for elements usually called 'subject, direct object, indirect object, adjunct, copulative
predicate, etc.' (Van der Merwe, Naudé and Kroeze 1999:239). According to Dik (1997a:26), syntactic functions 'specify the
perspective from which a State of Affairs is presented in a linguistic expression'.
The XML Schema was created using the built-in functionality of Visual Studio.Net 2003 (VS.Net 2003). The structure of an XML
document is represented by its schema. VS.Net 2003 was used because the XML functionality is not available in Visual Basic 6.
VS.Net 2005 allows one to automatically create an XML Schema, but not to use it directly to validate XML databases. VS.Net
2003, however, facilitates both automatic creation and direct validation (using an option on the XML menu). Although the basic
schema for the XML database of linguistic data was created automatically, a simple type and enumeration of syntactic tags were
coded manually and added to the schema. A 'simple type' is a user-defined type, which enables the programmer to create
custom-made types that reflect his/her exact requirements (Deitel and Deitel 2006:919-921). One may create a type to define a
list (enumeration) of all possible values of syntactic functions. An enumeration is 'a set of values that a data item can select
from' (Holzner 2004:213). The essential section of the schema is shown in Table 1 (for complete details, see Kroeze 2008:113-
119). The schema was then used to test the XML database of Genesis 1:1-2:3, and this procedure revealed some inconsistencies
in the tagging, for example with regard to the use of square brackets to indicate embedded clauses. After correcting these tagging
errors the validation was successful.
This schema, which basically is merely a list of syntactic functions occurring in the tagged text, is based on a more detailed
taxonomy of Hebrew syntax, shown in Table 2. The definitions contained in this taxonomy should be regarded as a reference
system, built by the author on nuggets of syntactic information mined from various sources. This taxonomy may be regarded as
the creation of a reality (a classification system) that occurred within the author himself, 'primarily through the use of past
experience, personal knowledge, and thinking', which is typical of an agile approach in scientific endeavours (Brown, Nerur and
Slinkman 2004:4141). It was used in an introductory study manual on Hebrew grammar and syntax (Kroeze 2000a:330-334), as
well as in various applications, for example in Kroeze 2000b, Kroeze 2002 and Kroeze 2008. In Table 2 below, those syntactic
functions that were actually used in the Genesis 1:1-2:3 XML databank are marked with an asterisk.
Table 1 Syntactic sections from an XML Schema used to validate an XML database of Genesis 1:1 -
2:3
...
...
Using simple type synfenum to validate syntactic function elements
...
Table 2 Taxonomy of syntactic functions in Biblical Hebrew (Kroeze 2000a, 2000b, 2002,
2008)
Page 2 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
Name Definition
1. Subject* The subject is that element in a clause which
determines the person, gender and number of the
main verb, or the gender and number of the
copula-predicate if this is a noun, adjective or
participle.1 It is that part of the clause about which the
predicate proclaims something.2 A noun, noun phrase,
verb phrase or a clause can serve as subject. Rarely,
even a preposition phrase may serve as subject.3
2. Predicate The predicate is that element in a clause which is
governed by the subject (in terms of person, gender
and number) and which tells something about the
subject.4 It consists of a main verb with or without
complements and adjuncts.5 It can also consist of a
copula and copula-predicate. In BH the copula is often
not expressed.6
2.1 Main verb* The main verb is a verb which functions as the main
element of the predicate.7 The following kinds of
verbs can function as main verbs:8
2.1.1 Transitive verb A transitive verb takes or supposes a noun or noun
phrase as direct object.9
2.1.2 Intransitive verb An intransitive verb does not (and cannot) take a
direct object.10
2.1.3 Preposition verb A preposition verb takes a preposition phrase as
complement.11
2.1.4 Copulative verb* A copulative verb (haya: is, was, were, etc.) takes a
noun, noun phrase, adjective, adjective phrase, adverb,
adverb phrase or preposition phrase as complement.
(haya is often omitted in BH.)12
2.2 Copula The copula is that element in a clause that connects
the subject and copula-predicate. In BH the
particles yesh, ayin/eyn, the independent personal
pronouns and the copulative verb haya can serve as
copula, but it is often omitted.13
2.3 Copula-predicate* The copula-predicate is the complement of the
copula. (When the copula is omitted, the copula-
predicate forms the whole predicate.) A noun, noun
phrase, adjective, adjective phrase, participle, participle
phrase,14 adverb, adverb phrase or preposition phrase
can serve as copula-predicate.15
2.4 Complement* A complement is an obligatory, non-verbal element
in the predicate which is selected by the verb.16
2.4.1 Direct object (Object*)
Object clause*
Object cluster*
The direct object is the complement of an active,
transitive verb. In a passive transformation the object
of the active clause becomes the subject of the passive
clause. A noun, noun phrase, verb, preposition phrase
or even a clause can serve as object.
An object clause* is a clause that functions as a direct
object.
An object cluster* is a group of clauses that functions
as a direct object.
2.4.2 Indirect object (IndObj*) The indirect object is the third argument or second
complement of the main verb. It can occur only if
there is also a direct object. In BH the indirect object
cannot be transformed into the subject of a passive
clause - compare, however, this phenomenon in
English: 'We were given the book'. A pronominal suffix,
noun, noun phrase or preposition phrase can serve as
indirect object. The indirect object follows the direct
object in unmarked word order, except when the
indirect object is a preposition with suffix.17
2.4.3 Other complements These are other obligatory non-verbal elements in
the predicate, for example the complements of
certain intransitive verbs such as verbs of being full,
living, moving. These complements cannot become
the subject of a passive clause. A noun, noun phrase or
preposition phrase can serve as complement.18
2.4.4 Copula-predicate* See 2.3.
2.5 Adjunct* An adjunct is an optional, non-verbal element in
the predicate. It is added to, but not selected by the
verb. It gives optional, additional information. It can be
omitted without changing the classification of the verb,
without making the clause ungrammatical or senseless,
and without changing the meaning of the clause
radically. An adverb, adverb phrase, noun, noun
phrase, preposition phrase, as well as an adverbial
clause, can serve as adjunct.19
3. Attribute* An attribute (or adjectival modifier) is a word, phrase
Page 3 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
or clause that qualifies a noun. An adjective,
adjective phrase, noun, noun phrase, adverb, adverb
phrase, numeral, preposition phrase or relative clause
can serve as attribute.20
4. Disjunct* These are elements which are not connected with
any of the elements in the clause, but which are
loosely connected to the whole clause. They may
also be called clause border adjuncts (Afrikaans:
'sinsrandadjunkte'), or sentence adjuncts or extra-
clausal constituents.21
4.1 Interjection/Modal word An interjection is an isolated element which is not
connected to any of the other elements in the clause,
for example an exclamation particle.22
4.2 Discourse marker A discourse marker is an isolated element which gives
prominence to the sentence in the wider context.23
4.3 Dislocative Any element of a clause may be marked as the topic of
the clause by moving it from its usual place to precede
the rest of the clause. It may even be separated from
the rest of the clause by the vav conjunction. Its empty
place in the clause may be filled by a pronoun or
adverb referring to the dislocated element.24
4.4 Addressee The addressee is the person to whom a clause is
addressed. It is usually a common noun or proper
noun. In BH the addressee is often marked by the
article ha- if it is a common noun.25
5. Conjunction (Conj*) A conjunct is a particle which connects clauses or
elements in clauses.26
5.1 Co-ordinate conjunction A co-ordinate conjunction connects two main
clauses. Co-ordinate conjunctions are also used to
connect elements in a phrase or clause.27
5.2 Subordinate conjunction A subordinate conjunction connects a main clause
with a subordinate clause.28
5.3 Relative particle The relative particle in BH is basically only a
conjunction, but it can also fulfil a syntactic function in
the relative clause. It connects the whole relative
clause as an attribute to one of the nouns in the main
clause. (All the other conjunctions are unconnected
with regard to the subordinate clause, and they connect
this whole clause as one of the syntactic elements in
the main clause, for example subject, object, adjunct.)
The relative particle can also be used independently,
that is, without an antecedent, and such a relative
clause fulfils one of the syntactic functions in the main
clause, for example subject, object, etc.29
1 Cf. Gesenius et al. (1976:462-467).
2 Cf. Waltke and O'Connor (1990:71).
3 Cf. Waltke and O'Connor (1990:69-70).
4 Cf. Waltke and O'Connor (1990:71).
5 Cf. Waltke and O'Connor (1990:69, 169); Van der Merwe et al. (1999:60-62).
6 Cf. Joüon and Muraoka (1991:564-577); Waltke and O'Connor (1990:71).
7 Cf. Du Plessis (1982:66-69); Van der Merwe et al. (1999:159-160, 165, 168).
8 Cf. Du Plessis (1982:78).
9 Cf. Van der Merwe et al. (1999:242-243, 246, 367); Waltke and O'Connor (1990:694): 'a
verb that (usually) governs a (direct) object'.
10 Cf. Van der Merwe et al. (1999:360), Du Plessis (1982:80), Waltke and O'Connor
(1990:691): 'a verb that (usually) does not govern an object'.
11 Cf. Waltke and O'Connor (1990:163, 165, 169-170, 221-222, 240, 275, 606, 690); Du
Plessis (1982:82-84); Gesenius et al. (1976:378-384).
12 Cf. Du Plessis (1982:85-86); Waltke and O'Connor (1990:131, 690).
13 Cf. Waltke and O'Connor (1990:71-72, 131, 228, 297, 690).
14 The participle is regarded as a verbal adjective (Gesenius et al., 1976:355-362; Waltke
and O'Connor, 1990:612, 624). Morphologically and syntactically, it behaves primarily like
an adjective.
15 Cf. Du Plessis (1982: 86); Joüon and Muraoka (1991:562); Van der Merwe et al.
(1999:234, 356); Waltke and O'Connor (1990:71).
Page 4 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
This taxonomy is one of the main building blocks for the suggested IS ontology of Hebrew syntax. The next section will give more
information on the characteristics of IS ontologies, which is another important fundamental part for this study.
3 Ontologies in ICT
It is important to differentiate clearly between three closely related concepts, some of which have already been used in the
discussion above, namely thesaurus, taxonomy and ontology (Gilchrist 2003). A thesaurus is a mere list or vocabulary of
associated concepts that may be categorised according to their meanings. The XML Schema of syntactic functions, referred to
above, may be regarded as a computerised version of an uncategorised thesaurus.
A taxonomy is a thesaurus that has been classified according to a scientific theory (Gilchrist 2003:7, 10). It may be viewed as a
controlled vocabulary that involves relationships between the concepts contained in it (Lambe 2007:6). The system of BH syntax
underlying the XML Schema, referred to above, may be considered as a taxonomy since it also defines the relationships between
the syntagms, for example that a subject governs a verb, that a direct object is the complement of a transitive verb, etc.
An ontology refers to the knowledge of a subset of reality. It is a knowledge representation of 'a particular domain of
knowledge' (Zúñiga 2001:187) that is shared and agreed upon by a certain community of scientists (Gilchrist 2003:7, 13). Since
an ontology is a representation of the knowledge of its builder it is also limited to his/her particular view of the subset of reality
(Buchholz 2006:694-695). As such, there is not much of a difference between a taxonomy and an ontology. However, a fourth
concept is that of formal ontology, which is an ontology that has been enriched with inference rules and axioms, using description
logics and artificial intelligence. It is machine readable and can be used to reason about the ontology. Ontologies in the IS realm
usually refer to formal ontologies. Zúñiga (2001:187) defines an information systems ontology as a 'formal language designed to
represent a particular domain of knowledge'.
The ontology suggested below is indeed a formal ontology; the concepts and relationships have been extracted from standard BH
grammars (such as Gesenius, Kautzsch and Cowley 1976; Joüon and Muraoka 1991; Van der Merwe et al. 1999; Waltke and
O'Connor 1990), implying that a large community of BH grammarians should agree on the taxonomy (if not, adjustments could of
course be made in another cycle of this research endeavour). References to these textbooks are provided in order to facilitate
confirmability of the interprevist work, but it still is necessary that members of the BH linguistic community check the proposed
system to ensure a jointly constructed reality (Oates 2006:294-295). The rules and axioms are deduced from the taxonomy and
can be built into the ontology by the use of Protégé 4, a software tool that allows semi-automatic creation of IS ontologies. The
ontology is machine readable and should allow computerised reasoning about itself and areas of application. The ontology is also
the creation of an artifact that formalises a subset of reality: 'In information science, an ontology refers to an engineering artifact,
16 Cf. Van der Merwe et al. (1999:241-244, 351, 355); Waltke and O'Connor (1990:163).
17 Cf. Gesenius et al. (1976:369, 370); Joüon and Muraoka (1991:442, 487, 490); Van der
Merwe et al. (1999:173, 174, 240, 254, 255, 275, 359, 368); Waltke and O'Connor
(1990:169, 193, 206).
18 Cf. Gesenius et al. (1976:369-372); Joüon and Muraoka (1991:455-461); Van der
Merwe et al. (1999:244); Waltke and O'Connor (1990:173-177).
19 Cf. Du Plessis (1982:97-103); Van der Merwe et al. (1999:241, 244-245, 351); Waltke
and O'Connor (1990:163, 169-173; 689).
20 Cf. Dik (1997a:151); Du Plessis (1982:48-58); Gesenius et al. (1976:414-419, 423-
437); Van der Merwe et al. (1999:57, 229, 232-233, 266-270, 354); Waltke and O'Connor
(1990:255-260, 689).
21 Cf. Du Plessis (1982:100); Dik (1997b:379-407). 'In linguistics, a disjunct is a type of
adjunct that expresses information that is not considered essential to the sentence it appears in, but
which is considered to be the speaker's or writer's attitude towards, or descriptive statement of, the
propositional content of the sentence.... More generally, the term disjunct can be used to refer to any
sentence element that is not fully integrated into the clausal structure of the sentence. Such elements
usually appear peripherally (at the beginning or end of the sentence) and are set off from the rest of
the sentence by a comma (in writing) and a pause (in speech). A specific type of disjunct is the
sentence adverb (or sentence adverbial), which modifies a sentence, or a clause within a sentence,
to convey the mood, attitude or sentiments of the speaker, rather than an adverb modifying a verb,
an adjective or another adverb within a sentence' [Disjunct (linguistics), 2007].
22 Cf. Van der Merwe et al. (1999:309-311, 334-335); Waltke and O'Connor (1990:674-
685).
23 Cf. Van der Merwe et al. (1999:328-333).
24 Cf. Gesenius et al. (1976:457-458); Joüon and Muraoka (1991:586-588); Van der
Merwe et al. (1999:339); Waltke and O'Connor (1990:128-129).
25 Cf. Van der Merwe et al. (1999:189, 249).
26 Cf. Van der Merwe et al. (1999:294-305).
27 Cf. Van der Merwe et al. (1999:294); Waltke and O'Connor (1990:648-655).
28 Cf. Waltke and O'Connor (1990:623-646).
29 Cf. Waltke and O'Connor (1990:330-340).
top
Page 5 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
constituted by a specific vocabulary used to describe a certain reality' (Fonseca 2007).
Fonseca (2007) differentiates between ontologies for and of information systems. While ontologies for information systems are
part and parcel of the systems themselves, ontologies of information systems are used on a higher conceptual level to 'support
the creation of modeling tools'. Ontologies of IS are 'the creation of ontologies that study the information system as an object per
se with the objective of creating better modeling tools'. The ontology of BH syntax, suggested in this article, is, however, an
ontology for information systems. The possible uses of this ontology will be discussed later on. The next section will discuss the
links between XML schemas and IS ontologies in more depth.
4 Links between XML schemas and ontologies
IS ontologies may be used to validate conceptual schemas. Ontologies for IS 'are useful to ensure that the conceptual schemas
(conceptual-modeling scripts) we create using the grammar are correct' (Fonseca 2007). Therefore, they may also be used to
check the consistency of an existing XML Schema. While conceptual schemas focus on a specific application, an ontology is more
general and could be shared by various applications; schemas ensure correct data, ontologies check logical consistency of
underlying theories (Fonseca and Martin 2007:137).
An ontology Web language (OWL), such as Protégé 4, is an ontology language that facilitates the semi-automatic creation of a
hierarchy of concepts, the definition of relations between these concepts and the checking of the ontology's consistency (Horridge
2009:10). It also allows logical reasoning using its built-in description logics.
OWLs use a resource description framework (RDF) to express sets of rules, referred to as ontologies, regarding domains of
knowledge, to ensure its validity and correctness. RDF is a semantic Web technology, the main strengths of which are 'its
simplicity, rigour and the use of URIs' (uniform resource identifiers) (Tummarello, Morbidoni, Puliti and Piazza 2008:469). Every
element (class, property and instance) in the ontology is regarded as a resource which is identified by its unique URI and data
value (literal). The resources are regarded as nodes connected by labelled edges to form a semantic network. RDF offers improved
knowledge representation over hierarchical XML schemas, because 'RDF is a graph-based data model, ... a super-case of the tree-
based XML model' (Tummarello et al. 2008:469). Ontologies are a further improvement in terms of information and knowledge
management, since they contain additional semantics. 'While the graph structure of RDF provides a very suitable data model for
annotations, it is the use of annotation vocabularies with well specified semantic meanings (ontologies) that possibly enables the
most interesting results' (Tummarello et al. 2008:475). The additional built-in semantics allows more advanced reasoning and
automated processing.
According to Ferdinand, Zirpins and Trastour (2004:354), existing XML schemas may be used as a basis for formal ontologies,
which are needed to create the semantic Web, allowing 'software agents to understand, share and reason about data'. The idea
behind the semantic Web is to enhance the current Web into a form that can be processed intelligently by machines (Antoniou and
Van Harmelen 2004:3-4). This article is such an attempt to semi-automatically upgrade or 'bootstrap' an existing schema into a
skeleton information systems ontology of BH syntax (compare Ferdinand et al. 2004, who propose an automated process of
bootstrapping). This may be regarded as a small step towards the integration of and improved access to linguistic information
systems.
5 Use of IT ontologies in (computational) linguistics
Ontologies are used in knowledge management endeavours to enhance knowledge representation, and its storage, search and
communication (Buchholz 2006:694). Although linguistic concepts are often used in the building of IS ontologies, not that much
has been done on the creation of ontologies for the purpose of linguistics itself, especially when one starts searching for ontology
literature on syntax and semantics. According to Buchholz (2006:700), WordNet is a lexicon or terminological ontology and not a
formal ontology because it is not axiomatised.
Tummarello et al. (2008), however, do propose a new textual encoding system using ontologies in order to overcome the
limitations of XML tagging, such as the unsatisfactory handling of overlapping hierarchies and embedded text. Elements of various
levels of language, such as structure and grammar, are encoded as instances of classes. This approach allows improved
cooperative encoding, consistency checks, and enhanced searching facilities.
The following benefits of an ontological approach are indicated by Tummarello et al. (2008:467-468, 474-476):
Enabling 'collaborative and distributive textual encoding'
Allowing 'ontology-based reasoning in text processing and querying'
Solving the encoding problem of overlapping hierarchies and 'cross concern annotations'
Facilitating enhanced searching and querying, even across divergent levels of annotation
Enhancing filtering and formatting
'Lowering the complexity of advanced textual document encoding'
'Increasing interoperability and reusability'
Merging or joint processing of distributed documents
Validity checking and inference of new information
Reuse of previous encoded results.
Tummarello et al. (2008:470-472) provide an example of an RDF model of text. Words, punctuation and other printable elements
constitute the nodes or resources in the network. Using further annotations the natural word order is encoded using a linked list -
each element has a property that points to the next element. In addition, clauses and sentences are encoded by pointing to their
first and last symbols. Even non-contiguous and interleaving elements, such as dependent and independent clauses, can be
annotated using the relevant URIs. The flexibility of the model is shown by a complex annotation bundle in which 'different
overlapping hierarchies and cross hierarchy (concern) annotations coexist and interrelate'. A formal ontology is used to regulate
and validate the interconnection of resources.
Some other examples of the application of ontologies in linguistics are Caracciolo (2006) who proposes the use of an ontology to
access the content of a textbook; Oltramari and Vetere (2008) on an Italian machine-readable dictionary; Oltramari and Stellato
top
top
Page 6 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
(2008) discussing sharing and integration of vocabularies from different computational ontologies; Guarino's (1998) discussion of
ISA overloading in upper-level lexical ontologies; Dahlgren (1995) who uses linguistic constraints for syntactic disambiguation, but
does not provide a taxonomy of syntactic functions; and Farrar (2005) who differentiates between the various layers of linguistic
analysis but does not describe syntactic functions of clauses in detail, and invites communities of practice extensions (COPEs) to
complement their proposed ontology with information in language-specific domains. This article could be a step in the direction of
creating a sub-ontology for BH syntax.
6 Suggesting an ontology of syntactic functions in Biblical Hebrew
Protégé 4 is free and open source software that facilitates the semi-automatic building of IS ontologies. This software may be used
to semi-automatically create an ontology of BH syntax. Such an ontology is a form of knowledge representation since it captures
and organises existing information in a machine-readable, logical and coherent system. It describes the syntactic entities of BH
and the structural relationships between them. Although the concepts are organised hierarchically, other connections are allowed
using OWL's description logics. The BH syntax ontology will, therefore, allow consistency checking and logical reasoning about the
system itself, and may be implemented in linguistic information systems, for example to ensure correct tagging of syntagms. An
ontology of syntax will consist, like any other ontology, of individuals, properties and classes. Individuals may eventually be used
to identify specific instances of syntactic classes in text. A class (also called a concept) is a set of individuals, for example,
DirectObject is the collection of all instances of direct objects. Properties describe the relations between two individuals, linking
them together, for example 'Subject governs Verb'. The inverse property would be 'Verb isGovernedBy Subject'. A property that
has a single value is functional. In description logics properties are referred to as roles. They are also sometimes called attributes.
If the classes are organised into a hierarchy of super- and subclasses (e.g. Predicate - MainVerb - TransitiveVerb), it becomes a
taxonomy. Subclasses are finer specialisations of a superclass. The built-in reasoner of Protégé 4 can compile these subsumption
relationships automatically. Descriptions 'specify the conditions that must be satisfied by an individual for it to be a member of a
class' (Horridge 2009:9-12).
Building the BH syntax ontology starts by defining classes of syntagms (classes are indeed the main building blocks of OWL
ontologies; see Figure 1). The class hierarchy cannot yet be considered as an ontology - it is still merely a taxonomy or skeleton
ontology. All the sub-classes on the same level of the hierarchy are declared as disjoint, meaning that, for example, a subject
cannot be a predicate, neither can a predicate be a subject. An individual of a class can only be an instance of that class. In
language, of course, a word may often be used in various ways, but a specific occurrence of a word (e.g. the third word in the
fourth sentence of the first paragraph of a specific book) can only be an instance of one specific class. OWL and Protégé assume
an open world and therefore one cannot assume that an individual is not a member of class B if it has been declared as a member
of class A. All instances of a subclass are per implication also instances of the superclass, for example, all verbs that are classified
as transitive verbs are per implication main verbs, and predicates, and syntagms (Horridge 2009:15-23).
Figure 1 Taxonomy of BH syntax implemented as a skeleton ontology using Protégé 4
Although more advanced functions are available in Protégé 4, their implementation fall outside the scope of this article. A more
top
Page 7 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
complete form of the ontology should be created and discussed in follow-up work. The next section explores possible uses of such
an ontology.
7 Possible uses of such an ontology
According to Buchholz (2006:695) ontologies are used by computerised agents to simulate human thinking to some extent. In
information systems, ontologies replace the function of the human brain by creating the illusion of storing, analysing and
understanding information. 'While never pretending to duplicate exactly the workings of the human imagination or experience,
ontologies attempt to capture conceptually the rational building blocks of the mind by modeling our knowledge of
reality' (Buchholz 2006:695). Domain-specific ontologies are used by software solutions in various ways, for example in efficient
information retrieval, to do automatic translations, to model enterprises and to tag information on the semantic Web (Gilchrist
2003:13-14). A BH syntax ontology could be used to intelligently retrieve formally related constituents in a text. This information
could be used to study the structures of typical syntactic frameworks. An intelligent tagging program could use its knowledge, that
a subject governs the person, number and gender of a finite verb, to suggest all possible subjects of a main verb in a clause. If
enough knowledge could be built into the ontology using description logics, it could even be used to assist translators by
suggesting syntactically correct alternative renderings which they might not have thought about.
Another functionality of a formal ontology is that one could use it to check 'whether or not one class is a subclass of another
class' (subsumption testing) (Ferdinand et al. 2004:357; Horridge 2009:49-54). This is done by using a reasoner or classifier that
computes the inferred ontology class hierarchy. The reasoner can also check the consistency of the ontology (if a class cannot
have any instances, it is inconsistent). This facility has been tested on the skeleton BH syntax ontology. Classifying the ontology
that has been asserted so far results in the inferred hierarchy, implying that all definitions so far have been consistent. Any
inconsistent classes would have been marked in red (for example, if one would have tried to declare Copula-Predicate both as a
subclass of Copula and Complement). This is demonstrated in the tool by ProbeInconsistentCopulaPredicate (see Figure 2 below).
However, making ProbeInconsistentCopulaPredicate a subclass both of Predicate and Complement (as is indeed suggested by the
original taxonomy underlying the XML Schema and OWL ontology) seems not to give a problem, because Predicate and
Complement are not marked as disjoint. Yet, in the inferred class ProbeInconsistentCopulaPredicate is shown only in one place
(subclass of Complement).
Figure 2 Inconsistent classes discovered and highlighted by Protégé 4
Protégé 4 also has a built-in visualisation tool called OWL Viz, which is a very useful facility that may be used to render a graphic
overview of an implemented ontology. This facility has indeed been tested for the proposed skeleton ontology and the resulting
visualisation provides a compact, graphical synopsis of the five hierarchical layers of syntagms (Figure 3).
Figure 3 Skeleton ontology of BH syntax visualised using Protégé 4's built in OWL Viz tool
top
Page 8 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
8 Conclusion
Since an XML Schema does not facilitate the checking of the logic and consistency of a syntactic taxonomy of BH, the use of an IS
ontology to fulfil this need was suggested. Some of the basic constructs of a formal ontology were implemented in Protégé 4 to
serve as an example of how this could be done. This may be regarded as a 'skeleton ontology' which should be extended using
description logics and OWL expressions (Ferdinand et al. 2004:357), some examples of which have been provided. The research
made a contribution by differentiating the roles of IS ontologies, taxonomies and XML schemas in the information and knowledge
management subject field. It also indicated the value of ontologies in terms of ensuring the quality and reliability of linguistic
information systems. The suggested ontology may, therefore, be regarded as an example of a domain specific ontology (Buchholz
2006:699), the domain being linguistics and, more specifically, BH syntax. Once the syntax ontology is completed it may be used
to ensure the quality and reliability of the system itself. It may then be implemented in a myriad of applications, for example to
ensure correct tagging of Hebrew texts or to integrate various existing marked-up products. The author trusts that this research
has been a small step towards Buchholz's (2006:700) dream: 'Ontologies will continue to play an important role in the
development of large-scale, computer mediated, and global knowledge-management projects. Communicating knowledge within
an organization, and among organizations worldwide, will be facilitated by ontologies, as they create a knowledge layer critical to
the automated sharing and reuse of essential explicit knowledge.'
9 References
Antoniou, G. and Van Harmelen, F. 2004. A semantic Web primer. Cambridge, MA: MIT Press.
Brown, R., Nerur, S. and Slinkman, C. 2004. Philosophical shifts in software development. In: Proceedings of the Tenth Americas
Conference on Information Systems (AMCIS), August 2004, New York, NY:4136-4143. [Online]. Available WWW:
http://aisel.aisnet.org/amcis2004/516 (Accessed 23 August 2009).
Buchholz, W. 2006. Ontology. In: Schwartz, D. (ed). Encyclopedia of Knowledge Management. Hershey, PA: IGI (Idea
Group):694-702.
Caracciolo, C. 2006. Designing and implementing an ontology for logic and linguistics. Literary and Linguistic Computing 21:29-
39.
Dahlgren, K. 1995. A linguistic ontology. International Journal of Human-Computer Studies 43:809-818.
Deitel, H.M. and Deitel, P.J. 2006. Visual Basic® 2005: how to program, 3rd edition. Upper Saddle River, NJ: Pearson.
Dik, S.C. 1997a. The theory of functional grammar. Part 1. The structure of the clause (edited by Kees Hengeveld), 2nd edition.
Berlin: Mouton de Gruyter.
top
top
Page 9 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
Dik, S.C. 1997b. The theory of functional grammar. Part 2. Complex and derived constructions (edited by Kees Hengeveld). Berlin:
Mouton de Gruyter.
Disjunct (linguistics). 2007. Wikipedia. [Online]. Available WWW: http://en.wikipedia.org/wiki/Disjunct_%28linguistics%
29 (Accessed 24 November 2007).
Du Plessis, H. 1982. Sintaksis vir eerstejaars. Pretoria: Academica.
Farrar, S. 2005. Modularizing GOLD. [Online]. Available WWW: http://linguistlist.org/gold/linguistics-
ontology.org/documents/gold-modules.pdf (Accessed 26 November 2009).
Ferdinand, M., Zirpins, C. and Trastour, D. 2004. Lifting XML Schema to OWL. In: Proceedings of the 4th International Web
Engineering Conference, 26-30 July 2004, Munich, Germany: Springer. (Lecture Notes in Computer Science, 3140:354-358.)
Fonseca, F. 2007. The double role of ontologies in Information Science research. Journal of the American Society for Information
Science and Technology 58(6):786-793. (Preprint.) [Online]. Available WWW:
http://www.personal.psu.edu/faculty/f/u/fuf1/publications/Fonseca_Ontologies_double_role_JASIST_2006.pdf
(Accessed 23 March 2009).
Fonseca, F. and Martin, J. 2007. Learning the differences between ontologies and conceptual schemas through ontology-driven
information systems. Journal of the Association for Information Systems, 8(2):129-142 (Article 3). [Online]. Available WWW:
http://aisel.aisnet.org/jais/vol8/iss2/4 (Accessed 24 November 2009).
Gesenius, F.H.W., Kautzsch, E. and Cowley, A.E. 1976. Gesenius' Hebrew Grammar, 2nd English edition. Oxford: Clarendon.
Gilchrist, A. 2003. Thesauri, taxonomies and ontologies - an etymological note. Journal of Documentation 59(1):7-18.
Guarino, N. 1998. Some ontological principles for designing upper level lexical resources. In: Proceedings of First International
Conference on Language Resources and Evaluation, ELRA - European Language Resources Association, May 28-30, 1998, Granada,
Spain:527-534.
Holzner, S. 2004. SAMS teach yourself XML in 21 days, 3rd edition. Indianapolis, Indiana: SAMS.
Horridge, M. (ed.) 2009. A practical guide to building OWL ontologies using Protégé 4 and CO-ODE tools, edition 1.2. The
University of Manchester. [Online]. Available WWW:
http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/resources/ProtegeOWLTutorialP4_v1_2.pdf (Accessed
23 August 2009).
Joüon, P. and Muraoka, T. 1991. A grammar of Biblical Hebrew. Vol. II, Part Three: syntax; paradigms and indices. Roma: Editrice
Pontificio Istituto Biblico. (Subsidia biblica 14/II.)
Kroeze, J.H. 2000a. Semitic Languages 115: Study guide for SET115. Potchefstroom: PU for CHE. (Study manual - Telematic
Learning Systems: BA Theol.)
Kroeze, J.H. 2000b. Semitic Languages 125: Study guide for SET125. Potchefstroom: PU for CHE. (Study manual - Telematic
Learning Systems: BA Theol.)
Kroeze, J.H. 2002. Developing a multi-level analysis of Jonah using html. In: Cooke, J. (ed). Bible and computer: The Stellenbosch
AIBI-6 conference. Proceedings of the Association Internationale Bible et Informatique 'From alpha to byte',17-21 July 2000,
University of Stellenbosch. Leiden: Brill:653-662.)
Kroeze, J.H. 2006. Building and displaying a Biblical Hebrew linguistics data cube using XML, Israeli Seminar on Computational
Linguistics (ISCOL), Haifa, Israel, 29 June 2006. [Online]. Available WWW (seminar presentations):
http://mila.cs.technion.ac.il/english/events/ISCOL/2006/index.html (Accessed 27 November 2009).
Kroeze, J.H. 2008. Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3. PhD(IT) thesis.
University of Pretoria. (Unpublished.) [Online]. Available WWW: http://upetd.up.ac.za/thesis/available/etd-07282008-
121520/ (Accessed 23 August 2009).
Kroeze, J.H. 2009. Information Systems and the humanities: a symbiotic relationship? (Inaugural lecture, 13 Nov. 2009.)
Vanderbijlpark: North-West University, Vaal Triangle Campus. (Vaal Triangle Occasional Papers: Inaugural lecture 5/2009. ISBN
978-1-86822-583-5.) (Scientific Contribution Series H.)
Lambe, P. 2007. Organising knowledge: taxonomies, knowledge and organisational effectiveness. Oxford: Chandos.
Oates, B.J. 2006. Researching information systems and computing. Los Angeles: Sage.
Oltramari, A. and Stellato, A. 2008. Enriching ontologies with linguistic content: an evaluation framework. In: Proceedings of
OntoLex 2008 (hosted by Sixth international conference on Language Resources and Evaluation), Marrakech (Morocco).
[Online]. Available WWW: http://www.loa-cnr.it/Papers/enriching_onto_oltramari-stellato.pdf (Accessed 25 November
2009).
Oltramari, A. and Vetere, G. 2008. Lexicon and ontology interplay in Senso Comune. In: Proceedings of OntoLex 2008 (hosted by
Sixth international conference on Language Resources and Evaluation), Marrakech (Morocco). [Online]. Available WWW:
http://www.loa-cnr.it/Papers/lexicon_oltramari-vetere.pdf (Accessed 25 November 2009).
Tummarello, G., Morbidoni, C., Puliti, P. and Piazza, F. 2008. A proposal for textual encoding based on semantic Web tools. Online
Information Review 32(4):467-477. [Online]. Available WWW: http://www.emeraldinsight.com/1468-4527.htm (Accessed
23 August 2009).
Page 10 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1
Van der Merwe, C.H.J., Naudé, J.A. and Kroeze, J.H. 1999. A Biblical Hebrew reference grammar. Sheffield: Sheffield Academic
Press.
Waltke, B.K. and O'Connor, M. 1990. An introduction to Biblical Hebrew syntax. Winona Lake, Indiana: Eisenbrauns.
Zúñiga, G.L. 2001. Ontology: its transformation from philosophy to information systems. In: Proceedings of the International
Conference on Formal Ontology in Information Systems, Ogunquit, Maine, USA, Volume 2001:187-197.
ISSN 1560-683X
Published by InterWord Communications for Department of Information and Knowledge Management,
University of Johannesburg
Page 11 of 11
2010/03/11http://www.sajim.co.za/peer164.11nr3.asp?print=1