GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries  |  HilerA et Al.   195

José R. Hilera, Carmen Pagés,  
J. Javier Martínez, J. Antonio 

Gutiérrez, and Luis de-Marcos

An Evolutive Process to 
Convert Glossaries into 
Ontologies

dictionary,	 the	 outcome	 will	 be	 limited	 by	 the	 richness	
of	 the	 definition	 of	 terms	 included	 in	 that	 dictionary.	 It	
would	 be	 what	 is	 normally	 called	 a	 “lightweight”	 ontol-
ogy,6	which	could	later	be	converted	into	a	“heavyweight”	
ontology	by	implementing,	in	the	form	of	axioms,	know-
ledge	not	contained	in	the	dictionary.	This	paper	describes	
the	 process	 of	 creating	 a	 lightweight	 ontology	 of	 the	
domain	 of	 software	 engineering,	 starting	 from	 the	 IEEE 
Standard Glossary of Software Engineering Terminology.7

■■ Ontologies, the Semantic Web,  and Libraries
Within	 the	 field	 of	 librarianship,	 ontologies	 are	 already	
being	 used	 as	 alternative	 tools	 to	 traditional	 controlled	
vocabularies.	 This	 may	 be	 observed	 particularly	 within	
the	realm	of	digital	libraries,	although,	as	Krause	asserts,	
objections	 to	 their	 use	 have	 often	 been	 raised	 by	 the	
digital	 library	 community.8	 One	 of	 the	 core	 objections	 is	
the	difficulty	of	creating	ontologies	as	compared	to	other	
vocabularies	such	as	taxonomies	or	thesauri.	Nonetheless,	
the	semantic	richness	of	an	ontology	offers	a	wide	range	
of	 possibilities	 concerning	 indexing	 and	 searching	 of	
library	documents.

The	 term	 ontology	 (used	 in	 philosophy	 to	 refer	 to	
the	 “theory	 about	 existence”)	 has	 been	 adopted	 by	 the	
artificial	intelligence	research	community	to	define	a	cate-
gorization	of	a	knowledge	domain	in	a	shared	and	agreed	
form,	based	on	concepts	and	relationships,	which	may	be	
formally	 represented	 in	 a	 computer	 readable	 and	 usable	
format.	 The	 term	 has	 been	 widely	 employed	 since	 2001,	
when	 Berners-Lee	 et	 al.	 envisaged	 the	 Semantic	 Web,	
which	aims	to	turn	the	information	stored	on	the	Web	into	
knowledge	by	transforming	data	stored	in	every	webpage	
into	a	common	scheme	accepted	in	a	specific	domain.9	To	
accomplish	 that	 task,	 knowledge	 must	 be	 represented	 in	
an	 agreed-upon	 and	 reusable	 computer-readable	 format.	
To	 do	 this,	 machines	 will	 require	 access	 to	 structured	
collections	 of	 information	 and	 to	 formalisms	 which	 are	
based	on	mathematical	logic	that	permits	higher	levels	of	
automatic	processing.	

Technologies	 for	 the	 Semantic	 Web	 have	 been	 devel-
oped	 by	 the	 World	 Wide	 Web	 Consortium	 (W3C).	 The	
most	relevant	technologies	are	RDF	(Resource	Description	

This paper describes a method to generate ontologies from 
glossaries of terms. The proposed method presupposes an 
evolutionary life cycle based on successive transforma-
tions of the original glossary that lead to products of 
intermediate knowledge representation (dictionary, tax-
onomy, and thesaurus). These products are characterized 
by an increase in semantic expressiveness in comparison 
to the product obtained in the previous transformation, 
with the ontology as the end product. Although this 
method has been applied to produce an ontology from 
the “IEEE Standard Glossary of Software Engineering 
Terminology,” it could be applied to any glossary of any 
knowledge domain to generate an ontology that may be 
used to index or search for information resources and 
documents stored in libraries or on the Semantic Web.

F
rom	 the	 point	 of	 view	 of	 their	 expressiveness	 or	
semantic	 richness,	 knowledge	 representation	 tools	
can	 be	 classified	 at	 four	 levels:	 at	 the	 basic	 level	

(level	0),	to	which	dictionaries belong,	tools	include	defini-
tions	 of	 concepts	 without	 formal	 semantic	 primitives;	 at	
the	 taxonomies	 level	 (level	 1),	 tools	 include	 a	 vocabulary,	
implicit	 or	 explicit,	 as	 well	 as	 descriptions	 of	 specialized	
relationships	between	concepts;	at	the	thesauri	level	(level	
2),	 tools	 further	 include	 lexical	 (synonymy,	 hyperonymy,	
etc.)	 and	 equivalence	 relationships;	 and	 at	 the	 reference 
models	level	(level	3),	tools	combine	the	previous	relation-
ships	 with	 other	 more	 complex	 relationships	 between	
concepts	 to	 completely	 represent	 a	 certain	 knowledge	
domain.1	Ontologies belong	at	this	last	level.

According	 to	 the	 hierarchic	 classification	 above,	
knowledge	 representation	 tools	 of	 a	 particular	 level	 add	
semantic	 expressiveness	 to	 those	 in	 the	 lowest	 levels	 in	
such	 a	 way	 that	 a	 dictionary	 or	 glossary	 of	 terms	 might	
develop	into	a	taxonomy	or	a	thesaurus,	and	later	into	an	
ontology.	 There	 are	 a	 variety	 of	 comparative	 studies	 of	
these	tools,2	as	well	as	varying	proposals	for	systematically	
generating	ontologies	from	lower-level	knowledge	repre-
sentation	systems,	especially	from	descriptor	thesauri.3

This	paper	proposes	a	process	for	generating	a	termino-
logical ontology from	a	dictionary	of	a	specific	knowledge	
domain.4	 Given	 the	 definition	 offered	 by	 Neches	 et	 al.	
(“an	 ontology	 is	 an	 instrument	 that	 defines	 the	 basic	
terms	and	relations	comprising	the	vocabulary	of	a	topic	
area	as	well	as	the	rules	for	combining	terms	and	relations	
to	define	extensions	to	the	vocabulary”)5	it	is	evident	that	
the	 ontology	 creation	 process	 will	 be	 easier	 if	 there	 is	 a	
vocabulary	 to	 be	 extended	 than	 if	 it	 is	 developed	 from	
scratch.

If	the	developed	ontology	is	based	exclusively	on	the	

José r. Hilera (jose.hilera@uah.es) is Professor, carmen Pagés 
(carmina.pages@uah.es) is assistant Professor, J. Javier Mar-
tínez (josej.martinez@uah.es) is Professor, J. Antonio Gutiér-
rez (jantonio.gutierrez@uah.es) is assistant Professor, and luis 
de-Marcos (luis.demarcos@uah.es) is Professor, Department of 
computer Science, Faculty of librarianship and Documentation, 
university of alcalá, Madrid, Spain.



196   iNFOrMAtiON tecHNOlOGY AND liBrAries  |  DeceMBer 2010

configuration	 management;	 data	 types;	 errors,	 faults,	
and	 failures;	 evaluation	 techniques;	 instruction	 types;	
language	 types;	 libraries;	 microprogramming;	 operating	
systems;	quality	attributes;	software	documentation;	soft-
ware	 and	 system	 testing;	 software	 architecture;	 software	
development	 process;	 software	 development	 techniques;	
and	software	tools.15

In	the	glossary,	entries	are	arranged	alphabetically.	An	
entry	may	consist	of	a	single	word,	such	as	“software,”	a	
phrase,	such	as	“test	case,”	or	an	acronym,	such	as	“CM.”	
If	a	term	has	more	than	one	definition,	the	definitions	are	
numbered.	In	most	cases,	noun	definitions	are	given	first,	
followed	 by	 verb	 and	 adjective	 definitions	 as	 applicable.	
Examples,	 notes,	 and	 illustrations	 have	 been	 added	 to	
clarify	selected	definitions.

Cross-references	 are	 used	 to	 show	 a	 term’s	 relations	
with	other	terms	in	the	dictionary:	“contrast	with”	refers	
to	a	term	with	an	opposite	or	substantially	different	mean-
ing;	“syn”	refers	to	a	synonymous	term;	“see	also”	refers	
to	a	related	term;	and	“see”	refers	to	a	preferred	term	or	to	
a	term	where	the	desired	definition	can	be	found.

Figure	2	shows	an	example	of	one	of	the	definitions	of	
the	glossary	terms.	Note	that	definitions	can	also	include	

Framework),10	 which	 defines	 a	 common	 data	 model	 to	
specify	metadata,	and	OWL	(Ontology	Web	Language),11	
which	 is	 a	 new	 markup	 language	 for	 publishing	 and	
sharing	 data	 using	 Web	 ontologies.	 More	 recently,	 the	
W3C	 has	 presented	 a	 proposal	 for	 a	 new	 RDF-based	
markup	 system	 that	 will	 be	 especially	 useful	 in	 the	 con-
text	 of	 libraries.	 It	 is	 called	 SKOS	 (Simple	 Knowledge	
Organization	 System),	 and	 it	 provides	 a	 model	 for	
expressing	 the	 basic	 structure	 and	 content	 of	 concept	
schemes,	such	as	thesauri,	classification	schemes,	subject	
heading	 lists,	 taxonomies,	 folksonomies,	 and	 other	 simi-
lar	types	of	controlled	vocabularies.12

The	 emergence	 of	 the	 Semantic	 Web	 has	 created	 great	
interest	 within	 librarianship	 because	 of	 the	 new	 possibili-
ties	 it	 offers	 in	 the	 areas	 of	 publication	 of	 bibliographical	
data	and	development	of	better	indexes	and	better	displays	
than	those	that	we	have	now	in	ILS	OPACs.13	For	that	rea-
son,	 it	 is	 important	 to	 strive	 for	 semantic	 interoperability	
between	 the	 different	 vocabularies	 that	 may	 be	 used	 in	
libraries’	 indexing	 and	 search	 systems,	 and	 to	 have	 com-
patible	 vocabularies	 (dictionaries,	 taxonomies,	 thesauri,	
ontologies,	etc.)	based	on	a	shared	standard	like	RDF.

There	 are,	 at	 the	 present	 time,	 several	 proposals	 for	
using	 knowledge	 organization	 systems	 as	 alternatives	 to	
controlled	vocabularies.	For	example,	folksonomies,	though	
originating	within	the	Web	context,	have	been	proposed	by	
different	 authors	 for	 use	 within	 libraries	 “as	 a	 powerful,	
flexible	 tool	 for	 increasing	 the	 user-friendliness	 and	 inter-
activity	of	public	library	catalogs.”14	Authors	argue	that	the	
best	approach	would	be	to	create	interoperable	controlled	
vocabularies	using	shared	and	agreed-upon	glossaries	and	
dictionaries	 from	 different	 domains	 as	 a	 departure	 point,	
and	then	to	complete	evolutive	processes	aimed	at	semantic	
extension	 to	 create	 ontologies,	 which	 could	 then	 be	 com-
bined	 with	 other	 ontologies	 used	 in	 information	 systems	
running	in	both	conventional	and	digital	libraries	for	index-
ing	as	well	as	for	supporting	document	searches.	There	are	
examples	 of	 glossaries	 that	 have	 been	 transformed	 into	
ontologies,	 such	 as	 the	 Cambridge	 Healthtech	 Institute’s	
“Pharmaceutical	 Ontologies	 Glossary	 and	 Taxonomy”	
(http://www.genomicglossaries.com/content/ontolo	
gies.asp),	which	is	an	“evolving	terminology	for	emerging	
technologies.”

■■ IEEE Standard Glossary of Software Engineering Terminology
To	 demonstrate	 our	 proposed	 method,	 we	 will	 use	 a	
real	 glossary	 belonging	 to	 the	 computer	 science	 field,	
although	 it	 is	 possible	 to	 use	 any	 other.	 The	 glossary,	
available	 in	 electronic	 format	 (PDF),	 defines	 approxi-
mately	1,300	terms	in	the	domain	of	software	engineering	
(figure	1).	Topics	include	addressing	assembling,	compil-
ing,	 linking,	 loading;	 computer	 performance	 evaluation;	

Figure 1. Cover of the Glossary document



GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries  |  HilerA et Al.   197

4.	 Define	the	classes	and	the	class	hierarchy
5.	 Define	the	properties	of	classes	(slots)
6.	 Define	the	facets	of	the	slots
7.	 Create	instances

As	 outlined	 in	 the	 Introduction,	 the	 ontology	 devel-
oped	using	our	method	is	a	terminological	one.	Therefore	
we	can	ignore	the	first	two	steps	in	Noy’s	and	McGuinness’	
process	as	the	concepts	of	the	ontology	coincide	with	the	
terms	of	the	glossary	used.

Any	 ontology	 development	 process	 must	 take	 into	
account	 the	 basic	 stages	 of	 the	 life	 cycle,	 but	 the	 way	 of	
organizing	 the	 stages	 can	 be	 different	 in	 different	 meth-
ods.	 In	 our	 case,	 since	 the	 ontology	 has	 a	 terminological	
character,	 we	 have	 established	 an	 incremental	 develop-
ment	 process	 that	 supposes	 the	 natural	 evolution	 of	 the	
glossary	 from	 its	 original	 format	 (dictionary	 or	 vocabu-
lary	 format)	 into	 an	 ontology.	 The	 proposed	 life	 cycle	
establishes	 a	 series	 of	 steps	 or	 phases	 that	 will	 result	 in	
intermediate	 knowledge	 representation	 tools,	 with	 the	
final	 product,	 the	 ontology,	 being	 the	 most	 semantically	
rich	(figure	4).	

Therefore	 this	 is	 a	 product-driven	 process,	 in	 which	
the	aim	of	every	step	is	to	obtain	an	intermediate	product	
useful	on	its	own.	The	intermediate	products	and	the	final	

examples	 associated	 with	 the	 described	 concept.	
In	 the	 resulting	 ontology,	 the	 examples	 were	
included	 as	 instances	 of	 the	 corresponding	 class.	
In	figure	2,	it	can	be	seen	that	the	definition	refers	
to	 another	 glossary	 on	 programming	 languages	
(Std	 610.13),	 which	 is	 a	 part	 of	 the	 series	 of	 dic-
tionaries	 related	 to	 computer	 science	 (“IEEE	 Std	
610,”	 figure	 3).	 Other	 glossaries	 which	 are	 men-
tioned	 in	 relation	 to	 some	 references	 about	 term	
definitions	are	610.1,	610.5,	610.7,	610.8,	and	610.9.	

To	 avoid	 redundant	 definitions	 and	 pos-
sible	inconsistencies,	links	must	be	implemented	
between	ontologies	developed	from	those	glossa-
ries	that	include	common	concepts.	The	ontology	
generation	 process	 presented	 in	 this	 paper	 is	
meant	to	allow	for	integration	with	other	ontolo-
gies	that	will	be	developed	in	the	future	from	the	
other	glossaries.

In	 addition	 to	 the	 explicit	 references	 to	 other	
terms	 within	 the	 glossary	 and	 to	 terms	 from	 other	 glos-
saries,	the	textual	definition	of	a	concept	also	has	implicit	
references	 to	 other	 terms.	 For	 example,	 from	 the	 phrase	
“provides	 features	 designed	 to	 facilitate	 expression	 of	
data	 structures”	 included	 in	 the	 definition	 of	 the	 term	
high order language	 (figure	 2),	 it	 is	 possible	 to	 determine	
that	 there	 is	 an	 implicit	 relationship	 between	 this	 term	
and	the	term	data structure,	also	included	in	the	glossary.	
These	relationships	have	been	considered	in	establishing	
the	properties	of	the	concepts	in	the	developed	ontology.	

■■ Ontology Development Process
Many	 ontology	 development	 methods	 presuppose	 a	 life	
cycle	 and	 suggest	 technologies	 to	 apply	 during	 the	 pro-
cess	 of	 developing	 an	 ontology.16	 The	 method	 described	
by	 Noy	 and	 McGuinness	 is	 helpful	 when	 beginning	 this	
process	 for	 the	 first	 time.17	 They	 establish	 a	 seven-step	
process:	

1.	 Determine	the	domain	and	scope	of	the	ontology
2.	 Consider	reusing	existing	ontologies
3.	 Enumerate	important	terms	in	the	ontology

Figure 2. Example of term definition in the IEEE Glossary

Figure 3. IEEE Computer Science Glossaries

610—Standard Dictionary of Computer Terminology
610.1—Standard Glossary of Mathematics of Computing Terminology
610.2—Standard Glossary of Computer Applications Terminology
610.3—Standard Glossary of Modeling and Simulation Terminology
610.4—Standard Glossary of Image Processing Terminology
610.5—Standard Glossary of Data Management Terminology
610.6—Standard Glossary of Computer Graphics Terminology
610.7—Standard Glossary of Computer Networking Terminology
610.8—Standard Glossary of Artificial Intelligence Terminology
610.9—Standard Glossary of Computer Security and Privacy Terminology
610.10—Standard Glossary of Computer Hardware Terminology
610.11—Standard Glossary of Theory of Computation Terminology
610.12—Standard Glossary of Software Engineering Terminology
610.13—Standard Glossary of Computer Languages Terminology

high order language (HOL). A programming language that requires little knowledge of the computer on which a program will run, can be 
translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed 
to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program state-
ment. Examples include Ada, COBOL, FORTRAN, ALGOL, PASCAL. Syn: high level language; higher order language; third gen-
eration language. Contrast with: assembly language; fifth generation language; fourth generation language; machine language. 
Note: Specific languages are defined in P610.13



198   iNFOrMAtiON tecHNOlOGY AND liBrAries  |  DeceMBer 2010

Since	 there	 are	 terms	 with	 different	 meanings	 (up	
to	 five	 in	 some	 cases)	 in	 the	 IEEE Glossary of Software 
Engineering Terminology,	 during	 dictionary	 development	
we	 decided	 to	 create	 different	 concepts	 (classes)	 for	 the	
same	 term,	 associating	 a	 number	 to	 these	 concepts	 to	
differentiate	 them.	 For	 example,	 there	 are	 five	 different	
definitions	 for	 the	 term	 test,	 which	 is	 why	 there	 are	 five	
concepts	(Test1–Test5), corresponding	to	the	five	meanings	
of	the	term:	(1)	An	activity	in	which	a	system	or	compo-
nent	 is	 executed	 under	 specified	 conditions,	 the	 results	
are	 observed	 or	 recorded,	 and	 an	 evaluation	 is	 made	 of	
some	 aspect	 of	 the	 system	 or	 component;	 (2)	 To	 conduct	
an	activity	as	in	(1);	(3)	A	set	of	one	or	more	test	cases;	(4)	
A	 set	 of	 one	 or	 more	 test	 procedures;	 (5)	A	 set	 of	 one	 or	
more	test	cases	and	procedures.	

taxonomy

The	 proposed	 lifecycle	 establishes	 a	 stage	 for	 the	 con-
version	 of	 a	 dictionary	 into	 a	 taxonomy,	 understanding	
taxonomy	 as	 an	 instrument	 of	 concepts	 categorization,	

product	are	a	dictionary,	which	has	a	formal	and	computer	
processed	structure,	with	the	terms	and	their	definitions	in	
XML	format;	a	taxonomy,	which	reflects	the	hierarchic	rela-
tionships	 between	 the	 terms;	 a	 thesaurus,	 which	 includes	
other	 relationships	 between	 the	 terms	 (for	 example,	 the	
synonymy	 relationship);	 and,	 finally,	 the	 ontology,	 which	
will	include	the	hierarchy,	the	basic	relationships	of	the	the-
saurus,	new	and	more	complex	semantic	relationships,	and	
restrictions	in	form	of	axioms	expressed	using	description	
logics.18	 The	 following	 paragraphs	 describe	 the	 way	 each	
of	these	products	is	obtained.

Dictionary

The	 first	 step	 of	 the	 proposed	 development	 process	 con-
sists	 of	 the	 creation	 of	 a	 dictionary	 in	 XML	 format	 with	
all	 the	 terms	 included	 in	 the	 IEEE Standard Glossary of 
Software Engineering Terminology	 and	 their	 related	 defini-
tions.	 This	 activity	 is	 particularly	 mechanical	 and	 does	
not	need	human	intervention	as	it	is	basically	a	transfor-
mation	of	the	glossary	from	its	original	format	(PDF)	into	
a	format	better	suited	to	the	development	process.

All	 formats	 considered	 for	 the	 dictionary	 are	 based	
on	 XML,	 and	 specifically	 on	 RDF	 and	 RDF	 schema.	
In	 the	 end,	 we	 decided	 to	 work	 with	 the	 standards	
DAML+OIL	 and	 OWL,19	 though	 we	 are	 not	 opposed	 to	
working	 with	 other	 languages,	 such	 as	 SKOS	 or	 XMI,20	
in	 the	 future.	 (In	 the	 latter	 case,	 it	 would	 be	 possible	
to	 model	 the	 intermediate	 products	 and	 the	 ontology	
in	 UML	 graphic	 models	 stored	 in	 xml	 files.)21	 In	 our	
project,	 the	 design	 and	 implementation	 of	 all	 products	
has	 been	 made	 using	 an	 ontology	 editor.	 We	 have	 used	
OilEd	(with	OilViz	Plugin)	as	editor,	both	because	of	its	
simplicity	and	because	it	allows	the	exportation	to	OWL	
and	DAML	formats.	However,	with	future	maintenance	
and	 testing	 in	 mind,	 we	 decided	 to	 use	 Protégé	 (with	
OWL	plugin)	in	the	last	step	of	the	process,	because	this	
is	 a	 more	 flexible	 environment	 with	 extensible	 mod-
ules	 that	 integrate	 more	 functionality	 such	 as	 ontology	
annotation,	 evaluation,	 middleware	 service,	 query	 and	
inference,	etc.

Figure	 5	 shows	 the	 dictionary	 entry	 for	 “high	 order	
language,”	 which	 appears	 in	 figure	 2.	 Note	 that	 the	 dic-
tionary	includes	only	owl:class	(or	daml:class)	to	mark	the	
term;	rdf:label	to	indicate	the	term	name;	and	rdf:comment	
to	provide	the	definition	included	in	the	original	glossary.

Figure 4. Ontology development process

<owl:Class rdf:about=”#HighOrderLanguage”>

 <rdfs:label>HighOrderLanguage</rdfs:label>

 <rdfs:comment><![CDATA[A programming language 
that requires little knowledge of the 
computer on which a program will 
run, can be translated into several 
different machine languages, allows 
symbolic naming of operations 
and addresses, provides features 
designed to facilitate expression of 
data structures and program logic, 
and usually results in several machine 
instructions for each program 
statement.]]>

 </rdfs:comment>

</owl:Class>

Figure 5. Example of dictionary entry



GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries  |  HilerA et Al.   199

example,	when	analyzing	the	definition	of	the	term	com-
piler:	“(Is)	A	computer	program	that	translates	programs	
expressed	 in	 a	 high	 order	 language	 into	 their	 machine	
language	 equivalent,”	 it	 is	 possible	 to	 deduce	 that	 com-
piler	is	a	subconcept	of	computer	program,	which	is	also	
included	in	the	glossary.)	In	addition	to	the	lexical	or	syn-
tactic	analysis,	it	is	necessary	for	an	expert	in	the	domain	
to	 perform	 a	 semantic	 analysis	 to	 complete	 the	 develop-
ment	of	the	taxonomy.

The	 implementation	 of	 the	 hierarchical	 relation-
ships	among	the	concepts	is	made	using	rdfs:subClassOf,	
regardless	 of	 whether	 the	 taxonomy	 is	 implemented	 in	
OWL	or	DAML	format,	since	both	languages	specify	this	
type	 of	 relationship	 in	 the	 same	 way.	 Figure	 6	 shows	 an	
example	 of	 a	 hierarchical	 relationship	 included	 in	 the	
definition	of	the	concept	pictured	in	figure	5.	

thesaurus

According	 to	 the	 International	 Organization	 for	
Standardization	(ISO),	a	thesaurus	is	“the	vocabulary	of	a	
controlled	indexing	language,	formally	organized	in	order	
to	 make	 explicit	 the	 a	 priori	 relations	 between	 concepts	
(for	example	‘broader’	and	‘narrower’).”25	This	definition	
establishes	the	lexical	units	and	the	semantic	relationships	
between	these	units	as	the	elements	that	constitute	a	the-
saurus.	The	following	is	a	sample	of	the	lexical	units:

■■ Descriptors	 (also	 called	 “preferred	 terms”):	 the	 terms	
used	 consistently	 when	 indexing	 to	 represent	 a	 con-
cept	 that	 can	 be	 in	 documents	 or	 in	 queries	 to	 these	
documents.	 The	 ISO	 standard	 introduces	 the	 option	
of	adding	a	definition	or	an	application	note	to	every	
term	to	establish	explicitly	the	chosen	meaning.	This	
note	 is	 identified	 by	 the	 abbreviation	 SN	 (Scope	
Note),	as	shown	in	figure	7.	

■■ Non-descriptors	 (“non-preferred	 terms”):	 the	 syn-
onyms	 or	 quasi-synonyms	 of	 a	 preferred	 term.	 A	
nonpreferred	 term	 is	 not	 assigned	 to	 documents	
submitted	to	an	indexing	process,	but	is	provided	as	
an	entry	point	in	a	thesaurus	to	point	to	the	appropri-
ate	 descriptor.	 Usually	 the	 descriptors	 are	 written	 in	
capital	letters	and	the	nondescriptors	in	small	letters.

■■ Compound descriptors:	 the	 terms	 used	 to	 represent	
complex	 concepts	 and	 groups	 of	 descriptors,	 which	
allow	for	the	structuring	of	large	numbers	of	thesau-
rus	descriptors	into	subsets	called	micro-thesauri.

In	 addition	 to	 lexical	 units,	 other	 fundamental	
elements	 of	 a	 thesaurus	 are	 semantic	 relationships	
between	 these	 units.	 The	 more	 common	 relationships	
between	lexical	units	are	the	following:

■■ Equivalence:	 the	 relationship	 between	 the	 descrip-
tors	 and	 the	 nondescriptors	 (synonymous	 and	

that	is,	as	a	systematical	classification	in	a	traditional	way.	
As	Gilchrist	states,	there	is	no	consensus	on	the	meaning	
of	 terms	 like	 taxonomy,	 thesaurus,	 or	 ontology.22	 In	 addi-
tion,	much	work	in	the	field	of	ontologies	has	been	done	
without	 taking	 advantage	 of	 similar	 work	 performed	 in	
the	 fields	 of	 linguistics	 and	 library	 science.23	 This	 situa-
tion	 is	 changing	 because	 of	 the	 increasing	 publication	 of	
works	 that	 relate	 the	 development	 of	 ontologies	 to	 the	
development	of	“classic”	terminological	tools	(vocabular-
ies,	taxonomies,	and	thesauri).

This	 paper	 emphasizes	 the	 importance	 and	 useful-
ness	 of	 the	 intermediate	 products	 created	 at	 each	 stage	
of	 the	 evolutive	 process	 from	 glossary	 to	 ontology.	 The	
end	 product	 of	 the	 initial	 stage	 is	 a	 dictionary	 expressed	
as	 XML.	 The	 next	 stage	 in	 the	 evolutive	 process	 (figure	
4)	 is	 the	 transformation	 of	 that	 dictionary	 into	 a	 tax-
onomy	through	the	addition	of	hierarchical	relationships	
between	concepts.

To	 do	 this,	 it	 is	 necessary	 to	 undertake	 a	 lexical-
semantic	 analysis	 of	 the	 original	 glossary.	 This	 can	
be	 done	 in	 a	 semiautomatic	 way	 by	 applying	 natural	
language	 processing	 (NLP)	 techniques,	 such	 as	 those	
recommended	 by	 Morales-del-Castillo	 et	 al.,24	 for	 creat-
ing	thesauri.	The	basic	processing	sequence	in	linguistic	
engineering	 comprises	 the	 following	 steps:	 (1)	 incorpo-
rate	 the	 original	 documents	 (in	 our	 case	 the	 dictionary	
obtained	in	the	previous	stage)	into	the	information	sys-
tem;	(2)	identify	the	language	in	which	they	are	written,	
distinguishing	independent	words;	(3)	“understand”	the	
processed	 material	 at	 the	 appropriate	 level;	 (4)	 use	 this	
understanding	 to	 transform,	 search,	 or	 traduce	 data;	 (5)	
produce	the	new	media	required	to	present	the	produced	
outcomes;	 and	 finally,	 (6)	 present	 the	 final	 outcome	 to	
human	 users	 by	 means	 of	 the	 most	 appropriate	 periph-
eral	device—screen,	speakers,	printer,	etc.

An	 important	 aspect	 of	 this	 process	 is	 natural	 lan-
guage	 comprehension.	 For	 that	 reason,	 several	 different	
kinds	 of	 programs	 are	 employed,	 including	 lemmatizers	
(which	 implement	 stemming	 algorithms	 to	 extract	 the	
lexeme	or	root	of	a	word),	morphologic	analyzers	(which	
glean	 sentence	 information	 from	 their	 constituent	 ele-
ments:	morphemes,	words,	and	parts	of	speech),	syntactic	
analyzers	 (which	 group	 sentence	 constituents	 to	 extract	
elements	larger	than	words),	and	semantic	models	(which	
represent	 language	 semantics	 in	 terms	 of	 concepts	 and	
their	relations,	using	abstraction,	logical	reasoning,	orga-
nization	and	data	structuring	capabilities).

From	 the	 information	 in	 the	 software	 engineering	
dictionary	 and	 from	 a	 lexical	 analysis	 of	 it,	 it	 is	 possible	
to	 determine	 a	 hierarchical	 relationship	 when	 the	 name	
of	a	term	contains	the	name	of	another	one	(for	example,	
the	 term	 language	 and	 the	 terms	 programming language 
and	 hardware design language),	 or	 when	 expressions	 such	
as	 “is	 a”	 linked	 to	 the	 name	 of	 another	 term	 included	 in	
the	glossary	appear	in	the	text	of	the	term	definition.	(For	



200   iNFOrMAtiON tecHNOlOGY AND liBrAries  |  DeceMBer 2010

indicating	 that	 high	 order	 language	 relates	 to	 both	
assembly	and	machine	languages.

The	life	cycle	proposed	in	this	paper	(figure	4)	includes	
a	 third	 step	 or	 phase	 that	 transforms	 the	 taxonomy	
obtained	 in	 the	 previous	 phase	 into	 a	 thesaurus	 through	
the	 incorporation	 of	 relationships	 between	 the	 concepts	
that	complement	the	hierarchical	relations	included	in	the	
taxonomy.	Basically,	we	have	to	add	two	types	of	relation-
ships—equivalence	 and	 associative,	 represented	 in	 the	
standard	thesauri	with	UF	(and	USE)	and	RT	respectively.

We	 will	 continue	 using	 XML	 to	 implement	 this	 new	
product.	 There	 are	 different	 ways	 of	 implementing	 a	
thesaurus	using	a	language	based	on	XML.	For	example,	
Matthews	 et	 al.	 proposed	 a	 standard	 RDF	 format,26	
where	 as	 Hall	 created	 an	 ontology	 in	 DAML.27	 In	 both	
cases,	 the	 authors	 modeled	 the	 general	 structure	 of	

quasi-synonymous).	ISO	establishes	that	the	abbrevia-
tion	UF	(Used	For)	precedes	the	nondescriptors	linked	
to	 a	 descriptor;	 and	 the	 abbreviation	 USE	 is	 used	 in	
the	opposite	case.	For	example,	a	thesaurus	developed	
from	 the	 IEEE	 glossary	 might	 include	 a	 descriptor	
“high	order	language”	and	an	equivalence	relationship	
with	a	nondescriptor	“high	level	language”	(figure	7).

■■ Hierarchical:	 a	 relationship	 between	 two	 descrip-
tors.	 In	 the	 thesaurus	 one	 of	 these	 descriptors	 has	
been	 defined	 as	 superior	 to	 the	 other	 one.	 There	 are	
no	 hierarchical	 relationships	 between	 nondescrip-
tors,	 nor	 between	 nondescriptors	 and	 descriptors.	A	
descriptor	can	have	no	lower	descriptors	or	several	of	
them,	 and	 no	 higher	 descriptors	 or	 several	 of	 them.	
According	to	the	ISO	standard,	hierarchy	is	expressed	
by	means	of	the	abbreviations	BT	(Broader	Term),	to	
indicate	 the	 generic	 or	 higher	 descriptors,	 and	 NT	
(Narrower	 Term),	 to	 indicate	 the	 specific	 or	 lower	
descriptors.	 The	 term	 at	 the	 head	 of	 the	 hierarchy	
to	 which	 a	 term	 belongs	 can	 be	 included,	 using	 the	
abbreviation	 TT	 (Top	 Term).	 Figure	 7	 presents	 these	
hierarchical	relationships.

■■ Associative:	 a	 reciprocal	 relationship	 that	 is	 estab-
lished	between	terms	that	are	neither	equivalent	nor	
hierarchical,	 but	 are	 semantically	 or	 conceptually	
associated	 to	 such	 an	 extent	 that	 the	 link	 between	
them	 should	 be	 made	 explicit	 in	 the	 controlled	
vocabulary	 on	 the	 grounds	 that	 it	 may	 suggest	
additional	 terms	 for	 use	 in	 indexing	 or	 retrieval.	
It	 is	 generally	 indicated	 by	 the	 abbreviation	 RT	
(Related	Term).	There	are	no	associative	relationships	
between	nondescriptors	and	descriptors,	or	between	
descriptors	 already	 linked	 by	 a	 hierarchical	 relation.	
It	 is	 possible	 to	 establish	 associative	 relationships	
between	descriptors	belonging	to	the	same	or	differ-
ent	 category.	 The	 associative	 relationships	 can	 be	 of	
very	different	types.	For	example,	they	can	represent	
causality,	instrumentation,	location,	similarity,	origin,	
action,	 etc.	 Figure	 7	 shows	 two	 associative	 relations,	

..

HIGH ORDER LANGUAGE (descriptor)

 SN A programming language that...

 UF High level language (no-descriptor)

 UF Third generation language (no-descriptor)

 TT LANGUAGE

 BT PROGRAMMING LANGUAGE

 NT OBJECT ORIENTED LANGUAGE

 NT DECLARATIVE LANGUAGE

 RT ASSEMBLY LANGUAGE (contrast with)

 RT MACHINE LANGUAGE (contrast with) 

..

High level language

 USE HIGH ORDER LANGUAGE 
..

Third generation language

 USE HIGH ORDER LANGUAGE

..

Figure 7. Fragment of a thesaurus entry

Figure 6. Example of taxonomy entry

<owl:Class rdf:about="#HighOrderLanguage">

 ...

 <rdfs:subClassOf>

 <owl:Class rdf:about="#ProgrammingLanguage"/>

 </rdfs:subClassOf>

</owl:Class>



GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries  |  HilerA et Al.   201

terms.	For	example:

<owl:ObjectProperty rdf:about="#UF"> 
<owl:inverseOf rdf:resource="#USE"/>.

Or	using	the	glossary	notation:

<owl:ObjectProperty rdf:about="#Syn"> 
<owl:inverseOf rdf:resource="#See"/>.

■■ The	 rest	 of	 the	 associative	 relationships	 (RT)	 that	
were	 included	 in	 the	 thesaurus	 correspond	 to	 the	
cross-references	of	the	type	“Contrast	with”	and	“See	
also”	that	appear	explicitly	in	the	IEEE	glossary.

■■ Neither	compound	descriptors	nor	groups	of	descrip-
tors	have	been	implemented	because	there	is	no	such	
structure	in	the	glossary.

Ontology

Ding	 and	 Foo	 state	 that	 “ontology	 promotes	 standard-
ization	 and	 reusability	 of	 information	 representation	
through	 identifying	 common	 and	 shared	 knowledge.	
Ontology	 adds	 values	 to	 traditional	 thesauri	 through	
deeper	 semantics	 in	 digital	 objects,	 both	 conceptually,	
relationally	and	machine	understandably.”29	This	seman-
tic	 richness	 may	 imply	 deeper	 hierarchical	 levels,	 richer	
relationships	 between	 concepts,	 the	 definition	 of	 axioms	
or	inference	rules,	etc.

The	final	stage	of	the	evolutive	process	is	the	transfor-
mation	of	the	thesaurus	created	in	the	previous	stage	into	
an	ontology.	This	is	achieved	through	the	addition	of	one	
or	more	of	the	basic	elements	of	semantic	complexity	that	
differentiates	ontologies	from	other	knowledge	represen-
tation	 standards	 (such	 as	 dictionaries,	 taxonomies,	 and	
thesauri).	For	example:

■■ Semantic	relationships	between	the	concepts	(classes)	
of	 the	 thesaurus	 have	 been	 added	 as	 properties	 or	
ontology	slots.	

■■ Axioms	 of	 classes	 and	 axioms	 of	 properties.	 These	
are	 restriction	 rules	 that	 are	 declared	 to	 be	 sat-
isfied	 by	 elements	 of	 ontology.	 For	 example,	 to	
establish	 disjunctive	 classes	 (<owl:Class rdf:about 
="*HigOrderLanguage"> <owl:disjointWith> <owl:Class 
rdf:about="* MachineLanguage"/>),	 have	 been	
defined,	and	quantification	restrictions	(existential	or	
universal)	and	cardinality	restrictions	in	the	relation-
ships	have	been	implemented	as	properties.

Software	 based	 on	 techniques	 of	 linguistic	 analysis	
has	 been	 developed	 to	 facilitate	 the	 establishment	 of	 the	
properties	 and	 restrictions.	 This	 software	 analyzes	 the	
definition	 text	 for	 each	 of	 the	 more	 than	 1,500	 glossary	
terms	 (in	 thesaurus	 format),	 isolating	 those	 words	 that	

a	 thesaurus	 from	 classes	 (rdf:Class	 or	 daml:class)	 and	
properties	 (rdf:Property	 or	 daml:ObjectProperty).	 In	 the	
first	 case	 they	 proposed	 five	 classes:	 ThesaurusObject,	
Concept,	 TopConcept,	 Term,	 ScopeNote;	 and	 several	
properties	 to	 implement	 the	 relations,	 like	 hasScope-
Note	 (SN),	 IsIndicatedBy,	 PreferredTerm,	 UsedFor	 (UF),	
ConceptRelation,	BroaderConcept	(BT),	NarrowerConcept	
(NT),	TopOfHierarchy	(TT)	and	isRelatedTo	(RT).

Recently	the	W3C	has	developed	the	SKOS	specifica-
tion,	 created	 to	 define	 knowledge	 organization	 schemes.	
In	 the	 case	 of	 thesauri,	 SKOS	 includes	 specific	 tags,	
such	as	skos:Concept,	skos:scopeNote	(SN),	skos:broader	
(BT),	 skos:narrower	 (NT),	 skos:related	 (RT),	 etc.,	 that	 are	
equivalent	 to	 those	 listed	 in	 the	 previous	 paragraph.	
Our	specification	does	not	make	any	statement	about	the	
formal	 relationship	 between	 the	 class	 of	 SKOS	 concept	
schemes	 and	 the	 class	 of	 OWL	 ontologies,	 which	 will	
allow	 different	 design	 patterns	 to	 be	 explored	 for	 using	
SKOS	in	combination	with	OWL.

Although	 any	 of	 the	 above-mentioned	 formats	 could	
be	 used	 to	 implement	 the	 thesaurus,	 given	 that	 the	 end-
product	of	our	process	is	to	be	an	ontology,	our	proposal	
is	 that	 the	 product	 to	 be	 generated	 during	 this	 phase	
should	have	a	format	compatible	with	the	final	ontology	
and	 with	 the	 previous	 taxonomy.	 Therefore	 a	 minimal	
number	 of	 changes	 will	 be	 carried	 out	 on	 the	 product	
created	 in	 the	 previous	 step,	 resulting	 in	 a	 knowledge	
representation	tool	similar	to	a	thesaurus.	That	tool	does	
not	need	to	be	modified	during	the	following	(final)	phase	
of	 transformation	 into	 an	 ontology.	 Nevertheless,	 if	 for	
some	 reason	 it	 is	 necessary	 to	 have	 the	 thesaurus	 in	 one	
of	the	other	formats	(such	as	SKOS),	it	is	possible	to	apply	
a	 simple	 XSLT	 transformation	 to	 the	 product.	 Another	
option	would	be	to	integrate	a	thesaurus	ontology,	such	as	
the	one	proposed	by	Hall,28	with	the	ontology	represent-
ing	the	IEEE	glossary.

In	 the	 thesaurus	 implementation	 carried	 out	 in	 our	
project,	the	following	limitations	have	been	considered:

■■ Only	 the	 hierarchical	 relationships	 implemented	 in	
the	 taxonomy	 have	 been	 considered.	 These	 include	
relationsips	of	type	“is-a,”	that	is,	generalization	rela-
tionships	or	type–subset	relationships.	Relationships	
that	 can	 be	 included	 in	 the	 thesaurus	 marked	 with	
TT,	 BT,	 and	 NT,	 like	 relations	 of	 type	 “part	 of”	 (that	
is,	partative	relationships)	have	not	been	considered.	
Instead	 of	 considering	 them	 as	 hierarchical	 relation-
ships,	 the	 final	 ontology	 includes	 the	 possibility	 of	
describing	classes	as	a	union	of	classes.

■■ The	relationships	of	synonymy	(UF	and	USE)	used	to	
model	the	cross-references	in	the	IEEE	glossary	(“Syn”	
and	“See,”	respectively)	were	implemented	as	equiv-
alent	 terms,	 that	 is,	 as	 equivalent	 axioms	 between	
classes	 (owl:equivalentClass or	 daml:sameClassAs),	
with	inverse	properties	to	reflect	the	preference	of	the	



202   iNFOrMAtiON tecHNOlOGY AND liBrAries  |  DeceMBer 2010

match	 the	 name	 of	 other	 glossary	 terms	 (or	 a	 word	 in	
the	 definition	 text	 of	 other	 glossary	 terms).	 The	 isolated	
words	will	then	be	candidates	for	a	relationship	between	
both	 of	 them.	 (Figure	 8	 shows	 the	 candidate	 properties	
obtained	 from	 the	 Software	 Engineering	 glossary.)	 The	
user	 then	 has	 the	 option	 of	 creating	 relationships	 with	
the	 identified	 candidate	 words.	 The	 user	 must	 indicate,	
for	 every	 relationship	 to	 be	 created,	 the	 restriction	 type	
that	it	represents	as	well	as	existential	or	universal	quan-
tification	 or	 cardinality	 (minimum	 or	 maximum).	 After	
confirming	 this	 information,	 the	 program	 updates	 the	
file	containing	the	ontology	(OWL	or	DAML),	adding	the	
property	to	the	class	that	represents	the	processed	term.

Figure	9	shows	an	example	of	the	definition	of	two	prop-
erties	and	its	application	to	the	class	HighOrderLanguage:	
a	property	Express	with	existential	quantification	over	the	
class	DataStructure	to	indicate	that	a	language	must	repre-
sent	at	least	one	data	structure;	and	a	property	TranslateTo	
of	universal	type	to	indicate	that	any	high-level	language	
is	translated	into	machine	language	(MachineLanguage).	

■■ Results, Conclusions,  and Future Work
The	existence	of	ontologies	of	specific	knowledge	domains	
(software	 engineering	 in	 this	 case)	 facilitates	 the	 process	
of	finding	resources	about	this	discipline	on	the	Semantic	
Web	and	in	digital	libraries,	as	well	as	the	reuse	of	learn-
ing	 objects	 of	 the	 same	 domain	 stored	 in	 repositories	
available	 on	 the	 Web.30	 When	 a	 new	 resource	 is	 indexed	
in	 a	 library	 catalog,	 a	 new	 record	 that	 conforms	 to	 the	
ontology	 conceptual	 data	 model	 may	 be	 included.	 It	
will	 be	 necessary	 to	 assign	 its	 properties	 according	 to	
the	concept	definition	included	in	the	ontology.	The	user	
may	later	execute	semantic	queries	that	will	be	run	by	the	
search	 system	 that	 will	 traverse	 the	 ontology	 to	 identify	
the	concept	in	which	the	user	was	interested	to	launch	a	
wider	 query	 including	 the	 resources	 indexed	 under	 the	
concept.	Ontologies,	like	the	one	that	has	been	“evolved,”	
may	also	be	used	in	an	open	way	to	index	and	search	for	
resources	 on	 the	 Web.	 In	 that	 case,	 however,	 semantic	
search	 engines	 such	 as	 Swoogle	 (http://swoogle.umbc	
.edu/),	are	required	in	place	of	traditional	syntactic	search	
engines,	such	as	Google.

The	 creation	 of	 a	 complete	 ontology	 of	 a	 knowledge	
domain	 is	 a	 complex	 task.	 In	 the	 case	 of	 the	 domain	
presented	 in	 this	 paper,	 that	 of	 software	 engineering,	
although	there	have	been	initiatives	toward	ontology	cre-
ation	that	have	yielded	publications	by	renowned	authors	
in	 the	 field,31	 a	 complete	 ontology	 has	 yet	 to	 be	 created	
and	published.

This	 paper	 has	 described	 a	 process	 for	 developing	
a	 modest	 but	 complete	 ontology	 from	 a	 glossary	 of	 ter-
minology,	 both	 in	 OWL	 format	 and	 DAML+OIL	 format,	

accept
access
accomplish
account
achieve
adapt
add
adjust
advance
affect
aggregate
aid
allocate
allow
allow symbolic 

naming
alter
analyze
apply
approach
approve
arrangement
arrive
assign
assigned by
assume
avoid
await
begin
break
bring
broke down
builds
call
called by
can be
can be input
can be used as
can operate in
cannot be usedas
carry out
cause
change
characterize
combine
communicate
compare
comply
comprise
conduct
conform
consist
constrain
construct
contain
contains no
contribute
control
convert
copy
correct
correspond
count
create
debugs
decompiles
decomposedinto
decrease
define
degree
delineate
denote
depend
depict
describe
design
designate
detect
determine
develop
development

direct
disable
disassembles
display
distribute
divide
document
employ
enable
encapsulate
encounter
ensure
enter
establish
estimate
establish
evaluate
examine
exchange
execute after
execute in
executes
expand
express
express as
extract
facilitate
fetch
fill
follow
fulfil
generate
give
give partial
given constrain
govern
have
have associated
have met
have no
hold
identify
identify request
ignore
implement
imply
improve
incapacitate
include
incorporate
increase
indicate
inform
initiate
insert
install
intend
interact with
interprets
interrelate
investigate
invokes
is
is a defect in
is a form of
is a method of
is a mode of
is a part
is a part of
is a sequence
is a sequenceof
is a technique
is a techniqueof
is a type
is a type of
is ability
is activated by
is adjusted by
is applied to
is based
is called by

is composed
is contained
is contained in
is establish
is established
is executed after
is executed by
is incorrect
is independent of
is manifest
is measured in
is not
is not subdivided in
is part
is part of
is performed by
is performed on
is portion
is process by
is produce by
is produce in
is ratio
is represented by
is the output
is the result of
is translated by
is type
is used
is used in
isolate
know
link
list
load
locate
maintain
make
make up
may be
measure
meet
mix
modify
monitors
move
no contain
no execute
no relate
no use
not be
connected
not erase
not fill
not have
not involve
not involving
not translate
not use
occur
occur in
occur in a
operate
operatewith
optimize
order
output
parses
pas
pass test
perform
permit
permitexecute
permit the 
execution
pertaining
place
preclude
predict
prepare
prescribe
present

present for
prevent
preventaccessto
process
produce
produce no
propose
provide
rank
reads
realize
receive
reconstruct
records
recovery
refine
reflect
reformat
relate
relation
release
relocates
remove
repair
replace
represent
request
require
reserve
reside
restore
restructure
result
resume
retain
retest
returncontrolto
reviews
satisfy
schedule
send
server
set
share
show
shutdown
specify
store
store in
structure
submission of
supervise
supports
suppress
suspend
swap
synchronize
take
terminate
test
there are no
through
throughout
transfer
transform
translate
transmit
treat
through
understand
update
use
use in
use to
utilize
value
verify
work in
writes

Figure 8. Candidate properties obtained from the linguistic 
analysis of the Software Engineering glossary



GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries  |  HilerA et Al.   203

to	each	term.)	We	defined	324	properties	or	relationships	
between	 these	 classes.	 These	 are	 based	 on	 a	 semiauto-
mated	 linguistic	 analysis	 of	 the	 glossary	 content	 (for	
example,	 Allow,	 Convert,	 Execute,	 OperateWith,	 Produces,	
Translate,	 Transform,	 Utilize,	 WorkIn,	 etc.),	 which	 will	 be	
refined	in	future	versions.

The	 authors’	 aim	 is	 to	 use	 this	 ontology,	 which	 we	
have	 called	 OntoGLOSE	 (Ontology	 GLossary	 Software	
Engineering),	 to	 unify	 the	 vocabulary.	 OntoGLOSE	 will	
be	 used	 in	 a	 more	 ambitious	 project,	 whose	 purpose	 is	
the	development	of	a	complete	ontology	in	software	engi-
neering	from	the	SWEBOK	Guide.32

Although	this	paper	has	focused	on	this	ontology,	the	
method	that	has	been	described	may	be	used	to	generate	
an	ontology	from	any	dictionary.	The	flexibility	that	OWL	
permits	 for	 ontology	 description,	 along	 with	 its	 compat-
ibility	with	other	RDF-based	metadata	languages,	makes	
possible	interoperability	between	ontologies	and	between	
ontologies	 and	 other	 controlled	 vocabularies	 and	 allows	
for	 the	 building	 of	 merged	 representations	 of	 multiple	
knowledge	 domains.	 These	 representations	 may	 eventu-
ally	 be	 used	 in	 libraries	 and	 repositories	 to	 index	 and	
search	for	any	kind	of	resource,	not	only	those	related	to	
the	original	field.

■■ Acknowledgments
This	 research	 is	 co-funded	 by	 the	 Spanish	 Ministry	
of	 Industry,	 Tourism	 and	 Commerce	 PROFIT	 program	
(grant	 TSI-020100-2008-23).	 The	 authors	 also	 want	 to	
acknowledge	 support	 from	 the	 TIFyC	 research	 group	 at	
the	University	of	Alcala.

References and Notes

1. M.	Dörr	et	al.,	State of the Art in Content Standards (Amster-
dam:	OntoWeb	Consortium,	2001).

2. D.	 Soergel,	 “The	 Rise	 of	 Ontologies	 or	 the	 Reinvention	
of	 Classification,”	 Journal of the American Society for Information 
Science	 50,	 no.	 12	 (1999):	 1119–20;	 A.	 Gilchrist,	 “Thesauri,	 Tax-
onomies	 and	 Ontologies—An	 Etymological	 Note,”	 Journal of 
Documentation	59,	no.	1	(2003):	7–18.

3. B.	 J.	 Wielinga	 et	 al.,	 “From	 Thesaurus	 to	 Ontology,”	 Pro-
ceedings of the 1st International Conference on Knowledge Capture 
(New	 York:	 ACM,	 2001):	 194–201:	 J.	 Qin	 and	 S.	 Paling,	 “Con-
verting	 a	 Controlled	 Vocabulary	 into	 an	 Ontology:	 The	 Case	 of	
GEM,”	Information Research	6	(2001):	2.

4. According	to	Van	Heijst,	Schereiber,	and	Wielinga,	ontolo-
gies	 can	 be	 classified	 as	 terminological	 ontologies,	 information	
ontologies,	and	knowledge	modeling	ontologies;	terminological	
ontologies	 specify	 the	 terms	 that	 are	 used	 to	 represent	 knowl-
edge	in	the	domain	of	discourse,	and	they	are	in	use	principally	
to	 unify	 vocabulary	 in	 a	 certain	 domain.	 G.	 Van	 Heijst,	 A.	 T.	

which	is	ready	to	use	in	the	Semantic	Web.	As	described	
at	 the	 opening	 of	 this	 article,	 our	 aim	 has	 been	 to	 create	
a	lightweight	ontology	as	a	first	version,	which	will	later	
be	improved	by	including	more	axioms	and	relationships	
that	increase	its	semantic	expressiveness.	We	have	tried	to	
make	this	first	version	as	tailored	as	possible	to	the	initial	
glossary,	knowing	that	later	versions	will	be	improved	by	
others	who	might	take	on	the	work.	Such	improvements	
will	increase	the	ontology’s	utility,	but	will	make	it	a	less-
faithful	representation	of	the	IEEE	glossary	from	which	it	
was	derived.	

The	 ontology	 we	 have	 developed	 includes	 1,521	
classes	 that	 correspond	 to	 the	 same	 number	 of	 concepts	
represented	 in	 the	 IEEE	 glossary.	 (Included	 in	 this	 num-
ber	 are	 the	 different	 meanings	 that	 the	 glossary	 assigns	

<owl:Class rdf:about="#HighOrderLanguage">

 ...

 <rdfs:subClassOf>

 <owl:Restriction>

 <owl:onProperty rdf:resource="#Express"/>

 <owl:someValuesFrom>

 <owl:Class rdf:about="#DataStructure"/>

 </owl:someValuesFrom>

 </owl:Restriction>

 </rdfs:subClassOf>

 <rdfs:subClassOf>

 <owl:Restriction>

 <owl:onProperty rdf:resource="#TranslateTo"/>

 <owl:allValuesFrom>

 <owl:Class rdf:about="#MachineLanguage"/>

 </owl:allValuesFrom>

 </owl:Restriction>

 </rdfs:subClassOf>

</owl:Class>

Figure 9. Example of ontology entry



204   iNFOrMAtiON tecHNOlOGY AND liBrAries  |  DeceMBer 2010

20. W3C,	 SKOS;	 Object	 Management	 Group,	 XML Metadata 
Interchange (XMI),	2003,	http://www.omg.org/technology/doc-
uments/formal/xmi.htm	(accessed	Oct.	5,	2009).

21. UML	 (Unified	 Modeling	 Language)	 is	 a	 standardized	
general-purpose	 modeling	 language	 (http://www.uml.org).	
Nowadays,	 different	 UML	 plugins	 for	 ontologies’	 editors	 exist.	
These	 plugins	 allow	 working	 with	 UML	 graphic	 models.	Also,	
it	 is	 possible	 to	 realize	 the	 UML	 models	 with	 a	 CASE	 tool,	 to	
export	them	to	XML	format,	and	to	transform	them	to	the	ontol-
ogy	 format	 (for	 example,	 OWL)	 using	 a	 XSLT	 sheet,	 as	 the	 one	
published	in	D.	Gasevic,	“UMLtoOWL:	Converter	from	UML	to	
OWL,”	 http://www.sfu.ca/~dgasevic/projects/UMLtoOWL/	
(accessed	Oct.	5,	2009).

22. Gilchrist,	“Thesauri,	Taxonomies	and	Ontologies.”
23. Soergel,	 “The	 Rise	 of	 Ontologies	 or	 the	 Reinvention	 of	

Classification.”
24. J.	 M.	 Morales-del-Castillo	 et	 al.,	 “A	 Semantic	 Model	 of	

Selective	 Dissemination	 of	 Information	 for	 Digital	 Libraries,”	
Information	Technology	&	Libraries	28,	no.	1	(2009):	22–31.

25. International	Standards	Organization,	ISO	2788:1986	Doc-
umentation—Guidelines	 for	 the	 Establishment	 and	 Develop-
ment	of	Monolingual	Thesauri	(Geneve:	International	Standards	
Organization,	1986).

26. B.	M.	Matthews,	K.	Miller,	and	M.	D.	Wilson,	“A	Thesau-
rus	 Interchange	 Format	 in	 RDF,”	 2002,	 http://www.w3c.rl.ac	
.uk/SWAD/thes_links.htm	(accessed	Feb.	10,	2009).

27. M.	Hall,	“CALL	Thesaurus	Ontology	in	DAML,”	Dynam-
ics	Research	Corporation,	2001,	http://orlando.drc.com/daml/	
ontology/CALL-Thesaurus	(accessed	Oct.	5,	2009).

28. Ibid.
29. Y.	 Ding	 and	 S.	 Foo,	 “Ontology	 Research	 and	 Develop-

ment.	 Part	 1—A	 Review	 of	 Ontology	 Generation,”	 Journal of 
Information Science	28,	no.	2	(2002):	123–36.	See	also	B.	H.	Kwas-
nik,	 “The	 Role	 of	 Classification	 in	 Knowledge	 Representation	
and	Discover,”	Library Trends	48	(1999):	22–47.

30. S.	Otón	et	al.,	“Service	Oriented	Architecture	for	the	Imple-
mentation	 of	 Distributed	 Repositories	 of	 Learning	 Objects,”	
International Journal of Innovative Computing, Information & Con-
trol	(2010),	forthcoming.

31. O.	 Mendes	 and	 A.	 Abran,	 “Software	 Engineering	 Ontol-
ogy:	 A	 Development	 Methodology,”	 Metrics News	 9	 (2004):	
68–76;	C.	Calero,	F.	Ruiz,	and	M.	Piattini,	Ontologies for Software 
Engineering and Software Technology	(Berlin:	Springer,	2006).

32. IEEE,	 Guide to the Software Engineering Body of Knowledge 
(SWEBOK)	(Los	Alamitos,	Calif.:	IEEE	Computer	Society,	2004),	
http://	www.swebok.org	(accessed	Oct.	5,	2009).

Schereiber,	and	B.	J.	Wielinga,	“Using	Explicit	Ontologies	in	KBS	
Development,”	International Journal of Human & Computer Studies	
46,	no.	2/3	(1996):	183–292.

5. R.	 Neches	 et	 al.,	 “Enabling	 Technology	 for	 Knowledge	
Sharing,”	AI Magazine	12,	no.	3	(1991):	36–56.

6. O.	 Corcho,	 F.	 Fernández-López,	 and	 A.	 Gómez-Pérez,	
“Methodologies,	Tools	and	Languages	for	Buildings	Ontologies.	
Where	Is	Their	Meeting	Point?”	Data & Knowledge Engineering	46,	
no.	1	(2003):	41–64.

7. Intitute	 of	 Electrical	 and	 Electronics	 Engineers	 (IEEE),	
IEEE Std 610.12-1990(R2002): IEEE Standard Glossary of Software 
Engineering Terminology	 (Reaffirmed	 2002)	 (New	 York:	 IEEE,	
2002).

8. J.	 Krause,	 “Semantic	 Heterogeneity:	 Comparing	 New	
Semantic	 Web	 Approaches	 with	 those	 of	 Digital	 Libraries,”	
Library Review	57,	no.	3	(2008):	235–48.

9. T.	 Berners-Lee,	 J.	 Hendler,	 and	 O.	 Lassila,	 “The	 Semantic	
Web,”	Scientific American	284,	no.	5	(2001):	34–43.

10. World	Wide	Web	Consortium	(W3C),	Resource Description 
Framework (RDF): Concepts and Abstract Syntax, W3C Recommen-
dation 10 February 2004,	 http://www.w3.org/TR/rdf-concepts/	
(accessed	Oct.	5,	2009).

11. World	Wide	Web	Consortium	(W3C),	Web	Ontology	Lan-
guage	 (OWL),	 2004,	 http://www.w3.org/2004/OWL	 (accessed	
Oct.	5,	2009).	

12. World	 Wide	 Web	 Consortium	 (W3C),	 SKOS	 Simple	
Knowledge	 Organization	 System,	 2009,	 http://www.w3.org/
TR/2009/REC-skos-reference-20090818/	(accessed	Oct.	5,	2009).

13. M.	 M.	Yee,	 “Can	 Bibliographic	 Data	 be	 Put	 Directly	 onto	
the	 Semantic	 Web?”	 Information Technology & Libraries 28,	 no.	 2	
(2009):	55-80.

14. L.	F.	Spiteri,	“The	Structure	and	Form	of	Folksonomy	Tags:	
The	Road	to	the	Public	Library	Catalog,”	Information Technology 
& Libraries	26,	no.	3	(2007):	13–25.

15. Corcho,	 Fernández-López,	 and	 Gómez-Pérez,	 “Method-
ologies,	Tools	and	Languages	for	Buildings	Ontologies.”

16. IEEE,	IEEE Std 610.12-1990(R2002).
17. N.	 F.	 Noy	 and	 D.	 L.	 McGuinness,	 “Ontology	 Develop-

ment	101:	A	Guide	to	Creating	Your	First	Ontology,”	2001,	Stan-
ford	 University,	 http://www-ksl.stanford.edu/people/dlm/
papers/ontology-tutorial-noy-mcguinness.pdf	 (accessed	 Sept	
10,	2010).

18. D.	 Baader	 et	 al.,	 The Description Logic Handbook	 (Cam-
bridge:	Cambridge	Univ.	Pr.,	2003).

19. World	 Wide	 Web	 Consortium,	 DAML+OIL	 Reference	
Description,	 2001,	 http://www.w3.org/TR/daml+oil-reference	
(accessed	Oct.	5,	2009);	W3C,	OWL.