Original article Biomath 1 (2012), 1211117, 1–6 B f Volume ░, Number ░, 20░░ BIOMATH ISSN 1314-684X Editor–in–Chief: Roumen Anguelov B f BIOMATH h t t p : / / w w w . b i o m a t h f o r u m . o r g / b i o m a t h / i n d e x . p h p / b i o m a t h / Biomath Forum Predicting and Scoring Links in Anatomical Ontology Mapping Peter Petrov∗, Milko Krachounov∗, Ognyan Kulev∗, Maria Nisheva∗, Dimitar Vassilev† ∗Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski 5 James Bourchier Blvd., 1164 Sofia, Bulgaria †Bioinformatics group, AgroBioInstitute 8 Dragan Tsankov Blvd., 1164 Sofia, Bulgaria Email: jim6329@gmail.com Received: 12 July 2012, accepted: 11 November 2012, published: 22 December 2012 Abstract—The paper presents a work performed in the area of automatic and semi-automatic ontology mapping. A method for inferring additional cross-ontology links while mapping anatomical ontologies is described and the results of some experiments performed with various external knowledge sources and scoring schemes are discussed as well. Keywords-ontology; graph; directed acyclic graph; on- tology mediation; ontology mapping; ontology merging; scoring scheme; probability; knowledge sharing; knowl- edge reuse; interoperability I. INTRODUCTION The term ontology comes from Philosophy and has been applied in Information Systems, Information Retrieval etc. to represent the formalization of a body of knowledge describing a given domain. Ontologies have become increasingly popular because they help to realize many of the most challenging problems in the IT field like interoperability, information/knowledge sharing and knowledge reuse. Information sources (and ontologies in particular), even from the same problem domain, are usually hetero- geneous. In order to enable interoperation between such information sources (ontologies) and to integrate the information/knowledge from multiple sources, one needs to build mappings between ontologies. These mappings establish the semantic correspondence between concepts and relations in different ontologies. As we have noted in [10] there are some terminological differences pertaining to the integration of ontologies within the ontology map- ping/merging/matching (OM) community. Those termi- nological differences are mostly between the terminology adopted in [1] on one side, and in [11] on the other. In our works, we adopt the terminology of [1]. In the sense of [1], ontology mapping is the process of taking two input ontologies and generating semantic links between their concepts/terms. The generated links are not part of the two input ontologies; they are stored separately from them. Two other terms are related to ontology mapping: ontology aligning and ontology merging. Ontology align- ing [1] can be viewed as an automatic or semi-automatic ontology mapping; it denotes the process of discovery of cross-ontology links by a computer program. Again, these links are stored separately from the two input on- tologies. Ontology merging [1] is the ultimate goal when integrating/mediating two input ontologies; it comes down to taking two input ontologies and generating an output ontology that unifies the knowledge contained in them. It is usually a process which follows the processes of mapping/aligning and which utilizes the intermediate results produced by them; during this process, some pairs of terms (one from each of the two input ontologies) are merged into single nodes of the output ontology, while other input terms are not paired but are just copied unchanged to the output ontology. This paper discusses some issues in automatic map- Citation: P Petrov, M Krachounov, O Kulev, M Nisheva, D Vassilev, Predicting and Scoring Links in Anatomical Ontology Mapping, Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 1 of 6 http://www.biomathforum.org/biomath/index.php/biomath http://dx.doi.org/10.11145/j.biomath.2012.11.117 P Petrov at al., Predicting and Scoring Links in Anatomical Ontology Mapping ping or aligning of species-specific anatomical ontolo- gies by utilization of various knowledge sources. II. PROBLEM FORMULATION Given the anatomical ontologies of two different species (model organisms) e.g. mouse and zebrafish, our goal is to establish semantic links between the terms of the two ontologies such that: (i) these links are of one of the following types: R1 = synonymy, R2 = hypernymy, R3 = hyponymy, R4 = holonymy, R5 = meronymy, and (ii) each of these links has some degree of certainty or degree of confidence or confidence score which is a real number in the interval [0, 1]. The semantic relation types Rk that we refer to here are well-known and are widely utilized in the areas of linguistics, knowledge representation and ontology engineering. That is why we don’t provide any formal or informal definitions for them here. The two input ontologies are represented in the form of OBO files. OBO stands for “Open Biomedical Ontol- ogy“ and denotes an ontology language and an ontology file format [2] for defining ontologies. It has been used mostly for defining ontologies in the biomedical domain. Nowadays OBO is adopted by the GO project [2], [3], the OBO Foundry initiative [4], and other communities. III. FORMALIZATION OF THE PROBLEM In mathematical terms, each of the two input anatom- ical ontologies can be considered as a directed acyclic graph together with a function colouring the graph’s edges. The colours model the relations defined within the input ontologies (like is a and part of ) which we call inner-ontology relations. Typically, there are other inner- ontology relations except those two. These additional relations usually pertain to the development of the par- ticular organism and not just to its adult/gross anatomy. Such relations are for example start stage, end stage, develops from but practically we don’t deal with them as we are mainly concerned with the organism’s adult/gross anatomy, not with the organism’s growth and develop- ment. We shall use further the following notation: O1 = OM : DAG1 = (V1,E1), F1 : E1 →C = {c1,c2, ...,cn}; O2 = OZ : DAG2 = (V2,E2), F2 : E2 →C = {c1,c2, ...,cn}. Here O1 and O2 are the two input anatomical ontolo- gies; DAG1, DAG2 are their corresponding directed acyclic graphs; V1 and V2 are the sets of terms of the two input ontologies (each term has an identifier and a name); E1 and E2 are the relations defined within the two input ontologies; F1 and F2 are the edge-colouring functions. Two terms u1 and u2 are connected with an edge e if and only if the pair of terms (u1,u2) belongs to the relation represented by e. The relations is a (specialization/generalization) and part of (membership/aggregation) are the two typical examples of inner-ontology relations defined within the ontologies O1 and O2. In our notation, we map relations to colours (through F1 and F2), and we deal only with two relations (is a, part of). So it can be assumed that n = 2, c1 = is a, c2 = part of . Thus, if for example, u1 =“brain”, u2 =“central nervous system”, u1,u2∈V1, then there usually exists an edge e between u1 and u2 such that F1(e) = part of (because the brain is part of the central nervous system and anatomical ontologies of most organisms usually declare this fact explicitly). Also given are several (typically large) external knowl- edge sources which might be either biomedical ones or general-purpose ones. They contain anatomical terms and relations (is a, part of , others) between their own terms. Three concrete external knowledge sources have been used for the purposes of this work. These are T1 = UMLS, T2 = FMA, T3 = WordNet. UMLS [5], [14] and FMA [6], [15] are biomedical knowledge sources, and WordNet [7], [8], [16] is a general purpose knowledge source. Formally stated, each of these knowl- edge sources Ts, s = 1, 2, 3, contains the following information: • Terms. Ms = {ts1,ts2, ...,tsms} is the set of terms in the knowledge source Ts. Here tsk = (idsk;namesk); idsk is the identifier within Ts of the term tsk; namesk is the textual name within Ts of the term tsk; ms (usually 106 ≤ ms ≤ 107) is the number of terms in the knowledge source Ts. • Relations. These are the is a and part of relations defined within the external knowledge source Ts: R ′ Ts = Ris aTs ⊆Ms ×Ms, R ′′ Ts = R part of Ts ⊆Ms ×Ms. Typically other relations are also defined within the external knowledge source Ts but only these two are relevant to our work. Each knowledge source src = Ts, s = 1, 2, 3, is up-front assigned a score f(src) which is based on its preciseness in predicting synonymy and parent-child (is a, part of ) relations between terms of the two in- put ontologies. Details on this evaluation (of the three Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 2 of 6 http://dx.doi.org/10.11145/j.biomath.2012.11.117 P Petrov at al., Predicting and Scoring Links in Anatomical Ontology Mapping knowledge sources that we use) can be found in [9]. Having the notation introduced above, we now seek to find a set of predictions (a set of 4-tuples): D = {(v1k,v2k,rk,sk) |k = 1,2, ..., |D|}, such that v1k∈V1, v2k∈V2, rk∈{R1,R2,R3,R4,R5} and sk∈(0,1]. Here, for each k, v1k is a term from the input ontology O1, v2k is a term from the input ontology O2, rk are automatically (i.e. in silico) predicted cross- ontology links from one of the five types defined in the previous section, and sk is a real number denoting the confidence score of the prediction that the terms v1k and v2k are related/linked by a cross-ontology link of the type rk. Requiring that sk ∈ (0,1], we basically imply that the set D which we seek, is in fact a set of cross-ontology predictions or a set of predicted cross- ontology links between O1 and O2 where each score is probabilistic-based (modeling, given the information we have in the input ontologies and also in the available knowledge sources, the probability that the correspond- ing prediction is actually true). IV. ALGORITHMIC PROCEDURES Three algorithmic procedures are applied to the graph structures that were described formally in the previous section. Each of them adds more links to the set D that is being sought. These three procedures are detailed in [12], here we mention them only briefly. Within the first procedure, the two input ontologies are scanned for identity matches between the names of their terms. If t1 ∈ V1 and t2 ∈ V2 have the same names, they are marked as synonyms predicted by what we call the direct matching (DM) procedure. The cross-ontology links discovered/predicted this way are assigned the highest possible scores of 1.0 as these predictions come from information contained entirely in the two input ontologies. During the second procedure, using the information (the terms and the relations) in the external knowledge sources, and identity matches between term names of the two input ontologies and term names of the three external knowledge sources, we build a graph model/structure which aligns each of the two input ontologies to each of the three external knowledge sources. This model contains a set of semantic links (of the types Rk, k = 1, 2, ..., 5, that were defined above) between the two input ontologies on the one side, and the three external knowledge sources on the other side. Then a set of logical rules is applied, and conclusions are drawn for the semantic relations that exist between terms t1 ∈V1 and t2 ∈V2 of the two input ontologies. The following rules are applied at this stage: • Rule (A). If two terms t1 ∈V1 and t2 ∈V2 have been detected as synonyms of the same term t∈Ts, then t1 and t2 are marked as predicted cross-ontology synonyms of each other; • Rule (B). If tj∈Vj has been detected as a synonym of t∈Ts (s=1, 2, 3), and if the term t3−j∈V3−j has been detected as an (is a/part of) child/parent of t, then tj is marked as predicted cross-ontology (is a/part of) parent/child of t3−j (here j =1 or j =2). The application of these rules is what we call the source matching predictions (SMP) procedure. Rule (A), when applied, finds the synonymy relations (i.e. the relations of type R1) between terms from the two input ontologies. Rule (B) is a composite (generalized) version of four separate rules (two options for is a/part of by two options for child/parent makes four options in total). These four rules which originate from rule (B), when applied, find the hypernymy, hyponymy, holonymy and meronymy relations (i.e. the relations of types R2, R3, R4, R5) between terms of the two input ontologies. All links predicted through SMP are given the score f(src), where src is the knowledge source confirming/implying these predictions. Finally, we run a procedure that we denote as the child matching predictions (CMP) procedure. This one tries to find R1, R2, R3, R4 and R5 links between terms of the two input ontologies, t1∈V1 and t2∈V2, for which no links have been predicted either by DM or by SMP. The approach CMP takes is to consider patterns of cross- ontology connectivity (found by DM and SMP) between t1 ∈V1 (parent term 1), t2 ∈V2 (parent term 2), and the child terms of the two parent terms t1 and t2. Three separate patterns of connectivity are considered by CMP: (i) t1∈V1←−tch1∈V1←→tch2∈V2−→t2∈V2 (we call this an U Pattern); (ii) t1∈V1←−tch2∈V2←→tch1∈V1−→t2∈V2 (we call this an X Pattern); (iii) t1 ∈V1 ←− tch1 ∈V1 −→ t2 ∈V2 or t1 ∈V1 ←− tch2 ∈V2 −→ t2 ∈V2 (we call these two patterns V Patterns). In this notation, the −→ and ←− arrows denote sets of non-CMP parent-child links (the arrows always point from child to parent). These are asymmetrical links. The ←→ arrows denote sets of non-CMP synonymy links These are symmetrical links. The tch1 and tch2 are child terms from the two input ontologies. Each occurrence of Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 3 of 6 http://dx.doi.org/10.11145/j.biomath.2012.11.117 P Petrov at al., Predicting and Scoring Links in Anatomical Ontology Mapping any of these patterns between t1 and t2 (the two parent terms) we call a pattern instance. All arrows within one pattern instance represent either is a or part of links (we don’t allow mixing these two within a single pattern instance). Based on these patterns of connectivity, new cross- ontology links (CMP links) are introduced (one CMP link per pattern instance) between t1 and t2. We call these links individual CMP links. To assign scores to the individual CMP links, the concepts score of a set of non- CMP links between two terms and score of a pattern instance (or score of an individual CMP link) are defined below. Also, we introduce two functions, Conj and Disj, with N ≥ 2 parameters each, which, provided that the probabilities p1,p2, ...,pN of N events are given, define the probabilities of (i) all these events occurring at the same time (Conj), and (ii) at least one of these events occurring (Disj). We call the Conj and Disj functions accumulation functions as they accumulate scores of non-CMP links to produce a score of an individual CMP link. Finally, all individual CMP links between t1 and t2 are aggregated through what we call an aggregation function (which can be e.g. the max of N ≥ 1 numbers). Next, we define in some more detail the concepts which we just introduced in relation to CMP. Definition 1 (Conj): Conj is a function which takes N arguments (each of them in [0, 1]) and returns a result in [0, 1]. We discuss a possible implementation for it below. Definition 2 (Disj): Disj is a function which takes N arguments (each of them in [0, 1]) and returns a result in [0, 1]. We discuss possible implementations for it below. Definition 3 (score of a non-CMP link): The score of a non-CMP link between any two terms (which could be from the same ontology or not) is defined as follows: score(sij)= I if sij is an IO link, D if sij is a DM link, f(src) if sij is an SMP link which came from the source src ∈ {UMLS, FMA, WordNet}. Here IO stands for inner-ontology, DM stands for direct matching and SMP stands for source matching predictions; sij is one single non-CMP link (i.e. one single evidence); the I and D are constants (typically having the values of 1.0). Definition 4 (score of a set of non-CMP links): The score of a set of non-CMP links (score of an evidence set) is defined as follows: score(Si) = Disjmk=1(score(sik)), where Disj is the function from Definition 2, sik are non-CMP (i.e. either IO or DM or SMP) links, and the Disj is taken over all non-CMP links taking part in the evidence set Si. Definition 5 (score of an individual CMP link): The score of an individual CMP link e is defined as: score(e) = p · Conjni=1(score(Si)), where p ∈ [0, 1] is a CMP penalty constant, Conj is the function from Definition 1, and the Conj is taken over all evidence sets Si that take part in the pattern instance, which the link e originates from (note that n = 2 for the V patterns and n = 3 for the X and U patterns). Definition 6 (aggregation function): Let K be the number of all individual CMP links drawn between t1 ∈ V1 and t2 ∈ V2. An aggregation function is a known function Fagg which takes the scores of all these K individual CMP links and produces a single number PCMP (t1,t2) ∈ [0,1], which we call score of the aggregated (final) CMP link drawn between t1 and t2. As a final result from the CMP procedure, this aggre- gated CMP link is drawn between any two terms t1 and t2 for which at least one pattern (of any of the three types X, U, V) is found. The score of this link is calculated in the way shown above. V. COMPARISON OF ALTERNATIVE SCORING SCHEMES We have produced several distinct scoring schemes by varying the functions Conj, Disj and Fagg which were defined above. • Scheme #1: (1a) Conj(s1,s2) = s1s2; Conj(s1,s2,...,sN)=Conj(Conj(s1,s2,...,sN−1),sN) (1b) Disj(s1,s2) = s1 + s2 −s1s2; Disj(s1,s2,...,sN)=Disj(Disj(s1,s2,...,sN−1),sN) (1c)Fagg(s1,s2, ...,sN) = max(s1,s2, ...,sN) • Scheme #2: (2a) Conj(s1,s2) = s1s2; Conj(s1,s2,...,sN)=Conj(Conj(s1,s2,...,sN−1),sN) (2b) Disj(s1,s2) = s1 + s2 −s1s2; Disj(s1,s2,...,sN)=Disj(Disj(s1,s2,...,sN−1),sN) (2c)Fagg(s1,s2, ...,sN) = Disj(s1,s2, ...,sN) Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 4 of 6 http://dx.doi.org/10.11145/j.biomath.2012.11.117 P Petrov at al., Predicting and Scoring Links in Anatomical Ontology Mapping • Scheme #3: (3a) Conj(s1,s2) = s1s2; Conj(s1,s2,...,sN)=Conj(Conj(s1,s2,...,sN−1),sN) (3b)Disj(s1,s2)=α(s1+s2−s1s2)+(1−α)max(s1,s2); Disj(s1,s2,...,sN)=Disj(Disj(s1,s2,...,sN−1),sN) (3c)Fagg(s1,s2, ...,sN) = Disj(s1,s2, ...,sN) The Disj functions from schemes #1 and #2 are identical to the formula for calculating the probability of the union of two (and respectively N) independent events. Fagg from scoring scheme #1 corresponds to the probability of the union of two events such that one is completely dependent on the other. Fagg from scoring scheme #2 coincides with Disj from the same scoring scheme, which equals the probability of the union of two independent events. Therefore in a probabilistic model the expression s1 +s2−s1s2 is a good choice for combining two in- dependent scores, while max(s1,s2) is a good choice for combining scores when one score is completely dependent on the other. In scoring scheme #3 we design a scoring function whose values are between the values of the first two scoring functions (#3 is a linear combination of #1 and #2). The main objective behind the use of this third scor- ing function is to account for the dependencies between the knowledge sources (UMLS, FMA, WordNet) without completely ignoring the fact that, if more than one of them confirm certain prediction, that usually improves the odds that this prediction is correct. In scheme #3, α ∈ [0, 1] is a parameter of the linear combination defined in (3b). It varies depending on the knowledge source or the combination of knowledge sources, which confirm the predictions whose scores we accumulate in (3b). The α parameter acts as a buffer to prevent the score from growing too fast when adding up cumulative predictions (i.e. when the predictions being accumulated are confirmed by several knowledge sources): when α equals 0.0, the value is growing the quickest (as it should for independent scores); when α equals 1.0, the value is limited by the maximum score of the scores being accumulated. To experimentally show that the choice of Disj from (3b) is a reasonable one, we have generated a set of observations on two dependent random variables x1, x2 with Boolean (1/0 i.e. true/false) truth values, and we have confirmed that if we substitute the scores s1 and s2 in (3b) with the probabilities P(xi = true) (i = 1, 2), and α with the modulus of the correlation coefficient between the two random variables, we get a very good approximation for the probability P(z = true) of their Boolean disjunction z = (x1 or x2). VI. RESULTS AND DISCUSSION Let us consider the following two figures which il- lustrate how the scores generated by the three scoring schemes are related to each other and demonstrate the advantages of scheme #3. Figure 1. Scatter plot: scheme #1 vs schemes #2 and #3 Figure 2. Scatter plot: scheme #3 vs schemes #1 and #2 Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 5 of 6 http://dx.doi.org/10.11145/j.biomath.2012.11.117 P Petrov at al., Predicting and Scoring Links in Anatomical Ontology Mapping It can be seen on Fig.1 that the data in scheme #1 appear clustered around the configured values for knowledge source scores (and combinations of these), because there isn’t anything to account for the amount of available evidence gathered from each source (e.g. the number of patterns confirming a prediction). Compared to scheme #1, both schemes #2 and #3 scatter the clusters because the Fagg values are growing when more patterns are confirming a given prediction. As Fagg in scheme #3 is limited through the α parameter, it causes a more moderate scattering as seen on Fig. 1, while scheme #2 causes a very rapid increase. The main advantage of scheme #3 is that it allows us to control the speed at which additional patterns increase the score, while scheme #2 gives control only over the initial value of that score. Within scheme #2, when having one pattern confirming the prediction, the scores start somewhere around the configured CMP score value (defined by the penalty constant), and grow with the same speed up to 1.0. Within scheme #3 this growth can be slowed down and controlled through the α parameter. The difference between the schemes #2 and #3 can be seen on Fig. 2 in red, and it clearly shows how easily some scores approach the value 1.0 when scheme #2 is used. The Disj function from Scheme #3 also causes a softening effect on the score when there are multiple knowledge sources and algorithmic procedures (DM, SMP, CMP) confirming the prediction, because it allows us to control the speed at which the score grows and even to use the actual correlation coefficient between the distinct knowledge sources. This is not directly visible on the figures at this scale, because it largely produces local shifts in the position of the clusters and has the biggest effect on data predicted by the knowledge sources (SMP) which constitute the cluster around score=1.0. VII. CONCLUSION We presented in this paper an original algorithmic approach to inferring (predicting and scoring) cross- ontology links within automatic mapping of distinct species-specific anatomical ontologies. The full mapping procedure assumes that the auto-generated set of predic- tions will be carefully checked by a curator (a human, an anatomy expert) and his/her input will be utilized to accurately calculate the correlation coefficients between certain pairs of knowledge sources. These correlation coefficients could be used as values for the α parameters of the scoring scheme. The procedures described briefly here and detailed in [12], and the scoring schemes introduced here, are utilized in the software program AnatOM [10], [13] developed as part of our work on semi-automatic mapping and merging of anatomical ontologies. REFERENCES [1] J. de Bruijn et al., Ontology Mediation, Merging, and Aligning. In: J. Davies, R. Studer, P. Warren (Eds.), Semantic Web Technologies. Wiley, 2006, pp. 95–113. [2] J. Day-Richter, OBO Flat File Format Speci- fication, version 1.2, 2006, Available online at http://www.geneontology.org/GO.format.obo-1 2.shtml, Last accessed: 20 October 2012. [3] M. Ashburner et al., Gene ontology: tool for the unification of biology. Nature Genetics, Vol. 25(1), 2000, pp. 25–29. http://dx.doi.org/10.1038/75556 [4] B. Smith et al., The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, Vol. 25, 2007, pp. 1251–1255. http://dx.doi.org/10.1038/nbt1346 [5] O. Bodenreider, The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, Vol. 32, 2004, pp. 267–270. http://dx.doi.org/10.1093/nar/gkh061 [6] C. Rosse, J. Mejino, A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J. Biomed. Inform., Vol. 36(6), 2003, pp. 478–500. http://dx.doi.org/10.1016/j.jbi.2003.11.007 [7] G. Miller, WordNet: A Lexical Database for English. Commu- nications of the ACM, Vol. 38(11), 1995, pp. 39–41. http://dx.doi.org/10.1145/219717.219748 [8] Ch. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998. [9] Ernest A.A. van Ophuizen, Jack A.M. Leunissen, An evaluation of the performance of three semantic background knowledge sources in comparative anatomy. J. Integrative Bioinformatics, Vol. 7, 2010, pp. 124–130. [10] Peter Petrov, Nikolay Natchev, Dimitar Vassilev, Milko Kra- chounov, Maria Nisheva, Ognyan Kulev, AnatOM An intelligent software program for semi-automatic mapping and merging of anatomy ontologies. To appear in: Proceedings of the 6th International Conference on Information Systems & Grid Tech- nologies (ISGT, Sofia, 1–3 June 2012), Sofia, St. Kliment Ohridski University Press, 2012. [11] J. Euzenat, P. Shvaiko, Ontology Matching. Springer, Heidel- berg, 2007. [12] Peter Petrov, Milko Krachunov, Ernest A.A van Ophuizen, Dimitar Vassilev, An algorithmic approach to inferring cross- ontology links while mapping anatomical ontologies. To appear in Serdica Journal of Computing, ISSN 1312-6555, Vol. 6, 2012. [13] Peter Petrov, Milko Krachunov, Elena Todorovska, Dimitar Vassilev, An intelligent system approach for integrating anatom- ical ontologies. Biotechnology and Biotechnological Equipment 26(4):3173–3181, 2012 [14] http://www.nlm.nih.gov/research/umls/ – Web site of the Uni- fied Medical Language System (UMLS) by the U.S. National Library of Medicine (NLM), Last accessed: 20 October 2012. [15] http://sig.biostr.washington.edu/projects/fm/ – Web site of the Foundational Model of Anatomy (FMA) by the University of Washington, Last accessed: 20 October 2012. [16] http://wordnet.princeton.edu/ – Web site of the WordNet project by the Princeton University, Last accessed: 20 October 2012. Biomath 1 (2012), 1211117, http://dx.doi.org/10.11145/j.biomath.2012.11.117 Page 6 of 6 http://dx.doi.org/10.1038/75556 http://dx.doi.org/10.1038/nbt1346 http://dx.doi.org/10.1093/nar/gkh061 http://dx.doi.org/10.1016/j.jbi.2003.11.007 http://dx.doi.org/10.1145/219717.219748 http://dx.doi.org/10.11145/j.biomath.2012.11.117 Introduction Problem formulation Formalization of the problem Algorithmic procedures Comparison of alternative scoring schemes Results and discussion Conclusion References