INT J COMPUT COMMUN, ISSN 1841-9836 9(1):113-124, February, 2014. A Novel Entity Type Filtering Model for Related Entity Finding J. Zhang, Y. Qu, S. Tian Junsan Zhang 1.College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, P.R.China 2.School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, P.R.China zhangjunsan@upc.edu.cn Youli Qu*, Shengfeng Tian School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044,P.R. China *Corresponding author: ylqu@bjtu.edu.cn sftian@bjtu.edu.cn Abstract: Entity is an important information carrier in Web pages. Searchers often want a ranked list of relevant entities directly rather a list of documents. So the research of related entity finding (REF) is a meaningful work. In this paper we investigate the most important task of REF: Entity Ranking. To address the issue of wrong entity type in entity ranking: some retrieved entities don’t belong to the target entity type. We propose a novel entity type filtering model in which the target types are composed of the originally assigned type and the new type which is automatically acquired from the topic’s narrative to filter wrong-type entities. For the query, we propose a method to process the original narrative to acquire a new query which is composed of noun and verb phrases. The results of experiments show our novel type filtering model gets a better result than the traditional filtering model at whatever precision and recall. Also the experiment shows the method that we acquire a new query is feasible. Keywords: related entity finding, entity, entity ranking, type filtering. 1 Introduction Along with the rapid development of internet, the number of web pages becomes more and more, so mass information is being produced now. Search engine became an important tool to query information from web in people’s life. If a user is looking for entities, which have a specific relationship to some entity, he has to scan the documents retrieved by a Search Engine system to look for entities. For instance, when searchers submit a query Michael’s teammates while he was racing in formula 1 [1], searchers want some related entities actually. Related entity finding task can meet searchers requirements. According to the definition of TREC2009 Entity track, related entity finding(REF): given a source entity, a relation and a target type, identify homepages of target entities that enjoy the specified relation with the source entity and that satisfy the target type constrain [1]. REF provides a new way of information searching through entities. Entity ranking is an important issue of REF .Two elements can affect the result of entity ranking, target entity type and entity relation between source entity and target entity. In this paper we focus on the effect of target entity type to entity ranking. Because wrong type entities pollute the result of entity ranking, we try to filter the entities of wrong type. However the common type Copyright © 2006-2014 by CCC Publications 114 J. Zhang, Y. Qu, S. Tian type filtering method is too coarse to filter wrong entities exactly. Therefore, we propose a novel type filtering model to filter wrong entities. We utilize the Wikipedia category information as the source of entities types. Also we observe carefully the effect of experiment and we see using the novel type filtering model can get a better result at both recall and precision . To the issue of extracting query , we propose an approach in which parsing the narrative’s syntactic structure and rewriting the query. The paper is organized as follows: Section 2 provides an overview of related work, Section 3 gives a description of the basic architecture of REF, Section 4 gives a detailed description of the proposed method, in Section 5 we utilize data set to implement our proposed method and analyze the experimental results, in Section 6 we draw the conclusion and propose our works in the future. 2 Related Work The entity retrieval originate natural language processing, specifically IE (information ex- traction). Finding all entities for a certain class is the target of IE, i.e., extracting entities based patterns or learned from examples or created manually [2]. QA (question answering) is the in- tersection of natural language processing and IR which combines IE and IR. It looks like the REF, yet it differs from REF: (i) an entity is not always been contained in QA query list [3] (ii) REF task add a special relation between target entities and source entity [1]. As an important issue of REF task, entity ranking begin with ranking a specific type entities, e.g. persons in expert search [4]. The task of expert finding is finding experts either by modeling an expert’s knowledge by its associated documents or collecting topic related documents first and then mod- eling experts [5]. Now it develops to rank more general types, e.g. persons, products, locations, organizations etc. The goal of entity ranking is retrieving entities as answers to a query. It is pri- marily focused on returning a ranked list of relevant entities [6]. What our concern are precision and recall. Type filtering can demote the wrong type entities and improve recall and precision. The novelty of our approach is we use co-occurrence model which widely used to estimate the strength of association between terms to estimate the associations between source entity and target entities and using a novel type filtering model filters wrong entities. We apply Wikipedia category information to as the source of entities types. Also we carefully analyze the effect of using the novel filtering model and the traditional filtering model for entity ranking . TREC has run an entity ranking track in 2009 aiming at performing entity-oriented search task on the web [1]. The definition of entity track is: given a source entity, a relation and a target type, find the relevant entities. It makes use of 20 topics, finds three types entities (persons, products, organizations). A query topic is defined as follows [1]: < query > < num > 1 < /num > < entity name > Blackberry < /entity name > < entity url >clueweb09-en0004-50-39593 < /entity url > < target entity > organization < /targetentity > < narrative >Carriers that Blackberry makes phones for. < /narrative > < /query > A general approach of REF task is: (i) collecting text snippets from relevant documents (ii) obtaining entities by performing named entity recognition (iii) ranking relevant entities (iv) finding homepage [1]. Researchers propose several approaches to perform the REF task. Some researchers use different language modeling approaches where the entity model is constructed from text snippets and relation is utilized ad a query [7], [8]. Y.Wu et al. [9] develop an effective A Novel Entity Type Filtering Model for Related Entity Finding 115 approach to rank entities via measuring the "similarities" between supporting snippets of entities and input query. Y.Fang et al. [10] propose a hierarchical relevance retrieval model for entity ranking. Three levels of relevance are examined which are document, passage and entity, respec- tively. R. Kaptein et al. [11] propose an approach using Wikipedia as a pivot for finding entities on the web, reducing the hard web entity ranking problem to easier problem of Wikipedia entity ranking. 3 The Basic Architecture of REF The basic architecture of related entity finding is shown in Figure 1. The REF task can be divided into three main parts: (i) relevant documents retrieving (ii) candidate entities extracting and entity ranking (iii) homepage finding. We will describe the three parts in the following paragraph. Retrieving Candidate Entities Entity Extraction (Using the Anchor Text of Wikipedia Pages) Entity Ranking ( Type Filtering & Relation Filtering) Ranking Entities & Documents Relevant Entities & Homepages Homepage Finding Relevant Documents Query Text Retrieval System Figure 1: The Basic Architecture of REF (1) Relevant documents retrieving. Retrieving relevant documents is the basic component of REF task. The first step is using corpus (here, we use the ClueWeb09 Category B as the documents repository) to build a full text retrieval system. Because our computing resource is limited, we make use of ”The Lemur Project” [12] which provides an online service of ClueWeb09 Category B as our source. Secondly, we send a query to the retrieval architecture and preliminary generate some candidate answers. For the selection of query, we chose the noun phrases and verb phrases as the query. We are interested in noun groups and verb groups because the 116 J. Zhang, Y. Qu, S. Tian noun groups often qualify the target entity and source entity in more detail and can be seen as a kind of a selection criterion. Through extracting the part-of-speech elements from the narrative, we get the noun phrases and verb phrases of each topic. For example, the topic19’s (Entity Track09) narrative is "Companies that John Hennessy serves on the board of". After the parsing the syntactic structure, the query is obtained: "Companies John Hennessy serves serves". For detecting the feasibility of acquiring query through this approach, we respectively utilize the "pure narrative" and the "noun and verb phrases" as the query to retrieve documents in experimental part. (2) Entity Ranking. Entity ranking is the focus in this paper. After generating the relevant documents, the traditional following step is named entity recognition (NER), yet NER is not our emphasis in this paper. We handle NER by considering only anchor texts as entity occurrences in Wikipedia pages [13], [14] . When we get some candidate entities, we hope to find the most relevant entities.So we need to rank the candidate entities. There are two factors will effect the entity ranking: entity type and entity relation. The wrong type entities and the entities which do not conform to the relation between source entity and target entity will pollute the ranking result. In this paper,We only focus on the issue of "filtering wrong type entities". (3) Homepage Finding. An entity is uniquely identified by its homepage, according to the definition of REF. Three homepages and a Wikipedia page at most can be returned for each entity in 2009 Entity Track. Homepage finding can be seen a document retrieval problem which employs a standard language modeling [15] to ranks homepages according to the query likelihood: p(q = e/d), using the entity’s name as a query. This issue is also not the emphasis in this paper. 4 Entity Ranking Model According to the definition of REF, given a Q(Es,T,R), return a ranked list of relevant entities. In the paper, we use Es to indicate the source entity, Et indicate the target entity, T indicate the target type, R indicate a relation between Es and Et. Using the conditional probability formula P(Et|Q) estimates REF task. Due to the condition of P(Et|Q) is complex and difficult to estimating. Next we rewrite P(Et|Q) to: P(Et|Q) = P(Et,Q) P(Q) (1) Considering the denominator P(Q) does not influence the ranking of entities, we derive the ranking formula as follows: P(Et,Q) = P(Q|Et) ·P(Et) (2) = P(Es,T,R|Et) ·P(Et) ∝ P(Es,R|Et) ·P(T|Et) ·P(Et) (3) = P(Es,R,Et) ·P(T|Et) = P(R|Es,Et) ·P(Es,Et) ·P(T|Et) (4) = P(R|Es,Et) ·P(Et|Es) ·P(Es) ·P(T|Et) (5) = P(R|Es,Et) ·P(Et|Es) ·P(T|Et) (6) We assume that type T is independent of source entity and relation R in (3). Assuming P(Es) is a uniform value in (5), we drop it. Now the ranking task is converted to three conditional probability question: P(R|Es,Et), P(Et|Es), P(T|Et). In this paper, our goal is to address the issue of wrong type polluting entity ranking. So we only discuss P(Et|Es) ·P(T|Et) in this paper. (1) Co-occurrence model A Novel Entity Type Filtering Model for Related Entity Finding 117 We see P(Et|Es) as a co-occurrence issue expresses the association between source entity Es and target entity Et. We use a formula to estimate P(Et|Es) as flows: P(Et|Es) = Co(Et,Es)∑ Et ′ Co(Et ′ ,Es) (7) Et ′ indicates an entity co-occurrence with source entity Es in documents. We use two approach to estimate C(Et,Es): (1) maximum likelihood estimate, (2) χ2 hypothesis test [13], [16]. Maximum likelihood estimate(MLE): CoMLE(E t,Es) = C(Et,Es)|C(Es) (8) Where C(Et,Es) indicates the number of documents in which Et and Es co-occurrence, C(Es) indicates the number of documents in which C(Es) occurrence. χ2 hypothesis test: Coχ2(E t),Es)) = N · (C(Et,Es) ·C(Et̄,Es̄)−C(Et,Es̄) ·C(Et̄,Es))2 C(Es) ·C(Et) · (N −C(Et)) · (N −C(Es)) (9) WhereEt̄ , Es̄ indicate the Et and Es don’t appears respectively, and N indicates the total number of documents. For example, C(Et̄,Es̄) expresses the number of documents in which both Et and Es don’t appear. (2) Entity type filtering model The co-occurrence model preliminary ranks entities. But it can not resolve the problem of wrong type entities pollute the ranking result. To address the issue of wrong type entities will pollute the ranking result. Traditional type filtering model deal with P(T|Et): the relation between target entity type and candidate entity type as flows: P(T |Et) = { 1 if C(Et)∩C(T) ̸= ϕ 0 otherwise (10) Here, the C(T) indicates the expected target entity type and the C(Et) indicates the type of candidate entity. The former is previously defined, although the latter is acquired via the Wikipedia category information of candidate entity. If they have an intersection we think the probability is 1, otherwise the probability is 0. Although utilizing traditional entity filtering model can filter some wrong type entities, it is not enough accurate sometimes. According to the definition of REF, the types of target entities are divided into several types which are too wide to a certain extent. Such as, for entity track 2009, there are only three types of target entities which are assigned to 20 topics: person, organization and product. Yet, we see the exact target type of each topic should be different through the observation of topics’ narratives. For example, there are two topics which have same target type (person). But they have completely different narratives: "Authors awarded an Anthony Award at Bouchercon in 2007" , "Chefs with a show on the Food Network". From the narratives, the exact type which the former want is Authors but the latter want is Chefs. Certainly, authors and chefs are both persons, yet they are also two different kinds of persons. So if we can refine the target type according to the topic’s narrative, it may filter wrong type entity more accurately. We propose a novel entity filtering model to estimate P(Et|Es) as flows: Score(T) = Score(Tt) + Score(Tn) = Fc,t Ft + Fc,n Fn (11) 118 J. Zhang, Y. Qu, S. Tian Where: Score(T) - score of type that a candidate entity get on the whole; Score(Tn)- score of type that a candidate entity get through the type acquired from topic’s narrative; Score(Tt)- score of type that a candidate entity get through the topic’s assigned target type; Fc,t- number of category features that a candidate entity type and the topic’s assigned target entity type have in common; Ft- number of category features that the topic’s assigned target entity type has; Fc,n- number of category features that the candidate entity type and the type acquired from topic’s narrative have in common; Fn- number of category features that acquired from topic’s narrative. In order to calculate the Score(Tt), we first get Cat(t) - category information of target entity type that a topic is assigned and its sub-categories (one level down) and Cat(c) - category infor- mation of a candidate entity. Then we can acquire Fc,t and Ft through the category information of themselves. In this paper, we make use of Wikipedia category information as the criterion of judgment. However, The Wikipedia category structure is not a strict hierarchy and the category assignments are imperfect [17]. So we must process the category information further. The num- ber of features that Cat(t) has is too much to use directly. We find many categories of Cat(t) often have a common structure (starting with "something" and ending with "something"). We select the top five categories or structures that appear in Cat(t) as the selected features. For example, the selected features of a assigned target entity type - "Person" are: • ’Living People’ • Ending with ’births’ • Ending with ’deaths’ • Starting with ’People’ • Ending with ’People’ For Score(Tn), we first get Cat(n) - the category information of target entity type that we get from the topic’s narrative and its sub-categories and Cat(c) - the category information of a candidate entity. Then we can acquire Fc,n and Fn through the category information of themselves. To get Cat(n), the category names in the narrative must extract first. We process the topics’ narratives using Brill’s Part-of-Speech [18] tagger and a Noun-Phrase chunker [19]. The first noun phrase in the narrative is the category that we want. For example, the narrative "Chefs with a show on the Food Network" is processed as follows: (ROOT (NP (NP (NNS Chefs)) (PP (IN with) (NP (DT a) (NN show))) (PP (IN on) (NP (DT the) (NNP Food) (NNP Network))))) The noun Chefs is the type that is extracted from the narrative. Then we utilize the Wikipedia category information to get Cat(n) . Next, the category features of Cat(n) is selected and the process is same as that is described in the previous paragraph. For example the selected features of "Chefs" are: • ’Chefs’ • Ending with ’Chefs’ • Ending with ’Characters’ • Ending with ’births’ • ’Living People’ A Novel Entity Type Filtering Model for Related Entity Finding 119 5 Experiment 5.1 Dataset and Evaluation Measures In this paper, we use ClueWeb09 Category B subset as our corpus, including about 50 million documents. TREC 2009 Entity Track has three kind of basic types which are Person, Product and Organization [1]. We restrict the scope of entities only in Wikipedia pages. So we drop 5 topics in which no Wikipedia pages are retrieved and 15 topics are left.The Wikipedia is an excellent structured resource of entities. The title of the page is the name of the entity, the content of the page is the representation of the entity and each Wikipedia page is assigned to a number of categories. In our experiment, we get about 100 thousand relevant entities totally. After culling duplicated entities we get about 70 thousand relevant entities. We also make use of the DBpedia category information to get each entity’s category information. DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets. We use precision and recall as our estimation measures. Using P@10 to express precision and R@N express recall where N taken to be 100, 2000. In order to evaluate the effects that the different model produces for entity ranking and the feasibility that we acquire query from the original topic narrative. We divide the experiment into five steps: (1) Using the original topic narrative as the query retrieves relevant documents. (2) Using the noun and verb phrases which is chosen from the original topic narrative as the query retrieves relevant documents. (3) We only make use of pure co-occurrence model to rank candidate entities. (4) We make use of traditional entity type filtering method to rank candidate entities based the result of using pure co-occurrence model. (5) Utilizing our novel entity type filtering model to rank candidate entities based the result of using pure co-occurrence model. 5.2 Experimental Result The precision and recall are our estimation measures. Using P@10 to express precision of top 10 entities retrieved and R@N express recall where N taken to be 100, 2000. We utilize chart and table to describe the experimental data, showing the effects of different methods rank entities. (1) For detecting the feasibility that using noun and verb phrases which are chosen from the narrative as the query, we make use of the narrative and the chosen phrases as the query to retrieve candidate entities respectively. The number of right entities which are retrieved from the documents are shown in Figure 2. (2) We make use of pure co-occurrence (MLE), traditional type filtering model and our proposed novel type filtering model (based the MLE) to estimate the effect of entity ranking respectively. The result (P@10) is shown in Figure 3. (3) We make use of pure co-occurrence (χ2), traditional type filtering model and our proposed novel type filtering model (based the χ2) to estimate the effect of entity ranking respectively. The result (P@10) is shown in Figure 4. (4) We take topic14 as an example and list the top ten entities in Table 1. In order to observe the variation of the top ten ranked entities using different methods, more intuitively. 120 J. Zhang, Y. Qu, S. Tian 00 1 22 3 44 5 66 7 88 9 1010 11 1212 13 1414 15 1616 17 1818 19 2020 0 10 20 30 40 50 N um be r o f C or re ct E nt iti es Using Narrative Using Noun Phrase and Verb Phrase Topic Figure 2: Number of correct entities using different query 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 P( 10 ) M LE 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Novel Type Filtering Model Traditional Type Filtering Model Pure Co-occurrence Model Topic Figure 3: P@10: using MLE co-occurrence model, traditional type filtering model and novel type filtering model filters entities respectively. A Novel Entity Type Filtering Model for Related Entity Finding 121 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 P( 10 ) X 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Novel Type Filtering Model Traditional Type Filtering Model Pure Co-occurrence Model Topic Figure 4: P@10: using χ2 co-occurrence model, traditional type filtering model and novel type filtering model filters entities respectively. (5) In order to evaluate the experiment effect in the whole. We list the results (the average of precision and recall of all topics) of using different models which are shown in Table 2. 5.3 Analysis Figure 2 shows the number of correct entities which are retrieved from the documents using original narrative and phrases as queries respectively. We see using noun and verb phrases as queries retrieves more correct entities(increased by 2.07%) than using original narratives as queries. It certifies the approach that we extract noun and verb phrases as query is feasible. Figure 3 shows the type filtering (based on MLE) whatever using traditional type filtering model or using our novel type filtering model causes a better result than using pure MLE co- occurrence model in P@10. The Figure 4 show the type filtering (based χ2 ) whatever using traditional type filtering model or using our novel type filtering model causes a better result than using pure co-occurrence model in P@10. Through the Figures we find that utilizing type filtering can improve the effects of P@10. And our proposed novel type filtering model can filter wrong type entities better than traditional type filtering model. Table 1 demonstrates intuitively the variation of topic14’s top ten ranked entities. Our novel type filtering model effectively removes the wrong type entities from the ranking. As to recall, Table 2 shows the R@100 and R@2000 of using pure co-occurrence model, traditional type filtering model and using our novel type filtering model respectively. We see using the novel type filtering model (based on MLE) increases by 27.35% than using traditional type filtering model (based on MLE) and increases by 54.43% than using pure MLE co-occurrence model in R@100. Also we see using the novel type filtering model (based on χ2) increases by 50% than using traditional type filtering model (based on χ2 ) in R@100. The result illustrates than our novel type filtering model can filter wrong type entities than traditional type filtering model in R@100. 122 J. Zhang, Y. Qu, S. Tian Pure Co-occurrence Model Traditional Type Filtering Model Novel Type Filtering Model MLE χ2 MLE χ2 MLE χ2 Bouchercon Music Sue Grafton Sherlock Holmes Sue Grafton Lawrence Block Book Publisher Agatha Christie Sue Grafton Ross Macdonald Sue Grafton Navigation Husband Edgar Allan Poe Edgar Allan Poe Edgar Allan Poe Bill Pronzini History Television Sherlock Holmes Danny Boyle George Washing- ton Lee Child US Crime Robert Crais Hercule Poirot Agatha Christie Marcia Muller Son Books George Washing- ton George Washington Robert Crais Ross Macdonald Organization Performance Hercule Poirot Agatha Christie Dennis Lehane George Washington Ant Homicide Stieg Larsson Laurence Olivier Michael Con- nelly Max Allan Collins Cher Detective Marcia Muller Lawrence Block Lee Child Laura Lippman Pub Big Ross Macdonald Bill Pronzini Val McDermid Val McDermid Table 1: The top ten ranked entities of topic14. The correct entities are indicated in bold. Pure Co-occurrence Model P@10 R@100 R@2000 MLE 0.1533 0.1646 0.4287 χ2 0.0267 0.0438 0.3968 Traditional Type Filtering Model P@10 R@100 R@2000 MLE 0.3533 0.1996 0.2674 χ2 0.2600 0.1492 0.2674 Novel Type Filtering Model P@10 R@100 R@2000 MLE 0.4667 0.2542 0.3632 χ2 0.3933 0.2238 0.3627 Table 2: Recall and Precision of using different type filtering model filters entities respectively. However, the data illustrates some different things in R@2000. We see using novel type filtering model (based on MLE) reduces by 15.28% than using pure MLE co-occurrence model and using traditional type filtering model (based MLE) reduces by 38.25% than using pure MLE co-occurrence model. And using novel type filtering model (based on χ2) reduces by 8.47% than using pure χ2 co-occurrence model and using traditional type filtering model (based on χ2) reduces by 32.61% than using pure χ2 co-occurrence model. The result illustrates that the type filtering model is not accurate enough. In other words, the model may remove some correct entities incorrectly. But it is encouraging to see the novel type filtering model gets a better result than traditional type filtering model (it increases by 26.38% in MLE and 26.27% in χ2 ). 6 Conclusions and Future Works For the issue of related entity finding, entity ranking is still an important issue. While some entities do not confirm to the required entity type and will affect the ranking result, filtering wrong type entities is essential. First, we parse the original narrative and acquire the noun and verb phrases as the new query. Then we make use of a novel type filtering model and the traditional type filtering model to filter entities respectively. In the experiment section, we choose 15 topics which target entity types are Person, Product and Organization as our test topics. We compare the experiment results and find: (i) the approach that we acquire a new query is A Novel Entity Type Filtering Model for Related Entity Finding 123 feasible (ii)using our novel type filtering model gets a better result than using the traditional type filtering model whatever in precision or recall. We also see the problem of our type filtering model: compare to the pure co-occurrence model(in R@2000),it reduces the recall.It indicates some correct entities are removed incorrectly and the model need to be improved further. In this paper we only argue the problem of wrong entity type filtering, while the wrong relation also affects the entity ranking result. As future work, we plan to investigate two issues: (i) how can we optimize our type filtering model to improve recall and precision further (ii) how can we use relation filtering to further optimize the result of entity ranking. Acknowledgement This work is supported by the Fundamental Research Funds for the Central Universities of China (Program No.2011JBM231). Bibliography [1] K. Balog, A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld, Overview of the TREC 2009 entity track, TREC 09, 2009. [2] E. Riloff, Automatically generating extraction patterns from untagged text, AAAI, 2:1044- 1049, 1996. [3] E. M. Voorhees, Overview of the TREC 2002 Question Answering Track, TREC 02, 115-123, 2009. [4] K. Balog, People Search in the Enterprise. PhD thesis, University of Amsterdam, 2008. [5] K. Balog, L. Azzopardi, and M. de Rijke, A language modeling framework for expert finding, Inf. Proc. and Man., 45(1): 1-19, 2009. [6] Jovan Pehcevski, James A. Thom, Anne-Marie Vercoustre ,Vladimir Naumovski, Entity rank- ing in Wikipedia: utilising categories, links and topic difficulty prediction, Information Re- trieval, 13(5):568-600, 2010. [7] W. Zheng, S. Gottipati, J. Jiang, and H. Fang, UDEL/SMU at TREC 2009 Entity Track, TREC 09, 2009. [8] Q. Yang, P. Jiang, C. Zhang, and Z. Niu, Experiments on related entity finding track at TREC 2009, TREC 09, 2009. [9] Y.Wu and H. Kashioka, NiCT at TREC 2009: Employing three models for Entity Ranking Track, TREC 09, 2009. [10] Y. Fang et al, Entity retrieval with hierarchical relevance model, TREC 09, 2009. [11] R. Kaptein, P. Serdyukov, A. de Vries, and J. Kamps, Entity ranking using Wikipedia as a pivot, CIKM, 2010. [12] http://lemurproject.org. [13] M. Bron and K. Balog and M. de Rijke, Ranking relfated entities: components and analyses, CIKM, 2010. 124 J. Zhang, Y. Qu, S. Tian [14] P. Serdyukov and A. de Vries, Delft university at the TREC 2009 Entity Track: Ranking wikipedia entities, TREC 09, 2009. [15] F. Song and W. B. Croft, A general language model for information retrieval, CIKM 99, 77-82, 1999. [16] C. D. Manning and H. Schuetze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999. [17] A. de Vries et al, Overview of the INEX 2007 Entity Ranking Track, 245-251, 2007. [18] E. Brill, Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging, Computational Linguistics, 21(4): 543-565, 1995. [19] L. Ramshaw, and M. Marcus, Text Chunking Using Transformation-Based Learning, Proc. of the Third ACL Workshop on Very Large Corpora, MIT, 1995.