Evidence Based Library and Information Practice Evidence Based Library and Information Practice 2011, 6.4 169 Evidence Based Library and Information Practice Classic Salton and Buckley’s Landmark Research in Experimental Text Information Retrieval A Review of: Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297. Reviewed by: Christine F. Marton Adjunct Instructor Faculty of Information, University of Toronto Toronto, Ontario, Canada Email: christine.marton@utoronto.ca Received: 08 May 2011 Accepted: 02 Nov. 2011 © 2011 Marton. This is an Open Access article distributed under the terms of the Creative Commons‐Attribution‐Noncommercial‐Share Alike License 2.5 Canada (http://creativecommons.org/licenses/by‐nc‐sa/2.5/ca/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly attributed, not used for commercial purposes, and, if transformed, the resulting work is redistributed under the same or similar license to this one. Abstract Objectives – To compare the performance of the vector space model and the probabilistic weighting model of relevance feedback for the overall purpose of determining the most useful relevance feedback procedures. The amount of improvement that can be obtained from searching several test document collections with only one feedback iteration of each relevance feedback model was measured. Design – The experimental design consisted of 72 different tests: 2 different relevance feedback methods, each with 6 permutations, on 6 test document collections of various sizes. A residual collection method was utilized to ascertain the “true advantage provided by the relevance feedback process.” (Salton & Buckley, 1990, p. 293) Setting – Department of Computer Science at Cornell University. Subjects – Six test document collections. Methods – Relevance feedback is an effective technique for query modification that provides significant improvement in search performance. Relevance feedback entails both “term reweighting,” the modification of term weights based on term use in retrieved relevant and non‐relevant documents, and “query expansion,” which is the addition of new terms from relevant documents retrieved (Harman, 1992). Salton and Buckley (1990) evaluated two established relevance feedback models based on the vector space model (a spatial model) and the probabilistic model, respectively. mailto:christine.marton@utoronto.ca� Evidence Based Library and Information Practice 2011, 6.4 170 Harman (1992) describes the two key differences between these competing models of relevance feedback. [The vector space model merges] document vectors and original query vectors. This automatically reweights query terms by adding the weights from the actual occurrence of those query terms in the relevant documents, and subtracting the weights of those terms occurring in the non‐relevant documents. Queries are automatically expanded by adding all the terms not in the original query that are in the relevant documents and non‐relevant documents. They are expanded using both positive and negative weights based on whether the terms are coming from relevant or non‐relevant documents. Yet, no new terms are actually added with negative weights; the contribution of non‐relevant document terms is to modify the weighting of new terms coming from relevant documents. . . . The probabilistic model . . . is based on the distribution of query terms in relevant and non‐relevant documents, This is expressed as a term weight, with the rank of each retrieved document then being the sum of the term weights for terms contained in the document that match query terms. (pp. 1‐2) Second, while the vector space model “has an inherent relationship between term reweighting and query expansion” (p. 2), the probabilistic model does not. Thus, query expansion is optional, but given its usefulness, various schemes have been proposed for expanding queries using terms from retrieved relevant documents. In the Salton and Buckley study 3 versions of each of the two relevance feedback methods were utilized, with two different levels of query expansion, and run on 6 different test collections. More specifically, they queried test collections that ranged in size from small to large, and that represented different domains of knowledge, including medicine and engineering with 72 experimental runs in total. Salton and Buckley examined 3 variants of the vector space model, the second and third of which were based on the first. The first model was the classic Rocchio algorithm (1971), which uses reduced document weights to modify the queries. The second model was the “Ide regular” algorithm, which reweights both relevant and non‐relevant query terms (Ide, 1971). And the third model was the “Ide dec‐ hi” algorithm, which reweights all identified relevant items but only one retrieved nonrelevant item, the one retrieved first in the initial set of search results (Ide & Salton, 1971). As well, 3 variants of the probabilistic model developed by S.E. Robertson (Robertson, 1986; Robertson & Spark Jones, 1976; Robertson, van Rijsbergen, & Porter, 1981; Yu, Buckley, Lam, & Salton, 1983) were examined: the conventional probabilistic approach with a 0.5 adjustment factor, the adjusted probabilistic derivation with a different adjustment factor, and finally an adjusted derivation with enhanced query term weights. The 6 vector space model and probabilistic model relevance feedback techniques are described in Table 3 (p. 293). The performance of the first iteration feedback searches were compared solely with the results of the initial searches performed with the original query statements. The first 15 documents retrieved from the initial searches were judged for relevance by the researchers and the terms contained in these relevant and non‐relevant retrieved items were used to construct the feedback queries. The authors utilized the residual collection system, which entails the removal of all items previously seen by the searcher (whether relevant or not), and to evaluate both the initial and any subsequent queries for the reduced collection only. Both multi‐valued (partial) and binary weights (1=relevant, 0=non‐relevant) were used on the document terms (Table 6, p. 296). Also, two types of query expansion method were Evidence Based Library and Information Practice 2011, 6.4 171 applied: expanded by the most common terms and expanded by all terms (Table 4, p. 294). While not using any query expansion and relying solely on reweighting relevant and non‐relevant query terms is possible, this option was not examined. Three measures were calculated to assess relative relevance feedback performance, the rank order (recall‐ precision value); search precision (with respect to the average precision at 3 particular recall points of 0.75, 0.50, and 0.25), and the percentage improvement in the 3‐point precision feedback and original searches. Main Results – The best results are produced by the same relevance feedback models for all test collections examined, and conversely, the poorest results are produced by the same relevance feedback models, (Tables 4, 5, and 6, pp. 294‐296). In other words, all 3 relevance feedback algorithms based on the vector space retrieval model outperformed the 3 relevance feedback algorithms based on the probabilistic retrieval model, with the best relevance feedback results obtained for the “Ide dec hi” model. This finding suggests that improvements in relevance from term reweighting are attributable primarily to reweighting relevant terms. However, the probabilistic method with adjusted derivation, specifically considering the extra weight assignments for query terms, was almost as effective as the vector space model relevance feedback algorithms. Paired comparisons between full query expansion (all terms from the initial search are utilized in the feedback query) and partial query expansion by the most common terms from the relevant items, demonstrate that full expansion is better, however, the difference between expansion methods is small. Conclusions – Relevance feedback methods that reformulate the initial query by reweighting existing query terms and adding new terms (query expansion) can greatly improve the relevance of search results after only one feedback iteration. The amount of improvement achieved was highly variable across the 6 test collections, from 50% to 150% in the 3‐point precision. Other variables thought to influence relevance feedback performance were initial query length, characteristics of the collection, including the specificity of the terms in the collection, the size of the collection (number of documents), and average term frequency in documents. The authors recommend that the relevance feedback process be incorporated into operational text retrieval systems. Commentary Although not widely stated, it is implicitly understood that information retrieval is the foundation of evidence based practice. Evidence is obtained by searching one or more text collections of peer‐reviewed journal literature and obtaining relevant articles. The goal of conducting a search is to retrieve relevant documents from a text collection. A searcher enters a query, also commonly referred to as a search statement, into the search interface for an information retrieval system (search engine), and a ranked list of search results is retrieved and presented to the searcher. The search results should meet the user’s specific information need. Thus, relevance is a key concept in information retrieval. Relevance refers to the match based on topicality between the query terms entered into the search interface of the information retrieval system and the items retrieved. There must be a match between the terms in the query and the terms in the documents retrieved, with documents of highest relevance retrieved first. Techniques that improve the effectiveness of the search process are those that increase relevance (Croft, Metzler, & Strohman, 2010; Manning, Raghavan, & Schütze, 2008; Meadow, 1992; Salton & Buckley, 1990; Salton & McGill, 1986). Relevance feedback is a query reformulation technique invented in the 1960s that has demonstrated effectiveness in improving search performance by improving the correspondence between query terms and document terms. Relevance feedback algorithms or formulas are associated with Evidence Based Library and Information Practice 2011, 6.4 172 retrieval models. Today, 3 retrieval models dominate: the vector space model, probabilistic models, and the widely used Boolean model. Each retrieval model is characterized by a unique ranking algorithm to produce a list of documents that have been scored in order of highest to lowest relevance. Relevance feedback was originally developed from the vector space model created by Gerard Salton in the 1960s. All relevance feedback algorithms, irrespective of the underlying retrieval model, make use of query term reweighting and query expansion (the addition of new terms to the revised query). Both relevance feedback processes can be manual (user‐driven) or automated (computer‐based), sometimes referred to as “pseudo‐relevance,” or “partially automated.” Also, all relevance feedback algorithms rely on text statistics, generally the frequency of query term occurrences in individual documents in a document collection and also the frequency of query terms in the document collection overall (Salton, 1968; Salton, 1971; Salton & Buckley, 1990; Croft, Metzler, & Strohman, 2010). In their seminal article, “Improving retrieval performance by relevance feedback”, Salton and Buckley (1990) conducted empirical research on the relative performance of two relevance feedback processes based on the vector space model and the probabilistic model, respectively. Since its publication in JASIS twenty years ago, their work has been cited over 400 times in the Web of Science’s citation databases, which include the following citation indexes: Science Citation Index Expanded (SCI‐EXPANDED); Social Sciences Citation Index (SSCI); Arts & Humanities Citation Index (A&HCI); Conference Proceedings Citation Index‐ Science (CPCI‐S), and Conference Proceedings Citation Index‐Social Science & Humanities (CPCI‐SSH). There are many recent citations in the academic literature for Salton and Buckley’s article, which demonstrates the ongoing importance of their research in several specialized areas of Information Retrieval (IR) research and practice. For medical librarians, novel ranking tools for PubMed, such as RankSVM (Yu et al., 2010) and MiSearch (States, Ade, Wright, Bookvich, & Athey, 2009), which reference Salton and Buckley’s work, hold promise for improving the relevance of PubMed search results. Other areas of IR research indebted to Salton and Buckley’s research include: image IR (Rahman, Antani, & Thoma, 2011; Su, Huang, Yu, & Tseng, 2011; Arevalillo‐ Herráez, Ferri, & Moreno‐Picto, 2011; Setchi, Tang, & Stankov, 2011); Setchi & Bouchard, 2010; Kwan, Gao, Guo, & Kameyama, 2010; Setchi, Tang, & Bouchard, 2009); video IR (Vallet, Hopfgartner, Jose, Castells, 2011; Yadav & Aygun, 2009); Web IR (Kaptein & Kamps, 2011; Xu, Luo, Yu, & Xu, 2011; Hamdi, 2011; Li, Otsuka, & Kitsuregawa, 2010; Fu, 2010; Gabrilovich et al., 2009; Nauer & Toussaint, 2009; Yumoto, Mori, & Sumiya, 2009; Kuppusamy & Aghila, 2009), Web commerce (Verma, Tiwari, & Mishra, 2011), Web 2.0 RSS feed content (Teng, Liu, & Ren, 2010), and multilingual IR (He & Wu, 2011; He, Tu, Luo, & Li, 2009; Tu, He, & Luo, 2009). More broadly, Salton’s vector space model continues to influence current IR research, including patent IR (Chen & Chiu, 2011); image IR (Martinet, Chiaramella, & Mulhern, 2011; Berber & Alpkrocack, 2010); TV content IR (Yu & Zhou, 2009); multilingual IR (Chew, Bader, Helmreich, Abdelali, & Verzi, 2011; Rajan, Ramalingam, Ganesan, Palanivel, & Palaniappan, 2009); Web search (Wang & Bai, 2009), and Web 2.0 blog posts of fiction reviews in particular (Chen, Lee, Huang, & Kuo, 2010). Overall, their experimental information retrieval research paper represents a classic both because of its findings and its rigorous study design that utilized the residual collection system (Salton, 1968; Salton, 1971; Salton & McGill, 1986; Salton & Buckley, 1990). While Salton and Buckley’s empirical study of relevance feedback processes is regarded as a classic in the field of experimental information retrieval, several issues are evident. First, although the authors assert that the initial search statement should be a tentative query Evidence Based Library and Information Practice 2011, 6.4 173 or trial run, conducted solely for the purpose of retrieving several relevant documents from a document collection (p. 288), their study design utilizes “high‐quality initial searches . . . for experimental purposes.” (p. 291). Second, the focus is on topical relevance (the match between query terms and terms in documents), whereas the authors do not examine user relevance, which includes a consideration of socio‐cognitive factors. Third, test collections are utilized instead of actual information retrieval systems. All of these characteristics of the study design point to the controlled environment of experimental information retrieval research, which limits its generalizability to pragmatic, real‐life searches. The study design utilized by Salton and Buckley (1990) has several shortcomings, foremost of which is a bias in favor of relevance feedback processes derived from the vector space model. This bias is manifested in several ways. First, although 6 relevance feedback processes are compared, 3 based on the vector space model and 3 based on the probabilistic model (a seemingly balanced approach to the evaluation of relevance feedback performance), results from the experimental runs for the probabilistic adjusted derivation are not presented in Tables 4 and 5 – a curious omission. Of greater importance is the supremacy of all 3 vector space model‐based relevance feedback processes over all 3 probabilistic models of relevance feedback examined. The authors attribute the poorer performance of the probabilistic model‐based relevance feedback processes to the indirect method of reweighting terms and the greater emphasis on non‐relevant terms in the probabilistic methods. Yet, another plausible explanation points to the methods used by the authors to revise or adjust the derivation in the probabilistic model relevance feedback algorithms. It is plausible that other methods of derivation for term reweighting could result in greater success in relevance feedback performance for the probabilistic relevance feedback processes. Another contentious issue concerns the methods used for query term expansion and the use of only one feedback iteration. Harman (1992) examined different query term expansion methods with probabilistic relevance feedback processes and many feedback iterations to determine optimal relevance feedback. Her rigorous approach to experimental text retrieval using the large‐ scale NIST collection demonstrated that multiple feedback iterations and different query term expansion methods with relevance feedback processes based on the probabilistic model can lead to substantial query improvement, thus refuting to some extent the findings reported in Salton and Buckley’s paper. References Arevalillo‐Herráez, M., Ferri, F. J., & Moreno‐ Picot, S. (2011). Distance‐based relevance feedback using a hybrid interactive genetic algorithm for image retrieval. Applied Soft Computing Journal, 11(2), 1782‐1791. doi:10.1016/j.asoc.2010.05.022 Berber, T., & Alpkrocak, A. (2010). An extended vector space model for content‐based image retrieval. In C. Peters, B. Caputo, J. Gonzalo, G. H. F. Jones, J. Kalpathy‐Cramer, H. Müller, & T. Tsikrika (Eds.). Multilingual Information Access Evaluation II: Multimedia Experiments, Lecture Notes in Computer Science, 6242, 219‐222. Chen, H. W., Lee, K. R., Huang, H. H., & Kuo, Y. H. (2010). Unsupervised subjectivity‐lexicon generation based on vector space model for multi‐ dimensional opinion analysis in blogosphere. In D.‐S. Huang, Z. Zhao, V. Bevilacqua, J. C. Figueroa (Eds.). Advanced and Intelligent Computing Theories and Applications. Lecture Notes in Computer Science, 6215, 372‐379. Chen, Y. L., & Chiu, Y. T. (2011). An IPC‐based vector space model for patent retrieval. Information Processing & Evidence Based Library and Information Practice 2011, 6.4 174 Management, 47(3), 309‐322. doi:10.1016/j.ipm.2010.06.001 Chew, P. A., Bader, B. W., Helmreich, S., Abdelali, A., & Verzi, S. J.. (2011). An information‐theoretic, vector‐space model approach to cross‐language information retrieval. Natural Language Engineering, 17(1), 37‐70. doi:10.1017/S1351324910000185 Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: information retrieval in practice. Boston: Addison‐ Wesley. Fu, X. (2010). Towards a model of implicit feedback for web search. Journal of the American Society for Information Science and Technology, 61(1), 30‐49. doi:10.1002/asi.21198 Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L., Zhang, T. (2009). Classifying search queries using the Web as a source of knowledge. ACM Transactions on the Web, 3(2), 1‐28. doi:10.1145/1513876.1513877 Hamdi, M. S. (2011). SOMSE: A semantic map based meta‐search engine for the purpose of web information customization. Applied Soft Computing Journal, 11(1), 1310‐1321. doi:10.1016/j.asoc.2010.04.004 Harman, D.K. (1992). Relevance feedback revisited. In N. J. Belkin, P. Ingwersen, & A. M. Pejtersen (Eds.). Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM. He, D., & Wu, D. (2011). Enhancing query translation with relevance feedback in translingual information retrieval. Information Processing and Management, 47(1), 1‐17. doi:10.1016/j.ipm.2009.09.008 He, T. T., Tu, X. H., Luo, J. & Li, F. (2009). Chinese query expansion based on topic‐relevant terms. Information: An International Interdisciplinary Journal, 12(2), 369‐376. Ide, E. (1971). New experiments in relevance feedback. In G. Salton (Ed.). The SMART retrieval system: experiments in automatic document processing. (pp. 337‐ 354). Englewood Cliffs, NJ: Prentice Hall Inc. Ide, E., & Salton, G. (1971). Interactive search strategies and dynamic file organization in information retrieval. In G. Salton (Ed.). The SMART retrieval system: experiments in automatic document processing. (pp. 373‐393). Englewood Cliffs, NJ: Prentice Hall, Inc. Kaptein, R., & Kamps, J. (2011). Explicit extraction of topical context. Journal of the American Society for Information Science and Technology, 62(8), 1548‐ 1563. doi: 10.1002/asi.21563 Kuppusamy, K. S., & Aghila, G. (2009). FEAST ‐ A multistep, feedback centric, freshness oriented search engine. In 2009 IEEE International Advance Computing Conference. (pp. 997‐1001). doi:10.1109/IADCC.2009.4809151 Kwan, P. W., Gao, J., Guo, Y., & Kameyama, K. (2010). A learning framework for adaptive fingerprint identification using relevance feedback. International Journal of Pattern Recognition and Artificial Intelligence, 24(1), 15‐38. doi:10.1142/S0218001410007841 Li, L., Otsuka, S., & Kitsuregawa, M. (2010). Finding related search engine queries by web community based query enrichment. World Wide Web, 13(1‐2), 121‐142. doi:10.1007/s11280‐009‐0077‐1 Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Evidence Based Library and Information Practice 2011, 6.4 175 Retrieval. New York: Cambridge University Press. Martinet, J., Chiaramella, Y., and Mulhern, P. (2011). A relational vector space model using an advanced weighting scheme for image retrieval. Information Processing & Management, 47(3), 391‐ 414. doi:10.1016/j.ipm.2010.10.003 Meadow, C. T. (1992). Text information retrieval systems. San Diego: Academic Press. Nauer, E., & Toussaint, Y. (2009). CreChainDo: An iterative and interactive web information retrieval system based on lattices. International Journal of General Systems, 38(4), 363‐378. doi:10.1080/03081070902857613 Rajan, K., Ramalingam, V., Ganesan, M., Palanivel, S., and Palaniappan, B. (2009). Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Systems with Applications, 36(8), 10914‐10918. doi:10.1016/j.eswa.2009.02.010 Rahman, M. M., Antani, S. K., & Thoma, G. R. (2011). A query expansion framework in image retrieval domain based on local and global analysis. Information Processing and Management, 47(5), 676‐ 691. doi:10.1016/j.ipm.2010.12.001 Robertson, S. E. (1986). On relevance weight estimation and query expansion. Journal of Documentation, 42(3), 182‐ 188. doi:10.1108/eb026793 Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129‐146. doi:10.1002/asi.4630270302 Robertson, S. E., van Rijsbergen, C. J., & Porter, M. F. (1981). Probabilistic models of indexing and searching. In R. N. Oddy, S. E. Robertson, C. J. van Rijsbergen, & P. W. Williams (Eds.) Information retrieval research (pp. 35‐56) London: Butterworths. Rocchio, Jr., J. J. (1971). Relevance feedback in information retrieval. In Salton, G. (Ed.). The SMART retrieval system: experiments in automatic document processing, pp. 313‐323. Englewood Cliffs, NJ: Prentice Hall Inc. Salton, G. (1968). Automatic organization and retrieval. New York: McGraw‐Hill. Salton, G. (1971). Relevance feedback and the optimization of retrieval effectiveness. The SMART retrieval system: experiments in automatic document processing, pp. 324‐336. Englewood Cliffs, NJ: Prentice‐Hall Inc. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw‐Hill. Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297. Setchi, R., Tang, Q., & Stankov, I. (2011). Semantic‐based information retrieval in support of concept design. Advanced Engineering Informatics, 25(2), 131‐146. doi:10.1016/j.aei.2010.07.006 Setchi, R., & Bouchard, C. (2010). In search of design inspiration: A semantic‐based approach. Journal of Computing and Information Science in Engineering, 10(3), 031006. Retrieved 22 Nov. 2011 from http://scitation.aip.org/getpdf/servlet/ GetPDFServlet?filetype=pdf&id=JCISB 6000010000003031006000001&idtype=c vips&prog=normal Setchi, R., Tang, Q., & Bouchard, C. (2009). Ontology‐based concept indexing of images. In J. D. Velásquez & S. A. Ríos Evidence Based Library and Information Practice 2011, 6.4 176 (Eds.). Knowledge-based and Intelligent Information and Engineering Systems: Lecture notes in Artificial Intelligence, 5711, 293‐300. States, D. J., Ade, A. S., Wright, Z. C., Bookvich, A. V., & Athey, B. D. (2009). MiSearch adaptive pubMed search tool. Bioinformatics, 25(7), 974‐976. doi:10.1093/bioinformatics/btn033 Su, J.‐H., Huang, W.‐.J, Yu, P. S., & Tseng, V. S. (2011). Efficient relevance feedback for content‐based image retrieval by mining user navigation patterns. IEEE Transactions on Knowledge and Data Engineering, 23(3), 360‐372. doi:10.1109/TKDE.2010.124 Teng, Z., Liu, Y., & Ren, F. (2010). Create special domain news collections through summarization and classification. IEEJ Transactions on Electrical and Electronic Engineering, 5(1), 56‐61. doi:10.1002/tee.20493 Tu, X., He, T., & Luo, J. (2009). Term relevance estimation for Chinese query expansion. In IEEE 2009 International Conference on Natural Language Processing and Knowledge Engineering. (139‐145) Piscataway, NJ: IEEE. Vallet, D., Hopfgartner, F., Jose, J. M., & Castells, P. (2011). Effects of usage‐ based feedback on video retrieval: A simulation‐based study. ACM Transactions on Information Systems, 29(2). doi:10.1145/1961209.1961214 Verma, A., Tiwari, M. K., & Mishra, N. (2011). Minimizing time risk in on‐line bidding: An adaptive information retrieval based approach. Expert Systems with Applications, 38(4), 3679‐ 3689. doi:10.1016/j.eswa.2010.09.025 Wang, F.J., & Bai, Z.Y. (2009). The Web software mining based on vector space model. In H. Zhang (Ed.). ICFIN 2009: 2009 First International Conference on Future Information Networks. (pp. 275‐279) Piscataway, NJ: IEEE. Xu, Z., Luo, X., Yu, J., & Xu, W. (2011). Mining web search engines for query suggestion. Concurrency Computation: Practice and Experience, 23(10), 1101‐ 1113. doi:10.1002/cpe.1689 Yadav, T., & Aygün, R. S. (2009). I‐quest: An intelligent query structuring based on user browsing feedback for semantic retrieval of video data. Multimedia Tools and Applications, 43(2), 145‐178. doi:10.1007/s11042‐009‐0262‐3 Yu, C. T., Buckley, C., Lam, K., & Salton, G. (1983). A generalized term dependence model in information retrieval. Information Technology: Research and Development, 2, 129‐154. Yu, H., Kim, T., Oh, J., Ko, I., Kim, S., & Han, W.‐S. (2010). Enabling multi‐level relevance feedback on PubMed by integrating rank learning into DBMS. BMC Bioinformatics, 11(Suppl. 2), 1‐10. doi:10.1186/1471‐2105‐11‐S2‐S6 Yu, Z. W., & Zhou, X. S. (2009). Combining vector space model and category hierarchy model for TV content similarity measure. In W. Jiang (Ed.). MUE 2009: Third International Conference on Multimedia and Ubiquitous Engineering. (pp. 130‐136) Los Alamitos, CA: IEEE Computer Society. Yumoto, T., Mori, Y., & Sumiya, K. (2009). Converting topics of user query sequences for cooperative web search. In K. Rose (Ed.). Proceedings: the Seventh International Conference on Creating, Connecting and Collaborating through Computing. (pp. 121‐127) Los Alamitos, CA: IEEE Computer Society. / Evidence Based Library and Information Practice Wang, F.J., & Bai, Z.Y. (2009). The Web software mining based on vector space model. In H. Zhang (Ed.). ICFIN 2009: 2009 First International Conference on Future Information Networks. (pp. 275-279) Piscataway, NJ: IEEE.