Evidence Based Library and Information Practice Evidence Based Library and Information Practice 2010, 5.4 90 Evidence Based Library and Information Practice Evidence Summary Music Information Seeking Behaviour Poses Unique Challenges for the Design of Information Retrieval Systems A Review of: Lee, J. H. (2010). Analysis of user needs and information features in natural language queries seeking music information. Journal of the American Society for information Science and Technology, 61, 1025-1045. Reviewed by: Cari Merkley Librarian Mount Royal University Calgary, Alberta, Canada Email: cmerkley@mtroyal.ca Received: 1 Sept. 2010 Accepted: 25 Oct. 2010 2010 Merkley. This is an Open Access article distributed under the terms of the Creative Commons- Attribution-Noncommercial-Share Alike License 2.5 Canada (http://creativecommons.org/licenses/by-nc- sa/2.5/ca/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly attributed, not used for commercial purposes, and, if transformed, the resulting work is redistributed under the same or similar license to this one. Abstract Objective – To better understand music information seeking behaviour in a real life situation and to create a taxonomy relating to this behaviour to facilitate better comparison of music information retrieval studies in the future. Design – Content analysis of natural language queries. Setting – Google Answers, a fee based online service. Subjects – 1,705 queries and their related answers and comments posted in the music category of the Google Answers website before April 27, 2005. Methods – A total of 2,208 queries were retrieved from the music category on the Google Answers service. Google Answers was a fee based service in which users posted questions and indicated what they were willing to pay to have them answered. The queries selected for this study were posted prior to April 27, 2005, over a year before the service was discontinued completely. Of the 2208 queries taken from the site, only 1,705 were classified as relevant to the question of music information seeking by the researcher. The off-topic queries were not included in the study. Evidence Based Library and Information Practice 2010, 5.4 91 Each of the 1,705 queries was coded according to the needs expressed by the user and the information provided to assist researchers in answering the question. The initial coding framework used by the researcher was informed by previous studies of music information retrieval to facilitate comparison, but was expanded and revised to reflect the evidence itself. Only the questions themselves were subjected to this iterative coding process. The answers provided by the Google Answer researchers and online comments posted by other users were examined by the author, but not coded for inclusion in the study. User needs in the questions were coded for their form and topic. Each question was assigned at least one form and one topic. Form refers to the type of question being asked and consisted of the following 10 categories: identification, location, verification, recommendation, evaluation, ready reference, reproduction, description, research, and other. Reproduction in this context is defined as “questions asking for text” and referred most often to questions looking for song lyrics, while evaluation typically meant the user was seeking reviews of works (p. 1029). Sixteen question topics were outlined in the coding framework. They included lyrics, translation, meaning (i.e., of lyrics), score, work, version, recording (e.g., where is an album available for purchase), related work, genre, artist, publisher, instrument, statistics, background (e.g. definitions), resource (i.e. sources of music information) and other. The questions were also coded for their features or the information provided by the user. The final coding framework outlined 57 features, some of which were further subdivided by additional attributes. For example, a feature with attributes was title. The researcher further clarified the attribute of title by indicating whether the user mentioned the title of a musical work, recording, printed material or related work in their question. More than one feature could appear in a user query. Main Results – Overall, the most common questions posted on the Google Answers service relating to music involved identifying works or artists, finding recordings, or retrieving lyrics. The most popular query forms were identification (43.8%), location (33.3%), and reproduction (10.9%). The most common topics were work (49.1%), artist (36.4%), recording (16.7%), and lyrics (10.4%). The most common features provided by users in their posted questions were person name (53%), title (50.9%), date (45.6%), genre (37.2%), role (33.8%), and lyric (27.6%). The person name usually referred to an artist’s name (in 95.6% of cases) and title most often referred to the title of a musical work. Another feature that appeared in 25.6% of queries was place reference, almost half of which referred to the place where the user encountered the music they were enquiring about. While the coding framework eventually encompassed 57 different features, a small number of features dominated, with seven features used in over 25% of the queries posted and 33 features appearing in less than 10%. The seven most common features were person name, title, date, genre, role, lyric, and place reference. Lee categorized most of the queries as “known-item searches,” even though at times users provided incorrect information and many were looking for information about the musical item but not the item itself (p. 1035). Other interesting features identified by the author were the presence of “dormant searches,” long standing questions a user had about a musical item, sometimes for years, which were reawakened by hearing the song again or other events (p. 1037). Multiple versions of musical works and the provision of information gleaned third hand by users were also identified as complicating factors in correctly meeting musical information needs. Conclusion – While certain types of questions dominated among music queries posted on the Google Answers service, there were a wide variety of music information needs expressed by users. In some cases, the features provided by the user as clues to answering the query were very personal, and related to the context Evidence Based Library and Information Practice 2010, 5.4 92 in which they encountered the work or the mood a particular work or artist evoked. Such circumstances are not currently or adequately covered by existing bibliographic record standards, which focus on qualities inherent in the music itself. The author suggests that user context should play a greater role in the testing and development of music information retrieval systems, although the instability and variability of this type of information is acknowledged. In some cases this context could apply to other works (film, television, etc.) in which a musical work is featured. Another potential implication for music information retrieval system development is a need to re-evaluate the terminology employed in testing to ensure that it is the language most often employed by users. For example, the 128 different terms used in this study to describe how a musical item made the user feel did not significantly overlap with terms employed in a previous music information retrieval task involving mood classification conducted through MIREX, the Music Information Retrieval Evaluation Exchange, in 2007. The author also argues that while most current music information retrieval testing is task- specific – e.g., how can a user search for a particular work by humming a few bars or searching for a work based on its genre, in real life, users come to their search with information that is not neatly parsed into separate tasks. The study affirms a need for systems that can combine tasks and/or consolidate the results of separate tasks for users. Commentary This study reaffirms the value of evaluating information retrieval systems with data gleaned from empirical studies of users in their natural habitat. As the author of the study rightly points out, what is particularly valuable in this instance is that the queries used in this study were not shaped by their interaction with a particular database or existing bibliographic records, but rather contained the information that users thought would be most helpful in tracking down the answer to their question. The types of questions logged may suggest that in many cases the information need users were attempting to satisfy was personal rather than academic, which may play a role in the potential applicability of the results to certain contexts. However, the high level at which the data is presented in the study and the potential overlap between these spheres make it difficult to achieve a clear determination on this issue. The Google Answers service may have attracted a particular type of music seeker, but the fact that users are expressing their questions in free form makes it a particularly rich source of data on how users articulate their information needs. The field of music information retrieval research is complex, and involves experts from a variety of fields, of which information science is one (Downie, 2008). Throughout the study, the author draws on existing work on information retrieval while clearly making the case for the unique challenges faced by individuals working to facilitate user access to the rich body of music information objects in existence. Another source of research on this particular issue is the proceedings of the annual conference of the International Society of Music Information Retrieval (2010). Non- music specialists may also find value in the methodology employed to answer other types of research questions. The author provides considerable detail on the coding framework created to support content analysis of the Google Answers questions and addresses some of the advantages and challenges posed by use of web resources as artifacts of information seeking. The author highlights the advantages of content analysis as a methodology, such as the ability to express results both numerically and qualitatively. The author also clearly addresses the issue of the representativeness of the data sample, and refrains from making sweeping generalizations based on the data. Finally, the author’s call for more empirical studies on user behaviour and less reliance on anecdotal evidence when creating information systems Evidence Based Library and Information Practice 2010, 5.4 93 will strike a chord with information professionals generally, not just those working with music. References Downie, J. S. (2008). The music information retrieval evaluation exchange (2005- 2007): A window into music information retrieval research. Acoustical Science and Technology, 29(4), 247-255. International Society of Music Information Retrieval. (2010). ISMIR - The International Society for Music Information Retrieval. Retrieved November 22, 2010 from http://www.ismir.net/ / Evidence Based Library and Information Practice Evidence Summary