LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Learning Mobile App Design From User Review Analysis doi:10.3991/ijim.v5i3.1673 E. Platzer1 and O. Petrovic2 1 evolaris next level, Graz, Austria 2 Karl-Franzens-University, Graz, Austria Abstract—This paper presents a new learning environment for developers of mobile apps that merges two quite differ- ent views of the same topic. Creative design and system en- gineering are core issues in the development process that are based on diverging principles. This new learning environ- ment aims to address both points of view by not suppressing one of them but trying to benefit from both. User review content analysis is introduced as a tool to generate informa- tion that is useful for both aspects. Index Terms—application design; creativity tool; innovation support; user motive analysis. I. INTRODUCTION Normally, Engineers are used to having clear specifica- tions for developing software. Whole academic disciplines like system engineering follow a structured process with engineer-like thinking. If you look to mobile apps like that in Apple’s AppStore, Google’s Android Market or to the founder of Facebook, Mark Zuckerberg, you cannot find much of that engineering thinking and feeling. The chal- lenge is: how can we build a learning environment for engineers to bring them to experiences in the field of real world, high emotional mobile apps which are loved by consumers? The basic consideration of the learning environment is that due to app markets we have access to a broad range of apps in a very easy, fast and less expensive way. Thus, we are able to learn the development of mobile apps by browsing through and experiment with different apps. Additionally, we can also use the user-generated content in form of reviews and assessments together with download numbers to proof user acceptance and to deduce trends from that. The main aim of the learning environ- ment currently under development is to enable engineers to explore existing mobile apps and related user-generated content in a semi-structured way and to experience critical success factors and current trends leading to high user acceptance. The goal is not to construct a specification robot but a learning environment for human beings. In the first part of this paper a conceptual framework is presented that serves as a foundation for technical imple- mentation of the system which is described subsequently. The system is then evaluated concerning its usability for developers of mobile applications. Results of this evalua- tion and a brief outlook on future research activities are provided at the end of the paper. II. CONCEPTUAL FRAMEWORK A. State of the Art Three potential sources of information about user re- quirements and ideas for new mobile apps provide a basis for the conceptual framework for the suggested learning environment. Innovation support tools are often named synonymously with creativity techniques which provide more or less systematic instruments for idea generation. Basically there exist intuitive-creative methods like brain- storming, brainwriting or synectics and systematic-logic approaches like mind mapping or morphological analysis [10]. Technology acceptance research focuses on adoption and further usage of technology. Main concepts of accep- tance research are Technology Acceptance Model [6] where “ease of use” and “usefulness” are key constructs and Task Technology Fit Model [7] that suggests strong influence of the fit between the challenging task and the technologies abilities to support the user with it on the behavioral intention to use a technology. The flow con- struct [4] is also a well-tested factor of technology accep- tance. Next to that some compound models like the Uni- fied Theory of Acceptance and Use of Technology [12] accumulate parts of existing models to form a new one. User-generated content is a phenomenon that gained importance with the fast development of Web 2.0. Users publish their opinion concerning various aspects of life voluntarily in the internet. The form of publication ranges from product reviews to blogs. The rise of social networks like Facebook or mySpace added completely new possi- bilities of interaction between users that generate content on the web. The incredible amount of content that is avail- able leads to initiatives like Folksonomies that aim to pro- vide a user-generated taxonomy of previously unstruc- tured information. As shown in fig. 1 there are several approaches to com- bine innovation support tools, acceptance research and user-generated content but none of them addresses all three sources. Dynamic models in technology acceptance research like Dynamic Approach for Re-evaluating Technologies and Compass-Model [1] include cyclic phases of technology design followed by acceptance research and redesign of technology. An approach to integrate end users in the in- novation process are the Lead User concept [8] which is based on the assumption that certain people show pro- nounced needs that will be general phenomena in future. iJIM – Volume 5, Issue 3, July 2011 43 http://dx.doi.org/10.3991/ijim.v5i3.1673� LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Figure 1. Interaction and integration of innovation support tools, ac- ceptance research and user-generated content A further development of this concept is the Customer as Innovator approach [11] that enables customers to cre- ate their own products by means of a toolkit based offer. Next to these rather market oriented approaches there also exist more cultural or society based concepts like Partici- patory Design or Design Anthropology [3]. B. Potentials for Improvement The three sources of information and ideas are applied to different stages of the traditional design process. Inno- vation support tools assist the very first phase of idea gen- eration whereas technology acceptance research takes place at the earliest when a prototype has been built which can be tested by users. Analysis of user-generated content is done at the end of the innovation process in order to find out what people think about the launched product or service and which changes they suggest. This procedure includes numerous potentials for further improvement. Creativity techniques do not take into account accep- tance factors but focus on the designers abilities to antici- pate what users need. Acceptance research takes place when investment in infrastructure and product develop- ment has already been made. This fact often impedes fun- damental changes of the product. Sometimes tested prod- ucts or services are already in use and acceptance research is only made to fully understand why people use it or not without further consequences. In the opposite case when the product or service is not available for the respondents of the survey another problem will occur. The interview- ees’ answers are based on mere imagination instead of real experience. The most commonly applied method of data gathering in technology acceptance research is survey with standard- ized questionnaires. Standardization of questionnaires limits the resulting reasons for acceptance to previously defined acceptance factors that must not cover or even include the real acceptance drivers. Moreover acceptance factors are commonly highly aggregated constructs in order to achieve a “good fit” of the tested model. This aggregation level causes fuzzy constructs that are not in- tersubjectively comprehensible as to say “ease of use” which is the most tested construct in technology accep- tance research means different things to different people. Moreover product life cycles in the mobile service mar- ket are quite short and surveys concerning technology acceptance take time when results should be at hand soon. Another potential lies in the analysis of user-generated content after market launch that will lead to incremental improvements of the existing product or service rather than to radical innovations. To sum up the potentials for improvement found in the traditional process:  There is a need to come up with the dynamics of development in the mobile service market.  There is a need to enhance design relevance of pro- vided information.  There is a need to provide an environment that en- ables radical innovations. C. Reshaped Process The potentials presented above can be captured if ac- ceptance research is done by means of user-generated con- tent analysis and transferred into an innovation support tool that is integrated in the idea generation phase of the innovation process. This integration is possible if some preconditions are fulfilled. Firstly the analysis of user- generated content has to be done automatically in order to shorten the effort of time and money until results are at hand. The possibility to use information concerning ac- ceptance factors immediately enables the designer to come up with the dynamics of the market. Moreover automation of the process allows continuous monitoring of acceptance factors and therefore avoids obsoleteness of information. Design relevance is enhanced by providing information concerning basic motivations for usage of mobile applica- tions and linking them to best practice examples. As in this framework acceptance research is done before the product or service is developed there is a need to redefine its goals. Traditionally acceptance research wants to find out why people adopt and use a certain service. In this case it should find out what makes people adopt and use successful mobile applications in general and then provide examples of mobile applications that where users empha- sized these causes. It is very important to ensure that the user-generated content that is analyzed was produced by people who actually experienced the mobile applications that serve as best practice examples. This enables a shift from behavioral intention to actual behavior which makes results more valid concerning economic reality. Radical innovations are possible as information is at hand before investment has been made which would pre- vent fundamental changes. The best practice examples can serve as a focused creativity tool. The system suggested in this paper is not a design tool that acts as a robot but a design support tool that acts as learning environment. Creative design is not replaced by automatically processed parameters of successful mobile applications but encour- aged by providing some basic information concerning acceptance factors and examples of mobile applications that were successful in practice. III. TECHNICAL IMPLEMENTATION A. Data Source Apples AppStore is used as the data source for proto- typical implementation for several reasons. First of all it is the most used platform for distribution of mobile applica- tions. In October 2010 more than 300.000 apps were available with more than 7 billion downloads performed. These usage numbers ensure reasonable amounts of avail- 44 http://www.i-jim.org LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS able customer reviews for successful apps. Secondly AppStore allows reviews only if the reviewer has downloaded the app in question. Therefore it is ensured that each review is based on real experience of the app. The US AppStore is chosen for analysis as it is huge and often used. Moreover most customer reviews are writ- ten in English which facilitates language processing. Analysis is limited to the 100 most downloaded apps of each category (free, paid and grossing). This is important because only successful services should be examined and the number of apps should be sufficiently high. The data that are used for analysis are app name, app ID in order to identify all other information, download rank in order to measure economic success of the app and all customer reviews related to the app in order to mine them for usage motives. The data are scraped automatically using a proxy and then filtering of relevant data and then saved in xml format to facilitate further processing. B. Data Classification There exist several options for classification of the cus- tomer reviews. Unsupervised clustering methods would provide a list of salient topics that are addressed in the reviews. This method is out of question as it would not lead to a learning system that automatically matches cus- tomer reviews with acceptance factors represented by us- age motives and therefore acts as a forecasting tool of economic success and it would neither ensure design rele- vance of the information that is provided. Another way is supervised learning which could provide most accurate annotation of reviews with usage motives but is a too time consuming procedure in this case as the number of re- views is very high. Another approach is semi-supervised learning that enables automated annotation after training with manually labelled data. This method is most useful for the purpose of this research. The first step is the manual annotation of a training data set of customer reviews with usage motives. Reiss model [9] is a very useful model of motivation that aims to cover all possible areas of motives for any human activity. The 16 basic desires listed in table 1 represent a canonical list that does not need adaptation or enlargement in case of technological development but remain validity. The annotation of the training set is done by two inde- pendent annotators in order to ensure intersubjectivity of the data. For the machine learning process only data are used where manual annotation was the same for both an- notators. These data are then annotated in GATE [5] and the precision of the machine based annotation is evaluated for the training set. This is done by splitting the training set and then comparing annotation results of the support vector machine [2] and the provided manually labelled data. Support vector machines learn a classification hy- perplane in the feature space using the provided training data to find out maximal distance to all training examples. Generalization capabilities of support vector machines are usually good and outperform those of other distance- or similarity-based learning algorithms [2]. The machine learning model is applied to all data as soon as evaluation results like F-measures are satisfying. When all reviews are annotated the next step is to calcu- late frequencies of usage motives. These frequencies rep- resent proportional importance of usage motives as ad- dressed in the reviews. TABLE I. 16 BASIC DESIRES OF REISS MODEL OF MOTIVATION Motive name Motive Intrinsic feeling Power Desire to influence (including leadership; related to mastery) Efficacy Curiosity Desire for knowledge Wonder Independence Desire to be autonomous Freedom Status Desire for social standing (including desire for attention) Self-importance Social Contact Desire for peer companion- ship (desire to play) Fun Vengeance Desire to get even (Including desire to compete, to win) Vindication Honor Desire to obey a traditional moral code Loyalty Idealism Desire to improve society (including altruism, justice) Compassion Physical exer- cise Desire to exercise muscles Vitality Romance Desire for sex (including courting) Lust Family Desire to raise own children Love Order Desire to organize (including desire for ritual) Stability Eating Desire to eat Satiation (avoid- ance of hunger) Acceptance Desire for approval Self-confidence Tranquility Desire to avoid anxiety, fear Safe, relaxed Saving Desire to collect, value of frugality Ownership C. Data Interpretation Developers of mobile apps are provided with several forms of data interpretation. Firstly they get a ranking of usage motives that are currently important. The motives are arranged according to their frequency within the ana- lysed reviews. Also their proportional importance regard- ing the other motives is displayed. As the system is planned to serve as a continuous learning environment it is also possible to compute changes within the motive struc- ture over time. Best practice apps are available for each motive. Best practice means that these apps address the motive best. This is indicated by disproportionately high frequency of the motive in question within the reviews related to the app. Another functionality of the system is that certain apps can be monitored and analysed in comparison to the most successful apps. The motives addresses in reviews con- cerning the selected app and those in all the successful apps are juxtaposed and differences are calculated. Next to annotation of usage motives the system will learn a machine learning model that matches customer reviews and download ranks that were provided in the xml files extracted from AppStore. This second learning model allows forecasting economic success of new apps by means of download rank prognoses. The download rank prognosis is computed by means of probabilistic heuris- tics. iJIM – Volume 5, Issue 3, July 2011 45 LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Figure 2. Procedure of the feasibility study including results of each step. D. Feasibility study A prototypical exemplary application of the system was developed in order to test the general feasibility of auto- mated motive-based content analysis of user reviews. This was done in a six-step process that is depicted in fig. 2. The first step was scraping the reviews concerning the top apps. This was done in form of a snap-shot at a given moment. The 277.345 reviews of the top 100 free apps, the top 100 paid apps and the top 100 grossing apps were then transformed into xml files including the needed meta data. In order to ensure balance for the further processing each file contained 200 reviews at most. This process re- sulted in 1.588 xml files that were then reduced by the doublets that occurred due to the fact that one app can be a top app in more than one category. After that the remaining 1.132 xml files were pre- processed for the machine learning tasks. Finite state transducers were used for the tokenization of the text. 60 files containing 9.510 reviews were randomly chosen from the 1.132 files. These files served then as a training data set. Two annotators tried to manually annotate each of the 9.510 reviews with one motive that was salient in the text after a discussion concerning the meaning of the 16 motives in the context of mobile apps. There was also an option to annotate none of the motives because there was either no motive identifiable or several motives were mentioned and it was impossible to tell which one was dominant. The manual annotations were then compared and 3.431 corresponding annotations were found. This represents about one third of the total sample size. As the training data set was randomly chosen this leads to the assumption that it is possible to identify intersubjectively comparable motives in about one third of all reviews. The manually annotated reviews were then used for the training of the learning model. Several engines were tested in order to find the most powerful one. Next to a support vector machine also a Naïve Bayes, C 4.5 decision tree, k- nearest neighbor were computed for reasons of compari- son. As expected because of data base characteristics the support vector machine provided superior results to the other standard algorithms. Unigrams were used to obtain kernels for the machine learning. For the review classifi- cation task the multiclass problem of 15 motive classes (“idealism” was not present in the sample) was transferred into numerous binary problems that could be computed by the system. The threshold probability for classification was set 0.4. This level was supposed to be sufficiently high to keep classification results meaningful and also sufficiently low to obtain a satisfying number of classified instances. Motive kind was the classification target for each of the review instances. A hold-out test where the training data set is split into two parts was carried out for evaluation of the machine learning model. A new model was learned from only two thirds of the training data and then applied to the remain- ing third. Then the results of the automated annotation were compared to those of the previous manual annota- tion. The overall accuracy level (F1 measure) of the learn- ing model was 0,67. This is sufficient for the conclusion that it is possible to obtain meaningful classification re- sults concerning motives when analyzing the content of customer reviews in AppStore. To double-check the meaningfulness of the resulting annotations the leaning model was applied to some of the remaining xml files that were not annotated by hand. The annotations that were suggested by the system were then verified intellectually. It showed that in general the tested annotations were meaningful and comprehensible. The concept of automated motive-based user review content analysis is therefore considered to be generally feasible. IV. EVALUATION A. Methodology The evaluation of the presented system is executed in cycles. This first evaluation of design relevance shall pro- vide information for further development of the system itself and also concerning its actual technical implementa- tion. In a later evaluation cycle design relevance and us- ability of the system will be tested in a field study with more experts. An expert-based qualitative approach was chosen as it will lead to more in depth information. As the system is not fully implemented yet we used “scribbles” for the evaluation. These “scribbles” are draft-like virtual screens of the results the system will provide. The system was presented to three app developers from different areas of development – creative system design, technical im- plementation and user interface design - in form of the drafted screens which are depicted in fig. 3, 4, 5 and 6. They were then interviewed separately concerning their perceptions of design relevance and usefulness respec- tively their suggestions for further improvement. Fig. 3 shows a fictitious pie chart of usage motives that were addressed in the customer reviews. In fig. 4 variation of these relative usage motives is depicted over time. Fig. 5presents the planned functionality of the system to com- 46 http://www.i-jim.org LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Figure 3. Screen 1: Relative usage motives in top apps (top 100 paid, top 100 free and top 100 grossing) pare a certain app with the top apps. The app that was used as an example shows shortcomings concerning major motives whereas minor motives are over-represented. Fig. 6 finally gives an example of the “best practice”-section. Five apps are presented that addressed the motive in ques- tion best. A link to the AppStore enables immediate download of the app that will initiate a creative learning process. B. Results The review analysis approach is regarded as more use- ful and more design relevant than questioning as the re- views are “closer to reality” and reflect what “the user really experiences” and it ensures that the respondent is “really interested in the product”. The results that accep- tance research could provide – Technology Acceptance Model [6] and Task Technology Fit Model [7] were pre- sented as the most often used models - would be also help- ful if they were at hand when idea generation takes place. Moreover the information should be provided on a more detailed level (e.g. how to achieve “ease of use”). Screen 1 was regarded as useful for idea generation and optimization of existing apps. The offered information give “a direction for one’s design objectives”. It was not considered to be relevant for the design of user interfaces but for the development of own ideas. One expert empha- sized the fact that “one can see at a glance what the world doesn’t need”. The advantages of screen 2 are to be seen in its trend depiction as a designer could derive future importance of motives from their past development. It is expected to be very useful for idea generation where there “has always been a lack of such data”. The experts did not consider screen 3 to be as useful as the previous two screens. The comparison to all top apps is not design relevant if the own app aims to be a niche product. It would me more useful if successful apps with similar usage motive structure were provided. One expe- dient use case of the comparison is evaluation of target achievement regarding the motives that were intended to address and those that actually were addressed. Screen 4 was regarded as most useful for graphic de- sign and feature design. One expert named this screen as the most useful functionality of the presented system as it really allows learning from the best practice examples. The experts reported that they usually try to find apps similar to that they want to design and would be more than happy to get a thorough report on that without further re- search. Criticism that was passed on this feature was that all apps are more or less built the same way. C. Discussion As the results of this first evaluation cycle show each presented feature was regarded to be useful and design relevant for at least one aspect of mobile application de- velopment it is reasonable to adhere to the presented data interpretation and representation forms. All four function- alities will be implemented in the technical solution. The editing will be very content-oriented according to the expert requirements. There is no need to focus on the graphical interface but instead provide the information in a purist design that does not influence the creative app de- sign process too much. In the course of the interviews two experts mentioned their strong need for a kind of “price finding support tool” that could possibly be implemented in the final system as an additional functionality. Such a tool could be “worth its weight in gold” as developers of apps often experience that a good app fails because of wrong prizing. The tech- nical implementation could be computed as an additional machine learning model similar to the rank prognoses model where rank is forecasted based on reviews and real- ized ranks of the top apps. When the price of the top apps is added as additional information it would be possible to train a machine learning model that connects customer reviews and prices of top apps and then suggests a price for the new app based on its reviews. A difficulty in this plan is that it will be problematic to obtain customer re- views for the actual app before a price is set. It could harm the success of the app if the price is set to zero until there exist enough customer reviews to compute an optimized price and then raise the price without added value for us- ers. V. OUTLOOK The next steps in the research process include techno- logical implementation of the system on a ready to use level. As soon as this is done it will be possible to evaluate the usefulness of the system in practical use. To further develop the system it will be necessary to evaluate its accuracy over time. The functional test of the system in the course of the feasibility study was executed in one run. In order to find out whether the system is able to keep up with the dynamic changes of the data base it will be useful to evaluate the system on the long-term. This is to say that the learning model is applied to updated data from AppStore at regular intervals and the results of automated annotation are compared to additional exam- ples of manually annotated reviews. This comparison could uncover decline of accuracy over time. In this case it might be useful to implement active learning elements. Decreasing accuracy can occur when the text characteris- iJIM – Volume 5, Issue 3, July 2011 47 LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Figure 4. Screen2: Usage motive trends over a period of one year Figure 5. Screen3: Comparison of usage motives between top apps and a certain selected app 48 http://www.i-jim.org LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS Figure 6. Screen 4: Example for presentation of best practice apps tics that indicate classes do not change gradually but sud- denly. Evaluation of the system’s usefulness has been done by means of descriptive methods so far. It is neces- sary to continue the evaluation and include experimental and observational methods. This will be possible as soon as the system is ready for practical usage in app develop- ment processes. At the moment the system is implemented in form of a semi-automated prototype and trained for classification of reviews concerning the motives included in the motivational model by Reiss. Training of the system concerning other models is possible at any time by means of manual annotation. Moreover it will be interesting to observe economic success of mobile applications that were developed sup- ported by the learning environment presented in this paper in a long-term study. This further evaluation of the system can provide deeper insights concerning its usefulness in practice. An accompanying usability study with develop- ers of mobile applications could support further develop- ment of the learning environment. Another focus of future research will be applicability of the system to other data sources than AppStore or even other fields of products or services. The functionalities of the presented system are not bound to the mobile service market. Generalizability of the system will be tested in selected areas. REFERENCES [1] M. Amberg, M. Hirschmeier, J. Wehrmann, „The compass accep- tance model for the analysis and evaluation of mobile services”, International Journal of Mobile Communications 2(3), pp. 248- 259, 2004. [2] Y. Li, K. Bontcheva, H. Cunningham, “Adapting SVM for data sparseness and imbalance: a case study in information extraction,” Natural Language Engineering 15(2), pp. 241-271, 2008. doi:10.1017/S1351324908004968 [3] J.Burr and B. Matthews, “Participatory innovation”, International Journal of Innovation Management 12 (3), pp. 255-273, 2008. doi:10.1142/S1363919608001996 [4] M. Csikszentmihalyi, Das Flow-Erlebnis: Jenseits von Angst und Langeweile im Tun aufgehen. (The Flow-Experience: Being car- ried away with action beyond fear and boredom.), Stuttgart: Klett- Cotta, 1987. [5] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, “GATE: A framework and graphical development environment for robust NLP tools and applications”, in Proceedings of the 40th Anniver- sary Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, July 2002. [6] F. Davis, A Technology Acceptance Model for Empirically Testing New End-User Information Systems, Massachusetts Institute of Technology, Sloan School of Management Thesis, 1985. [7] D. L. Goodhue, R. L. Thompson, “Task-technology fit and indi- vidual performance”, MIS Quarterly 19, pp. 213-236, 1995. doi:10.2307/249689 [8] E. v. Hippel, “Lead Users: A Source of Novel Product Concepts”, Management Science 32 (7), pp. 791-805, 1986 doi:10.1287/ mnsc.32.7.791 [9] S. Reiss, “Multifaceted Nature of Intrinsic Motivation: The The- ory of 16 Basic Desires”, Review of General Psychology 8(3), pp. 179-193, 2004. doi:10.1037/1089-2680.8.3.179 [10] G. Steiner, “Kreativitätsmanagement: Durch Kreativität zur Inno- vation (Creativity Management: To Innovation via Creativity)”, in Strebel, H., Ed., Innovations- und Technologiemanagement (Inno- vation and Technology Management), Vienna: WUV Univer- sitätsverlag, pp. 265-325, 2003. [11] S. Thomke and E. v. Hippel, “Customers as innovators, a new way to create value”, Harvard Business Review 80 (4), pp. 74-81, 2002. iJIM – Volume 5, Issue 3, July 2011 49 http://dx.doi.org/10.1017/S1351324908004968� http://dx.doi.org/10.1142/S1363919608001996� http://dx.doi.org/10.2307/249689� http://dx.doi.org/10.1287/mnsc.32.7.791� http://dx.doi.org/10.1287/mnsc.32.7.791� http://dx.doi.org/10.1037/1089-2680.8.3.179� LEARNING MOBILE APP DESIGN FROM USER REVIEW ANALYSIS [12] V. Venkatesh, “User acceptance of information technology: to- ward a unified view”, MIS Quarterly 27, pp. 425-478, 2003 AUTHORS E. Platzer was with the Department of Information Science and Information Systems at Karl-Franzens- University, Graz, Austria. She is now with evolaris next level (e-mail: elisabeth.platzer@evolaris.net). O. Petrovic is with the Department of Information Sci- ence and Information Systems at Karl-Franzens- University, Graz, Austria (e-mail: otto.petrovic@uni- graz.at). Manuscript received May 13th, 2011. Published as submitted by the au- thors June 9th, 2011. 50 http://www.i-jim.org CfPart_ICL2011_0531mea.pdf