PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS Affordances in Mobile Augmented Reality Applications http://dx.doi.org/10.3991/ijim.v8i4.4051 Tor Gjøsæter University of Bergen, Norway Abstract—This paper explores the affordances of augmented reality content in a mobile augmented reality application. A user study was conducted by performing a multi-camera video recording of seven think aloud sessions. The think aloud sessions consisted of individual users performing tasks, exploring and experiencing a mobile augmented reality (MAR) application we developed for the iOS platform named ARad. We discuss the instrumental affordances we observed when users interacted with augmented reality content, as well as more complex affordances rising from conventions from media content, AR and the traditional WIMP paradigm. We find that remediation of traditional newspaper content through the MAR medium can provide engaging, pleasing and exciting user experiences. However, the some of the content still suffers from being shoveled onto the MAR platform without adapting it properly. Finally, we discuss what content was most successfully mediated to the user and how the content impacts the user experience. Index Terms—augmented reality, mobile computing, human computer interaction I. INTRODUCTION Mobile augmented reality has become increasingly viable through less expensive (and recently free) systems development kits (SDK) and ever increasing technical capabilities of mobile devices. Wagner et al [1] pioneered “…the first completely stand-alone Augmented Reality client on a PDA.” (p. 2) on handheld devices in the early 2000s. Since then stand-alone AR has gradually become more common as computing power on PDAs and smartphones increased. A mobile augmented reality (MAR) or handheld mobile augmented reality (HMAR) application is in its most basic form a mobile device with a display to show an augmentation and registration technology and software that "...have the following three characteristics: 1) Combines real and virtual 2) Interactive in real time 3) Registered in 3-D". [2] With the smartphone revolution as a backdrop, we decided to start developing a MAR application for distributing and presenting AR content in a newspaper – Verdens Gang – with a large circulation in Norway. The software we developed in collaboration with the newspaper iterated from concept and prototype, to the stable release now available on the AppStore. Through the iterations, we saw increasing requests for reuse of traditional media content the newspaper already had in its possession. These requests resulted in a final version of the application that supports remediating of content on markers in a newspaper. The term remediation is coined in Bolter et al [3] book “Remediation – Understanding new Media”. Remediation can be understood as taking existing content (pictures, videos, etc) adapted to a specific media outlet – for instance a printed newspaper – and remediated through a different medium, in our case an augmented reality application. Trough a video based user study we have uncovered how remediation of traditional newspaper content has shaped our MAR application and how different affordances of the content itself affect the user experience of the application. II. RESEARCHING MAR Augmented reality as a term is broadly interpreted and to some degree diluted across media and academia alike. In this paper, we use Azuma’s [2] definition of AR mentioned in the previous section to position this work. We can with confidence claim the ARad application to be augmenting reality as it incorporates Azuma’s three characterizing properties. In this paper, we focus on the interplay between the augmentation (the content) and how the user experiences the remediated content in our application. This paper contributes to further the understanding of AR as a tool in a mobile multimedia context. Additionally it provides a detailed overview of the design of our application and provides design guidelines theoretically aligned towards affordances in human computer interaction (HCI). III. MAR AS A MEDIUM Bolter and Grusin [3] sees augmented reality as a hyper mediated visual space and "...the insistence that everything that technology can present must be presented at one time - this is the logic of hypermediacy." This definition is fitting since augmented reality seeks to enable us to see the world how we want it, with just the right amount of information. Macintyre et al [4] argues strongly for AR in general being a new media experience mainly because of a "...fluid blend of the physical and the virtual, and the inevitable tension between them, offers rich dramatic possibilities that are impossible in any other medium." In Interactions [5], Bolter et al argues that the field of media studies is not responsible for providing design guidelines that the HCI community. However, as content and the interaction is becoming increasingly intertwined one might look to polyaesthetics [6] to shed light on aspects of new media. Polyastechics suggests media studies should be concerned with contributing knowledge iJIM ‒ Volume 8, Issue 4, 2014 45 PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS about the aesthetics of content since the interaction between content and interface creates the user experience. We believe that content in relation to the interface provides the complete user experience. Poor content in an application with excellent user interface delivers a poor user experience and visa versa. This is paramount in the ARad application, as the content to some extent becomes the user interface itself. AR technology promotes the ideal of a pure and fully transparent user experience, where the interaction with the medium is transparent to the user and just the experience of the content remains. Although some transparency is achievable on a handheld device, Rosenblum et al [7] believe “…if AR is to realize its full potential, hand-held form factors, de- spite much of the hype they are receiving now, simply are not adequate.” (p. 445) We agree with Rosenblum to some degree. The handheld form factor can never create the same immersion that a totally transparent AR solution can. On the other hand, even with all its hype MAR has utility until this paradigm shifts. Pavlik et al [8] gives an overview of the emergence of mobile AR in journalism in general. Content creators for the applications they identified (Aurasma, Layar, Wikitude World Browser, Junaio and Blippar) may benefit from the findings in this paper. In Virtual Realities, Schmalstieg et al [9] discuss the first mobile augmented reality ad developed in 2007 by HIT Lab NZ for the Wellington Zoo. The application delivered a single static 3D model of a zoo animal augmented on a printed marker in the newspaper. They report that for mobile AR advertisement the "…most challenging aspects have been the content creation and application distribution, not the application programming." In addition, Schmalstieg et al points out, "Most of the published AR research has been on enabling technologies (tracking or displays, etc.), or on experimental prototype applications, but there has been little user evaluation of AR interfaces."[9]. In this paper we hope to contribute further to a field where there has “…to date been very little user-based experimentation in augmented reality.” [10, p. 3]. Similarly, Zhou et al [11] call for more practical applications. In a more recent and extensive literature review De Sá [12] gather that “…despite the appeal and the growing number of services and applications, very few guidelines, design techniques and evaluation methods have been presented in the existing literature.” The motivation to write this paper comes from a wish to understand the nature of MAR applications, specifically the affordances in these applications. Accordingly, we aim to describe affordances and specify a set of design guidelines one may apply when remediating content on the mobile augmented reality platform. IV. AFFORDANCES IN AR APPLICATIONS Our application augments traditional media with virtual content. It enables us to create a new user experience for readers consuming traditional print media. By identifying how our MAR application affords interaction with this content, we seek to contribute prescriptive knowledge in the form of design guidelines intended for practitioners and academics alike in the field of interactive augmented reality mobile technologies. To provide meaningful prescriptive knowledge about affordances we have adopted the sociocultural perspective of Kaptelinin et al [13]. This view, and its borrowing of “web of mediators” from Bødker [14]allows us to look at affordances in AR with a respect to the cultural aspect. MAR is different from other media channels, and the content chosen for remediation needs to be adapted to the AR interaction paradigm. Previous experience with different content types needs to be taken into account when designing user experiences for MAR. It requires the understanding of the combination of known usability conventions and conventions of aesthetics directly related to the content we are presenting. In this paper, we aim to identify how we can successfully present the content in an engaging manner on MAR. The framework proposed by Kaptelinin et al [13] allows us to look at affordances from an individual perspective with regards to culture. Through Kaptelinin et al literature review, they find "A growing number of studies in HCI and related areas call for re-defining the notion of affordances to include social and cultural aspects of human interaction with the world." [13] Kaptelinin et al approach is named mediated action perspective and "is concerned with how humans act in their cultural environments, rather than with how animals act in their natural habitats." [13]. If we allow the discussion about affordances in HCI to include these aspects, we can study IT-artifacts with the cultural aspect in mind. This is fruitful in this study since we are creating an artefact relying on existing conventions from a wide range of interaction and content paradigms. These include the WIMP paradigm, the AR interaction technique, existing media content and smartphone conventions. Through Bødker [14] they identify auxiliary affordances that take into account "complex relations within webs of mediation". This may be understood as there is often a need to perform indirectly or directly related actions to achieve an outcome. They identify maintenance affordances, aggregation affordances and learning affordances as examples of affordances technology may employ to achieve its intended purpose. In this study, we will describe the instrumental affordances of our application. Further we look at the complex affordances that arise with bringing culture into understanding of applications. Learning affordances: What steps do the users go through to learn how to interact with an AR application, and how we might improve these affordances. Maintenance affordances: What needs to be taken care of to allow applications to function as effective mediators (maintenance). Aggregation affordances: How is the application intertwined with analog and digital artifacts to achieve its outcome. Aggregation affordances illustrate the fact that some applications must be combined with other artifacts to achieve its indented purpose. AR`s character is to use the environment to provide augmentations and in this regard we aim to identify the essential aggregation affordances of a MAR application. In addition, we propose a domain specific affordance - Remediation affordances: This affordance encapsulates the affordances associated with the content. Specifically, 46 http://www.i-jim.org PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS how we re-communicate the existing affordances of content through an interface. In our case, it deals with bringing the content from a known interface a to a different medium. The act of remediation creates a number of boundaries and as well as new possibilities. The aim is to identify successful remediation affordances for a given medium or interface. This is at the heart of mediated action, taking into account the culturally bound when designing. V. DEVELOPMENT OF ARAD To address the problem formulated previously we developed several proof of concept prototypes on a variety of mobile platforms (Symbian, Windows Mobile and iOS) starting fall 2010. An iterative process with several prototypes led to the application now freely available in the AppStore. In the following sections, we will describe the rationale behind the design choices we transferred to the interface, structure and content of ARad. A. The structure and interface of ARad ARad is intended to be a MAR application that enables users to seamlessly experience AR content in print media. The interface should be intuitive and enable the users to move rapidly between the different content types. ARad achieves this through a considerable amount of pre- fetching of data in load-screens. Rather than loading content for each marker, the user can download a collection of content from one content provider and experience it in a newspaper or any other printed medium by turning the page. ARad is developed as a content browser; this means that we provide a range of content providers or campaigns represented as icons on the first screen. The application architecture is straightforward. When ARad starts up it checks for new content and content-providers. It loads new content-providers and links touchable icons to content. When the user touches an icon, it downloads the corresponding content and loads the media assets into memory while the user see a loading-screen. When the user has touched the play icon, a notification screen informs the user to direct the phone towards a marker. When the user points at a marker associated with the content, the augmentation appears. The user can leave this view by aiming the device at something untrackable and press a X icon. They then return to the screen representing the different types of content and may choose content from a different content provider (Figure 1). This structure we believe enables the perception of ARad being a content browser for different types print media categorized as either campaigns or content- providers. Campaigns are not restricted to a template format, and commonly has some custom code implemented. Content-providers are identified as media outlets that generate new content rapidly based on templates, in contrast to campaigns that remove their custom content when its life cycle is complete. VI. CONTENT IN ARAD The specifications for the interface and content in ARad came from newspaper in Norway (Verdens Gang). The newspaper wanted a way of using existing images, videos, illustrations and 3D material already in their possession. Marshall McLuhan [15] remarked that the content of a new medium is always the content of another medium. In keeping with this prophecy we inadvertently started the process of remediation old media through new media (MAR). Images, videos, illustrations and 3D material related to news or advertisement already in their possession was adapted to a format ARad could display and remediate. A. Content types ARad can handle several different types of content using templates. The templates can be updated on the dynamically using an XML-like specification language and supports the quick turnaround in a newspaper. When new content is available we flag it in an index file and when the user reload the content new content is downloaded to the application. ARad supports the following content types: • Image slideshows with or without audio • Videos with audio • Dynamic 3D content with predefined points of interaction • Custom content The specification relays paths to marker-, texture-, image-, audio-, video- and mesh-data for the application to load and how to organize and orient it in relation to markers. The custom content needs to be submitted to AppStore for review as it contains executable code. 1) Image slideshows with or without audio We configure this with a list of images and corresponding single ambient track for background music. Other soundtracks can be tied to an image to provide a narrative or sound effects to amplify the user experience. The user can use arrow icons to maneuver in the image carousel. 2) Videos with audio The video template points to a MPEG-1 file with an attached audio file. The user can play or pause the video when it is playing on a marker or in full screen. The content is a short interview with a performing artist. Figure 1. Different screen elements in Arad. iJIM ‒ Volume 8, Issue 4, 2014 47 PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS 3) Dynamic 3D content with predefined points of interaction The 3D content template loads a set of 3D objects onto a marker and allows tap interaction with the 3D model. The 3D objects have a predefined idle animation with audio. Animations for interaction can be accessed using tap interaction. All the different content types can be viewed in something we have called full-screen mode. This means that the user can hit a button to view the images, movies and 3D content without the marker in the cameras view, and keep interacting by tapping and dragging the content. This gives the user freedom similarly to the Freeze-set-go interaction method proposed by Lee at al [16]. VII. METHOD The aim of the evaluations is to provide descriptive knowledge about - how users use the mobile AR application - and prescriptive knowledge about design - in this case a reflection on what works and how to improve what already works - about this IT-artifact in accordance with Iivari [17] types of knowledge attainable from design science. This study utilizes the framework for studying and evaluating augmented reality games described by Gjøsæter et al[18]. The framework details a step-by-step approach to conducting a study of AR games. This approach “…works particularly well with AR games since they combine virtual information with the physical environment: while video recordings in addition to screen captures allows the researcher to observe both what happens on screen as well as what happens in the physical space, Think aloud provides insights into the players’ problem solving processes.” (p. 78). While this study is not concerned with the gaming aspect of AR, we are concerned with what happens in virtual as well as physical space as well as insight into the users experience of the AR system. A. Data collection methods Think Aloud (THA) [19] is an evaluation technique that takes place during testing; users are instructed to verbalize their actions and thoughts throughout the evaluation. It is a more informal evaluation technique than other common evaluation techniques used in HCI such as cognitive walk- through and heuristic evaluation [20]. Rather than trying to uncover piecemeal design errors as these pragmatic methods excel at, this approach allows us to observe and investigate a uninterrupted user experience through the captured data. Some previous studies of using think-aloud as an evaluation tool and observational evaluation technique for AR systems can be found. Dünser et al [21] employ think aloud to assess how young children interact with augmented reality books. Liarokapis et al [22] uses think aloud to evaluate and discuss implementation of different interaction methods to AR games. We chose to do a concurrent Think Aloud, where the participants talked during gameplay. In addition, a retrospective Think Aloud and interview session was performed post session. This session served as a debriefing and allowed users to discuss their overall experience of the content. B. Capturing video of mobile augmented reality Video-based qualitative research has in recent years increased in popularity within the HCI field. A key advantage to this method is that it can help capture “...aspects of social activities in real-time: talk, visible conduct, and the use of tools, technologies, objects and artifacts”[23]. We used guidelines provided by Heath et al [23] and Gjøsæter et al [18] to prepare for, and undertake the video recording. Care should be taken to ensure proper audio capture to facilitate transcription of interaction, clear video of the interaction in proper lighting and correct angles and other technical aspects of video recording. We digitized and synchronized the data in high resolution to allow us to see the participant’s interaction with the context as well as the content (Figure 2). Some of the figures in this paper are formatted as cartoon strips. This format supports communication [18] of the activities, as well as the speech occurring during the sessions. The sessions were captured using a small camera attached to a rig at an angle to the handheld device screen. A portable tripod camera was used to film the user's interaction with the tangible printed markers. We believe the combination of THA and video recordings provide a good foundation for revealing affordances and qualities with the interfaces. The video recordings provide detail about the worldly context and the user interface on the mobile device. In contrast to simply an audio recording of the Think aloud, where you would lose essential data about for instance how they are pointing and orienting the device. In some cases it may also be difficult to ascertain what user interface elements the users are referring just from audio recordings. Additionally, one can use still images, and still images in sequence from the video recordings to directly illustrate the system and underpin interesting findings. Figure 2. Tripod with custom camera rig 48 http://www.i-jim.org PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS C. Structure of evaluations A pilot study with a female primary school teacher aged 25 preceded the final evaluations. The pilot study was conducted to rehearse and uncover problems with the structure of the evaluations. We gathered data about gender, age, profession and familiarity with information technology, smartphones and tables. The final evaluations were conducted with seven participants – five male and two female. With an average age of 29 years (min: 26, max: 34), the participants were faculty members (an accountant, three PhD students, a post doc student) and others (a high-school teacher and an engineer). When asked “How would you characterize your experience with information technology?” on a Likert scale from “1 – Very little” to “8 – Very much” the participants averaged 4.8 (4 min, 6 max). In regards to “How would you characterize your experience with tablets and smartphones?” the participants averaged 5.0 (3 min, 7 max). We label the evaluators from R1 to R7. The author´s assistant is labeled A, and the author is labeled T in the transcripts. The sessions ranged from 25 minutes to 45 minutes of dual camera video. A free form retrospective think aloud followed each session, lasting between 10 and 30 minutes. The corpus for this study total 51 transcribed dual camera video clips. During the sessions, the participants experienced six remediated content types. • A video (Figure 4) • An interactive image slideshow (Figure 6) • A 3D interactive castle (Figure 3, 7) • A 3D interactive troll (Figure 5, 7) • A game related to the Troll (not described in this paper) • A game related to a different campaign (not described in this paper) VIII. FINDINGS ARad is remediating in nature, and we choose to look at it from this perspective in this study. When disseminating sequences of interaction that relate to the overall experience we may reveal how remediation of content can potentially provide new, better, more fun or engaging ways of interacting with print media. A variety of other topics on the details of usability is readily available in the data. However, this study focused on the user experience and different affordances of content remediated through a handheld augmented reality application. We use design research principles [24] in combination with established evaluation methods within the field of HCI to help us communicate and analyze the user experience of the content and interface of ARad through illustrations and quotations in context. The findings below are represented as direct quotes in italic and illustrations from the concurrent think aloud. Quotes are referenced in the text as Q with the quote number following. A. Video content During the interaction with the video content, we made observations in regards to how the users experienced this content. The novelty factor of a digital video being experienced augmented on paper can not be understated (Q1, Q2). These utterances were spoken when the participants viewed the movie on the marker for the first time. Q1 - R2: “Ah! Now it works. Cool!” Q2 - R4: “It is attached to the paper there. That was pretty cool.” Some of the evaluators found it to be unfamiliar to watch movies in this manner and tried to align it the movie naturally to the screen (Q3). Q3 - R6: “The only thing here is that we need to aim to show the movie in the right size.” Other participants did not utter this directly, but from the video material we can observe the participants align the video naturally after some time experimenting, and eventually bringing the movie to full screen as illustrated in Figure 4. Some participants did not mind watching the video at odd angles. Others did not align the video to the display perfectly, but kept the movie in view for long sequences of interaction, and avoided getting the video out of view or at odd angles. It may be beneficial to show content into alignment when we detect that the user is trying to align any flat content to the screen. Users want to see flat content in full screen mode. It may be some subtle property with AR- video we have yet to determine, but the evaluators seemed to be more excited about AR-video represented in this way than images. Figure 4. While interacting with 2D content, the users tried to align the content appearing on the marker to iPhone display. Figure 3. A user exploring 3D content iJIM ‒ Volume 8, Issue 4, 2014 49 PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS B. Image content We observed the participants recognize the image content as images. The concept of a slideshow we created was immediately recognized and articulated (Q5). Q5 - R3: “*Sigh* now I’ve been through the entire slideshow.” Q6 - R2: “I don’t know that it improves it, that the images are lying on top of the paper.” The participant does understand the concept of the application, but he questions the value of AR-mode to display images, and clearly expresses boredom through body language and sighs (Q5). Q6 shows the user not seeing the point, expressing doubts about the experience this feature adds to the image content in any way. Q7 - R6: “There is nothing wrong with experiencing it in this manner, but you are used to it being attached to the screen.” This evaluator is decidedly neutral in discussing this way of viewing images (Q7). She walks through the motions of the task, but there is nothing suggesting fun or enjoyment when interacting with the image content. To view images in this manner does not add anything significant to experiencing of seeing image content. We find that remediating flat still-images in this form serves little purpose as users express reservations about its merit. It serves its purpose of shoveling images to the users; they all understand the concept and diligently consume the images. We did not observe any immersive moments with the user using this content in contrast to the 3D content and video content where they conducted themselves differently. C. Experiencing 3D content When users interacted with augmented 3D content, we observed different reactions. We could also observe users perceiving the 3D content as significantly different than the 2D content. Technically, the image content is more or less the same, a mesh with a texture on it; particularly one user got excited about 3D content to a greater extent than 2D content (Q8, Q13, Q14, Q15). Q8 - R7: “Wow! This is something completely different. This looks like game element perhaps? What is it?” They spent a significantly more time exploring the 3D content than the flat image and video content. Users engaged with the model and looked at it from different angles, eager to discover features and details. This may suggest that the exploration of the 3D model itself is a captivating and fun activity (Figure 3). Q9 -T: “Is there an angle you prefer?” R3: “This had depth, definitely.” R3: “It's more fun than standing over, it makes it flat. The fun part is that it is 3D.” R3: “It's tempting to touch it. I want to play with the windmill.” The evaluator notes that watching the model from above gives the model less depth (Q9). This is a problem when developing AR content for print media. In this application, we achieve optimal tracking when the user has entire marker in view and a fair amount of high contrast pixels available for tracking. However, the 3D content itself achieves the most depth in conjunction with the print media at steeper angles. This is one of the main issues with freeform interaction with MAR content. It is difficult and time-consuming for a designer to develop content that is equally appealing from all angles and distances. Remediated content perhaps to a greater degree is created for other purposes, like effect shots in movies or informational content, rather than content to be viewed from above. Q10 - R1: “It’s a 3D model of a troll.” R1: “Stands still breathing.” R1: “Waiting for something, Norwegian Christian’s mans blood.” The users notice subtle movement (Q10) in the content and express more emotions and attention when interacting with the Raglefant troll. Q11 - R1: “Being that it is an animal, it is tempting to look at him, in eye height.” Q12 - R6: “I’m thinking I want to see him from a different perspective. I want to view him from the front.” Some users want a deeper interaction with this content than the 2D content; they seem to require more interaction (Q11, Q12). We suspect the fact that it is a humanoid creature; the experience may be degraded by the fact that the content is not trying to interact with the users. It appears inanimate and behaves unnaturally. Q13 - R5: “I’m trying to press it, since that is what the theme from before.” R5: “YARRR!” (Participant mimics the noise of the troll) R5: *Laughing* Q14 - R7: “Oh oi! This is a Raglefant yes!” The user exclaimed when the troll appeared in sight. Q15 - R7: “Hello hello!” Figure 5. Observation of a user using the piece of paper with the marker on it to manipulate the 3D content. 50 http://www.i-jim.org PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS R7: “It got a bit annoyed when scratching it on the belly.” R7: “Nice!” These users experienced fun, R5 (Q13) mimicking noises, almost forgetting the experimental setting and R7 (Q14, Q15) exclaiming and talking to the model. The 3D content, not surprisingly lends itself to greater immersion than the 2D content. Despite a full screen icon being present throughout the interaction with the two content types, most of the users did not use it after first trying it out on the moving castle (Q16). One participant uttered this about going into full screen mode: Q16 - R4: “Let's to full screen mode.” R4: “Now its in full screen, but that is not an improvement.” R4: “Its a lot cooler to not see it in full screen.” Those that used it found the interaction in this mode to be less compelling and to some degree pointless (Q17). Q17 - A: “In contrast to what you mentioned before about that you want to see images filling the screen. How about this? Does this feel more natural?” R6: “This is more like an object you should explore. You get the feeling that this object is sitting on the paper on the table, and that you are filming it in a way.” The users may feel this way because it is fun to "film" intriguing objects. If they think they are filming in 3D modus, this lends to the user experience while viewing it without its real world context lessens the immersion. D. Physical interaction beyond the device with 3D content How the users relate to the augmentations in physical space sheds light on how users perceive the content in combination with the markers. 1) Interacting with the troll We observed different approach to interaction when the users engage with the troll than other content types. Two of the users tried to touch the augmentation in itself (Fig. 7), they swiped their fingers across the area where the augmentation appeared and expected some reaction. We could not observe this interaction with the 3D castle. This leads us to believe that natural interaction for content on markers should also allow the users see some response from the interface when performing this type of interaction. 2) Exploring 2D content During the evaluations of the flat content, all the users except one used their bodies and arms to align markers to the view of the mobile device. The users found the angle where the content was right side up and looked at it from that angle. None of the users wanted to see 2D content up side down (Fig. 4,6). 3) Exploring 3D content using markers During the exploration of 3D content, only three participants actively engaged with the markers (Fig. 5). • R7 rotates troll marker and castle marker. • R5 rotates castle marker. Figure 6. Image slideshow on marker (top), in full screen (bottom) Figure 7. 3D content in Arad iJIM ‒ Volume 8, Issue 4, 2014 51 PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS • R2 rotates the castle marker. This observation may lead us to believe that many users may not engage with the marker at all, and their most natural way of interacting with 3D content will be by using the device when they are novice users of the technology. The affordance of markers and the content surrounding the markers should invite to interaction to enhance the user experience of 3D mobile AR content. E. Mixing paradigms Late in the development cycle the approach to experiencing "flat" content were changed. Instead of showing images and videos on the marker tracked in 3D first, the flat content – images and video will appear flat in full screen mode first. This mimics the known paradigm of QR codes and came as a request by the media outlet. The intention was that users would find QR code interaction more familiar than the AR paradigm, where content appears tracked on the marker first. Users could touch a small icon, and 2D content would appear augmented on the marker. The evaluators had different views of this functionality. Through analyzing the videos we can observe how the participants chose to experience the video and image content. The task presented to the users before experiencing the image and video data was as follows: They should point the phone at the marker and maneuver between full screen and marker mode (Fig. 6), tap through the slideshow and play/pause the video. When we analyze one participant thinking aloud we observe that starting the movie in full screen mode is confusing (Q18, Q19). Q18 - R4: “A film just appeared here. Did I do it correctly then do you think?” Q19 - R4: “I have... I got it right there somehow. That was very cool.” A: “Do you know how? Can you reverse the process? R4: “I’m not sure. I got it right. Something happened afterwards. It did not happen before when I wanted to do it. Now it happened, now its fastened to the sheet of paper. And when I press play it plays” A: “Ok” R4: “That was pretty neat. If I press, then I get it in full screen there. Now I understand what I am doing. I didn’t understand it previously. No, I have to find the marker there yes. And then I can click play, and take it to full screen from there. And it goes elegantly to full screen. It took some effort to understand.” The user does not understand how the film appeared in full screen. The connection between the marker and content is unnoticed by the user. After some trial and error, R4 manages to point the phone towards the marker when exiting full screen mode. Since he does not need to point the phone at the marker while in full screen mode, he has lost his point of reference We asked the participants during the retrospective THA about the full screen-first approach “What do you think about the movie and image slideshows start in full screen mode”. Some users responded negatively (Q20, Q21, Q22). Q20 - R3: “It steals from the AR experience because the first thing that shows do not show the potential it has. If you didn’t expect it you maybe hadn’t... Why should I do anything with it?” Q21 - R4: “I think it should have started down on the screen, not in full screen.” Q22 - R5: “I would have started it in world-modus because ... when a movie plays, or a slideshow is running you are experiencing the content already. I don’t need to make more of itself, there and then.” Some participants were indifferent to it, but acknowledge its potential to confuse (Q23). Q23 - R6: “It may be a little confusing in the start if you have to do something to make it be on the marker.” R6: “But I did not think about it really. It is perfectly fine.” However, two of the seven participants liked that the content started in full screen mode, and disliked the idea of having to point towards a marker (Q24, Q25). Q24 - R1: “I thought it loaded very quickly, and if you are not needed to hold the camera towards the symbol, was very all right. So I liked that very much.” Q25 - T: “If you could choose, would you like it to start in full screen or on the marker?” R7: “Full screen then.” The idea of having content pop up in full screen mode first seems not to be favored by the evaluators. However, the option to exit from AR-mode to full screen seems to be welcomed (Q26). Q26 - R2: “I really like the functionality of being able to go in and out of full screen.” T: “What do you like about that?” R2: “That you are independent of the marker when you have started it.” R2: “In addition to that the content is presented in a better way.” It may be tempting to align AR close to a medium like the familiar QR language where the user point a phone on a marker and content fills the screen. However, in this case we find that it creates inconsistencies for the user, and makes them confused. Some experienced difficulties when pointing the camera towards a marker initially. Eventually, they made the cognitive leap from content to the marker. The idea of a marker did not become obvious until the content appeared on the marker itself. So having content appear in full screen right away may provide the user little chance of making the connection between the marker and content. F. Preferred content During the retrospective think aloud session users elaborated what content they found most enjoyable. The respondents answered that they favored the 3D content as well as the video content. The users seemed to favor the 3D content. This is in line with the observations in regards to how they acted when experiencing the different types of content. The image content most users found uninteresting. However, we find it intriguing that the video content also was represented among the respondents as the favored content. IX. IMPLICATIONS By analyzing the findings we can summarize some design implications for future attempts to remediate content in this manner. There is a need to develop a metaphor for interacting with content beyond the touch screen, people intuitively seeks to touch an augmentation. Hurst et al [25] also points out the need for gesture interaction in AR via finger tracking. The data reveals that interaction does not need to be complex. Simple tracking of a swipe gesture in front of the camera may be enough to support greater immersion with AR content. 52 http://www.i-jim.org PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS We can see that the users attempt to engage more directly with humanoid and animal creatures and find it awkward that they appear lifelike, but do not interact naturally with the user. This is similar to the findings of Wagner et al [26] where they found that a virtual character not relating to them made them uncomfortable and could even be perceived as offending. When creating 3D content to be displayed on markers in print media, we find that users will try to optimize their viewing angle to the content. However, when dealing with content directly remediated from another medium care should be taken – if possible – to reorient the content. This is to ensure that it appears nice when viewed from a downward angle. When users interact with 2D content, we find that they prefer to start viewing it on a marker. In some cases, they bring it to full screen mode to take a closer look. However, when the user is trying to align the content to the display area, some automation should take place to optimize its viewing properties so the user can experience it in its native format on screen. As it stands now, markers have not acquired the affordance for rotating. People recognize markers, but they do not use them to their full potential yet. Adding instructions to the markers themselves to afford rotation and manipulation may improve their affordance and encourage users to interact with content beyond maneuvering the mobile device itself. We find that image content suffers most from being remediated in this manner; no users preferred this content over video or 3D content. Most of the users preferred the 3D content, and some users genuinely enjoyed watching and interacting with the video. It has become increasingly clear that the user experience of remediated AR content is closely related to the affordances of the content itself. Text, images and video have clear and (perhaps culturally) deeply set affordances, 3D content on the other hand does afford a true new media experience. This may be self-evident to any practitioner in AR or MAR, to anyone else it may not. X. AFFORDANCES Based on analysis of the findings we have identified affordances related to Kaptelinin´s instrumental affordances in addition to the learning, maintenance, aggregation, and our own niche affordance of remediation. Firstly it is beneficial to clarify the basic instrumental handling and effecter affordances of our application. This gives an impression of the possibilities for interacting with this technology. The application supports handling through tangible markers, GUI widgets on the display, and the by maneuvering the mobile phone itself. These handling affordances let us effect the augmentation through interaction. We may handle the device (move the device), the GUI (use buttons to move to the next image) and the markers (in the world) to handle the augmentation to affect the augmentation layer as we observe the users do eventually. The effecter affordances come through the device, GUI and marker. The users affect the augmentation primarily through maneuvering the device. The GUI allows users to play and pause videos, move between images and view animations inherent in the 3D content. We find that users rarely handle the markers to affect the augmentations. Ideally the handling and effecter affordances should be easily and intuitively picked up, in some cases this how it transpires. Most users intuitively point the phone at the marker and starts interacting, while some users require directions to start the interaction itself. Hence we recognize the need to identify the "web of mediating" affordances that make MAR viable. A. Maintenance affordance: Maintenance affordance is related to tasks one must perform to operate the IT artifact. As we use CV in the tracking algorithm, both the markers and the environment the markers operates in requires maintenance. This comes in the form of maintaining good enough lighting and maintaining the integrity of the markers. This means they cannot be crumbled too much, and the lighting must be sufficient to support tracking. This is an affordance that is not particularly easy to communicate. Room lighting is not something one immediately thinks of as a problem when interacting with MAR. Human eyes adapt easily to changing lighting conditions, in sharp contrast to CV tracking algorithms. B. Aggregation affordances: Markers have yet to become a convention for point of interaction. Even though the participants interacted with printed surfaces with markers on them, not everyone understood that markers reveal AR content. Markers in themselves poorly afford the desired interaction, and the participants seldom read on-screen instructions. This is an obstacle for AR content in print media at this stage. Newspapers, and sheets of paper do not afford engaging manipulation. C. Learning affordances: Even with onscreen instructions telling the users to point the camera at the marker, this simple instruction would not be absorbed. This is similar to a finding in a study of AR games [18]. The study makes a point of users having trouble understanding what object in the world the AR application is referring. It must be made exceedingly clear what real world objects the application interacts upon. When the content appeared on the marker, affordances related to translating and rotating the objects were immediately recognizable using the phone to rotate the view. Very few of the participants wanted to rotate the printed media to experience the content from different angles. Seldom we find that other AR applications try to shed light on how they work. According to Bødker[14], some applications need to mediate how they work, in contrast to invisible computers. AR relies heavily on computer vision (CV), for a user to be able to correct errors in CV it is useful to understand the limitations of the CV algorithm. To enjoy an AR application we may need a representation of its internal status more clearly. We do provide users with instructions on where to point the marker initially, but we decided against giving the users information about how well they are tracking the markers. This was a conscious decision. We felt that users might focus on achieve optimal tracking, rather than consuming content if iJIM ‒ Volume 8, Issue 4, 2014 53 PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS we provided them with visualization of the current performance of the tracking. D. Remediation affordances: The findings emphasize the need for users to bring the content into a known interaction paradigm. Users often tried to align 2D content to the display insofar that it resembled a familiar way of consuming this type of content. Users liked the idea of being independent of the marker, by enabling them to enter full screen mode to move the device freely while still seeing the content. In the trials we had the content appear in full screen directly, some users found this to be unintuitive as they expected the content to be first in 3D. E. Design guidelines for content remediated through handheld AR: It is necessary to note that these guidelines are intended for handheld augmented reality. They may not have much merit in other AR applications. However, we word them to be usable in general AR applications. 1) Learning affordances • Clearly afford what objects the application is referring to in the physical world to support learning. This can be achieved by use visual clues that make it clear how the software understands the world. • Represent the internal state insofar that users may learn how the CV algorithm performs satisfactory. Signal the inner workings of the tracking algorithm. 2) Maintenance affordances • Afford actions needed to adequately maintain an environment suitable for augmenting. We believe this may be achieved by introducing functions to the hardware: external sensors for sensing light and providing information about the environment to enable users to take steps to improve the performance of the application. 3) Aggregation affordances • Afford the relationship the device has to the trackable. Visual clues can be used to make the users direct the device at the marker. Bear in mind that textual clues may not be perceived. 4) Remediation affordances • Afford easy transition between the known interaction and the augmented interaction of remediated content. Users will try to view 2D-content as they are used to in other media, we support this by allowing users to enter a mode where they are not dependent on the marker. • 3D content should afford their value from natural angles. As 2D content can easily be viewed from a bird’s eye view, this is not the case for 3D content. 3D content needs to be designed explicitly to support viewing from a bird’s eye view. • Afford the same entry point for interaction even though the content is different We experienced that our effort to align the interaction closer to a QR paradigm would confuse the users rather than improve the user experience. • Humanoid character that afford interaction should allow interaction with the user Users may become uncomfortable or provoked if humanoid characters that afford interaction are unresponsive. XI. CONCLUSIONS This study describes an application using AR to augment print media with remediated content. We find that remediation of video content in an AR context seems promising. Users find it at times awkward to view flat (2D) content in a marker-context and like the functionality allowing them to bring flat content into a full screen mode. Image slideshows in the AR context has little merit amongst the users and contributes little to an overall positive user experience of the application. 3D content provides users with the greatest level of immersion and users engaged more directly with this type of content. A consequence of MAR being a new medium it may be tempting to leverage existing maladapted content to help saturate this new platform with content. Case in point being the idea of using a QR-code metaphor, and 2D images because it is familiar and easy. We believe this degrades the user experience, firstly because it makes the MAR medium to some degree pointless and it adds little to the user experience. Affordances related to aggregation, learning, maintenance and remediation is described in this paper. These affordances and can be used in combination with design guidelines to assist design of remediating MAR applications. These guidelines can be utilized to create a better user experience of content for the MAR platform. Overall we conclude that the ARad application gives a fun user experience, and we argue that the true potential in remediating content to MAR lies in the user experience of 3D content. XII. FUTURE RESEARCH In this study we have looked at remediated content from a newspaper to the MAR domain. Future research would be into the affordances of MAR games. The maintenance and learning affordances we expect to remain similar. However, in regards to aggregation and the remediation of game content and interaction we expect some differences. Game developers adapting games for the MAR domain may seek to remediate concepts and content from game development to the MAR application. It would be interesting to investigate if this is the case, and if so what shape and form the content and conventions take, and to identify design guidelines for game interfaces in MAR applications. REFERENCES [1] D. Wagner and D. Schmalstieg, “First steps towards handheld augmented reality,” presented at the ISWC '03 Proceedings of the 7th IEEE International Symposium on Wearable Computers, 2003, p. 127. [2] R. Azuma, “A survey of augmented reality,” Presence- Teleoperators and Virtual Environments, vol. 6, pp. 355–385, 1997. [3] J. D. Bolter and R. Grusin, Remediation - Understanding New Media, Paperback edidtion. The MIT Press, 2000, pp. 1–290. [4] B. Macintyre, J. D. Bolter, E. Moreno, and B. Hannigan, “Augmented reality as a new media experience,” presented at the 54 http://www.i-jim.org PAPER AFFORDANCES IN MOBILE AUGMENTED REALITY APPLICATIONS Augmented Reality, 2001. Proceedings. IEEE and ACM International Symposium on, 2001, pp. 197–206. [5] J. D. Bolter, M. Engberg, and B. Macintyre, “Media Studies, Mobile Augmented Reality, and Interaction Design,” interactions, ACM, Jan-2013. [6] M. Engberg, “Writing on the world: augmented reading environments,” Sprache und Literatur, pp. 67–78, Feb. 2012. [7] L. J. Rosenblum, S. K. Feiner, S. J. Julier, and J. E. Swan, “The Development of Mobile Augmented Reality,” in Expanding the Frontiers of Visual Analytics and Visualization, no. 24, London: Springer London, 2012, pp. 431–448. http://dx.doi.org/10.1007/ 978-1-4471-2804-5_24 [8] J. V. Pavlik and F. Bridges, “The Emergence of Augmented Reality (AR) as a Storytelling Medium in Journalism,” Journalism & Communication Monographs, 2013. [9] D. Schmalstieg, T. Langlotz, and M. Billinghurst, “Augmented Reality 2.0,” in VIRTUAL REALITIES, no. 2, Wien: Virtual Realities, 2011. [10] J. Swan and J. Gabbard, “Survey of user-based experimentation in augmented reality,” presented at the Proceedings of 1st International Conference on Virtual Reality, 2005. [11] F. Zhou, H. B.-L. Duh, and M. Billinghurst, “Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR,” presented at the ISMAR '08: Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality, 2008, pp. 193–202. [12] M. De Sà and E. F. Churchill, “Mobile Augmented Reality: A Design Perspective,” in Human Factors in Augmented Reality Environments, no. 6, W. Huang, L. Alem, and M. A. Livingston, Eds. New York: Springer, 2013, pp. 139–164. [13] V. Kaptelinin and B. Nardi, “Affordances in HCI: toward a mediated action perspective,” presented at the CHI '12 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 967–976. [14] S. Bødker and P. B. Andersen, “Complex mediation,” Human- Computer Interaction, vol. 20, no. 4, Dec. 2005. http://dx.doi.org/10.1207/s15327051hci2004_1 [15] M. McLuhan, Understanding Media - The extensions of man. McGraw-Hill, 1964. [16] G. A. Lee, U. Yang, Y. Kim, D. Jo, K.-H. Kim, J. H. Kim, and J. S. Choi, “Freeze-Set-Go interaction method for handheld mobile augmented reality environments,” presented at the VRST '09: Proceedings of the 16th ACM Symposium on Virtual Reality Software and Technology, 2009. http://dx.doi.org/10.1145/1643928.1643961 [17] J. Iivari, “A paradigmatic analysis of information systems as a design science,” Scandinavian Journal of Information Systems, vol. 19, no. 2, 2007. [18] T. Gjøsæter and K. Jørgensen, “Combining Think Aloud and Comic Strip Illustration in the Study of Augmented Reality Games,” presented at the NOKOBIT 2012, 2012, pp. 1–21. [19] K. A. Ericsson and H. A. Simon, “Verbal reports as data.,” Psychological review, vol. 87, no. 3, pp. 215–251, 1980. http://dx.doi.org/10.1037/0033-295X.87.3.215 [20] A. Dix, J. E. Finlay, G. D. Abowd, and R. Beale, Human- Computer Interaction (3rd Edition), 3rd ed. Prentice Hall, 2003. [21] A. Dünser and E. Hornecker, “An Observational Study of Children Interacting with an Augmented Story Book,” presented at the Edutainment'07 Proceedings of the 2nd international conference on Technologies for e-learning and digital entertainment, Berlin, 2007, pp. 305–315. [22] F. Liarokapis, L. Macan, G. Malone, G. Rebolledo-Mendez, and S. de Freitas, “A Pervasive Augmented Reality Serious Game,” presented at the Games and Virtual Worlds for Serious Applications, 2009. VS-GAMES '09. Conference in, 2009, pp. 148–155. [23] “Video in Qualitative Research,” 2010. [24] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” Mis Quarterly, vol. 28, no. 1, pp. 75–105, 2004. [25] W. Hürst and C. Wezel, “Gesture-based interaction via finger tracking for mobile augmented reality,” Multimedia Tools and Applications, pp. 1–26, Jan. 2012. [26] D. Wagner, M. Billinghurst, and D. Schmalstieg, “How real should virtual characters be?,” presented at the ACE '06 Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer entertainment technology, 2006. AUTHOR Tor Gjøsæter is with University of Bergen, Norway. Submitted 17 July 2014. Published as resubmitted by the authors 14 October 2014. iJIM ‒ Volume 8, Issue 4, 2014 55 iJIM – Vol. 8, No. 4, 2014 Evaluation of Augmented Reality Frameworks for Android Development Affordances in Mobile Augmented Reality Applications