International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 13, No. 3, 2019


Paper—Augmented Reality User Interface Evaluation  

Augmented Reality User Interface Evaluation  

Performance Measurement of Hololens, Moverio and Mouse Input 

https://doi.org/10.3991/ijim.v13i03.10226  

Alaric Hamacher(*), Jahanzeb Hafeez, Roland Csizmazia 
Kwangwoon University, Seoul, Republic of Korea 

stereo3d@kw.ac.kr 

Taeg Keun Whangbo 
Gachon University, Seongnam, Republic of Korea 

Abstract—Recent innovation in the field of Augmented Reality (AR) and 
Virtual Reality (VR) has brought new devices on the market. The price for con-
sumer products dropped significantly. Many industries see a big future in AR 
business and applications. The present research focuses on the user input per-
formance of these AR-devices. This paper proposes an evaluation procedure us-
ing a server based input interface with a built-in assessment control. The eval-
uation is performed by test persons exposed to two AR devices: Microsoft Ho-
lolens and Epson Move- rio BT-200. A conventional mouse input is used as a 
benchmark. The assessment reveals a trend of strength and weaknesses of each 
device and can orient developers to create more optimized AR experiences and 
improve the user experience. 

Keywords—Augmented Reality, Input, Performance 

1 Introduction 

New generations of AR device emerge at every major trade show. The market 
seems to advance to maturity and as the base of users increases many industries fore-
cast a successful future in AR services, businesses and applications[1]. 

While the field of application is constantly increasing, a general problem in AR 
remains: human interaction. Research has been done to find appropriate, generic and 
new input methods for AR. These methods include gestures, voice input, trackers, 
markers or other haptic devices. 

Although many methods have been proposed in this sector, not all interfaces pro-
posed by AR devices are as easy and efficient as a mouse. The following can be ob-
served: gestures are imprecise; input is slow; cloud processed voice input causes pri-
vacy problems; input methods require training similar to acquiring game skills[2]; 
input methods require custom content. 

Manufacturers generally rely on their own Software Development Kit (SDK)s to 
provide interaction. Sensors and trackers are often so specific that content needs do be 

iJIM ‒ Vol. 13, No. 3, 2019 95


Paper—Augmented Reality User Interface Evaluation  

developed with a target device in mind. Most of the time the experience will be relat-
ed to a certain manufacturer of technology. Specific aspects of the performance or 
operation of AR devices have been evaluated for example for the Google Glass pro-
ject in first experiences for lectures described by Ebner et al. [3]. 

The observation of these general problems and the underlying thoughts have moti-
vated the present research. The purpose of this paper is to present an approach to a 
homogenized evaluation method in order to create a scientific assessment of the per-
formance of existing input methods. 

The most popular AR devices, Microsoft Hololens and Epson Moverio, have been 
selected for this research. 

2 Previous Research 

Most input devices are not specially designed for AR interaction. However some 
dedicated special devices exist for interaction with AR systems[4]. This section intro-
duces some examples of so-called generic AR interfaces. 

2.1 Tangible interfaces with tiles 

This method was developed by Poupyrev et al. in early 2000 for AR, when the 
technology was in its infancy. Although performance of devices was very limited, 
many applications could already be foreseen [4]. This AR interface relied on a set of 
tiles. Acting like graphical boards, they could be overlaid by AR with symbols and 
custom designs. Since AR platforms imply awareness of the environment in form of 
video scanning, the idea behind this method is to use the mentioned tiles as optical 
markers. The computer system attached to the AR environment should then perform 
two tasks: 

First, track the markers in the real works and map predefined object on them to 
give them a design and signification inside the computer application. The second task 
is to follow the interaction with theses semi-virtual objects. The tiles can be used to 
trigger actions. Such as for example copy, paste or delete. These tiles could not only 
serve to interact with one single application, but perform the same task in different 
applications[5]. 

2.2 Two handed interface for AR 

Szalavári and Gervautz developed a specific AR interface called Two Handed In-
terface for AR[6]. Similar to the previous example, this interface is tangible. This 
means it can be interacted with by touch as if it was a real world object. Similar to the 
tile interface it is also of versatile appearance: the AR system is tracking the object 
and overlaying it the texture and interface elements that are desired for the interaction. 
This interface consists of a track pad with a pen pointer. Both are held in front of the 
view of the AR device, so they can easily be tracked and registered by the computer 
system. Multiple functions can be assigned to this pair of track pad and pen. Figure 1 

96 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

shows a simulation in form of a stylus and a tablet. The shape of the interface can be 
changed as desired to give the user the impression to interact with simple buttons or if 
necessary with sliders allowing inputting more precise values. 

 
Fig. 1. Reconstruction of two hand interface for AR, Szalavári and Gervautz 

The two handed interface for AR is an approach of tracking and mapping. It re-
sembles in VR to the input sticks and trackers provided for the HTC Vive. 

Another application using AR devices for tracking markers and hands are de-
scribed by Menezes in his U-Academy learning modules showing how markers and 
hands can be used for advanced interaction in AR[7]. 

2.3 Palm Type 

The human hand offers different possibilities for AR applications. Recent research 
shows how it can be used as an input or to visualize human hand anatomy as de-
scribed by Boonbrahm et al.[8]. Wang et al.[9] developed a keyboard projected on the 
users hand to create a virtual input device as shown in figure 2. A similar approach 
had already been conceived by Dezfuli et al. for a palm based television remote con-
trol[10]. 

Palm Type was originally developed as an enhancement for the Google Glass pro-
ject, which is similar to the Epson Moverio BT-200. Analog to the previously de-
scribed methods, Palm Type provides a tangible user interface. In this case it the us-
er's palm with little segments that remind a typewriter key board and using the body 
as a virtual input surface [11]. However, with some training the users can learn to map 
this mental keyboard to the lines and knuckles of the palm.  

As opposed to the previously shown methods, the authors also perform a series of 
assessments to evaluate the performance of the new input method. The results are 
presented on the one hand as a numerical performance value, showing how many 
words per minute a user will be able to input on such an Palm Type keyboard for VR. 
On the other hand, the test persons are asked to rate the experience after the assess-
ment. The evaluated is measured on a scale from zero to ten as shown in the results 
displayed in figure 3. 

iJIM ‒ Vol. 13, No. 3, 2019 97


Paper—Augmented Reality User Interface Evaluation  

 
Fig. 2. Palm Type schema 

 
Fig. 3. PalmType subjective evaluation 

The numerical results are published using the numerical benchmark Words Per 
Minute (WPM) ranging between 9.19 and 10.1. If word input is counted for AR de-
vices, it is important to remember that writing performance is usually evaluated in 
Characters Per Minute (CPM)[12]. This is usually a requirement for typists. Real 
world values are 200 to 400 characters per minute entered on a keyboard, which cor-
responds approximately to one hundred WPM. 

The work on Palm Type contains two important features that have inspired this re-
search: First, tested devices are compared to an everyday device to set an independent 
benchmark. In this case, it will be a mouse attached to a laptop. Second, the perfor-
mance is not only measured numerically, but it is followed by a subjective evaluation 
to reflect the overall satisfaction. 

3 Method 

Bach and Scapin state in their research [13] that a single assessment method for 
measuring Mixed Reality Systems (MRS) does not yet exist. According to the au-
thors, this is due to the following factors: The field of AR is large and specialized. It 
is not easy to find experts who are competent for all systems. Many limitations lie in 
the technology itself, not easily to be measurable and traceable. The overall aim of the 
present assessment is to require as little instruction as possible and to give as much 

98 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

introduction as necessary. The test persons should not be biased by the operators or 
the technology. 

Therefore, the following measures were taken: 

• Random order of experiments 
• Instructions integrated in the assessment 
• Test persons can run the assessment alone 
• Test persons are chosen outside the lab environment 

These precautions aim to eliminate most limitations in order to obtain significant 
evaluations. The following sections describe the different tests and methods that have 
been developed to perform the assessment. 

3.1 Comparing and Benchmarking 

The purpose of this research is an evaluation of the objective and subjective per-
formance for AR input devices. For this a series of assessments using three different 
input methods as shown in figure 4 will be conducted: 

 
Fig. 4. Overview inputs methods for performance evaluation 

3.2 Performance measurement 

The following section describes the assessment user interface and the underlying 
server technology driving the assessment and collecting the results. 

3.3 Assessment interface 

The left part of figure 5 shows the interface visible to the user on the different AR 
devices. The top line (A) contains a small space for the instruction. In case of tasks 
with timeout, the background of the instruction space can optionally display a pro-
gress bar indicating the remaining time. Element (B) is a vertical slider. This element 
allows selecting values between 0 and 10. The lower part of the interface displays two 
large buttons. One labeled start (C) to begin the assessment. The other button (D) 
reads ok and can be triggered when the user has accomplished a task. 

iJIM ‒ Vol. 13, No. 3, 2019 99


Paper—Augmented Reality User Interface Evaluation  

The assessment manager sees a different interface: It contains a text-field with the 
XML-assessment (E). The controls (F) for loading, starting the assessment and select-
ing the assessment devices. The lower side displays a real time log viewer (G) and a 
text-field displaying the XML-results (H). 

 
Fig. 5. Assessment interface: for user (left) and for manager (right) 

3.4 Assessment server 

In order to provide a reusable test environment for a large number of AR devices, 
the previously described user interfaces are generated by a server equipped with a 
wireless access point. The AR devices can connect to the access point and display the 
assessment interface using a web view element or web browser. In the present re-
search server and access point were implemented using a Raspberry PI. This approach 
has two advantages: First, it assures a unified user experience. Second, it simplifies 
the assessment creation by not using any platform specific development environ-
ments. 

3.5 Survey for subjective evaluation 

The second part of the assessment consists of a subjective evaluation. Each test 
person is asked to answer questions and to rank their experience after having com-
pleted the tasks on the assessment server. 

While many methods exist for such evaluations such as the Likert Scale [14] with a 
range form one to five or one to seven, these systems aim at identifying the affirma-
tion of a certain hypothesis if form of statements such as: “I strongly agree” or “I 
strongly disagree”. While this method has many advantages to identify opinions and 
to reflect test person’s attitude, It is generally difficult to create a mean or to sum up a 
certain statement, which might even be contradictory [15]. 

100 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

For this reason, the test persons are asked to evaluate certain factors on a linear 
decimal scale ranging from zero to ten for these categories: 

• Overall comfort (easiness to wear) 
• Learning (effort necessary to learn operation)  
• Efficiency (evaluation after own testing) 
• Precision (evaluation after own testing) 
• Frustration (description of level during testing)  
• Mental demand (description of level during testing)  
• Physical demand (description of level during testing) 

The lower number describes a negative or uncomfortable experience. The higher 
number describes a positive or comfortable experience. The test persons are given 
unlimited time after the experiment to fill out a questionnaire for the survey. 

Table 1.  Assessment period 

Phase Start End Persons 
Pretest 0 0 7 
First Session 0 0 17 
Second Session 0 0 10 

4 Results 

This section presents the results of survey and assessment. The first part describes 
composition and structure of the samples. The second part exposes results and subjec-
tive evaluation. 

4.1 Overview and demography 

The head of the questionnaire for the subjective evaluation includes general demo-
graphic information and an identification number to relate the subjective evaluation 
with the results measured by the assessment server. Most of the test persons are stu-
dents in Seoul, South Korea. Other participants are partly teaching stuff, researchers 
and students whose major is media, converged software or information contents. Ta-
ble 1 shows time ranges and sample amount of each assessment. 

The assessment period took place between end of May and beginning of June in 
2018. A series of seven pretests were conducted on 2018 May 18. These pretests had 
the purpose to identify ambiguities in the questionnaire and to optimize the survey. 
Tests on the user interface helped to identify problems and to improve the assessment 
process. The principal test session took place may 24 and 25 in the VR Medial Lab of 
the Kwangwoon University. The duration of each assessment was approximately 30 
minutes for each device. Assessment assistants verified the proper functioning of the 
equipment. The test person received a brief introduction how to operate each device: 
HoloLens, Moverio BT-200 and a laptop with an ordinary office mouse served as 
configuration for benchmarking. 

iJIM ‒ Vol. 13, No. 3, 2019 101


Paper—Augmented Reality User Interface Evaluation  

The first series of tests run for 18 persons without any timeout for the participants. 
They had all the necessary time to perform all the required tasks until they judged it 
completed. A second series of tests was performed on additional group of ten test 
persons on June 7 and 8. These test persons received the same questionnaire, but had 
to perform the tasks with a specific timeout for each device. 

The order in which the participants assessed each device was chosen randomly in 
order to exclude this factor’s influence on the participant’s performance. Participants 
have not been selected by any criterion but accepted as a random group. Their partici-
pation was voluntary. The total number of participants is 27. Six of them were fe-
males, 21 were males. Table 2 shows the gender composition. 

Table 2.  Assessment participant gender 

Gender Amount 
Female 6 
Male 21 
Total 27 

4.2 Responses without timeout 

The first set of results, as displayed in table 3, shows the mean interaction for each 
sample and device in absolute time in milliseconds. A resemblance in the pattern can 
be observed for all the samples: HoloLens has on average the longest response time, 
the mouse is in most cases the fastest input device. 

The original aim to account for errors in the input methods, seemed biased by the 
fact that the test persons generally take as much time as needed in order to complete a 
task without any error. Table 4 reveals the mean and median response time for each 
device. 

Table 3.  Mean Response Time per Sample and Device in ms 
Nr Hololens Moverio Mouse 
1 15375 8040 4690 
2 10285 9858 4179 
3 8544 8486 4100 
4 17826 9121 3079 
5 10050 12088 3894 
6 32658 9424 4061 
7 17670 7811 4532 
8 25788 9711 2931 
9 12368 15953 4319 

10 15634 9866 3769 
11 17925 24214 5096 
12 17925 13252 3700 
13 16364 8472 5010 
14 7900 12180 3024 
15 37937 9300 2621 
16 24061 9545 3482 
17 10756 21802 5300 

102 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

4.3 Responses with timeout 

The second set of samples was obtained by using the mean response time of each 
device as a timeout for the assessment. The second set of samples shows a much 
smaller deviation compared to the first one. Table 5 shows the mean response time 
and deviation of the first session without timeout and the second session with timeout. 

The deviation of the response time remains much closer to the mean on all devices. 
Some people are still faster than the average limited by the timeout. Even if it is much 
smaller, the variance between test persons with fast and slow reaction times is the 
largest on HoloLens, and the smallest when using the mouse. 

Table 4.  Mean and Median Response Time per Device Data in ms 

 Hololens Moverio Mouse 
Response Time (median) 16364 9711 4061 
Response Time (average) 17592 11713 3987 

Table 5.  Mean Response Time per Device and Standard Deviation in ms Data 

Devices Mean without Timeout SDV Mean with Timeout SDV 

Hololens 17592 8368 11867 2595 

Moverio 11713 4754 10361 1473 

Mouse 3987 793 3851 212 

4.4 Subjective evaluation results 

While most of the assessment performance could be measured in interaction and 
response time by implementation of an assessment server inside the user interface, the 
subjective evaluation occurred with no time constraint and requested the test person to 
rank their experiences. 

Figure 6 shows the result of the subjective evaluation among all test subjects after 
the assessment. The result show some obvious effects: First, there seems to be a gen-
eral order in all the categories, attributing the best properties to mouse as input inter-
face followed by Moverio and eventually HoloLens. Second: While some device 
input methods are evaluated below the average, no input method is really evaluated 
with zero points. In all cases, the mouse as interactive input seems to represent the 
ideal case ranging in almost all cases at the top with eight or nine in average. Table 6 
shows the subjective evaluation overview data. 

iJIM ‒ Vol. 13, No. 3, 2019 103


Paper—Augmented Reality User Interface Evaluation  

 
Fig. 6. Subjective evaluation results, all devices cumulative 

Regarding the overall comfort, me mouse ranks at the top of the evaluation. Ho-
lolens is evaluated 3.5 as the most uncomfortable among the tested devices. 

The results regarding learnability show HoloLens ranks last one on this category, 
meaning that most test persons considered it the hardest to learn how to correctly 
interact with the device. 

Efficiency is evaluated below average on HoloLens with a score of 4.0. While the 
mouse is evaluated the most efficient input device with a score of 8.4, Moverio ranks 
above the average. 

Regarding precision, all AR devices are evaluated less precise than the mouse. 
Among the AR devices, HoloLens ranks the lowest. 

Table 6.  Mean subjective rating overview, samples N = 27 

 
The level of frustration is comparable to the previous scores. The test persons eval-

uate HoloLens with a score of 4.7, which is ranks in a neutral region of frustration. 
All other devices seem less frustrating to use. 

The scores regarding mental and physical demand required for the operation of the 
device show again that the mouse ranks the highest. With a score of 3.4 HoloLens has 
the lowest ranking in physical demand, which means that the test persons judged it to 
require the most efforts in order to operate the device properly. 

question Hololens Moverio Mouse 
Overall Comfort 3.5 6.2 8.5 
Learning 5.4 7.6 8.7 
Efficiency 4 6.8 8.4 
Precision 4.2 6 8.4 
Frustrating 4.7 5.8 8.5 
Mentally Demanding 5.4 6.6 8.7 
Physically Demanding 3.4 6.7 8.7 

104 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

4.5 Future research 

The present research shows a trend in the performance of the tested AR input de-
vices. Future assessments should increase the amount of samples in order to gain a 
higher significance regarding operation and subjective results. The subjective evalua-
tion was conducted after the practical assessment. Future research should include an 
additional questionnaire to measure the expectations of the user before the assess-
ment. This would allow to draw additional conclusions toward expected and real 
performance. 

5 Conclusion 

The present research has proposed a method for objective and subjective perfor-
mance evaluation using an assessment server in combination with user surveys. The 
present research indicates that performance and satisfaction of the contemporary AR 
devices far from being satisfactory. Although the technology makes big steps forward, 
assessment metrics indicate that there is a need for further improved human input 
devices. New input methods, such as gestures or touch devices emerge in AR, howev-
er most of them are ranked far behind traditional input methods such as the mouse. 
This research shows that performance of human input interfaces in AR still has large 
room for improvement of overall performance, satisfaction and user comfort. 

6 Acknowledgement 

The present research has been conducted by the Research Grant of Kwangwoon 
University in 2018. 

7 References 

[1] Francesca Bonetti, Gary Warnaby, and Lee Quinn. Augmented reality and virtual real-
ity in physical and online retailing: A review, synthesis and research agenda. In Aug-
mented Reality and Virtual Reality, pages 119–132. Springer, 2018. 
https://doi.org/10.1007/978-3-319-64027-3_9 

[2] Valerie J Shute and Fengfeng Ke. Games, learning, and assessment. In Assessment in 
game-based learning, pages 43–58. Springer, 2012. https://doi.org/10.1007/978-1-
4614-3546-4_4 

[3] Markus Ebner, Herbert Mühlburger, and Martin Ebner. Google glass in face-to-face 
lectures-prototype and first experiences. International Journal of Interactive Mobile 
Technologies (iJIM), 10(1):27–34, 2016. https://doi.org/10.3991/ijim.v10i1.4834 

[4] Ivan Poupyrev, Desney S Tan, Mark Billinghurst, Hirokazu Kato, Holger Re-
genbrecht, and Nobuji Tetsutani. Developing a generic augmented-reality interface. 
Computer, 35(3):44–50, 2002. https://doi.org/10.1109/2.989929 

iJIM ‒ Vol. 13, No. 3, 2019 105


Paper—Augmented Reality User Interface Evaluation  

[5] Ivan Poupyrev, Desney S Tan, Mark Billinghurst, Hirokazu Kato, Hol- ger Re-
genbrecht, and Nobuji Tetsutani. Tiles: A mixed reality author- ing interface. In Inter-
act, volume 1, pages 334–341, 2001. 

[6] Zsolt Szalavári and Michael Gervautz. The personal interaction panel– a two-handed 
interface for augmented reality. In Computer graphics forum, volume 16. Wiley 
Online Library, 1997. 

[7] Paulo Menezes. An augmented reality u-academy module: From basic principles to 
connected subjects. International Journal of Interactive Mobile Technologies (iJIM), 
11(5):105–117, 2017. https://doi.org/10.3991/ijim.v11i5.7074 

[8] Poonpong Boonbrahm, Charlee Kaewrat, Presert Pengkaew, Salin Boonbrahm, and 
Vincent Meni. Study of the hand anatomy using real hand and augmented reality. In-
ternational Journal of Interactive Mobile Technologies (iJIM), 12(7):181–190, 2018. 
https://doi.org/10.3991/ijim.v12i7.9645 

[9] Cheng-Yao Wang, Wei-Chen Chu, Po-Tsung Chiu, Min-Chieh Hsiu, Yih-Harn 
Chiang, and Mike Y. Chen. Palmtype. Proceedings of the 17th International Confer-
ence on Human- Computer Interaction with Mobile Devices and Services - Mo- 
bileHCI ’15, 2015. https://doi.org/10.1145/2785830.2785886 

[10] Niloofar Dezfuli, Mohammadreza Khalilbeigi, Jochen Huber, Murat Özkorkmaz, and 
Max Mühlhäuser. Palmrc: leveraging the palm sur- face as an imaginary eyes-free tel-
evision remote control. Behaviour & Information Technology, 33(8):829–843, 2014. 
https://doi.org/10.1080/0144929X.2013.810781 

[11] Joanna Bergstrom-Lehtovirta, Sebastian Boring, and Kasper Hornbæk. Placing and re-
calling virtual items on the skin. In Proceedings of the 2017 CHI Conference on Hu-
man Factors in Computing Systems, pages 1497–1507. ACM, 2017. https://doi.org/10. 
1145/3025453.3026030 

[12] Scott I MacKenzie. Kspc (keystrokes per character) as a characteristic of text entry 
techniques. In International Conference on Mobile Human- Computer Interaction, 
pages 195–210. Springer, 2002. 

[13] C ́edric Bach and Dominique L Scapin. Obstacles and perspectives for evaluating 
mixed reality systems usability. In Acte du Workshop MIXER, IUI-CADUI, volume 
4. Citeseer, 2004. 

[14] Elaine Allen and Christopher Seaman. Likert scales and data analyses. Quality pro-
gress, 40(7):64, 2007. 

[15] Ron Garland. The mid-point on a rating scale: Is it desirable. Marketing bulletin, 
2(1):66–70, 1991. 

8 Acknowledgement 

The present research has been conducted by Research Grant of Kwangwoon Uni-
versity in 2018. 

9 Authors 

Alaric Hamacher is professor for 3D Contents and Virtual Reality in the Graduate 
School of Smart Convergence, Kwangwoon University. Alaric Hamacher graduated 
in directing and producing from the Academy for Television and Cinema Munich and 

106 http://www.i-jim.org


Paper—Augmented Reality User Interface Evaluation  

holds a MA in Film Sciences from Paris VII. He directed stereo 3D on many 3D 
commercials and corporate movies. His present research focusses on Augmented and 
Virtual Reality, 360VR. 

Roland Attila Csizmazia is professor for statistical programming and office au-
tomation at Kwangwoon University. Currently, he is working on his dissertation at 
Korea University in Industrial Management Engineering. 

Jahanzeb Hafeez currently works at the Graduate School of Smart Convergence, 
Kwangwoon University. Jahanzeb does research in Close Range photogrammetry, 
Structure-from-motion, Engineering and Medicine, Computer Graphics and Algo-
rithms. 

Taeg Keun Whangbo received the M.S. degree from City University of New 
York in 1988 and the Ph.D degree both in Computer Science from Stevens Institute of 
Technology in 1995. Currently, he is a professor in the Department of Computer Sci-
ence, Gachon University. He was also the researcher in Samsung Electronics from 
2005 to 2007. His research areas include Computer Graphics, Deep Learning and 
AR/VR. 

Article submitted 2019-01-29. Resubmitted 2019-02-24. Final acceptance 2019-02-24. Final version 
published as submitted by the authors. 

iJIM ‒ Vol. 13, No. 3, 2019 107