A 3D head pointer: a manipulation method that enables the spatial position and posture for supernumerary robotic limbs


ACTA IMEKO 
ISSN: 2221-870X 
September 2021, Volume 10, Number 3, 81 - 90 

 
ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 81 

A 3D head pointer: a manipulation method that enables the 
spatial position and posture of supernumerary robotic limbs  

Joi Oh1, Fumihiro Kato2, Yukiko Iwasaki1, Hiroyasu Iwata3 

1 Waseda University, Graduate School of Creative Science and Engineering, Tokyo, Japan  
2 Waseda University, Global Robot Academic Institute, Tokyo, Japan 
3 Waseda University, Faculty of Science and Engineering, Tokyo, Japanl 

 
Section: RESEARCH PAPER  

Keywords: VR/AR; hands-free interface; polar coordinate system; teleoperation; SRL 

Citation: Joi Oh, Fumihiro Kato, Iwasaki Yukiko, Hiroyasu Iwata, A 3D head pointer: a manipulation method that enables the spatial position and posture for 
supernumerary robotic limbs, Acta IMEKO, vol. 10, no. 3, article 13, September 2021, identifier: IMEKO-ACTA-10 (2021)-03-13 

Editor: Bálint Kiss, Budapest University of Technology and Economics, Hungary 

Received March 31, 2021; In final form September 6, 2021; Published September 2021 

Copyright: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, 
distribution, and reproduction in any medium, provided the original author and source are credited. 

Corresponding author: Joi Oh, e-mail: joy-oh0924@akane.waseda.jp  

 
1. INTRODUCTION 

In recent years, there has been a considerable amount of 
research and development on the use of supernumerary robotic 
limbs (SRLs) for ‘body augmentation’. In previous studies, 
robotic technology, especially wearable robots, has been 
developed for use as prostheses for rehabilitation purposes. An 
SRL aims to provide its users with additional capabilities, 
enabling them to accomplish tasks that they would otherwise be 
incapable of performing. In this respect, an SRL is different from 
other types of existing wearable robots; a lightweight, sufficient 
torque and a highly manoeuvrable SRL developed by Vernonia 
et al. [1] is a classic example. These robots can be used in any 
context, from helping individuals to perform household chores 
to improving industrial productivity. 

To effectively assist in routine tasks (e.g., opening an umbrella 
or stirring a pot), users require an interface that indicates the 
target point location to the end effector of the SRL without 
requiring them to interrupt their actions. However, such a 

method has not yet been established. Parietti et al. [2],[3] 
developed a manipulation technique in which the operator's 
movements were monitored by a robot, following which the 
robotic arm performed the corresponding movements. Iwasaki 
et al. [4] proposed an interface that allowed the operator to 
actively control the SRL by using the orientation of the face, 
while Sasaki et al. [5] developed a manipulation method that 
enabled more complicated operations of the robotic arm with the 
user’s feet as the controllers. Previous studies have overlooked 
the balance between ensuring the operator’s limbs move freely 
and providing detailed instructions to the SRL, and there are 
further challenges with respect to multitasking in the context of 
daily life. Therefore, in this study, a method for manipulating 
SRLs so that two parallel tasks do not interfere with each other 
is proposed and then evaluated for its usefulness. 

In the present study, a two-stage experiment was conducted. 
This section describes the hypothesis of the method, and Section 
2 presents the method for position instruction along with the 
experimental results. In Section 3, a manipulation method that 
includes posture instructions is proposed and the experimental 

ABSTRACT 
This paper introduces a novel interface ‘3D head pointer’ for the operation of a wearable robotic arm in 3D space. The developed system 
is intended to assist its user in the execution of routine tasks while operating a robotic arm. Previous studies have demonstrated the 
difficulty a user faces in simultaneously controlling a robotic arm and their own hands. The proposed method combines a head-based 
pointing device and voice recognition to manipulate the position and orientation as well as to switch between these two modes. In a 
virtual reality environment, the position instructions of the proposed system and its usefulness were evaluated by measuring the 
accuracy of the instructions and the time required using a fully immersive head-mounted display (HMD). In addition, the entire system, 
including posture instructions with two switching methods (voice recognition and head gestures), was evaluated using an optical 
transparent HMD. The obtained results displayed an accuracy of 1.25 cm and 3.56 ° with the 20-s time span necessary for communicating 
an instruction. These results demonstrate that voice recognition is a more effective switching method than head gestures. 

mailto:joy-oh0924@akane.waseda.jp


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 82 

results are presented, and the two experiments are then 
discussed. Section 4 presents comparisons with other similar 
methods and discusses the limitations, and finally, Section 5 
presents the conclusions. 

The following two elements are considered essential for 
achieving daily support for parallel tasks: 

1) undisturbed movement of the operator's limbs, 
2) an indication of spatial position and posture. 
To date, several hands-free interfaces have been proposed to 

satisfy requirement 1, with some operated by the tongue [6], eye 
movement [7] or voice [8] and used for either screen control or 
robot manipulation (or both). Methods to control robotic limbs 
with brain waves [9] are also being investigated. 

However, this study focuses on requirement 2 and the 
construction of a more intuitive instructional method. When the 
operator provides directions related to a 3D-space location, they 
must accurately indicate the target point. The field of view, within 
which a person can perceive the shape and position of an object, 
is as narrow as 15 ° from the gazing point [10]; hence, to 
compensate, it is necessary to direct the face and gaze in the 
instructional space to provide spatial position instructions. The 
interface proposed in this study takes advantage of this 
compensatory action and uses it as an instruction method. 

Methods for using the head as a joystick have already been 
proposed. One method involves the manipulation of the head 
for instruction in a 2D plane, such as on-screen operations [11]. 
Another method involves switching between the vertical and 
horizontal planes by nodding towards the plane to be 
manipulated, supplementing the plane manipulation by the head 
so that only the head is used to manage the 3D space [12]. 
However, these methods do not use the compensatory head 
motion as a manipulation technique. 

2. PROPOSAL FOR A POSITIONING METHOD USING HEAD 
BOBBING 

Turning one’s head can be used to instruct the radial direction 
of the target point in polar coordinates. In this section, we 
propose a pointing interface that combines head bobbing with 
head orientation in a polar coordinate system. Head bobbing is a 
small back and forth motion of the head that does not interfere 
with the operator’s movements.  

This research was performed using the standard morphology 
of a Japanese man, as recorded by Kouchi et al. [13]. According 
to these data, the head-bobbing range was determined as 
approximately 9.29 cm, which allows the operator to keep the 
zero-moment point in the torso of the body and operate a 

robotic arm without losing balance. A doughnut-shaped area 
with an innermost and outermost radius of 30 and 100 cm, 
respectively, around the operator was defined as an example of 
an SRL operating range [14]. The head-bobbing depth-change 
factor was 70 / 9.29 = 7.53 or more. The range of motion that 
can be performed using head bobbing is considerably lower than 
that of arms.  

The preliminary experiments demonstrated that at high 
magnification, the instructional accuracy of head bobbing was 
lower than that of other comparable methods. Additionally, the 
required instructions were shown to be longer, and an 
increase/decrease factor (IDF) that gradually changes the depth 
of the head-bobbing task based on head velocity was therefore 
introduced. The IDF allows precise instructions while 
maintaining a high magnification. In this study, the IDF was 
constructed using the mouse-cursor change factor shown in 
Figure 1, set by Microsoft Windows [15]. 

2.1. Evaluation test with a fully immersive head-mounted display 

This section examines the usefulness of the IDF and 3D head 
pointer as a whole. This study was conducted based on the 
previously developed robotic arm proposed by Nakabayashi et 
al. [14] and Amano et al. [16], as shown in Figure 2. The arm has 
a reach of up to 1 m, and its jamming hand, shown in Figure 3, 
can be used as an end effector to grasp an object, with an error 
of up to 3 cm [16]. Therefore, the allowable indication error at 
the interface in this experiment was set to 3 cm.  

In this study, the validation was performed in a virtual reality 
(VR) environment. The indication of radial direction by head 
orientation was measured from the front of the head-mounted 
display (HMD). The depth indicator was implemented by setting 
up a sphere with the operator at the centre, as shown in Figure 

 
Figure 1. Microsoft's mouse-cursor speed-change settings [15].  

Figure 2. External view of the robotic arm proposed by Nakabayashi et al. [14] 
and Amano et al. [16]. 

 
Figure 3. External view of the jamming hand. 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 83 

4, and by changing the radius of the sphere controlled by head 
bobbing. 

An HMD is used in the proposed method (HTC VIVE [17]). 
The experimental procedure is as follows: 

1) The participant wears the VIVE headset and grasps a 
VIVE controller in each hand, holding them up in front 
of their chest, as shown on the right in Figure 5. This is 
defined as the ‘rest position’. The subject’s avatar is 
displayed in the VR space, as shown on the left in Figure 
5. 

2) The 3D head pointer’s control cursor (the red ball in the 
centre of Figure 6) appears 65 cm in front of the eyes. 
Simultaneously, the target sphere with a 10-cm diameter 
(the blue transparent sphere in the upper-right corner of 
Figure 6) appears at any of the eight locations at a 
± 30-cm height, ± 20-cm width and ± 20-cm depth, and 
it is positioned ± 20 cm from the cursor. 

3) The participant aligns the cursor with the centre of the 
target sphere by using the 3D head pointer. 

4) When the participant perceives that they have reached 
the centre of the target sphere, they verbalise the 
completion of the instruction. As shown in Figure 7, the 
target sphere has a reference frame with its origin at the 
centre of the sphere. The participant adjusts the position 
of the cursor accordingly. 

5) Steps (1)–(4) are performed for all eight target sphere 
positions. 

In the present study, the above-mentioned procedure was 
performed by two groups of six participants each. The 
experiments were performed once under different conditions for 
each group. Table 1 shows the experimental conditions and 
group distribution. Group 1 was asked to perform the tasks 
described above, but with a predefined time limit for instruction 
execution, while group 2 was asked to perform the experiment 
either with or without an IDF. 

Figure 8 shows the relationship between head-bobbing speed 
and magnification. ‘IDF not available’ is a condition in which the 

rate of change in depth due to head bobbing is fixed at 10 times 
without using the IDF. 

Based on the aforementioned experiments, the usefulness of 
the 3D head pointer was evaluated using the average indication 
error condition (a) shown in Table 1, the relationship between 
the indication accuracy error and operation time in conditions 
(a)–(f) and the maximum arm sway of the subject measured by 
the VIVE controller according to condition (a). At the same time, 
the usefulness of the IDF was tested by comparing the 
instructional error between conditions (a) and (g). 

2.2. Results and discussion on the fully immersive HMD 

In this study, the Wilcoxon signed-rank test was used to verify 
the significant differences between two conditions. This is a 
nonparametric test used when the population does not follow a 
normal distribution. The difference in Zi = Yi − Xi between the 
experimental values of two conditions Xi and Yi performed on 
the i-th participant was obtained. Next, Zi was arranged in order 
of decreasing absolute value, and rank Ri was assigned to the 
smaller value. The Wilcoxon signed-rank test quantity of W was 
then calculated as follows: 

𝑊 = ∑ ∅𝑖

𝑛

𝑖=1

𝑅𝑖 . (1) 

However, in this case, ∅i was calculated as 

 
Figure 4. 3D image of the head pointer operation. 

 
Figure 5. The experimental interface operation. Left: instructional target 
spheres and participants within the VR; right: participant wearing the HMD 
and holding the controllers. 

 
Figure 6. Subjective view of the user's experience. 

 
Figure 7. Target sphere and cursor visibility. 

Table 1. The experimental conditions and group distribution. 

condition Requirement Group 

(a) No requirements 1, 2 

(b) 2-s time limit for instruction 1 

(c) 3-s time limit for instruction 1 

(d) 4-s time limit for instruction 1 

(e) 6-s time limit for instruction 1 

(f) 8-s time limit for instruction 1 

(g) the rate of change in depth due to head 
bobbing is fixed at 10 times 

2 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 84 

∅𝑖 = {
1(𝑍𝑖 > 0)

0(𝑍𝑖 < 0)
 . (2) 

 
Significant differences were calculated by comparing test 

quantity W to the Wilcoxon signed-rank table [18]. In this 
experiment, instead of the table, the Excel statistics function 
(Microsoft Inc.) was used to calculate significant differences. 

2.2.1. Indication Error 

The instructional error of the distance from the centre of the 
target sphere to the control cursor was measured upon 
completion of the instruction.  

This was done in VR by using an IDF-based 3D head pointer 
for 12 people, divided equally into two groups (1 and 2). The 
results are presented in Table 2. 

In this study, a jamming hand [16] capable of grasping an 
object with an error of up to 3 cm in target point indication was 
used as a reference-index end effector. The average error of the 
instructions in this experiment was approximately 1.32 cm, with 
the highest instructional error of 2.5 cm. These results suggest 
that the indication error of the 3D head pointer is within the 
range of absorbable error in the case of grasping and 
manipulating an object with the specific end effector. 

The standard deviation of the indication error was 0.65 cm, 
and the error varied widely from person to person. This result 
may be related to the familiarity level of each individual in the 
use of a VR space. The results were validated by considering VR 
experience.  

2.2.2. Change in indication error at each indication time 

The experiment was conducted under conditions (a)–(f) for 
six members of group 1. The relationship between the 
instruction error and instruction time is shown in Figure 9.  

The average operation time under condition (a), with no time 
limit, was 6.2 s. When the operation time was limited, the 
indication error decreased rapidly with the increase in time limit 
from 2 to 3 s. When the time was greater than 4 s, this error 
remained almost constant regardless of the time taken. This 
suggests that the operation with the 3D head pointer itself had 
already been completed by 4 s. 

2.2.3. Maximum arm sway 

The maximum arm sway of the six participants in group 1 was 
measured from the movement of the VIVE controller while 
standing upright and compared to the maximum arm sway when 
the 3D head pointer was manipulated in condition (a). The 
results are presented in Figure 10. 

The comparison results demonstrated that the maximum arm 
sway was greater with a 3D head pointer. However, the Wilcoxon 
signed-rank test did not show any significant difference between 
the two conditions (N = 6, p < 0.1), suggesting that the proposed 
method allows a user to continue performing regular arm 
movements while following the instructions. Because the 
proposed method requires visibility of the target space for 
performing tasks with SRL, multitasking is sometimes 
impossible, and interruption of the task being performed by the 
user is unavoidable. However, if the operator’s hand position can 
be maintained while using the 3D head pointer, the interrupted 

Table 2. Average instruction error. 

Subject Instructional error (cm) 

1 1.20 

2 2.50 

3 1.54 

4 2.19 

5 2.41 

6 1.06 

7 1.06 

8 0.882 

9 0.757 

10 0.905 

11 0.695 

12 0.668 

Average 1.32 

 
Figure 8. Change in head-bobbing magnification with and without IDF. 

 
Figure 9. Instruction error per operating time in the evaluation test. 

 
Figure 10. Maximum arm sway when standing upright and operating the 3D 
head pointer. 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 85 

task can be resumed quickly following instructions to the SRL; 
this is significantly more efficient than performing the two tasks 
separately. 

2.2.4. Differences in indication error with and without IDF 

We conducted the experiment under conditions (a) and (g) for 
the six members of group 2 and measured the instruction errors 
of the 3D head pointer and the depth-only instruction errors for 
head bobbing. The results are shown in Figure 11 and Figure 12, 
respectively. The use of IDF reduced the average instruction 
error by approximately 77.6 % for the depth instruction by head 
bobbing and approximately 67.0 % for total error in the three 
axes (x, y, z). Additionally, a significant difference was observed 
between the two conditions with and without IDF in the case of 
the Wilcoxon signed-rank test (n = 6, p < 0.05). It was therefore 
confirmed that the introduction of the IDF greatly improved the 
accuracy and demonstrated its usefulness. Nevertheless, it is still 
necessary to verify whether the accuracy can be further improved 
with the additional fine-tuning of the parameters related to the 
magnification change ratio. 

3. PROPOSAL FOR COMBINING THE POSITION AND 
POSTURE INDICATION METHOD  

The previous section showed the effectiveness of the position 
indications for SRL. However, without posture instructions at 
the interface, the SRL cannot perform complex routine tasks 
(e.g., holding an umbrella at an angle to strong winds, pouring 
the contents of a bottle into a cup). Some objects can only be 
grabbed from certain directions. In this study, a method is 
proposed that uses the head for SRL to provide posture 
indications. Because it is difficult to provide stereotactic and 

posture instructions simultaneously with the head, a ‘switching 
indication’ function was also proposed, which switches between 
position and posture indications. 

3.1. Proposal for a posture-indication method using isometric 
input 

Figure 13 shows that the human head can rotate in three axes 
using Unity-chan as the model (a humanoid model created by 
Unity Technologies Japan [19]). The use of head-rotation axes 
for SRL posture indication (yaw, pitch and roll) facilitates 
intuitive instructions. However, the head has limited angles of 
yaw, pitch and roll ranging from −60 ° to +60 °, −50 ° to +60 ° 
and −50 ° to +50 °, respectively [20]. If the displacement of the 
head is used as an input device, the SRL cannot be instructed to 
posture at an angle beyond the limits of the angle of the head. In 
addition, according to requirement (2) in Section 1, if the head 
moves more than 15 °, the operation target will be out of the 
operator’s effective field of view. 

In this study, the three-axis rotation of the head was used as 
an isometric input-device parameter that determines the 
rotational velocity of the pointer according to the rotational angle 
of the head [21]. The maximum input angle of the head was set 
to 15 °, which is the maximum angle limit of the effective field 
of view. To avoid incorrect input, head rotations of ≤ 3 ° were 
not detected as inputs. The changes in the rotational velocity 
were spherically interpolated using trigonometric functions. 
Figure 14 shows the relationship between the amount of rotation 
of the head and the rotation speed of the posture indicator. The 

 
Figure 11. Depth error based on head bobbing with and without IDF. 

 
Figure 12. Total error in the three axes due to the 3D head pointer with and 
without IDF. 

 
Figure 13. The three different rotation axes of the head. 

 
Figure 14. Relationship between head rotation angle and posture rotation 
speed. 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 86 

reference angle for head rotation is the direction that the user is 
facing when switching to the posture indication. 

3.2. Proposal for a mode-switching method using voice 
recognition  

An increase in the number of body parts used for 
manipulation is undesirable because it leads to an increase in the 
body load. The switching method was constructed using the head 
or voice. In this study, two types of switching instruction 
methods were proposed and then compared in an evaluation test. 

3.2.1. Voice-recognition-based switching indication method 

A switching method based on voice recognition is less 
physically demanding and has less impact on the operator’s limbs 
than physical operations. Table 3 lists the commands used for 
voice indications. 

3.2.2. Head-gesture-based switching indication method 

A method for switching between posture and position 
instructions using head gestures was also proposed. In this 
method, a ‘head tilt’ motion was performed to switch from 
position to posture instructions (top of Figure 15), while the 
‘head bobbing’ motion was performed to switch from posture 
instruction to position instruction (bottom of Figure 15). 
Because the user only has to indicate the operation mode 
required, the head-gesture-based switching method requires little 
cognitive load, and switching can be done intuitively. 

3.3. Evaluation test with optical transparent HMD 

This section presents an evaluation of the usefulness of the 
posture and switching instructions in the 3D head pointer as well 

as an evaluation of the usefulness of the 3D head pointer in real 
space. To operate the SRL on a real machine, the tip of the SRL 
and target object must be visible. There are two ways to see the 
tip of the SRL on a real machine: by using a video transparent 
HMD or an optical transparent HMD [22]. The video 
transparent system may not be able to cope when the SRL 
malfunctions because of the delay in viewing the actual device. 
In this experiment, the proposed method was constructed using 
an optical transparent HMD (Hololens2 [23]) to evaluate the 
usefulness of the entire 3D head pointer. To provide posture 
instructions, the pointing cursor was changed from a red sphere 
to a blue–green bipyramid, as shown in Figure 16.  

The indication of the radial direction based on head 
orientation was measured from the front of the HMD. The depth 
indicator was implemented by changing the radius of the sphere 
through head bobbing, as described in Section 2.1. The amount 
of head rotation in the posture indication was determined by 
measuring the posture of the HMD. Compared to position 
indication, it is difficult to evaluate the amount of operator input 
required for posture indication. To visually display the user’s 
head rotation, the user interface (UI) is displayed during posture 
instruction, as shown in Figure 17. The white point on the UI is 
aligned with the centre and moves up, down, left and right 
according to the amount of yaw and pitch fed as the input. The 
roll-angle input is displayed as a white circle in the UI, and the 
circle rotates according to the amount of roll input. This UI 
allows the operator to visually understand how much operator 
input is. For speech recognition, Microsoft’s Mixed Reality 
Toolkit was used [24]. 

In this experiment, a pointing task was set up as the target 
appearing in the air. The experimental procedure is described as 
follows: 

Table 3. Voice command list. 

Voice command Function 

Indicate position Switch from posture indication to position 
indication 

Indicate posture Switch from position instructions to posture 
instructions 

Finish Signals that the indication has been 
completed. 

(Used for evaluation tests) 

 
Figure 15. Top: Switch to posture instruction;  
Bottom: Switch to position instruction. 

 
Figure 16. Pointer cursor corresponding to posture indication. 

 
Figure 17. Auxiliary user interface for posture instruction.  


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 87 

1) The subjects stood upright while wearing the HMD and 
Bluetooth headset in a room with white walls. 

2) The 3D head pointer cursor (blue–green bipyramid in 
Figure 18) and the target (purple bipyramid in Figure 18) 
were displayed in front of the participant. The target 
appeared at a random position within 15 ° to the left and 
right of the subject’s direction of gaze and at a depth of 
between 30 and 100 cm, as shown in Figure 19. The 
direction of the target was determined randomly from 
six possible directions: up, down, left, right, front and 
back. 

3) The participant moved the cursor to the same position 
and posture as the target using a 3D head pointer. When 
the subject perceived that the operation had been 
completed, they verbalised ‘instruction complete’ into 
the Bluetooth headset. Markers were displayed at the 
centre of the cursor and at the target position and 
rotation, as shown in Figure 18. These markers were 
always visible to the participant regardless of the 
position and posture of the cursor and target, and the 
operator relied on these markers for position and 
posture indications. 

4) Steps 1)–3) were performed 12 times in succession in 
one experiment. The evaluation experiment was 
conducted under the following two conditions: 
A) switching indications by voice recognition, 
B) switching indications by head gesture. 
A verbal questionnaire was administered after the 
operation was complete. 

The experiment was conducted using a total of six men and 
six women in their 20s and 30s, with the order of conditions A) 
and B) randomised. Procedures 1)–4) were performed at least 
once as a practice run before conducting the experiment, and 
additional practice was conducted until the subject judged that 
they were proficient. Based on the above experiments, the 

usefulness of posture indication was verified according to the 
posture error and operation time. The usefulness of the 
switching instruction was verified by comparing the position 
error, posture error and operation time in each condition. Finally, 
the usefulness of the 3D head pointer as a whole was verified 
based on the position error, posture error and operation time. 
Section 3.4 describes these results. 

3.4. Results and discussion on the optical transparent HMD 

3.4.1. Position error and posture error 

The average values of the position and angle errors for each 
condition for the six subjects are shown in Figure 20. In this 
experiment, the tolerance was set assuming the same use of SRL 
as in the experiment discussed in Section 2.2.1, and the tolerance 
of the position indication was 3 cm. In the jamming hand of the 
SRL, when reaching vertically to a cylindrical or spherical object, 
the success rate for grasping did not decrease if the angular error 
was within 30 ° [16].  

The average position error of the instructions in this 
experiment was approximately 1.25 cm for the voice switching 
method and approximately 2.82 cm for the head-gesture 
switching method, and a significant difference was observed 
between the two conditions in the Wilcoxon signed-rank test. 
This result demonstrates that the voice recognition method is 
more accurate in terms of indicating the position. Since the 
instruction error of the position instruction alone in Section 2.2.1 
was 1.32 cm, this result shows that the head-gesture switching 
method has a negative effect on the accuracy of the location 
instruction. The increased error in the head-gesture result can be 
attributed to the shift in the position indication; when the head 
is tilted to switch from position to posture instructions, the 
direction of the face moves accordingly. In addition, in the 

 
Figure 18. Cursor and target in the experiment. 

 
Figure 19. Area where the target appears (blue area in the Figure). 

 
Figure 20. Top: Error in position indication; Bottom: Error in posture 
indication. 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 88 

questionnaire, there were several comments noting that it was 
difficult to tilt the head without changing the direction of the face 
while indicating using the head gesture. 

The average error for the posture instruction was 
approximately 3.56 ° for the voice switching method and 
approximately 1.78 ° for the head-gesture switching method, and 
a significant difference was observed between the two conditions 
in the Wilcoxon signed-rank test. This result shows that the 
accuracy of posture indication is higher when using head gestures. 
This can be attributed to the fact that posture instruction is an 
isometric input; as long as the head is rotated from the origin, the 
posture of the cursor will continue to rotate. If the operator uses 
head gestures, the instruction can be rapidly switched to 
stereotactic instructions, and consequently, the cursor posture 
can be fixed at the moment the continuously rotating cursor 
reaches the target posture. In the voice-based switching method, 
there is a delay between the time the voice command is uttered 
and the time the uttered voice is recognised as a command by 
voice recognition.  

The voice-based switching method might cause the cursor to 
rotate during the time the user wants to switch; however, a time 
delay occurs when the operation actually switches to the position 
instruction, resulting in a posture error. These results show that 
voice-based switching is effective in terms of position indication, 
and head-gesture-based switching is effective in terms of posture 
indication. Furthermore, when switching using voice, the posture 
error increases; but even for the subject with the largest error, 
the average error was 5.56 °, which is within the acceptable range 
of 30 °. However, the subject with the largest error in the case of 
head-gesture-based switching had an average position error of 
6.74 cm, which is far beyond the acceptable error of position 
instruction. Thus, it can be concluded that the voice-based 
switching method is more useful in terms of instructional 
accuracy, as all the values are within the acceptable error range 
for the SRL assumed in this experiment. 

3.4.2. Operation time 

The mean values of the operation time for each condition for 
the six participants are shown in Figure 21. The average 
operating time was approximately 20.3 s for the voice switching 
method and approximately 20.8 s for the head-gesture switching 
method. There was no significant difference between the two 
conditions in the Wilcoxon signed-rank test. This indicates that 
there is no significant difference between the two switching 
methods in terms of operation time. When combined with the 

results of instructional accuracy, the results suggest that voice 
switching is more practical. 

Moreover, the average operation time for position 
instructions alone, as discussed in Section 2.2.2, was 6.2 s. In this 
experiment, the operation time was three times higher than when 
only position instructions were used owing to the addition of 
posture and switching indications. In addition, compared to the 
participant with the shortest average operation time, the 
participant with the longest average operation time had an 
operation time that was three times longer. When the subjects 
were asked about the cause of the increase in operation time in 
the questionnaire, some of them explained that the operation 
took longer when the posture indication did not perform well. 
The causes of the delay for posture indication were as follows: 

1) when giving posture instructions, incorrect rotation was 
mistakenly fed as input, 

2) compared to position instructions, it was difficult to 
correct errors when they occurred, 

3) it was difficult to understand the posture of the cursor or 
target during rotation instructions. 

Since posture manipulation by intentionally moving the neck 
along the three axes is not performed in daily life, the reason for 
cause 1) was verified. The reason for cause 2) was the length of 
time it took to correct the error because the error had to be 
corrected by indicating the amount of displacement in the 
posture indication. This is in contrast to position indication, 
which can directly specify the correct position when an error 
occurs. The reason for cause 3) was related to depth perception 
and size perception in the peripheral vision. The permissible 
eccentricity for recognising the position and shape of an object 
in the peripheral vision is 15 ° [10], but the perceptible 
eccentricity for depth is less than 12.5 °, and the perceptible 
eccentricity for size is less than 5 ° [25]. In addition, the accuracy 
of both depth perception and size perception decreased with 
eccentricity from the gazing point. Because posture indication 
recognises the posture of an object from changes in the size and 
depth of each side of the cursor or target, it required more visual 
information than position indication. These reasons made it 
difficult to recognise the posture of the object when the face was 
turned away by up to 15 ° during posture manipulation. 

3.4.3. Evaluation of the usefulness of the 3D head pointer as a 
whole 

In the case of the voice switching method, the error in both 
position and posture indications was within the acceptable range, 
suggesting that the accuracy of the 3D head pointer is also 
effective for indications in real space through an optical 
transparent HMD. In terms of operation time, there was a large 
variation, and the indication time was not stable, indicating room 
for improvement. The improvement in posture instruction, 
which is the most significant factor for the increase in operation 
time, is considered to be effective, and from the results of the 
questionnaire, the improvements to be made are as follows: 

1) construct the manipulation method using routine head 
movements, 

2) use isotonic input, 
3) do not leave the operator’s gaze point. 
Of these, 1) and 2) can be solved by using face orientation for 

posture indication, but there is a potential problem in how to 
provide posture instructions by rotating the head beyond its 
movable angle limit. In terms of finding a solution for 3), when 
the operator removes their gaze point from the cursor and target 
object in the posture indication state, the target object and cursor 

 
Figure 21. The mean values of the operation time. 


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 89 

can be improved by continuing to display them in front of the 
operator in augmented reality (AR). However, displaying real 
objects in AR in real time is a demanding process for AR devices. 

In order to display AR in real time, it is necessary to devise a 
way to reduce using the processing power, such as detecting the 
mesh of objects in real space and displaying them. 

4. DISCUSSION ON THE PRACTICAL APPLICATION OF A 3D 
HEAD POINTER 

In this section, the practical application of the proposed 
method presented in this study is discussed. The advantages of 
the 3D head pointer can be clarified by comparing this method 
with other manipulation methods. Following the comparison, 
concerns about using this interface in real life are discussed. 

4.1. Comparison with other similar methods 

Based on the results of the previous section, the proposed 
method was compared with other similar methods. 

a. Physical controller 
Some SRLs, such as those made by Vernonia [1], use a  
physical controller that is similar to a gamepad, with an 
analogue stick and buttons as the method of operation. 
The advantage of the 3D head pointer is that its 
operation is more intuitive and easier to understand than 
that of a physical controller, and it can be operated 
hands free.  

b. SRL manipulation method using the feet  
The proposed method can operate the SRL in any  
standing or seated position unlike methods operated by  
the feet [5]. However, manipulation with the feet can 
simultaneously indicate the position and attitude of the 
SRL. A short operation time is the main advantage of 
foot operation. 

c. Head joystick and nodding to switch between the vertical and 
horizontal planes 
Because the 3D head pointer uses the compensatory 
motion of the head, it has a lower operational burden 
than methods that use the head as a joystick [11],[12]. In 
contrast, the nodding method [12] allows for digital 
input from the head alone and may be used in 
conjunction with the 3D head pointer. 

4.2. Limitations 

In this study, voice recognition was used to give instructions, 
such as for switching, but voice recognition has the disadvantage 
of not being able to operate in a noisy environment or while the 
operator is having a conversation. Some prior examples of 
command-type instructions use gaze to provide instructions 
[26],[27]. The combination of pointing instructions with the head 
and gaze-based instructions could provide a more flexible 
environment for SRL indications. 

If there is a need to use SRL for complex or long movements 
in daily life, the movements must be registered and played back. 
Registering and replaying behaviours require many commands, 
but the number of command-type instructions that can be 
intuitively memorised and selected is as few as six [28]. When 
building a system with seven or more commands, it is necessary 
to devise a way to remember commands, such as displaying a 
menu screen in the HMD.  

5. CONCLUSIONS 

In this study, a spatial position and posture indication 
interface for SRLs was proposed to improve functional efficiency 
in the execution of routine tasks. The required functions for 
indicating spatial position and posture have been described, and 
a position indication method, the 3D head pointer, has been 
proposed, which combines head-bobbing-type depth indication 
for spatial position and polar direction indication by face 
orientation. In a VR environment, evaluation tests of the 3D 
head pointer and IDF were conducted. The results showed that 
the 3D head pointer had sufficient accuracy without requiring 
the operator to interrupt their actions. 

In addition, to provide not only position but also posture 
guidance by using a 3D head pointer, a posture guidance method 
using head rotation as isometric input and two types of switching 
guidance methods using voice recognition and head gestures 
were proposed. In addition, a comparative study of two 
switching instruction methods using an optical transparent 
HMD and a test to evaluate the usefulness of the 3D head 
pointer as a whole was conducted. The results showed that the 
switching method based on voice recognition was effective for 
using the assumed SRL, and it was confirmed that the 3D head 
pointer was sufficiently accurate to be useful for operating 
robotic arms using an optical transparent HMD. These results 
provide useful knowledge for improving the SRL interface. 

In the future, an intuitive posture instruction method will be 
developed that is not affected by compensatory head movements 
and that will incorporate a command instruction method that 
replaces voice recognition. In addition, an SRL will be considered 
as an interface for use as a third arm in situations, such as 
banquets and construction sites, where an individual’s hands are 
not sufficient. 

ACKNOWLEDGEMENT 

This research is supported by Waseda University Global 
Robot Academic Institute, Waseda University Green Computing 
Systems Research Organization and by JST ERATO Grant 
Number JPMJER1701, Japan. 

REFERENCES 

[1] C. Veronneau, J. Denis, L. Lebel, M. Denninger, V. Blanchard, A. 
Girard, J. Plante, Multifunctional 3-DOF wearable supernumerary 
robotic arm based on magnetorheological clutches, IEEE 
Robotics and Automation Letters 5 (2020) pp. 2546-2553. 
DOI: 10.1109/LRA.2020.2967327  

[2] C. Davenport, F. Parietti, H. H. Asada, Design and biomechanical 
analysis of supernumerary robotic limbs, Proc. of the 
IEEE/ASME International Conference on Advanced Intelligent 
Mechatronics, Fort Lauderdale, Florida, United States, 17-19 
October 2012, pp. 787-793.  
DOI: 10.1115/DSCC2012-MOVIC2012-8790  

[3] H. H. Asada, F. Parietti, Supernumerary robotic limbs for aircraft 
fuselage assembly: body stabilization and guidance by bracing, 
Proc. of IEEE International Conference on Robotics and 
Automation, Hong Kong, China, 2014, pp. 119-125.  
DOI: 10.1109/ICRA.2014.6907002  

[4] Y. Iwasaki, H. Iwata, A face vector - the point instruction-type 
interface for manipulation of an extended body in dual-task 
situations, IEEE International Conference on Cyborg and Bionic 
Systems, Shenzhen, China, 25-27 Oct. 2018, pp. 662-666. 
DOI: 10.1109/CBS.2018.8612275  

[5] T. Sasaki, M. Saraiji, K. Minamizawa, M. Inami, MetaArms: body 
remapping using feet-controlled artificial arms, Proc. of the 31st 

https://doi.org/10.1109/LRA.2020.2967327
http://dx.doi.org/10.1115/DSCC2012-MOVIC2012-8790
https://doi.org/10.1109/ICRA.2014.6907002
https://doi.org/10.1109/CBS.2018.8612275


ACTA IMEKO | www.imeko.org September 2021 | Volume 10 | Number 3 | 90 

Annual ACM Symposium on User Interface Software and 
Technology, New York, United States, 14 October 2018, pp. 65-
74.  
DOI: 10.1145/3242587.3242665 

[6] S. G. Terashima, J. Sakai, T. Ohira, H. Murakami, E. Satho, C. 
Matsuzawa, S. Sasaki, K. Ueki, Development of a tongue operative 
joystick - for proposal of development of an integrated tongue 
operation assistive system (I-to-AS) for seriously disabled people, 
The Society of Life Support Engineering 24 (2012), pp. 201-207.  
DOI: 10.5136/lifesupport.24.201  

[7] R. Barea, L. Boquete, M. Mazo, E. Lopez, System for assisted 
mobility using eye movements based on electrooculography, 
IEEE Transactions on Neural Systems and Rehabilitation 
Engineering 10 (2002), pp. 209-218. 
DOI: 10.1109/TNSRE.2002.806829  

[8] R. C. Simpson, S. P. Levine, Voice control of a powered 
wheelchair, IEEE Transactions on Neural Systems and 
Rehabilitation Engineering 10 (2002) pp. 122-125. 
DOI: 10.1109/TNSRE.2002.1031981  

[9] S. Nishio, C. I. Penaloza, BMI control of a third arm for 
multitasking, Science Robotics 3 (2018) 20. 
DOI: 10.1126/scirobotics.aat1228  

[10] T. Miura, Behavioral and Visual Attention, Kazama Shobo, 
Chiyoda, Japan, 1996, ISBN 978-4-7599-1936-3. 

[11] R. Hasegawa, Device For Input Via Head Motions, Patents WO 
2010/110411 Al, Japan, 30 September 2010. 

[12] A. Jackowski, M. Gebhard, A. Gräser, A novel head gesture based 
interface for hands-free control of a robot, Proc. of the IEEE 
International Symposium on Medical Measurements and 
Applications, Benevento, Italy, 15-18 May 2016, pp. 1–6.  
DOI: 10.1109/MeMeA.2016.7533744  

[13] M. Kouchi, M. Mochimaru, AIST Anthropometric Database, Pub. 
National Institute of Advanced Industrial Science and 
Technology, Japan, January 2005. Online [Accessed 4 September 
2021]  
https://www.airc.aist.go.jp/dhrt/91-92/fig/91-
92_anthrop_manual.pdf 

[14] L. Drohne, K. Nakabayashi, Y. Iwasaki, H. Iwata, Design 
consideration for arm mechanics and attachment positions of a 
wearable robot arm, Proc. of the. IEEE/SICE International 
Symposium on System Integration, Paris, France, 14-16 January 
2019, pp. 645-650.  
DOI: 10.1109/SII.2019.8700355  

[15] Windows Dev Center - Hardware, Pointer Ballistics for Windows 
XP, 2002. Online [Accessed 4 September 2021]   
http://archive.is/20120907165307/msdn.microsoft.com/en-
us/windows/hardware/gg463319.aspx#selection-165.0-165.33  

[16] K. Amano, Y. Iwasaki, K. Nakabayashi, H. Iwata, Development 
of a three-fingered jamming gripper for corresponding to the 
position error and shape difference, RoboSoft (2019), IEEE 

International Conference on Soft Robotics, Seoul, Korea (South), 
14-18 April 2019, pp. 137-142. 
DOI: 10.1109/ROBOSOFT.2019.8722768  

[17] HTC VIVE, 2011. Online [Accessed 4 September 2021]  
https://www.vive.com/eu/product/vive/  

[18] C. Zaiontz, Wilcoxon Signed-Ranks Table, 2020. Online 
[Accessed 4 September 2021]   
http://www.real-statistics.com/statistics-tables/wilcoxon-signed-
ranks-table/ 

[19] Unity Technologies Japan/UCL, Unity-chan!, 2014. Online 
[Accessed 4 September 2021]  
https://unity-chan.com/ 

[20] Committee on Physical Disability, Japanese Orthopaedic 
Association, Joint range of motion display and measurement 
methods, Japanese Journal of Rehabilitation Medicine 11 (1974) 
pp. 127-132.  
DOI: 10.2490/jjrm1963.11.127 

[21] S. A. Douglas, A. K. Mithal, The Ergonomics of Computer 
Pointing Devices, Springer, London, 1997. 

[22] J. P. Rolland, R. L. Holloway, H. Fuchs, Comparison of optical 
and video see-through, head-mounted displays, Proc. of The 
International Society for Optical Engineering, 21 December 1995, 
pp. 292-307.  
DOI: 10.1117/12.197322  

[23] Hololens2. Online [Accessed 4 September 2021]  
https://www.microsoft.com/en-us/hololens/buy  

[24] Mixed Reality Toolkit. Online [Accessed 4 September 2021]  
https://hololabinc.github.io/MixedRealityToolkit-
Unity/README.html  

[25] A. Yasuoka, M. Okura, Binocular depth and size perception in the 
peripheral field, Journal of the Vision Society of Japan 23 (2011) 
pp. 103-114. 
DOI: 10.24636/vision.23.2_103  

[26] M. Yamato, A. Monden, Y. Takada, K. Matsumoto, K. Tori, 
Scrolling the text windows by looking, Transactions of the 
Information Processing Society of Japan 40 (1999), pp. 613-622. 
Online [Accessed 4 September 2021]  
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_a
ction=repository_view_main_item_detail&item_id=12841&item
_no=1&page_id=13&block_id=8  

[27] T. Ohno, Quick menu selection task with eye mark, Transactions 
of the Information Processing Society of Japan 40 (1999), pp. 602-
612. Online [Accessed 4 September 2021]  
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_a
ction=repository_view_main_item_detail&item_id=12840&item
_no=1&page_id=13&block_id=8  

[28] Y. Iwasaki, H. Iwata, Research on a third arm: analysis of the 
cognitive load required to match the on-board movement 
functions, Poster presented at: The Japanese Society for Wellbeing 
Science and Assistive Technology, 6-8 September 2018, Tokyo, 
Japan, Session No. 2-4-1-2. 

 
https://doi.org/10.1145/3242587.3242665
https://doi.org/10.5136/lifesupport.24.201
https://doi.org/10.1109/TNSRE.2002.806829
https://doi.org/10.1109/TNSRE.2002.1031981
https://doi.org/10.1126/scirobotics.aat1228
https://doi.org/10.1109/MeMeA.2016.7533744
https://www.airc.aist.go.jp/dhrt/91-92/fig/91-92_anthrop_manual.pdf
https://www.airc.aist.go.jp/dhrt/91-92/fig/91-92_anthrop_manual.pdf
https://doi.org/10.1109/SII.2019.8700355
http://archive.is/20120907165307/msdn.microsoft.com/en-us/windows/hardware/gg463319.aspx#selection-165.0-165.33
http://archive.is/20120907165307/msdn.microsoft.com/en-us/windows/hardware/gg463319.aspx#selection-165.0-165.33
https://doi.org/10.1109/ROBOSOFT.2019.8722768
https://www.vive.com/eu/product/vive/
http://www.real-statistics.com/statistics-tables/wilcoxon-signed-ranks-table/
http://www.real-statistics.com/statistics-tables/wilcoxon-signed-ranks-table/
https://unity-chan.com/
https://doi.org/10.2490/jjrm1963.11.127
https://doi.org/10.1117/12.197322
https://www.microsoft.com/en-us/hololens/buy
https://hololabinc.github.io/MixedRealityToolkit-Unity/README.html
https://hololabinc.github.io/MixedRealityToolkit-Unity/README.html
https://doi.org/10.24636/vision.23.2_103
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12841&item_no=1&page_id=13&block_id=8
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12841&item_no=1&page_id=13&block_id=8
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12841&item_no=1&page_id=13&block_id=8
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12840&item_no=1&page_id=13&block_id=8Y
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12840&item_no=1&page_id=13&block_id=8Y
https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=12840&item_no=1&page_id=13&block_id=8Y