INT J COMPUT COMMUN, ISSN 1841-9836 Vol.7 (2012), No. 4 (November), pp. 721-732 Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform C. Pribeanu Costin Pribeanu National Institute for Research and Development in Informatics - ICI Bucharest Romania, 011455 Bucuresti, Bd. Maresal Averescu, 8-10 E-mail: pribeanu@ici.ro Abstract: The ergonomic quality of educational systems is a key feature influencing both the usefulness and motivation for the learner. Desktop Augmented Reality (AR) sys- tems are featuring specific interaction techniques that may create additional usability issues affecting the perceived ease of use. Measuring key usability aspects and under- standing the causal relationships between them is a challenge that requires formative measurement models specification and validation. In this paper we present an eval- uation instrument based on two main formative indexes that are capturing specific usability measures for two AR-based applications. The formative indexes are forming a second order formative construct that acts as predictor for both the general ease of use and ease of learning how to operate with the application. Keywords: formative measurement model, formative index, augmented reality, us- ability, ergonomic quality. 1 Introduction Educational systems based on desktop AR technologies are creating an appealing user ex- perience for the learner by integrating real life objects into computer environments. Touching and holding real objects is increasing the students’ motivation to learn and could better support active and collaborative learning [18], [22]. As AR technologies become more wide-spread, there is an increasing interest in their ergonomic quality. Designing for usability is not easy in emerging technologies, like AR systems, which are featuring novel interaction techniques [5], [15]. The ISO standard 9126-1 defined usability as the capability of a software system to be easy to understand, easy to learn how to operate with, easy to operate with, and attractive, when used under specified conditions [19]. By ergonomic quality we refer to the first three usability aspects: ease of understanding, ease of learning how to operate, and ease of operating with a software system. How to measure and improve the usability of interactive systems is a key research topic in HCI. A research challenge is to better understand the relationships between different usability measures as well as between usability and other factors of interest [17]. In a previous work we developed a measurement model that was grounded in the technology acceptance models (TAM) theory [9] in order to explain the causal relations between various factors influencing the intention to use of an AR-based educational platform [3], [4]. Although the structural model was useful to test some typical TAM hypotheses the variance explained was small and several items targeting specific usability aspects were eliminated in order to achieve the unidimensionality required by a reflective measurement model. Moreover, reflective measurement assumes the same antecedents for reflective indicators (as manifest variables) so causal relations are estimated at construct level [8], [24]. These shortcomings suggest looking for an alternative modeling approach. In this paper we present a measurement model for the evaluation of the ergonomic quality of applications developed onto an AR-based educational platform. The Augmented Reality Teach- ing Platform (ARTP) was developed in the framework of the ARiSE (Augmented Reality for Copyright c⃝ 2006-2012 by CCC Publications 722 C. Pribeanu School Environments) European project. Two AR applications implementing learning scenarios for Biology and Chemistry were developed and tested onto ARTP. The measurement model consists in two sets of formative indicators that are measuring two dimensions of the ergonomic quality of desktop AR applications: the quality of visual and audi- tory perception and the ease to operate and collaborate in a constrained space. The two indexes are forming a second order formative construct. In order to achieve identification requirements we used as outcome variables a reflective construct measuring the perceived ease of learning how to use ARTP and a general reflective item measuring the overall ease of use. The formative measurement model was estimated on the Biology scenario data. Then we cross validated the models on the Chemistry scenario data. The rest of this paper is organized as follows. In the following section we describe the forma- tive measurement models and discuss some methodological aspects related to the specification, identification and validity. In section 3 we present and discuss the estimation results with the Biology scenario data. In section 4 we present the results of a confirmatory assessment of the formative measurement model using the Chemistry scenario data and we comparatively discuss the results for each scenario. The paper ends with conclusion and future research directions. 2 The formative measurement model 2.1 Reflective vs. formative measurement models A measurement model describes the relationships between a construct (latent variable) and its measures (indicators, items) while a structural model describes the relationships between different constructs [12], [13]. The causal relation between a construct and its measures could be from construct to measures (reflective model) or from measures to construct (formative model). There are distinct characteristics of each measurement model that were systematically presented and discussed in detail in [6], [10], [12], [20]. In the reflective measurement model the indicators are manifest variables of the latent vari- able. A change in the constructs is reflected in simultaneous changes in all indicators. As such, the items are interchangeable and elimination of one of them doesn’t change the construct domain. Measures should be positively correlated and the measurement model should have convergent and discriminate validity. In the formative measurement model the measures are defining the conceptual meaning of the construct. Indicators are not interchangeable since each is capturing a distinct cause. Since the measures are defining the construct, a census of indicators is recommended [6]. There are no assumptions on unidimensionality and correlations between indicators. However, collinearity should be avoided. Indicators don’t have an error term and items are intercorrelated. Although there is an error terms at construct level this is not a measuring error but a disturbance accounting for other causes not specified by the model [11]. The nomological net of formative indicators could differ as this is a distinct feature of the formative measurement [20]. A formative measurement model taken in isolation is under identified and cannot be esti- mated. Jarvis et al. and Diamantopoulos et al. recommend achieving identification based on a 2+ rule: specifying effects (outcomes) of the formative constructs on at least two other variables that are reflectively measured [12], [20]. The outcome variables could be: two reflective indicators (MIMIC model), two reflective constructs, or a reflective construct and a reflective variable. The selection of the outcome variables is just as important as is the selection of indicators [11], [14]. According to Wilcox et al., the selected effect variables are determining the empirical meaning of the formative construct and the set of indicators [26]. The proper specification of the measurement model is a precondition before analyzing and Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform 723 assigning a meaning to the structural model [1]. According to Jarvis et al., there are many studies in literature that are based on inappropriate specification of the measurement models [20]. In recent years, there is an ongoing debate regarding the formative versus reflective specification of various constructs and the appropriateness of measurement scales that are frequently used in different domains [12]. Taking the appropriate measurement perspective is not a simple issue. As pointed out by Jarvis and colleagues, based on the analysis of 178 papers published in four top journals in mar- keting research, there are about 29% cases (reported at 1192 constructs) of misspecification [20]. Moreover, the authors themselves experienced difficulties in classifying 14% of constructs featur- ing both reflective and formative characteristics. Wilcox and colleagues argued that a construct is not inherently formative or reflective so the researcher has a choice to take a perspective or another [26]. In this respect, the specification of alternative models is useful since is providing with more insights into the field of study. 2.2 Experiment, samples and data analysis ARTP is a "seated" AR environment: users are looking to a see-through screen where virtual images are superimposed over the perceived image of a real object placed on the table [27]. Two AR-based applications were developed onto this platform (see Figure 1). Figure 1: Students testing the ARTP learning scenarios: Biology (left) and Chemistry (right) The first application implemented a Biology learning scenario for secondary schools. The implemented paradigm was "3D process visualization of hidden processes" and was targeted at enhancing the students’ understanding and motivation to learn the human digestive system. The real object is a flat torso of the human body. A pointing device having a colored ball on the end of a stick and a remote controller Wii Nintendo as handler has been used as interaction tool that serves for three types of interaction: pointing on a real object, selection of a virtual object and selection of a menu item. The second application implemented a Chemistry scenario. The implemented paradigm was "building with guidance" and was targeted at enhancing the students’ understanding and moti- vation to learn the periodic table of Chemical elements, the structure of atoms / molecules, and the chemical reactions. The real objects were the periodic table of chemical elements and four sets of colored balls symbolizing atoms. The remote controller Wii Nintendo has only been used as interaction tool for confirming a selection. The test was conducted in 2008, on the ICI’s platform which is equipped with 4 ARTP modules. A total number of 139 students (13-14 years old), from which 65 boys and 74 girls tested the platform. All were 8th grade students enrolled in 3 general schools in Bucharest. None of them was familiar with the AR technology. The students came in groups of 7-8, accompanied 724 C. Pribeanu by a teacher. Each student tested the platform twice: once for the Biology scenario and second time for the Chemistry scenario. Each scenario consists of a demo lesson and a number of exercises. After testing, the students were asked to answer a usability questionnaire by rating the items on a 5-point Likert scale (1-strongly disagree, 2-disagree, 3-neutral, 4-agree, and 5-strongly agree). The questionnaire has 28 closed items and 2 open questions, asking users to describe the most 3 positive and most 3 negative aspects. The first 24 closed items are targeting various dimensions of the ARTP such as ergonomics and usability (items 1-14), perceived utility (items 15-17), perceived enjoyment (items 18-21) and intention to use (items 22-24). The last four items were to assess how the students overall perceived the platform as being easy to use, useful for learning, enjoyable to learn with, and exciting. In order to estimate the new measurement model we used the Biology scenario data. We analyzed the initial sample of 139 observations for normality (skewness and kurtosis), univariate and multivariate outliers. We transformed the data (square root extraction) and we repeated the analysis and successively removed 9 observations. The final sample has 130 observations that present moderate deviations from normality. In order to cross validate the model on another sample, we used the Chemistry scenario data. We performed the same data analysis proce- dure on the initial sample and successively removed 11 observations. The final sample has 128 observations with moderate deviations from normality. 2.3 Model specification and identification According to our knowledge, there are few approaches to formative index construction for the usability and / or ease of use [21]. Although the perceived ease of use and perceived usability are frequently used in information systems research, in almost all studies they are specified as reflectively measured constructs. As such, their indicators have a limited contribution (as manifest variables) to explain the effect of usability problems. Since the objective of this study is to analyze the relationships between different aspects related to the ergonomic quality of the ARTP, 15 items in the usability questionnaire are of interest, from which 11 are formative measures and 4 are reflective measures. The 15 items (presented in Annex 1) are grouped into four constructs and a single item measure: • The quality of visual and auditory perception (ERG-P): clear observation and superposi- tion, easy to read the information on the screen, and easy to understand the vocal expla- nations. • The ease of interaction and collaboration and collaboration (ERG-O): comfortable work place, easy to select a menu item with the remote control, easy to correct errors, and easy to collaborate with colleagues. • The ease of adjusting the devices and accessories (ERG-A), i.e. the see-through screen, stereo glasses and head phones. • The ease of learning (PEOL): easy to understand, easy to learn and easy to remember how to use ARTP. • The general item measuring the overall ease of use (PEOU1). The first three constructs are composite indexes measuring distinct usability aspects that are specific to an AR-based learning application. As such, the indicators are not interchangeable and elimination of any of them will alter the conceptual domain of the construct. For example, if we analyze the three items measuring the quality of the visual perception, each is targeting a Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform 725 different usability aspect. The clarity of observation through the screen is a hardware issue while the clarity of superposition between the augmentation and the real object is a software issue. Reading the information on the screen relates to augmentation, messages to the user and menu items. Note that apart from the specific AR devices and accessories there are also several usability aspects which are specific to a given application. For example, in the Biology scenario the user selects an organ by pointing on flat torso of the digestive system which is a real object shared by to students staying face-to-face. In the Chemistry scenario, the students create a molecule by bringing together several colored balls symbolizing atoms. In this respect, the interaction with the remote control, the correction of mistakes (selection errors) and the collaboration between students depend on the real objects registered with the application. Therefore a formative model is an appropriate measurement perspective. The ergonomic quality of ARTP is a multidimensional construct conceptualized as a com- posite of formative indexes. Each dimension is a formative index measuring a set of specific usability aspects. Each index is assumed to have a significant positive influence on two general usability aspects: the perceived ease of learning how to use ARTP (the construct PEOL) and on the overall ease of use (the general item PEOU1). 2.4 Validity of the formative indexes According to recent studies, there are several criteria to assess the validity of formative indexes [6], [10], [12], [14]: adequate coverage of the construct’s domain, absence of multicollinearity, indicator validity, significant γ-coefficients, complete mediation of effects, significant influence (β-coefficients) on outcome variables, and acceptable fit with the data. Although a census of indicators is ideal to cover the scope of a formative index, this is rarely possible. In our model, each index is addressing a distinct aspect of the ergonomic quality of ARTP. Since formative indicators are also capturing critical usability aspects as indicated in previous studies (e.g. [23]) the coverage of the domain is acceptable. The collinearity of formative indicators was analyzed with the VIF (Variation Inflation Fac- tor) statistic for each index. VIF values were in the range 1.183-1.946 for the Biology scenario, respectively 1.085-1.715 for the Chemistry scenario bellow the 3.3 cut-off value [12]. The general item PEOU1 is an overall measure of the ergonomic quality of ARTP which qualify it for using as criterion validity. An analysis using Pearson’s rho indicated that there are significant positive linear relationships between PEOU1 and the formative indicators of ERG-P and ERG-O but no significant correlations with the formative indicators of ERG-A. Nevertheless, in both samples ERG-A indicators are positively correlated with the formative item ERGO1. This suggests that ERG-A is not a distinct dimension of the ergonomic quality of ARTP but only an antecedent of a formative indicator measuring the comfort with the workplace. A regression analysis on the Biology data sample showed that ERGA1 and ERGA2 are two antecedents of ERGO1 (standardized coefficients βERGA1 =0.191, sig=0.046 and βERGA2=0.185, sig=0.039). The regression analysis on the Chemistry data sample confirmed this finding (βERGA1=0.156, sig=0.083 and βERGA2=0.255, sig=0.005). In order to estimate the formative indexes we used a MIMIC model and a structural model presented in Figure 2. The models were estimated using AMOS 17.0 [2]. Each index has n formative indicators, more specifically n=4 for ERG-P and ERG-O, and n=3 for ERG. There are four outcome variables in the MIMIC model. Three of these reflective indicators are further grouped in the structural model that features 2 outcome variables: the general item PEOU1 (overall ease of use) and the reflective construct PEOL (ease of learning how to operate). All outcome variables are closely related to the focal construct as they measure general aspects 726 C. Pribeanu Figure 2: Estimation of formative indexes with MIMIC (left) and structural models (right) of the perceived ergonomic quality. There are three general hypotheses assessed with these models: 1. There is a significant contribution of the formative indicators to the composite index (xi→η, i=1...n). 2. There is a significant positive influence of the composite index on the perceived ease of learning how to use ARTP (η→PEOL1, η→PEOL2, η→PEOL3 in the MIMIC model, respectively η→PEOL in the structural model). 3. There is a significant positive influence of the composite index on the overall ease of use (η→PEOU1). Since the structural model includes a reflectively measured construct, the internal consistency and convergent validity should be assessed. The scale reliability and unidimensionality were analyzed with SPSS 16.0 and Amos 17.0. The consistency of scale (Cronbach’s alpha) was 0.701 for the Biology scenario and 0.704 for the Chemistry scenario which is acceptable. Convergent validity was assessed by examining the standardized factor loadings, composite reliability, and average variance extracted for PEOL in each scenario [16]. Almost all factor loadings are over the minimum recommended level of 0.60. The composite reliability was 0.711 for the Biology scenario and 0.704 for the Chemistry scenario, above the minimum recommended value of 0.70 in each scenario. The average variance extracted was 0.456 for the Biology scenario and 0.438 for the Chemistry scenario. Overall, PEOL construct has an acceptable convergent validity. 3 Estimation results on the Biology scenario data 3.1 First order formative indexes The results of MIMIC and structural model estimations for ERG-P and ERG-O are presented in Table 1. All γ-coefficients are significant at p<0.05 level thus supporting the first hypothesis. There are small differences between the magnitudes of γ-coefficients in the two models. The variance of the error term associated with the formative index is small in each model, so the formative index is sound and each formative item has a distinct contribution to the explained variance [11]. Fit indices are acceptable, over the recommended values [16]: χ2=1.115, DF=13, χ2/DF=1.624, GFI=0.962, CFI=0.974, SRMR=0.036 (ERG-P, structural model), and χ2=22.963, DF=13, χ2/DF=1.766, GFI=0.960, CFI=958, SRMR=0.042 (ERG-O, structural model). In both models all β-coefficients are significant (p<0.001), which supports the last two hy- potheses. The influence of formative indexes is stronger on the perceived ease of learning how to Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform 727 Table 1: Estimation results for ERG-P and ERGO - Biology scenario ERG-P MIMIC model Structural model ERG-O MIMIC model Structural model γ/β sig.(p) γ/β sig.(p) γ/β sig.(p) γ/β sig.(p) Contribution Contribution ERGP1 .33 <.001 .36 <.001 ERGO1 .22 0.018 .27 0.006 ERGP2 .31 0.001 .30 0.002 ERGO2 .22 0.017 .21 0.030 ERGP3 .20 0.010 .21 0.010 ERGO3 .29 0.003 .30 0.003 ERGP4 .27 0.002 .29 0.002 ERGO4 .33 <.001 .33 0.001 Effect variables Effect variables PEOU1 .63 <.001 .63 <.001 PEOU1 .62 <.001 .66 <.001 PEOL .91 <.001 PEOL .87 <.001 PEOL1 .64 <.001 PEOL1 .63 <.001 PEOL2 .73 <.001 PEOL2 .71 <.001 PEOL3 .61 <.001 PEOL3 .63 <.001 Variance expl. Variance expl. ERG-O 71% 78% ERG-O 54% 62% PEOL 83% PEOL 75% use ARTP than on the general ease of use. This means that once the user understands and learns how to use the system he finds it easy to use. The highest contributions to ERG-P have the first two items (clarity of observation through the see-through screen and accuracy of superposition). The most important contribution to ERG-O has the last item related to the ease of collaboration with colleagues. The ease of correcting the mistakes proved also to be an important measure for the Biology scenario. 3.2 Second order formative index ERG-P and ERG-O are two distinct dimensions of the ergonomic quality of ARTP that are forming a second order formative construct (ERG). We used the scores of the first order constructs (the predicted values of the multiple regression) as formative indicators in the second order construct. Similar approaches are described in [7], [8]. The estimation results are presented in Table 2. Table 2: Estimation results for second order construct - Biology scenario ERG MIMIC model Structural model γ/β sig.(p) γ/β sig.(p) Contribution ERG-P .65 <.001 .68 <.001 ERG-O .27 0.006 .30 0.004 Effect variables PEOU1 .63 <.001 .63 <.001 PEOL .90 <.001 PEOL1 .64 <.001 PEOL2 .71 <.001 PEOL3 .61 <.001 Variance expl. ERG 75% 84% PEOL 81% The γ-coefficients are significant in each model. The contribution of the first dimension is much higher showing that the quality of visual perception is a critical requirement for the desktop AR systems. The analysis of modification indices showed that the formative index is completely mediating the effects of its items. Both β-coefficients are significant (p < 0.001), which supports the last two hypotheses. The variance of the error term associated with the formative index is 0.009 (medium effect). The magnitude of the error term is suggesting some other aspects not covered by the indicators. 728 C. Pribeanu Fit indices are acceptable, over the recommended values: χ2=11.759, DF=7, χ2/DF=1.679, GFI=0.972, CFI=0.984, SRMR=0.032 (structural model). 4 Cross validation of the formative indexes on the Chemistry scenario 4.1 First order formative indexes The results of estimation are presented in Table 3. Almost all γ-coefficients are significant at p<0.05 level thus supporting the first hypothesis. There is only one exception: ERGP4 in the MIMIC model, where the γ-coefficient is signifficant at p<0.10 level. There are relatively small differences between the contributions of each item in each model. Table 3: Estimation results for ERG-P and ERG-O - Chemistry scenario ERG-P MIMIC model Structural model ERG-O MIMIC model Structural model γ/β sig.(p) γ/β sig.(p) γ/β sig.(p) γ/β sig.(p) Contribution Contribution ERGP1 .23 0.016 .29 0.010 ERGO1 .25 0.009 .24 0.018 ERGP2 .28 0.009 .31 0.012 ERGO2 .27 0.003 .30 0.002 ERGP3 .24 0.010 .32 0.004 ERGO3 .22 0.021 .24 0.016 ERGP4 .20 0.053 .24 0.047 ERGO4 .36 <.001 .38 <.001 Effect variables Effect variables PEOU1 .55 <.001 .61 <.001 PEOU1 .48 <.001 .49 <.001 PEOL .75 <.001 PEOL .93 <.001 PEOL1 .70 <.001 PEOL1 .71 <.001 PEOL2 .67 <.001 PEOL2 .70 <.001 PEOL3 .52 <.001 PEOL3 .55 <.001 Variance expl. Variance expl. ERG-O 47% 67% ERG-O 49% 55% PEOL 56% PEOL 86% In both models β-coefficients are significant (p<0.001), which supports the last two hypothe- ses. The variance of the error term associated with the formative index is 0.022 (0.008) for ERG-P and 0.021 (0.016) for ERG-O. Since the magnitude of the error term is small and all indicator coefficients are significant, the formative index is sound and each formative item has a distinct contribution to the explained variance. Fit indices are acceptable, over the recommended values [16]: χ2=15.154, DF=13, χ2/DF=1.624, GFI=0.973, CFI=0.990, SRMR=0.038 (ERG-P, structural model), and χ2=22.392, DF=13, χ2/DF=1.722, GFI=0.958, CFI=0.947, SRMR=0.048 (ERG-O, structural model). The influence of formative indexes is stronger on the perceived ease of learning how to use ARTP than on the general ease of use. The highest contributions to ERG-P have the items ERGP2 (accuracy of superposition) and ERGP3 (understanding the vocal explanation). The contribution of ERGP3 shows the importance of vocal explanations for students. The most important contribution to ERG-O has the last item related to the ease of collaboration with colleagues. The ease of selecting a menu item proved also to be an important measure for the Chemistry scenario. 4.2 Second order formative index The results of structural model estimation are presented in Table 4. Both γ-coefficients are significant. The contribution of each dimension is similar for the Chemistry scenario. The analysis of modification indices showed that the index is completely mediating the effects of its items. Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform 729 Table 4: Estimation results for second order construct - Chemistry scenario ERG MIMIC model Structural model γ/β sig.(p) γ/β sig.(p) Contribution ERG-P .45 <.001 .55 <.001 ERG-O .47 <.001 .49 0.004 Effect variables PEOU1 .53 <.001 .55 <.001 PEOL .83 <.001 PEOL1 .70 <.001 PEOL2 .68 <.001 PEOL3 .53 <.001 Variance expl. ERG 63% 80% PEOL 69% Both β-coefficients are significant (p<0.001), which supports the hypotheses. The variance of the error term associated with the formative index is 0.016 (0.222) which means a medium to large effect. The magnitude of the error term is suggesting some other aspects not covered by the indicators. Fit indices are acceptable, over the recommended values: χ2=14.932, DF=7, χ2/DF=2.133, GFI=0.964, CFI=0.960, SRMR=0.049 (structural model). 4.3 Comparison of results and discussion The estimation of formative indexes on the Chemistry scenario data cross validated the measurement model and enables a comparison between the two implemented scenarios. The variances explained by the structural models for the formative indexes are higher for the Biology scenario than for the Chemistry scenario. The variance explained by the model for the second order construct is slightly higher for the Biology scenario. The contribution of ERG-P to the super ordinate index is higher than the contribution of ERG-O in both scenarios but the relative importance is much higher for the Biology scenario. The variance explained by the model for the outcome variable PEOL is also higher for the Biology scenario (81% vs. 69%). As regarding the ERG-P index, the comparison reveals that understanding of vocal explana- tions (ERGP3) is the most important item for the Chemistry scenario and the less important for the Biology scenario. This is explained by the fact that the Chemistry demo lesson and exercises were more difficult for students so a clear understanding of the lesson and how to perform the exercises was critical. The accuracy of superposition between the projection and the real object (ERGP2) has a higher importance for Biology. As regarding ERG-O, the comparison reveals that the ease of collaboration with colleagues (ERGO4) is the most important item for both scenarios. Selecting a menu item (ERGO2) was easy for the Biology scenario (lowest γ-coefficient) and difficult the Chemistry scenario. This is explained by the fact that the students had to use both hands to manipulate the colored balls (symbolizing atoms) so handling also the remote control became more difficult. Correcting the mistakes (ERGO3) was more difficult for the Biology scenario because of frequent selection errors when students tried to select a small organ. In both scenarios, ERG-A had a significant positive influence on the formative indicator ERGO1, showing that the ease to adjust the see-through screen and stereo glasses is influencing the comfort on the work place. 730 C. Pribeanu 5 Conclusion and future work The main contribution of this study is a measurement model for the perceived ease of use of the ARTP featuring a second order formative index with two dimensions: the quality of visual and auditory perception and the ease of interaction and collaboration. These indexes are antecedents of a reflective construct measuring the perceived ease of learning how to use ARTP. The latter could be then integrated in structural models that are based solely on reflective scales. There are several strengths and limitations of this study. An outcome of this research is the integration of almost all items related to the perceived ease of use that were eliminated in a previous work [4] for unidimensionality and convergent validity reasons. The new measure- ment model includes 12 of 15 items related to the ergonomic quality. As such, it provides a wider perspective on the ergonomic quality and enables the analysis of specific usability aspects. Second, the estimation of a formative measurement model provides a more detailed information (at indicator level) shedding light on usability aspects that are critical for ARTP and a given learning scenario. Third, the formative indexes were specified and validated with a structural model that addressed all general aspects related to the perceived ergonomic quality: ease of understanding, ease of use and ease of operating with a software system. Since all variables are strongly related to the focal construct the structural model is well supporting an external validity. Up to now, there is no similar model developed for the ergonomic quality of a software system. Fourth, the model was estimated and cross validated on two different samples which enables a comparison between scenarios and makes it possible to further integrate and discuss in more detail the answers at open questions (qualitative data). As regarding the limitations, the sample used in this study was collected from only 6 classes (3 Romanian schools), having a limited representativeness. Second, both samples are small, at limit for SEM (Structural Modeling Equation) requirements. Third, the convergent validity of the rel- atively measured construct is at limit (acceptable for an exploratory study). Fourth, the breadth of formative indicators is inherently limited since the evaluation questionnaire was indented to capture the main usability aspects. Fifth, there are inherent limitations since the methodology regarding formative indexes estimation and validation is not mature yet. The usability question- naire used to collect the data was conceptualized in 2007 while the main recommendations for formative indexes development have been published only in 2008. Based on this work we intend to develop a new evaluation questionnaire having both formative and reflective items. The questionnaire will be used for the evaluation of a new version of the Chemistry application which is currently under development. Acknowledgements This work was supported by the research projects TEHSIN 503/2009 and ARiSE FP6-027039. Specification and Validation of a Formative Index to Evaluate the Ergonomic Quality of an AR-based Educational Platform 731 Bibliography [1] Anderson, J.C., Gerbing, D.W. Structural Equation Modelling in Practice: A Review and Recommended Two-Step Approach. Psychological Bulletin 103(3), 411-423, 1988. [2] Arbuckle, J.L. AMOS 16.0 User’s Guide. Amos Development Corporation, 2007. [3] Balog, A., Pribeanu, C. Developing a measurement model for the evaluation of AR-based educational systems. Studies in Informatics and Control 18(2), 137-148, 2009. [4] Balog, A., Pribeanu, C. The Role of Perceived Enjoyment in the Students’ Acceptance of an Augmented Reality Teaching Platform: a Structural Equation Modelling Approach . Studies in Informatics and Control 19(3), 319-330, 2010. [5] Bach, C., Scapin, D., Obstacles and perspectives for Evaluating mixed Reality Systems Us- ability. Proceedings of IUI-CADUI Conference 2004, 72-79, 2004. [6] Bollen, K., Lennox, R. Conventional wisdom on measurement: a structural perspective. Psychological Bulletin 110(2), 305-314, 1991. [7] Bruhn, M., Georgi, D., Hadwich, K. Customer equity management as formative second order construct. Journal of Business Research 61, 1292-1301, 2008. [8] Cadogan, J., Souchon, A., Procter, D. The quality of market-oriented behaviors: Formative index construction. Journal of Business Research 61, 1263-1277, 2008. [9] Davis, F.D. Perceived usefulness, perceived easy of use, and user acceptance of information technology. MIS Quaterly 13, 319-340, 1989. [10] Diamantopoulos, A., Winklhofer, H. Index construction with formative indicators : an alternative to scale development. Journal of Marketing Research 28, 269-277, 2001. [11] Diamantopoulos, A. The error term in formative measurement models : interpretation and modeling implications. Journal of Modeling in Management 1(1), 7-17, 2006. [12] Diamantopoulos, A., Riefler, P., Roth, K. Advancing formative measurement models. Jour- nal of Business Research 61, 1203-1218, 2008 [13] Edwards, J., Bagozzi, R. On the nature and direction of relationship between constructs and measures. Psychological Methods 5(2), 155-174, 2000. [14] Franke, G., Preacher, K., Rigdon, E. Proportional structural effects of formative indicators. Journal of Business Research 61, 1229-1237, 2008. [15] Gabbard, J., Swann, E. Usability engineering for augmented reality: Employing user-based studies to inform design. IEEE Transactions on Visualization and Computer Graphics 14(3), 513-525, 2008. [16] Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L. Multivariate Data Analysis, Prentice Hall, 2006. [17] Hornbaek, K. Current practice in measuring usability: Challenges to usability studies and research. Int. J. Human Computer Studies. 64, 79-102, 2006. 732 C. Pribeanu [18] Huang, H.M., Rauch, U., Liaw, S.S. Investigating learners’ attitude towards virtual reality learning environments: based on a constructivist approach. Computers & Education 55, 1171- 1182, 2010. [19] ISO 9126-1:2001 Software Engineering - Software product quality. Part 1: Quality Model [20] Jarvis, C.B., Mackezie, S., Podsakoff, M. A critical review of construct indicators and mea- surement models misspecification in marketing and consumer research. Journal of Consumer Research 30, 199-218, 2003. [21] Konradt, U., Christophersen, T., Schaefer-Kuelz, U. Predicting user satisfaction, strain and system usage of employee self-services. Int. J. of Human-Computer Studies 64, 1141-1153, 2006. [22] Krauss, M., Riege, K., Winter, M., Pemberton, L. Remote Hands-On Experience: Dis- tributed Collaboration with Augmented Reality. Proceedings EC-TEL 2009, LNCS 5794, Springer, 226-239, 2009 [23] Pribeanu, C., Balog, A., Iordache, D.D. Measuring the usability of augmented reality e- learning systems: a user-centered evaluation approach.Chapter 14: Software and Data Tech- nologies, CCIS 47, Corderiro, H., Shiskov B, Ranchordas A, Helfert M (Eds.), Springer, 175-186, 2009. [24] Ruiz, D.M., Gremler, D., Washburn, J., Carrion, G.C. Service value revisited: specifying a high order formative measure. Journal of Business Research 61, 1278-1291, 2008. [25] Tabachnick, B. G., Fidell, L. S. . Using Multivariate Statistics, 5th ed. Boston: Allyn and Bacon, 2007. [26] Wilcox, J., Howell, R., Breivik, E. Questions about formative measurement. Journal of Business Research 61, 1219-1228, 2008. [27] Wind, J., Riege, K., Bogen M. SpinnstubeŽ: A Seated Augmented Reality Display System, Virtual Environments: Proc. IPT-EGVE - EG/ACM Symposium, 17-23, 2007. Annex 1 Constructs and items ERG MM Items Variables Quality of F ERGP1 Observing through the screen is clear of visual and F ERGP2 The superposition between projection and the real object is clear auditory perception F ERGP3 Understanding the vocal explanations is easy (ERG-P) F ERGP4 Reading the information on the screen is easy Ease of F ERGO1 The work place is comfortable interaction and F ERGO2 Selecting a menu item is easy collaboration F ERGO3 Correcting the mistakes is easy (ERG-O) F ERGO4 Collaborating with colleagues is easy Ease of F ERGA1 Adjusting the "see-through" screen is easy adjusting devices F ERGA2 Adjusting the stereo glasses is easy (ERG-A) F ERGA3 Adjusting the head phones is easy Perceived ease R PEOL1 Understanding how to operate with ARTP is easy of learning to operate R PEOL2 Learning how to operate with ARTP is easy (PEOL) R PEOL3 Remembering how to operate with ARTP is easy *** General item R PEOU1 Overall, I find the system easy to use Note: MM(Measurement Model): F (Formative) / R (Reflective)