Microsoft Word - [4] 6436-Article Text-21751-1-11-20201218.docx Australasian Journal of Educational Technology, 2020, 36(6). 53 Deep neural networks for collaborative learning analytics: Evaluating team collaborations using student gaze point prediction Zhang Guo Computer and Information Sciences, University of Delaware Roghayeh Barmaki Computer and Information Sciences, University of Delaware Automatic assessment and evaluation of team performance during collaborative tasks is key to the research on learning analytics and computer-supported cooperative work. There is growing interest in the use of gaze-oriented cues for evaluating the collaboration and cooperativeness of teams. However, collecting gaze data using eye-trackers is not always feasible due to time and cost constraints. In this paper, we introduce an automated team assessment tool based on gaze points and joint visual attention (JVA) information drawn from computer vision solutions. We evaluated team collaborations in an undergraduate anatomy learning activity (N = 60, 30 teams) as a test user study. The results indicate that higher JVA was positively associated with student learning outcomes (r(30) = 0.50, p < 0.005). Moreover, teams who participated in two experimental groups and used interactive 3D anatomy models, had higher JVA (F(1,28) = 6.65, p < 0.05) and better knowledge retention (F(1,28) = 7.56, p < 0.05) than those in the control group. Also, no significant difference was observed based on JVA for different gender compositions of teams. The findings from this work have implications in learning sciences and collaborative computing by providing a novel joint attention-based measure to objectively evaluate team collaboration dynamics. Implications for practice or policy:  Student learning outcomes can be improved by receiving constructive feedback about team performances using our gaze-based collaborative learning method.  Underrepresented and underserved minorities of science, technology, engineering and mathematics disciplines can be engaged in more collaborative problem-solving and team-based learning activities since our method offers a broader reach by automating collaboration assessment process.  Course leaders can assess the quality of attention and engagement among students and can monitor or assist larger numbers of students simultaneously. Keywords: collaborative learning analytics, co-located team-based learning, gaze following, joint visual attention, deep learning, experimental design Introduction Collaborative learning is an essential educational instrument for teaching and learning. As a team-based and student-centred educational practice, it promotes student motivation and enhances knowledge retention via teamwork and cooperation (Sung & Hwang, 2013). While collaborative learning has been introduced and practiced in co-located settings (Barmaki et al., 2019; Huang et al., 2019; Prinsen et al., 2007; Schneider et al., 2018; Sung & Hwang, 2013), as well as distributed settings (de Freitas & Griffiths, 2007; Li et al., 2008; Schaf et al., 2009; Tao et al., 2019), measuring and evaluating collaboration still remains a challenge. Fairness of group work distribution (Ng et al., 2019), rationality of collaborative conditions (Innes & Booher, 2016) and automatism of process analytics (Rosé et al., 2008) are some of the core issues that need to be considered during collaborative learning analytics, especially in relatively large teams (Bertsimas & Gupta, 2016). Understanding gender effects in collaboration dynamics and investigating best learning practices in teams are also crucial aspects of collaborative learning analytics, which is the focus of our paper. Gaze-oriented cues can be used as a means of obtaining information about the cognitive activities of a collaborator, and there is evidence that students look at and point to the same object during collaborative Australasian Journal of Educational Technology, 2020, 36(6). 54 co-located learning activities (Schneider & Pea, 2013; Schneider et al., 2018). This gaze alignment is called joint visual attention (JVA; Van Rheden et al., 2017) – see Figure 1, for example. JVA is a strong predictor of successful collaboration among students (Pietinen et al., 2010; van der Meulen et al., 2016). Compared with collecting traditional self-reported survey data or collaboration system data for one-time performance evaluation (Wendler, 2006), capturing gaze alignment during the entire collaborative process with the JVA features can reveal more valuable information about the quality of interactions among teams (Wahn et al., 2016). New technologies provide innovative methods to extract and measure JVA features. A growing number of researchers have taken advantage of sensor-based eye-trackers to objectively measure gaze features during various social interactions, especially for co-located collaborative tasks (Huang et al., 2019; Schneider et al., 2018). However, despite the provision of highly accurate data from eye-tracking devices, these sensors are usually highly priced and may introduce limitations for educational study settings, specifically those needed to be carried out in classrooms and not in research labs. For example, the calibration might be a time-consuming process or the study activity is needed to be completed within the limited tracking range of the sensors. (a) (b) Figure 1. Examples of the gaze-following method in our study: (a) with JVA feature: students’ gaze points converge on the tablet; (b) without JVA feature: students look at their own notes. With the emergence of different deep learning techniques, the gaze tracking problem can be approached differently. Using deep neural networks to track the gaze features from a sequence of 2D images or videos is practical and robust in understanding and interpreting student behaviours in human-human and human- object interaction (Lian et al., 2018; Recasens et al., 2015). For example, when two students are looking for a path from the library to the gym on a single university map, by following their gaze direction, we can easily find out if they are sharing the same information and we can predict whether they will pick the same path. Compared with eye tracking and gaze estimation, the gaze following method (Hansen & Ji, 2009; Lian et al., 2018; Santini et al., 2017) not only estimates the gaze direction but also predicts the gaze point from the image without the need for specialised hardware (e.g., head-mounted camera, infrared light source) and obtrusive gaze calibration procedure. Figure 1 shows the application of gaze following method on top of two images captured from our test user study. In this paper, we introduce a computer vision-based solution for team performance evaluation using mutual gaze point predictions, along with a collaborative anatomy learning activity as a test user study for our approach. We recorded collaborative activity sessions as a sequence of images with a colour camera. For collaboration analysis, we first tracked team members’ gaze directions and the focus objects during the activity using the gaze following method (Lian et al., 2018) with a deep neural network framework. We then extracted the JVA features of the teams and analysed them with other collected data, including post- test scores and demographics information related to team gender composition. This study hypothesised that students who share mutual gazes during the activity – for example, those teams with higher JVA values – obtain higher scores in their post-activity knowledge tests as well since they engage more in collaborative tasks. We were also interested in understanding if JVA values are varied significantly in different study conditions and different gender compositions of teams for collaborative learning. Australasian Journal of Educational Technology, 2020, 36(6). 55 This paper is organised as follows: First, we review the literature on the related work in anatomy education, gender effects, collaborative learning analytics, JVA applications and gaze following approaches. We then introduce details of the intervention, data sets, the gaze following method and our proposed assessment measures for collaboration. Finally, we present the findings from our test user study and discuss further implications. Literature review Educational technology for anatomy education In the domain of human anatomy learning, different education technologies have recently replaced traditional teaching methods such as lectures, cadavers and textbooks. With modern computer-assisted technologies, 3D visualisation methods improve students’ performance by allowing them to explore 3D anatomical models on 2D mobile device screens (Yammine & Violato, 2015). Mobile-based applications and web-based 3D games have been used as learning tools for the study of human skeletal, muscular and cardiovascular systems to name a few (Golenhofen et al., 2020; Lemos et al., 2019). Virtual reality (VR) and augmented reality (AR) techniques have been adopted into medical education and surgical training fields in recent years (Maresky et al., 2019; Silva et al., 2018). As powerful learning tools, VR and AR engage students in an immersive environment with audio and visual interactions, and stereoscopic 3D models to enhance their learning experience (Barmaki et al., 2019; Hackett & Proctor, 2018; Luursema et al., 2006, 2008). We evaluated team performance in a controlled study that leveraged modern anatomical content visualisation in 3D with handheld tablet devices and large-scale AR displays. Gender effects and collaboration As the results of the different socialisation processes, gender differences have been discussed by recent studies at the team level. The relationship between gender and collaboration is not uniform, and it varies based on different disciplines and tasks (Fernandez-Sanz & Misra, 2012; Wegge et al., 2008). Researchers have shown that females have better information-processing skills than males during cognitive tests (Rabbitt et al., 1995; Schaie & Willis, 1993). Females’ higher management ability in collaborative tasks has also been highlighted (Bear & Woolley, 2011; De Paola et al., 2018; Eagly & Carli, 2003). Several studies (Bear & Woolley, 2011; De Paola et al., 2018; Eagly & Carli, 2003; Rabbitt et al., 1995; Schaie & Willis, 1993) concluded that collaboration performance would be improved with females involved. Other research showed that women often had negative experiences on teams due to gender biases at the technical level, especially in science, technology, engineering and mathematics (Meadows et al., 2015; Meiksins et al., 2015). Conversely, some reported no significant gender effect. Andersson (2001) argued that, although females have better performance on individual memory tasks, no gender effect was found in collaborative tasks. Prinsen et al. (2007) noted that females were more likely to collaborate, and males were more assertive in computer-mediated communication (Herring, 1996), and computer-supported collaborative learning (CSCL; Lipponen, 1999) settings. In the same 2007 study, Prinsen et al. also acknowledged that different distributions of roles in collaborative learning might change gender contributions. We explored potential gender differences in our anatomy learning study in association with learning outcomes and joint attention measures. Collaborative learning analytics The importance of social interactions during the learning process has been emphasised (Okita et al., 2008). Collaborative learning not only helps students to improve teamwork skills but also promotes learning motivation, increases learning experience, enhances brainstorming skills (Guo & Barmaki, 2019; Webb et al., 1995) and facilitates their learning performance during team interactions (Barmaki et al., 2019; Sung & Hwang, 2013). Consequently, researchers have highlighted that instead of using new learning formats, more attention should be paid to the measurements of and access to collaboration performance (Huang et al., 2019). In early attempts to analyse collaborative learning, Soller and Lesgold (1999) provided a practical collaborative learning framework and evaluated active learning skills using conversational interaction data collected from surveys. With advancements in CSCL research, machine learning techniques have been used to predict student grades, using support vector machines (Baker & Yacef, 2009), decision trees (Rosé et al., 2008) and regression (Kotsiantis, 2012) to name a few. Those solutions either established effective collaborative learning models or built reasonable standards for evaluating collaboration performance based Australasian Journal of Educational Technology, 2020, 36(6). 56 on single-time solicitation techniques. However, the data used in those models were collected from class attendance, quiz scores, or reports, which may represent only students’ one-time or episodic performance during the learning activities. Collaborative learning analytics research has been much of attention mainly in flipped classrooms with collaborative problem-solving activities primarily in science, technology, engineering and mathematics, mediated by computers either in co-located or distributed learning settings. The majority of research for distributed studies lies in CSCL studies; for example, Subburaj et al. (2020) presented a collaborative problem-solving model for an educational physics game with 101 teams of undergraduate students. Facial expressions, acoustic-prosodics, eye gaze and task context information were captured in the last minute of the intervention and used as measures for predicting success at solving the game. The combined predictive model of non-verbal cues with language-based features outperformed other predictive models. Also, behavioural cues such as eye gaze (Vrzakova et al., 2019), head pose (Otsuka et al., 2018), prosody and acoustics (Miura & Okada, 2019; Murray & Oertel, 2018), as well as language (Flor et al., 2016) have been investigated in collaborative learning analytics for group outcomes including task performance. In co- located learning scenarios, despite their similarity of approaches to distributed settings, physical proximity and movement dynamics in teams were a key factor in the collaboration. In our previous work (Guo & Barmaki, 2019), we used an object detection approach atop image data from a collaborative anatomy learning activity to extract useful proximity features, such as the locations of students and objects in the scene. Research also used multimodal learning analytics techniques and high-level features from dissimilar sources such as video and sketchpads to discriminate between experts and non-experts in groups of students (Ochoa et al., 2013), and understand team performance from physical engagement, satisfaction and individual accountability perspectives (Spikol et al., 2017). JVA applications JVA features have been introduced to a broad range of applications, including collaborative search (Brennan et al., 2008), mediated interaction (Bente et al., 2007), infant-caregiver interaction (Markus et al., 2000) and training for children with autism (Whalen & Schreibman, 2003). Interest has grown in the use of synchronised eye-trackers to quantitatively measure gaze alignment in various collaborative situations (Bryant et al., 2019; Huang et al., 2019; Kim et al., 2020; Van Rheden et al., 2017). However, there are challenges in using eye-tracking sensors, including the high cost of the devices, and restricted environmental and calibration settings (e.g., the camera should be precisely in front of the student within a close distance and on top of a specific panel (Huang et al., 2019)). Image-based computer vision methods – as a more affordable alternative approach – have also been used for extracting gaze features in previous studies. Using a colour camera, Yücel et al. (2013) presented an image-based head pose estimation method for establishing joint attention between an experimenter and a robot. Harari et al. (2018) used image segmentation to identify the common gaze target by combining the estimated 3D gaze direction. Gaze following using deep learning There has been an expanding interest in the estimation and reconstruction of human gaze direction from 2D images to identify their activities in the scene using various deep learning frameworks. Gaze following is the task of following people’s gaze in a scene and inferring what they are looking at. Compared with eye- tracking and gaze estimation, gaze following not only estimates the gaze direction but also detects the focus point from the image (Lian et al., 2018). Patacchiola and Cangelosi (2017) proposed a face detector to extract face landmarks and estimate head poses using convolutional neural networks. Marín-Jiménez et al. (2014) used head pose detection with implicit pose information to detect human-human interaction in videos. However, those works were limited by the complexity of inputs (massive eye-tracking data: (Yücel et al., 2013); restricted situations (resolution of the image: (Marín-Jiménez et al., 2014); and field of view (the distance between the camera and students: (Zhu & Ramanan, 2012). In the work of Recasens et al. (2015), the gaze point of multiple observers in daily scenarios was predicted using deep neural networks and saliency models of attention. Mukherjee and Robertson (2015) combined RGB-depth images and multimodal data to reconstruct 3D head poses and follow gaze direction in images and videos. These studies motivated the work reported here, to use a deep learning approach to target gaze alignment features for the novel application of collaborative learning analytics. Australasian Journal of Educational Technology, 2020, 36(6). 57 We were interested in understanding how two students are interacting with one another, or with objects, and following the gazes of multiple observers in a scene. During the preparation stage of our study, we tested various algorithms for gaze feature extraction, including facial landmark detection (Patacchiola & Cangelosi, 2017) and head pose detection (Marín-Jiménez et al., 2014) to predict gaze direction. However, neither facial landmark nor head pose can be detected completely when participants are back to the camera or face downwards, which was not practical for our study (see Figure 2(a) and (b)). Hence, we used gaze following method (Lian et al., 2018) to estimate both the gaze direction and the gaze points to collect human-human interaction information. Further details about our approach are presented in the following section. (a) (b) (c) Figure 2. Example results from the application of multiple methods on our data set (a) head pose detection (Marín-Jiménez et al., 2014), (b) facial landmark detection (Patacchiola & Cangelosi, 2017) and (c) gaze following method (Lian et al., 2018). Method and materials Intervention We conducted a between-subjects study of collaborative muscle learning intervention in a laboratory course (General Biology) as part of an undergraduate premedical program at Johns Hopkins University. A total of 301 students in 138 teams participated in the original study, we selected the data from a subset of teams with two members (N = 60, 30 teams) as our test data set. Students worked in teams to complete a muscle painting activity (Barmaki et al., 2019; Marieb, 2015) as part of their required laboratory activities. They were expected to identify and paint the major muscles of their body using one of the learning instruments (textbook, tablet or AR) and washable painting supplies. The first student played the role of a model, while their teammate, as a painter, located the major upper-limb muscles with the aid of their laboratory manual (Marieb, 2015) or other digital devices and painted the model’s upper limb. Afterward, students switched roles, and the upper-limb painter became a model for the lower limb. The goal of the learning activity was to ensure all students could gain knowledge of anatomy in a collaborative effort. See Figure 3 to learn more about the intervention details. As briefly mentioned, our study had three different settings based on instrumental tools. Students in the control group used textbooks as their learning tools. In experimental group I, instead of a textbook, students used our in-house interactive app on the tablet as a 3D musculoskeletal visualising system. Experimental group II used a screen-based AR system – also developed internally – where students could see themselves with augmented anatomy visualisations on a large display (Barmaki et al., 2019). The knowledge base information presented in all instrumental tools was identical to mitigate potential confounding factors related to student workload and learning. There was also a mobile workstation inside the laboratory room to capture snapshots from students during learning activities. Figure 3 shows the three study conditions of the learning activity. Australasian Journal of Educational Technology, 2020, 36(6). 58 (a) (b) (c) Figure 3. Study conditions for students in pairs to complete anatomy painting intervention using (a) a textbook, (b) an interactive app on the tablet, or (c) a screen-based AR system. The study was approved by the Institutional Review Board (Protocol ## HIRB00005021) in May 2018, and oral informed consent was obtained from each participant student before the study commenced. After consent, students entered the activity room with their teammates and completed the task. All students completed both pre- and post-activity questionnaires and knowledge tests. Data sets Surveys Using the Qualtrics application, survey data was collected from all students individually after completing the activity. The survey consisted of demographics information, usability questions and a post-test about the human muscle system. Image training data We adopted GazeFollow, the large-scale gaze-following data set used for training from Recasens et al.’s (2015) study. This benchmark data set had 130,339 people and 122,143 images in total with gaze points inside the image. Image test data The test data set consisted of 4,646 images collected from 30 teams (pairs) of students during the collaborative learning activity in the three conditions (10 teams from each condition of the textbook, tablet and AR, totalling images from 30 teams). Images were captured every 10 seconds, and each image file was timestamped. The resolution of each test image was 2560 x 1440 pixels. Images with camera difficulties or additional individuals in the scene were discarded. Gaze following framework To extract shared gaze features from the images, we needed to estimate the students’ gaze direction and focus in the scene. Thus, we applied a two-stage gaze following approach (Lian et al., 2018) on our test data set. This method was very suitable for our project since it was capable of detecting the gaze direction from the head image and predict the potential gaze point along the gaze direction, via deep neural networks. The gaze following approach and its underlying network architecture is shown in Figure 4. The gaze following framework was inspired by the human behaviour of gaze following (Lian et al., 2018). First, a gaze direction was estimated from the gaze direction pathway. In the gaze direction pathway, the resized head image (224 x 224) – image sizes are listed in pixels hereafter – was fed into the convolutional neural network ResNet-50 (Rezende et al., 2017) for feature extraction. Then, the head features were concatenated with head position features encoded by one fully connected layer for gaze direction estimation. A coarse gaze direction was predicted as the vector output and then encoded as multi-scale gaze direction fields. The gaze point was assumed to be in the gaze direction or line of sight. Next, the multi- scale gaze direction fields were combined with the scene contents (224 x 224) and fed into the heat map pathway for heat map regression using a feature pyramid network (Zhao et al., 2019). Australasian Journal of Educational Technology, 2020, 36(6). 59 The heat map (56 x 56) represented the probability distribution of the gaze point, and the point with the maximum value of the heat map represented the probable gaze point of the scene. Figure 4. The network architecture for the gaze following method (Lian et al., 2018) atop our collaborative study image frames. Using the heat map, we can predict the gaze point convergence (focus point) of students in the collaborative activity. (a) (b) (c) (d) (e) (f) Figure 5. Gaze following results for three sample frames: (a) gaze directions with blue lines; (b) output without the JVA feature (Euclidean distance between the gaze points of students is greater than 100 pixels; (c) output with the JVA feature (Euclidean distance between the gaze points is smaller than 100 pixels); and (d–f) heat maps associated with the gaze points. Note. While two distinct sheets of papers are predicted as gaze points for team members in (e), (d) and (f) are examples of the convergence of visual attention on the tablet device; thus, the Is_JVA variable is true in (d) and (f). Lian et al. (2018) claimed that their gaze following approach outperformed other existing methods in gaze point prediction. Compared with state of the art (Recasens et al., 2015), Lian et al.’s method decreased 23.68% of the Euclidean distance error for gaze point on the GazeFollow data set. We chose Lian et al.’s Australasian Journal of Educational Technology, 2020, 36(6). 60 gaze following model because it managed to simulate the gaze following behaviour of a third person view. Furthermore, Lian et al. trained this model robustly on a large data set by using the heat maps for focus point prediction. Figure 2 highlights its strengths over other existing solutions. The gaze following output shown in Figure 5(a) visually draws a blue gaze line on the original image for each individual in the scene. The blue line is initiated at the eye location and terminated at the predicted final gaze point. The highlighted region in the corresponding heat map – Figure 5(d–f) – represents the predicted gaze point where the students are looking. The output also marks the coordinates of each gaze point – which are used in our approach as a collaboration metric. We were interested in the automatic recognition of joint or mutual gaze visual attention among students in every image sequence during the collaborative task. Further details about the JVA feature analysis as a collaboration measure are presented in the following section. Evaluation measures We analysed team performance and collaboration based on objective measures related to joint attention, knowledge retention, study conditions and gender composition of the teams. These evaluation measures of collaboration are described as follows. JVA ratio JVA represents the shared focus of two or more individuals and plays a key role in collaboration prediction (Bruinsma et al., 2004). In this work, we defined the JVA ratio for each team based on the frequency with which the two students shared gazes during the collaborative activity, divided by total image frames captured from the team – a normalised measure of JVA based on the total activity frames of the teams. Since there was a lot of cooperation between painters and models during the learning activity process, they needed to maintain joint attention most of the time: while painting, discussing and looking at the learning materials. For example, when the painter was painting, both painter and model may have looked at the same location, the active painting region. When students needed to find the muscle’s correct location, the painter and the model may have shared the screen of the interactive app on the tablet or the AR view to zoom in on the 3D musculoskeletal system. We used Euclidean distance between the gaze points detected by the gaze following method for automatic identification of the JVA in each image frame (Is_JVA was a Boolean variable per each frame; it was set to false by default). Based on image size and resolution, we recognised the JVA - our mutual gaze feature - and set Is_JVA as true, if that distance was smaller than 100 pixels. For each team, the JVA ratio was computed by number of frames in which the Is_JVA variable was true, divided by the total number of frames. Some examples with and without JVA recognition are shown in Figures 1 and 5. We were interested to learn if a higher JVA ratio is associated with better learning outcomes. Team post-test score The key objective of this collaborative learning activity was to enhance the anatomy knowledge retention of students. All participants needed to independently – not with the assistance of their peers – locate and label five muscle names in a diagram of the human musculature in the post-test; thus, individual test scores ranged between 0 and 5 with discrete values. Since we used the average of post-test scores per team and named it team post-test score, the team post-test score was still in the same range, but non-discrete values were also observed in the data set. Study conditions As mentioned earlier, there were three different conditions or settings for our muscle painting study. Students in the control group used textbooks as their learning tools. The experimental groups either used a tablet or the screen-based AR system to complete the task. We wanted to investigate the differences in team performance based on these three conditions and the two groups. Gender composition Students were preassigned randomly to teams to complete the muscle painting activity. There were three possible gender compositions per pair of students: male pair, female pair and mixed pair. We were interested in evaluating the gender effects in collaborative learning and investigating if any significant variability of JVA ratios and knowledge tests was present in female-female, male-male and male-female Australasian Journal of Educational Technology, 2020, 36(6). 61 (mixed) pairs of students. As females had a higher enrolment rate in the General Biology lab – which was a common pattern in premedical programs across the nation (Barzansky, 1997; Brooks, 2017) – our study participants were also mostly females with a total of 38 out of 60 study participants (out of the 30 teams, half of them were female-only, eight were mixed and the remaining seven were male-only teams). This does not mean gender imbalance in the data since the sample population is a representative subset of the target population in premedical programs (Barzansky, 1997; Brooks, 2017). Results In the following, we report descriptive and inferential results from our test user study. In particular, we looked at the JVA ratio – an automatically generated measure based on our proposed framework using deep neural networks – in association with our evaluation measures. Table 1 summarises the descriptive statistics for pairs of students in each study condition, including the number of teams, mean values and standard deviations for JVA ratios and team post-test scores. Table 1 Summary of JVA ratio and team post-test score with different instrumental tools Group Condition (Instrumental tool) Observation (teams) JVA ratio (%) Team post-test score n M ± SD M ± SD Control Textbook 10 31.30 ± 9.73 1.15 ± 0.95 Tablet 10 46.50 ± 15.43 2.35 ± 1.03 AR 10 44.60 ± 17.28 2.35 ± 1.42 Experiment Combined (tablet & AR) 20 45.55 ± 15.97 2.35 ± 1.20 Total Textbook, tablet, AR 30 40.80 ± 15.59 1.95 ± 1.25 Note. JVA = joint visual attention; M = mean; n = number of observations; SD = standard deviation; Team post-test score is in range from 0 to 5; JVA ratio (%) is in [0–100] range. Participants We analysed data from 60 participants (38 females) in 30 teams. All of these students were enrolled in the undergraduate premedical program at Johns Hopkins University. There were 10 teams for each condition – textbook, tablet and AR. Knowing that tablet and AR conditions were part of the experimental group, we had 20 teams in the experimental group and 10 in the control group. Data from teams with a larger size, those in different activity rooms, those with students under 18 years of age and those with incomplete data were excluded in this study. JVA ratio JVA ratio was the percentage of the time teams had shared mutual or joint attention during the learning activity. Although no significant difference between the three study conditions and the JVA ratio was observed, the p value was very close to the critical value of α (F(2,27) = 3.26, p = 0.054, ns – ns stands for statistically non-significant). Interestingly, the JVA ratio of the two experimental groups of tablet and AR (n = 20, M = 45.6, SD = 15.97) was higher than those in the control condition who used textbook (n = 10, M = 31.3, SD = 9.73) and this finding was statistically significant with a large effect size (F(1,28) = 6.65, p < 0.05, Cohen’s d = 1.00). Table 1 and Figure 6 provide additional information about the JVA ratio distribution across all study conditions and groups. Australasian Journal of Educational Technology, 2020, 36(6). 62 (a) (b) Figure 6. The boxplot with observed data points for JVA ratio across (a) different study conditions (instrumental tools) of textbook, tablet and AR, (b) two groups of control (textbook) and experiment (tablet and AR). JVA ratio was significantly different between control and experimental groups. Team post-test score A significant difference based on team post-test scores was observed among study conditions of textbook (M = 1.15, SD = 0.95), tablet (M = 2.35, SD = 1.03) and AR (M = 2.35, SD = 1.20), (F(2,27) = 3.64, p < 0.05, r2 = 0.16, medium effect size). Post-hoc comparisons indicate that pairs of textbook and tablet, and textbook and AR conditions were different from each other based on differences in the means. Similarly, the team’s average post-test score from the two experimental groups of tablet and AR (M = 2.35, SD = 1.20) was significantly higher than those in the control group (M = 1.15, SD = 0.95), and this finding was also statistically significant with a large effect size (F(1,28) = 7.56, p < 0.05, Cohen’s d = 1.06). See Figure 7 to learn more. (a) (b) Figure 7. The boxplot with observed data points for team post-test score across (a) different study conditions (instrumental tools) of textbook, tablet and AR, (b) different groups of control and experiment. Australasian Journal of Educational Technology, 2020, 36(6). 63 JVA ratio and team post-test score We also measured the association between JVA and team post-test scores using the Pearson correlation coefficient. The Pearson correlation measure indicates a significant positive linear association (Zou et al., 2003) with a strong relationship between the JVA ratio and team post-test scores (r(30) = 0.50, F(1,28) = 9.33, p < 0.005, r2 = 0.25, large effect size). The scatter plot drawn from data is shown in Figure 8. This finding shows that JVA features are strongly associated with learning outcomes, such as post scores. Points on the scatter plot closely resemble a straight line with a positive slope, which shows that post-test scores increase with higher JVA ratios. Therefore, the team with a high frequency of sharing gazes is more likely to achieve better outcomes in the post-test. Figure 8. The scatter plot of JVA ratio with team post-test scores. The Pearson correlation and its underlying regression model indicate a significant positive correlation between JVA ratio and team post- test score. Gender composition We recorded the gender composition for each of the 30 teams from the survey data and investigated the gender effects on collaborative learning during the activity (see Table 2 and Figure 9 to learn more). Overall, mixed pairs (eight teams) achieved the highest JVA ratio (M = 47.0, SD = 15.46) and the best learning outcomes from post-test scores (M = 2.50, SD = 1.28), but this variability was not statistically significant (F(2,27) = 1.29, p = 0.29, ns). Moreover, no significant difference was observed based on gender composition in teams for JVA ratios (F(2,27) = 1.10, p = 0.35, ns). Table 2 Summary of JVA ratio and team post-test score with different gender compositions Gender composition Observation (teams) JVA ratio (%) Team post-test score n M ± SD M ± SD Females 15 37.00 ± 15.05 1.63 ± 1.29 Males 7 41.86 ± 16.72 2.00 ± 1.04 Mixed 8 47.00 ± 15.46 2.50 ± 1.28 Total 30 40.80 ± 15.59 1.95 ± 1.25 Note. JVA = joint visual attention; M = mean; n = number of observations; SD = standard deviation; Team post-test score is in range from 0 to 5; JVA ratio (%) is in [0–100] range. Australasian Journal of Educational Technology, 2020, 36(6). 64 (a) (b) Figure 9. The boxplot with observed data points across teams with different gender compositions: (a) JVA ratios, (b) team post-test scores. No significant difference was observed in the study for JVA ratios nor post-test scores for different pairs of students. Note. Among these 30 pairs or teams of participants, there were 15 female-female, seven male-male and eight mixed pairs. Discussion JVA ratio Capturing gaze alignment during the collaborative process with the JVA criterion can reveal valuable information about the quality of interaction among teams (Bruinsma et al., 2004; Bryant et al., 2019; Markus et al., 2000; van der Meulen et al., 2016; Wahn et al., 2016); although, not many studies have investigated computer vision–based approaches to better measure and capture it in co-located team-based learning activities. In this paper, we introduced a novel assessment tool for automatic team performance evaluation using mutual gaze information using the gaze following method (Lian et al., 2018). Compared with other methods using traditional one-time performance evaluation (Wendler, 2006) or high-cost eye- tracking devices (Bryant et al., 2019), our method was able to automatically extract JVA features during the whole learning process with a simple colour camera. We also investigated the effectiveness of our JVA method in a test user study. Results show that the JVA ratios of the two experimental groups of tablet and AR were significantly higher than those in the control group, who used the textbook. Our findings are supported by research based on gaze information from student users that looked at e-textbooks as a potential alternative learning tool (Gelderblom et al., 2019), although that research was limited to individual learners and not teams. Team post-test score Post-test scores indicate student achievement from the learning activity (Morris, 2008). In this study, we set up three different study conditions by using different instrumental tools for an anatomy learning activity. Right after the activity, post-test scores were collected from students using a survey completed individually, and the team post-test score was calculated as the average of team members’ individual test scores. Team post-test scores of the two experimental groups of tablet and AR were significantly higher than those in the control group. Furthermore, our research on collaborative learning analytics conducted with 288 students in May 2017 (Barmaki et al., 2019) also showed that higher test scores were achieved from experimental groups who used the AR system. These findings are in agreement with previous studies in anatomy education, which highlighted the potential of using evolving technologies such as mixed and augmented Australasian Journal of Educational Technology, 2020, 36(6). 65 realities for enhancing student learning and outcomes in anatomical science education (Maresky et al., 2019; Nicholson et al., 2006; Silva et al., 2018). JVA ratio and team post-test score There was a significant positive linear association with a strong relationship between the JVA ratio and the team post-test score. This is in agreement with the hypothesis of our study: students who shared mutual gaze with their teammates for a longer time on the learning task were more likely to obtain higher scores in their post-activity knowledge tests. This finding also agrees with previous research on the positive effects of sharing a gaze on learning outcomes, including imitation and socio-cognitive performance (Carpenter & Tomasello, 1995; Hirotani et al., 2009). Features like hand distance, speed and face count have been used as high-level features for collaborative learning analytics (Spikol et al., 2017), and they seem to have potential in practice-based learning. Our findings from the JVA and team performance outcomes also show the potential of proximity-based, behavioural measurements in co-located or practice-based collaborative learning analytics. Gender composition There were three possible gender compositions per pair of students: females, males and mixed pairs. About half of the participants were in female-only teams, which is also a common gender enrolment rate in life sciences and premedical programs (Barzansky, 1997; Brooks, 2017). Even though mixed teams had slightly higher JVA ratios and better learning outcomes, no significant difference was observed based on the gender composition of the teams neither for the JVA ratio nor for the team post-test score. These findings are consistent with previous studies that noted no gender effect in life sciences studies (Andersson, 2001; Prinsen et al., 2007). In addition, we found that compared with control groups using text and 2D anatomy models from the textbook, the students in both experimental groups had higher JVA ratios and better knowledge retention by interacting with 3D models on the tablet screen or AR system. Specifically, in teams with the screen- based AR, students could easily collaborate and locate specific muscles with high accuracy projected on top of their own bodies. This outcome was also highlighted in a recent meta-analysis study as collaborative learning being the most beneficial approach in any AR interventions (Garzón et al., 2020). Our study also provides further evidence that 3D visualisation technologies increase students’ engagement and improve their knowledge retention in human anatomy learning (Hackett & Proctor, 2018; Luursema et al., 2008; Nicholson et al., 2006; Yammine & Violato, 2015). Conclusion In this paper, we introduced an automated team assessment tool based on gaze points and JVA information extracted by computer vision solutions. The results from a pilot study indicate that experimental teams who interacted with 3D digital learning tools had a high frequency of JVA and better knowledge retention outcomes than those in the control group. We also investigated the association of user study gender composition effects on JVA ratios and team test scores. We found no significant difference for JVA ratios or post-test scores among different teams with varied gender compositions. This work was a preliminary study to automatically assess team collaboration with computer vision techniques. Like any other project, there is room for improvement. The focus was to understand collaboration on co-located situations, and shared gaze features were identified during the post-analysis process. There was a subset of collected data; since the main objective of this paper was to first understand the dyadic interactions of students, we excluded larger teams. In further planned research, a data set with team sizes larger than two would better illustrate our idea and validate our findings. In future work, it will be ideal to evaluate the effectiveness of adopting our method to other collaborative learning scenarios. Furthermore, we plan to work on multiple computer vision techniques by combining multiple image-based features, such as facial expression recognition, emotion recognition and head and body pose estimation, along with joint attention estimation to more comprehensively interpret collaboration dynamics. The findings from this work have implications in educational technology and collaborative computing by offering a novel assessment tool for team collaborations based on gaze information. Australasian Journal of Educational Technology, 2020, 36(6). 66 Acknowledgements We wish to express our sincere gratitude to our research collaborators at Johns Hopkins University, especially the General Biology course instructors Drs Rebecca Pearlman and Richard Shingles; research partners Drs Nassir Navab and Greg Osgood; our developers Kevin Yu and Felix Bork; and laboratory technicians, teaching assistants and study participants for their invaluable contributions to making this study possible. Statement on open data, ethics and conflict of interest The study was approved by the Institutional Review Board (Protocol # HIRB00005021). Since participant information was identifiable from the image data, we were not able to share participants’ data from this study based on the ethical requirements. The authors declare no conflicts of interest. References Andersson, J. (2001). Net effect of memory collaboration: How is collaboration affected by factors such as friendship, gender and age? Scandinavian Journal of Psychology, 42(4), 367–375. https://doi.org/10.1111/1467-9450.00248 Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17. https://doi.org/10.5281/zenodo.3554657 Barmaki, R., Yu, K., Pearlman, R., Shingles, R., Bork, F., Osgood, G. M., & Navab, N. (2019). Enhancement of anatomical education using augmented reality: An empirical study of body painting. Anatomical Sciences Education, 16(6), 599–609. https://doi.org/10.1002/ase.1858 Barzansky, B. (1997). Educational programs in US medical schools, 1996-1997. JAMA: The Journal of the American Medical Association, 278(9), 744–749. https://doi.org/10.1001/jama.1997.03550090068035 Bear, J. B., & Woolley, A. W. (2011). The role of gender in team collaboration and performance. Interdisciplinary Science Reviews, 36(2), 146–153. https://doi.org/10.1179/030801811X13013181961473 Bente, G., Eschenburg, F., & Krämer, N. C. (2007). Virtual gaze. A pilot study on the effects of computer simulated gaze in avatar-based conversations. In R. Shumaker (Ed.), Lecture notes in computer science: Vol. 4563. Virtual reality (pp. 185–194). Springer. https://doi.org/10.1007/978-3-540- 73335-5_21 Bertsimas, D., & Gupta, S. (2016). Fairness and collaboration in network air traffic flow management: An optimization approach. Transportation Science, 50(1), 57–76. https://doi.org/10.1287/trsc.2014.0567 Brennan, S. E., Chen, X., Dickinson, C. A., Neider, M. B., & Zelinsky, G. J. (2008). Coordinating cognition: The costs and benefits of shared gaze during collaborative search. Cognition, 106(3), 1465– 1477. https://doi.org/10.1016/j.cognition.2007.05.012 Brooks, M. (2017, December 19). More women than men enrolled in US med schools for first time. Medscape. https://www.medscape.com/viewarticle/890321 Bruinsma, Y., Koegel, R. L., & Koegel, L. K. (2004). Joint attention and children with autism: A review of the literature. Mental Retardation and Developmental Disabilities Research Reviews, 10(3), 169– 175. https://doi.org/10.1002/mrdd.20036 Australasian Journal of Educational Technology, 2020, 36(6). 67 Bryant, T., Radu, I., & Schneider, B. (2019). A qualitative analysis of joint visual attention and collaboration with high-and low-achieving groups in computer-mediated learning. In K. Lund, G. P. Nicollai, E. Lavoué, C. Hmelo-Silver, G. Gweon, & M. Baker (Eds.), Proceedings of the 13th International Conference on Computer Supported Collaborative Learning (Vol. 2, pp. 923–924). International Society of the Learning Sciences. https://repository.isls.org/bitstream/1/1731/1/923- 924.pdf Carpenter, M., & Tomasello, M. (1995). Joint attention and imitative learning in children, chimpanzees, and enculturated chimpanzees. Social Development, 4(3), 217–237. https://doi.org/10.1111/j.1467- 9507.1995.tb00063.x de Freitas, S., & Griffiths, M. (2007). Online gaming as an educational tool in learning and training. British Journal of Educational Technology, 38(3), 535–537. https://doi.org/10.1111/j.1467- 8535.2007.00720.x De Paola, M., Gioia, F., & Scoppa, V. (2018). Teamwork, leadership and gender (IZA Discussion Papers, No. 11861). Institute of Labor Economics. https://www.econstor.eu/bitstream/10419/185321/1/dp11861.pdf Eagly, A. H., & Carli, L. L. (2003). The female leadership advantage: An evaluation of the evidence. The Leadership Quarterly, 14(6), 807–834. https://doi.org/10.1016/j.leaqua.2003.09.004 Fernandez-Sanz, L., & Misra, S. (2012). Analysis of cultural and gender influences on teamwork performance for software requirements analysis in multinational environments. IET Software, 6(3), 167–175. https://doi.org/10.1049/iet-sen.2011.0070 Flor, M., Yoon, S.-Y., Hao, J., Liu, L., & von Davier, A. (2016). Automated classification of collaborative problem solving interactions in simulated science tasks. In J. Tetreault, J. Burstein, C. Leacock, & H. Yannakoudakis (Eds.), Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 31–41). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-0504 Garzón, J., Baldiris, S., Gutiérrez, J., & Pavón, J. (2020). How do pedagogical approaches affect the impact of augmented reality on education? A meta-analysis and research synthesis. Educational Research Review, 31, Article 100334. https://doi.org/10.1016/j.edurev.2020.100334 Gelderblom, H., Matthee, M., Hattingh, M., & Weilbach, L. (2019). High school learners’ continuance intention to use electronic textbooks: A usability study. Education and Information Technologies, 24(2), 1753–1776. https://doi.org/10.1007/s10639-018-9850-z Golenhofen, N., Heindl, F., Grab-Kroll, C., Messerer, D. A., Böckers, T. M., & Böckers, A. (2020). The use of a mobile learning tool by medical students in undergraduate anatomy and its effects on assessment outcomes. Anatomical Sciences Education, 13(1), 8–18. https://doi.org/10.1002/ase.1878 Guo, Z., & Barmaki, R. (2019). Collaboration analysis using object detection. In C. F. Lynch, A. Merceron, M. Desmarais, & R. Nkambou (Eds.), Proceedings of the 12th International Conference on Educational Data Mining (pp. 695–698). Educational Data Mining. https://drive.google.com/file/d/1yznkJQl-bkP1y5sIjRjm8RwaInXNqp-k/view Hackett, M., & Proctor, M. (2018). The effect of autostereoscopic holograms on anatomical knowledge: A randomised trial. Medical Education, 52(11), 1147–1155. https://doi.org/10.1111/medu.13729 Hansen, D. W., & Ji, Q. (2009). In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 478–500. https://doi.org/10.1109/TPAMI.2009.30 Harari, D., Tenenbaum, J. B., & Ullman, S. (2018, April 30). Discovery and usage of joint attention in images. ArXiv Preprint ArXiv:1804.04604. Cornell University. https://arxiv.org/abs/1804.04604 Herring, S. C. (Ed.). (1996). Computer-mediated communication: Linguistic, social, and cross-cultural perspectives. John Benjamins Publishing. https://doi.org/10.1075/pbns.39 Hirotani, M., Stets, M., Striano, T., & Friederici, A. D. (2009). Joint attention helps infants learn new words: Event-related potential evidence. Neuroreport, 20(6), 600–605. https://doi.org/10.1097/WNR.0b013e32832a0a7c Huang, K., Bryant, T., & Schneider, B. (2019). Identifying collaborative learning States using unsupervised machine learning on eye-tracking, physiological and motion sensor data. In C. F. Lynch, A. Merceron, M. Desmarais, & R. Nkambou (Eds.), Proceedings of the 12th International Conference on Educational Data Mining (pp. 318–323). Educational Data Mining. https://drive.google.com/file/d/1i_Ga8pmL_3R8aQOQBYKc6R4kvKN4FUMC/view Innes, J. E., & Booher, D. E. (2016). Collaborative rationality as a strategy for working with wicked problems. Landscape and Urban Planning, 154, 8–10. https://doi.org/10.1016/j.landurbplan.2016.03.016 Australasian Journal of Educational Technology, 2020, 36(6). 68 Kim, Y., D'Angelo, C., Cafaro, F., Ochoa, X., Espino, D., Kline, A., Hamilton, E., Lee, S., Butail, S., Liu, L., Trajkova, M., Tscholl, M., Hwang, J., Lee, S., & Kwon, K. (2020). Multimodal data analytics for assessing collaborative interactions. In M. Gresalfi, & I. S. Horn (Eds.), Proceedings of the 14th International Conference on Learning Sciences (Vol. 5, pp. 2547–2554). International Society of the Learning Sciences. https://repository.isls.org/bitstream/1/6619/1/2547-2554.pdf Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x Lemos, R. R., Rudolph, C. M., Batista, A. V., Conceição, K. R., Pereira, P. F., Bueno, B. S., Fiuza, P. J., & Mansur, S. S. (2019). Design of a Web3D serious game for human anatomy education: A Web3D game for human anatomy education. In A. L. Krassmann, É. Amaral, F. B. Nunes, G. B. Voss, & M. C. Zunguze (Eds.), Handbook of research on immersive digital games in educational environments (pp. 586–611). IGI Global. https://doi.org/10.4018/978-1-5225-5790-6.ch020 Li, Q., Lau, R. W. H., Shih, T. K., & Li, F. W. B. (2008). Technology supports for distributed and collaborative learning over the internet. ACM Transactions on Internet Technology, 8(2), 1–24. https://doi.org/10.1145/1323651.1323656 Lian, D., Yu, Z., & Gao, S. (2018). Believe it or not, we know what you are looking at! In C. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Lecture notes in computer science: Vol. 11363. Asian conference on computer vision (pp. 35–50). Springer. https://doi.org/10.1007/978-3-030-20893-6_3 Lipponen, L. (1999). The challenges for computer supported collaborative learning in elementary and secondary level: Finnish perspectives. In C. M. Hoadley & J. Roschelle (Eds.), Proceedings of the 1999 Conference on Computer Support for Collaborative Learning (pp. 519–528). Association for Computing Machinery. https://dl.acm.org/doi/10.5555/1150240.1150286 Luursema, J.-M., Verwey, W. B., Kommers, P. A., & Annema, J.-H. (2008). The role of stereopsis in virtual anatomical learning. Interacting with Computers, 20(4–5), 455–460. https://doi.org/10.1016/j.intcom.2008.04.003 Luursema, J.-M., Verwey, W. B., Kommers, P. A., Geelkerken, R. H., & Vos, H. J. (2006). Optimizing conditions for computer-assisted anatomical learning. Interacting with Computers, 18(5), 1123–1138. https://doi.org/10.1016/j.intcom.2006.01.005 Maresky, H., Oikonomou, A., Ali, I., Ditkofsky, N., Pakkal, M., & Ballyk, B. (2019). Virtual reality and cardiac anatomy: Exploring immersive three-dimensional cardiac imaging, a pilot study in undergraduate medical anatomy education. Clinical Anatomy, 32(2), 238–243. https://doi.org/10.1002/ca.23292 Marieb, E. N. (2015). Essentials of human anatomy & physiology laboratory manual (6th ed.). Pearson. Marín-Jiménez, M. J., Zisserman, A., Eichner, M., & Ferrari, V. (2014). Detecting people looking at each other in videos. International Journal of Computer Vision, 106(3), 282–296. https://doi.org/10.1007/s11263-013-0655-7 Markus, J., Mundy, P., Morales, M., Delgado, C. E., & Yale, M. (2000). Individual differences in infant skills as predictors of child-caregiver joint attention and language. Social Development, 9(3), 302– 315. https://doi.org/10.1111/1467-9507.00127 Meadows, L. A., Sekaquaptewa, D., & Paretti, M. C. (2015). Interactive panel: Improving the experiences of marginalized students on engineering design teams. In B. M. Holloway (Ed.), Proceedings of the ASEE Annual Conference & Exposition (Vol. 26, pp. 1–23). American Society for Engineering Education. https://doi.org/10.18260/p.24344 Meiksins, P., Beddoes, K., Layne, P., McCusker, M., & Camargo, E. (2015, November 30). Women in engineering: A review of the 2014 literature. Consulting – Specifying Engineer. https://www.csemag.com/articles/women-in-engineering-a-review-of-the-2014-literature/ Miura, G., & Okada, S. (2019). Task-independent multimodal prediction of group performance based on product dimensions. In W. Gao, M. Meng, M. Turk, S. R. Fussell, B. Schuller, Y. Song, & K. Yu (Eds.), Proceedings of the 2019 International Conference on Multimodal Interaction (pp. 264–273). Association for Computing Machinery. https://doi.org/10.1145/3340555.3353729 Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organizational Research Methods, 11(2), 364–386. https://doi.org/10.1177/1094428106291059 Mukherjee, S. S., & Robertson, N. M. (2015). Deep head pose: Gaze-direction estimation in multimodal video. IEEE Transactions on Multimedia, 17(11), 2094–2107. https://doi.org/10.1109/TMM.2015.2482819 Australasian Journal of Educational Technology, 2020, 36(6). 69 Murray, G., & Oertel, C. (2018). Predicting group performance in task-based interaction. In S. K. D’mello, P. Georgiou, S. Scherer, E. M. Provost, M. Soleymani, & M. Worsley (Eds.), Proceedings of the 20th ACM International Conference on Multimodal Interaction (pp. 14–20). Association for Computing Machinery. https://doi.org/10.1145/3242969.3243027 Ng, J., Hu, X., Luo, M., & Chu, S. K. W. (2019). Relations among participation, fairness and performance in collaborative learning with Wiki-based analytics. Proceedings of the Association for Information Science and Technology, 56(1), 463–467. https://doi.org/10.1002/pra2.48 Nicholson, D. T., Chalk, C., Funnell, W. R. J., & Daniel, S. J. (2006). Can virtual reality improve anatomy education? A randomised controlled study of a computer-generated three-dimensional anatomical ear model. Medical Education, 40(11), 1081–1087. https://doi.org/10.1111/j.1365- 2929.2006.02611.x Ochoa, X., Chiluiza, K., Méndez, G., Luzardo, G., Guamán, B., & Castells, J. (2013). Expertise estimation based on simple multimodal features. In J. Epps, F. Chen, S. Oviatt, K. Mase, A. Sears, K. Jokinen, & B. Schuller (Eds.), Proceedings of the 15th ACM on International Conference on Multimodal Interaction (pp. 583–590). Association for Computing Machinery. https://doi.org/10.1145/2522848.2533789 Okita, S., Bailenson, J., & Schwartz, D. (2008). Mere belief of social action improves complex learning. In P. A. Kirschner, J. J. van Merriënboer, & T. de Jong (Eds.), Proceedings of the 8th International Conference for The Learning Sciences (Vol. 2, pp. 132–139). International Society of the Learning Sciences (ISLS). https://repository.isls.org/bitstream/1/3144/1/132-139.pdf Otsuka, K., Kasuga, K., & Köhler, M. (2018). Estimating visual focus of attention in multiparty meetings using deep convolutional neural networks. In S. K. D’mello, P. Georgiou, S. Scherer, E. M. Provost, M. Soleymani, & M. Worsley (Eds.), Proceedings of the 20th ACM International Conference on Multimodal Interaction (pp. 191–199). Association for Computing Machinery. https://doi.org/10.1145/3242969.3242973 Patacchiola, M., & Cangelosi, A. (2017). Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recognition, 71, 132–143. https://doi.org/10.1016/j.patcog.2017.06.009 Pietinen, S., Bednarik, R., & Tukiainen, M. (2010, May). Shared visual attention in collaborative programming: A descriptive analysis. In Y. Dittrich, C. de Souza, M. Korpela, H. Sharp, J. Singer, & H. W. Theophilus (Eds), Proceedings of the 2010 ICSE Workshop on Cooperative and Human Aspects of Software Engineering (pp. 21–24). Association for Computing Machinery. https://doi.org/10.1145/1833310.1833314 Prinsen, F. R., Volman, M. L., & Terwel, J. (2007). Gender-related differences in computer-mediated communication and computer-supported collaborative learning. Journal of Computer Assisted Learning, 23(5), 393–409. https://doi.org/10.1111/j.1365-2729.2007.00224.x Rabbitt, P., Donlan, C., Watson, P., McInnes, L., & Bent, N. (1995). Unique and interactive effects of depression, age, socioeconomic advantage, and gender on cognitive performance of normal healthy older people. Psychology and Aging, 10(3), 307–313. https://doi.org/10.1037/0882-7974.10.3.307 Recasens, A., Khosla, A., Vondrick, C., & Torralba, A. (2015). Where are they looking? In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (pp. 199–207). Neural Information Processing Systems Foundation. http://papers.nips.cc/paper/5848-where-are-they-looking Rezende, E., Ruppert, G., Carvalho, T., Ramos, F., & De Geus, P. (2017). Malicious software classification using transfer learning of resnet-50 deep neural network. In X. Chen (Ed.), Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (pp. 1011–1014). IEEE. https://doi.org/10.1109/ICMLA.2017.00-19 Rosé, C., Wang, Y.-C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., & Fischer, F. (2008). Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. International Journal of Computer- Supported Collaborative Learning, 3(3), 237–271. https://doi.org/10.1007/s11412-007-9034-0 Santini, T., Fuhl, W., & Kasneci, E. (2017, May). CalibMe: Fast and unsupervised eye tracker calibration for gaze-based pervasive human-computer interaction. In G. S. Mark, C. Fussell, Lampe, M. C. Schraefel, J. P. Hourcade, C. Appert, & D. Wigdor (Eds.), Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 2594–2605). Association for Computing Machinery. https://doi.org/10.1145/3025453.3025950 Australasian Journal of Educational Technology, 2020, 36(6). 70 Schaf, F. M., Müller, D., Bruns, F. W., Pereira, C. E., & Erbe, H.-H. (2009). Collaborative learning and engineering workspaces. Annual Reviews in Control, 33(2), 246–252. https://doi.org/10.1016/j.arcontrol.2009.05.002 Schaie, K. W., & Willis, S. L. (1993). Age difference patterns of psychometric intelligence in adulthood: Generalizability within and across ability domains. Psychology and Aging, 8(1), 44. https://doi.org/10.1037/0882-7974.8.1.44 Schneider, B., & Pea, R. (2013). Real-time mutual gaze perception enhances collaborative learning and collaboration quality. International Journal of Computer-Supported Collaborative Learning, 8(4), 375–397. https://doi.org/10.1007/s11412-013-9181-4 Schneider, B., Sharma, K., Cuendet, S., Zufferey, G., Dillenbourg, P., & Pea, R. (2018). Leveraging mobile eye-trackers to capture joint visual attention in co-located collaborative learning groups. International Journal of Computer-Supported Collaborative Learning, 13(3), 241–261. https://doi.org/10.1007/s11412-018-9281-2 Silva, J. N., Southworth, M., Raptis, C., & Silva, J. (2018). Emerging applications of virtual reality in cardiovascular medicine. JACC: Basic to Translational Science, 3(3), 420–430. https://doi.org/10.1016/j.jacbts.2017.11.009 Soller, A., Lesgold, A., Linton, F., & Goodman, B. (1999). What makes peer interaction effective? Modeling effective communication in an intelligent CSCL. In S. E. Brennan, A. Giboin, & D. Traum (Eds.), Proceedings of 1999 AAAI Fall Symposium: Psychological Models of Communication in Collaborative Systems (pp. 116–123). AAAI Press. https://www.aaai.org/Papers/Symposia/Fall/1999/FS-99-03/FS99-03-017.pdf Spikol, D., Ruffaldi, E., & Cukurova, M. (2017). Using multimodal learning analytics to identify aspects of collaboration in project-based learning. In B. K. Smith, M. Borge, E. Mercier, & K. Y. Lim (Eds.), Proceedings of the 2017 Conference on Computer Support for Collaborative Learning (pp. 263–270). International Society of the Learning Sciences. https://repository.isls.org/bitstream/1/240/1/37.pdf Subburaj, S. K., Stewart, A. E. B., Ramesh Rao, A., & D’Mello, S. K. (2020). Multimodal, multiparty modeling of collaborative problem solving performance. In K. Truong, D. Heylen, M. Czerwinski, N. Berthouze, M. Chetouani, & M. Nakano (Eds.), Proceedings of the 2020 International Conference on Multimodal Interaction (pp. 423–432). Association for Computing Machinery. https://doi.org/10.1145/3382507.3418877 Sung, H.-Y., & Hwang, G.-J. (2013). A collaborative game-based learning approach to improving students’ learning performance in science courses. Computers & Education, 63, 43–51. https://doi.org/10.1016/j.compedu.2012.11.019 Tao, C., Zhang, Q., & Zhou, Y. (2019). Collaborative learning with limited interaction: Tight bounds for distributed exploration in multi-armed bandits. In L. O’Conner (Ed.), Proceedings of the 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (pp. 126–146). IEEE. https://doi.org/10.1109/FOCS.2019.00017 van der Meulen, H., Varsanyi, P., Westendorf, L., Kun, A. L., & Shaer, O. (2016). Towards understanding collaboration around interactive surfaces: Exploring joint visual attention. In J. Rekomoto, T. Igarashi, J. O. Wobbrock, & D. Avrahami (Eds.), Proceedings of the 29th Annual Symposium on User Interface Software and Technology (pp. 219–220). Association for Computing Machinery. https://doi.org/10.1145/2984751.2984778 Van Rheden, V., Maurer, B., Smit, D., Murer, M., & Tscheligi, M. (2017). LaserViz: Shared gaze in the Co-located physical world. In M. Inakage, H. Ishii, E. Y. Do, J. Steimle, O. Shaer, K. Kunze, & R. Peiris (Eds.), Proceedings of the Eleventh International Conference on Tangible, Embedded, and Embodied Interaction (pp. 191–196). Association for Computing Machinery. https://doi.org/10.1145/3024969.3025010 Vrzakova, H., Amon, M. J., Stewart, A. E. B., & D’Mello, S. K. (2019). Dynamics of visual attention in multiparty collaborative problem solving using multidimensional recurrence quantification analysis. In S. Brewster, G. Fitzpatrick, A. Cox, & V. Kostakos (Eds.), Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–14). Association for Computing Machinery. https://doi.org/10.1145/3290605.3300572 Wahn, B., Schwandt, J., Krüger, M., Crafa, D., Nunnendorf, V., & König, P. (2016). Multisensory teamwork: Using a tactile or an auditory display to exchange gaze information improves performance in joint visual search. Ergonomics, 59(6), 781–795. https://doi.org/10.1080/00140139.2015.1099742’ Webb, N. M., Troper, J. D., & Fall, R. (1995). Constructive activity and learning in collaborative small groups. Journal of Educational Psychology, 87(3), 406–423. https://doi.org/10.1037/0022- 0663.87.3.406 Australasian Journal of Educational Technology, 2020, 36(6). 71 Wegge, J., Roth, C., Neubach, B., Schmidt, K.-H., & Kanfer, R. (2008). Age and gender diversity as determinants of performance and health in a public organization: The role of task complexity and group size. Journal of Applied Psychology, 93(6), 1301–1313. https://doi.org/10.1037/a0012680 Wendler, D. (2006). One-time general consent for research on biological samples. BMJ: British Medical Journal, 332(7540), 544–547. https://doi.org/10.1136/bmj.332.7540.544 Whalen, C., & Schreibman, L. (2003). Joint attention training for children with autism using behavior modification procedures. Journal of Child Psychology and Psychiatry, 44(3), 456–468. https://doi.org/10.1111/1469-7610.00135 Yammine, K., & Violato, C. (2015). A meta-analysis of the educational effectiveness of three- dimensional visualization technologies in teaching anatomy. Anatomical Sciences Education, 8(6), 525–538. https://doi.org/10.1002/ase.1510 Yücel, Z., Salah, A. A., Meriçli, Ç., Meriçli, T., Valenti, R., & Gevers, T. (2013). Joint attention by gaze interpolation and saliency. IEEE Transactions on Cybernetics, 43(3), 829–842. https://doi.org/10.1109/TSMCB.2012.2216979 Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2det: A single-shot object detector based on multi-level feature pyramid network. In P. Stone, P. V. Hentenryck, & Z. H. Zhou (Eds.), Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 9259–9266). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33019259 Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In R. Chellappa, B. Kimia, S. C. Zhu, S. Belongie, A. Blake, J. Luo, & A. Yuille (Eds.), Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2879–2886). IEEE. https://doi.org/10.1109/CVPR.2012.6248014 Zou, K. H., Tuncali, K., & Silverman, S. G. (2003). Correlation and simple linear regression. Radiology, 227(3), 617–628. https://doi.org/10.1148/radiol.2273011499 Corresponding author: Roghayeh Barmaki, rlb@udel.edu Copyright: Articles published in the Australasian Journal of Educational Technology (AJET) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant AJET right of first publication under CC BY-NC-ND 4.0. Please cite as: Guo, Z., & Barmaki, R. (2020). Deep neural networks for collaborative learning analytics: Evaluating team collaborations using student gaze point prediction. Australasian Journal of Educational Technology, 36(6), 53-71. https://doi.org/10.14742/ajet.6436