Acta Polytechnica CTU Proceedings doi:10.14311/APP.2017.12.0083 Acta Polytechnica CTU Proceedings 12:83–93, 2017 © Czech Technical University in Prague, 2017 available online at http://ojs.cvut.cz/ojs/index.php/app AN INTELLIGENT CO-DRIVER SURVEILLANCE SYSTEM Mădălina Toma, Mirela Popa, Leon Rothkrantza, b, ∗ a Intelligent Interaction, Delft University of Technology, Mekelweg 4, Delft, The Netherlands b Faculty of Transportation Sciences, Czech Technical University in Prague, Konviktská 20, Prague 1, Czech Republic ∗ corresponding author: L.J.M.Rothkrantz@tudelft.nl Abstract. In recent years many car manufacturers developed digital co-drivers , which are able to monitor the driving behaviour of a car. Sensors in the car measure if a car passes speed limits, leaves its lane, or violates other traffic rules. A new generation of co-drivers is based on sensors in the car which are able to monitor the driver behaviour. Driving a car is a sequence of actions. In case a driver doesn’t show one of the actions the co-driver generates a warning signal. Experiments in the car simulator TORC were performed to extract the actions of a car driver. These actions were used to develop probabilistic models of the driving behaviour. A prototype of a warning system has been developed and tested in the car simulator. The experiments and test results will be reported in this paper. Keywords: behaviour analysis, car simulator, surveillance, Bayesian reasoning. 1. Introduction In recent years we can observe an increase of invest- ments of car producers and researchers in the devel- opment of self-driving cars. The first prototypes of such cars have been tested. However, there is a long way to go before such cars appear on the highways and in the cities. It can be expected that at least for the coming 10 years cars with a human driver will prevail. Also in the regular driver-car interaction a lot of automatization will take place. Navigation systems support the driver to find the shortest route from his starting point to its destination. Smart interfaces support the drivers to use their phones, radio and other digital devices in their cars. This paper focuses on the development of systems increasing the safety of the car driver and other road users. At Delft University of Technology there is a project running on the development of surveillance angels and guardian angels [1], [2]. One of the applications is a digital automated car driver assistant. Such an assistant is able to supervise the driving behaviour of a car driver. The system permanently receives the position of the steering wheel, position of the pedals, gear lever, but also information of body movements of the driver, its facial expression and speech and information from the environment. This information is provided by sensors in the car and sensors attached to car parts and (manual-) instruments. The goal of the system is to make assessments of the possible dispositions/situation of the car driver, that is to say a semantic interpretation of the driving behaviour. In case the driver overlooks important information, violates traffic rules or driving rules, the system will generate an alert or support the driver. The next step is that the car takes over the driving of the car, however this is postponed to future work. The full model is based on information of the car driver, but also on the environment and state of the car. From the movements of the car driver we make assessments of the state of the car driver, his goals and intentions. Information on the state of the driver can also be assessed by physiological measures and EEG measurements [3],[4]. This study focuses on the assessment of the visual behaviour of the car driver. A car driver can show many body movements. We can order them in specific categories: • Core driving movements. These movements are related to driving a car. • Peripheral movements. These movements are re- lated to interaction with peripheral devices in the car such as radio, phone, navigation devices. • Signal movements. These movements express some message from the driver about his physical or emo- tional state (stretching the arms, showing a fist) or messages to other drivers on the road. • Random movements. These movements have no specific semantic meaning related to car driving such as scratching the nose. Core driving movements can be composed of one sin- gle action or a sequence of actions. Take, for example, the actions of the car driver such as starting a car, parking, speeding up, slowing down, taking a turn. All these actions are characterized by a sequence of information of the car sensors. After an action the next step is to classify the action. It is necessary to realise that the classification is context sensitive. The classification is usual an ambiguous probabilistic pro- cess. In case of an ongoing sequence of actions we have to predict the future and compute the probability of possible next steps. It can also happen that a driver 83 http://dx.doi.org/10.14311/APP.2017.12.0083 http://ojs.cvut.cz/ojs/index.php/app M. Toma, M. Popa, L.J.M. Rothkrantz Acta Polytechnica CTU Proceedings skips one or more action on purpose or by mistake. Take, for example, the case when a driver wants to take over but does not view the mirror to check if the next lane is free. This can result in a dangerous situa- tion and an alert has to be generated. In this paper a rule based system and a Bayesian reasoning system will be used to compute the probability of missing or future steps in an action string. By experiments using a driving simulator all possible sequences of actions are computed as well as corresponding probabilities of those actions. In the next section we discuss related work. In section 3 a model of a digital co-driver is pre- sented. In section 4, experiments with the car driver simulator TORC are discussed. Then we present an analysis of the recorded data in section 5. This paper is concluded with a final discussion. 2. Related work According to the statistics published by the National Highway Traffic Safety Administration in 2005 a big number of collisions (78%) and near-collisions (65%) are associated with driver inattention. Distractions caused by secondary tasks, such as manipulation of the navigation system, radio, or phone, seems to be the main source of the inattention [5]. In [6] Merat and Jamson present their findings regarding driver’s ability to respond to sudden and unexpected events while driving under normal conditions and also while performing other tasks. Their conclusions highlight that the reaction time increases with 200ms when the driver is using in-car systems. Head pose estimation reveals important information about the driver’s focus of attention. In [7] Doshi and Trivedi present their findings regarding head dynamics and eye gaze as im- portant cues in predicting driver’s intent of changing lanes. Regarding human pose estimation, currently there are various representation methods of a human in an image, such as silhouettes [8], bounding boxes, or sticks representing the limbs of a person [9][10]. One of the most popular approaches is using the pictorial structures model which stands as a collection of human body parts [11]. Building intelligent systems have become a trend in current researches. In driving domain, they have been called advanced driver assistance systems [12]. According to Adarsha [13], these systems support the driver’s sensing ability and they are able to track and detect the error/lapses of drivers. Assistive intelligent systems are usual a combination of several tools. For example, Benoit, [14] developed in his work an assis- tive driving system for assessing the driver’s fatigue. His system uses two platforms, namely OpenInterface developed in C++ for signal processing and ICARE. ICARE is a conceptual component model for multi- modal input/output interaction. Thus, the Benoit’s system combines multimodal signal processing analy- sis with a multimodal interaction. Our system can be built on a similar system architecture, and it tries to enhance the novice driver’s skills. Many other intelligent systems have been developed [15]. The cognitive model of the driver had been inves- tigated in order to implement driver’s mental activity in this system. The driver’s mental activity is ana- lyzed when a driver interacts dynamically with the driving environment. Therefore, the driver mental representation of a situation can be illustrated as an instance in the working memory. We also investigate in building a cognitive model of drivers. We analyzed driver’s mental activity as a result of the following ac- tivities: gaze and head activities related to the traffic situation, the car controller (steering wheel, pedals, and gear shifter) related to body parts movements, and states of the car on the road. Other intelligent systems have been implemented based on a stochastic model. In these cases, driver behavior has been modeled using collected data from the driving simulator. Liang used this model in his work [16], to detect driver distraction in real-time. He modeled driver behavior based on parameters ex- tracted from driver’s eye movements, states of the steering wheel and car position on the lane. In a similar way Giusti [17] detected driver sleep-attacks by using acquired data from the steering wheel. Wang and Gao estimated in [18] the state of a car on the road with a rule-based expert system. The expert system typically provides the intelligence of the system, and it fits very well with the requirements of our system. According to the last decade researches, an expert system can be classified in the following six methodologies: rule-based systems, knowledge-based systems, intelligent agent (IA), database methodology, inference engine, system-user interaction. A basic rule-based expert system has the following entities: a knowledge base which contains data from experts, an inference engine which contains the evaluation of decision making components of the system. We es- timated the driver’s intent with a rule-based expert system. Moreover, our framework developed with a rule-based expert system reasons like a driver provid- ing full assistance for novice drivers in terms of the driving skills enhancement. Fletcher proposed an assistive driver system [19], which monitors driver activity using vision sensors. The set of sensors used in Fletcher’s system tracks the eye movements and the body parts of the driver. However, it is difficult to estimate driver intentions from the data of the visual sensors. Therefore, our system estimates driver’s intention by correlating data from visual sensors with states of cars and information about the driving environment. The gaming industry provides driving simulations which simulate 3D traffic environments. If a simulation of a 3D traffic envi- ronment is used, the data about a driving situation can be read easily. In our work, we used TORCS simulator because it can be extended and adapted to the proposed system. This simulator is an excellent 84 vol. 12/2017 An intelligent co-driver surveillance system open source tool, and it gave us an opportunity to configure the tracking sensors and the car controller. In prior research, we tried to determine the driver’s interaction intent by analysing only single body parts, e.g. pose estimation [20], gaze detection, or facial ex- pression. Compared with these, we take into account more aspects and actions, such as body postures, gaze direction, and head orientation during interaction with the car and the surrounding environment. A lot of in- telligent systems that provide assistance to the drivers were presented above. Some of them present good solutions. Still an intelligent system with a complex reason which trains the novice driver’s skills need to be developed. Thus, we present the implementation of an intelligent system that has the capability to assist a novice driver and it tries to enhance his skills at the same time. 3. Model In figure 1 we display a model/architecture of our digital co-driver system. The behaviour of the driver is assessed by parallel sensor systems: KINECT, gaze tracker, car-driver interaction tools. The sensory in- put system has been configured and synchronized with clock cycle of TORCS tool in order to track the driver activities. The TORCS simulator is extended on three modular levels of abstraction. In the first level, the system detects sequences of postures. The driver ac- tion can be recognized from sequence of postures in the second level. In the last level, a driver’s intention is estimated based on the driver’s actions related to the traffic situation. The set of possible intentions had been stored in the database. When a driver’s intention is wrong the proposed framework sends feed- back alarms. A GUI Interface has been built in order to allow monitoring the functionality of the whole system and also driver’s activity. The main reasoning module of the system imple- ments the intelligence of the system, and it is responsi- ble to predict driver’s intent. In one of the developed prototypes a rule based system was used to assess the most probable scenario. Every rule has a tally attached to it to indicate the probability or impor- tance of the rule. Some scenarios are composed of one single action. An observed action triggers the most probable rule. This activated rule is allowed to fire and the right hand site of the if-then rule is generated. Some scenarios consist of only one action. If this is a dangerous driving action an alert has been generated by a rule. An example is a driver who doesn’t look at the road in front of him. In other cases observed action(s) trigger the most probable scenario and then specific rules check if important actions are missing or dangerous actions can be expected. An example is a car driver switching on his signal lights to change his lane who did not check of the next lane is free to insert. The problem with rule based systems is that all the rules have to be defined in advance. So the system is not prepared for new, unexpected situa- tions. Another problem is that sensors are not able to assess all the ongoing actions. These errors in the sensor observation can be caused by failing technology or because vision field of sensors is occluded. The actions are ambiguous or the lighting conditions are too bad. A gaze tracker, for example, usually misses a significant amount of actions. By sudden, large movements of the head the gaze tracker has to be calibrated again. Nevertheless, the biggest problem to solve is that actions are usually distributed in time and that some car drivers start actions simultaneously. It is necessary to realize that the proposed system will be used as a decision support system. Ultimately car drivers themselves remain responsible for their actions. To improve the performance of the assessment of the actions a Bayesian probabilistic system will be used. To compute the values of the entries in conditional probabilities tables experts are needed to set these val- ues or huge amount of data generated in experiments testing the system is needed to compute the values. The robustness of the system is increased with a fusion method based on multiple sensors compared with a fusion method based on a single sensor. Multi-sensor data fusion allows the combination of information with different physical characteristics to enhance the understanding of the driver’s action. The information read from the set of sensors are fundamental in the decision-making module of the system. Our fusion method seeks to combine information from multiple sensors. Table 1 presents a hierarchy of driver activity in a bottom-up way, and it represents the foundation of our system architecture . The first level needs to detect the driver postures, and then sequences of pos- tures define the driver action. In the third level, one action or a set of actions predict the driver’s intention. A human intention can be defined as an anticipated outcome that guides planned actions. It is the goal or purpose behind an action or a set of actions that a person is following. Driver’s intention can be pre- dicted from the driving actions. Driving actions in a scenario can be recognized based on a set of postures performed over time. Thus, the system architecture depicted in Figure 1 defines an assistive car driving system based on multi-steps of processing: tracking driver behavior, predicts driver’s future behavior, de- cide if the driver’s intention is correct or not, and issue corrective feedback. 4. Simulation environment The main tool in our system is TORCS simulator. It is one of the most popular 3D open source car simulators written in C++ and available under GPL license. It can be used as an ordinary car racing game, as an AI racing game. The main advantage of the software is that we could extend the platform by configuring inside all sensors used for tracking the driver’s behavior. We selected TORCS in our research for the following reasons. TORCS is an advanced fully 85 M. Toma, M. Popa, L.J.M. Rothkrantz Acta Polytechnica CTU Proceedings Figure 1. Architecture of the digital co-driver system. Levels of driver activity Driver interaction with car controller and traffic situations Driving intent interaction: is a mental state of any driver who exe- cutes an action or set of actions into a driving situation based on driving envi- ronment and the car states. Driver’s actions: are made up of multiple body-part ges- tures such as eyes, head, arms, legs motion, and each body-part gesture is an elementary event of motion and can be composed of a sequence of instanta- neous poses at each moment of time. Driver’s postures: are instantaneous configuration of the body part in the space of interaction. Table 1. Hierarchy of driver activity. customizable simulator, and it can be adapted for our application; it features a sophisticated physics engine as well as a 3D graphics engine for the visualization of the virtual environment; and it has a modular software architecture, hence the new controlling and sensing devices allow a straightforward integrating. According to the features of the proposed system, we design tracks which simulate some driving scenarios. Therefore, the creation of a customized track involves the use of other tools. In the first step, we used the Trackeditor tool for creation a track. It can design the tracks adding straight or curves and some parameters can be configured such as length, radius or banking. All of this information can be stored in an XML file. In addition, the Trackeditor tool generates a file with extension. AC, and this file stores the 3D description of the track. The 3D description of the track was edited with the Blender Tool. Blender tool allows us to add elements such as traffic signs and to insert textures. We added billboards to one of the tracks. Looking at the driver and understanding his actions is the first task of our system. We used a marker-less system to track upper body activity, and the driver’s upper limb motion can be tracked by the Kinect device. This device has been developed by Microsoft as a game console input device. Comparing with other video cameras [21] used for recognizing human’s movement activities, it tracks the motion of the subject through a combination of hardware and software technologies and achieves a high accuracy tracking of the body with a rate of 30 FPS (frames per seconds). In our system, we used for eye movements and head orientation the EyeLink II device produced by the Canadian SR Research, which uses pupil and cornea reflection tracking mode. This system provides a comprehensive tracking based on smart vision sensors which consist of a head-mounted camera system and two PCs for processing data and running experiments. 86 vol. 12/2017 An intelligent co-driver surveillance system On the head-mounted device, both left and right eye pupil position and the head orientation relative to computer monitor can be tracked. Combining the position of the head with the pupil movement relative to the screen, enabled recording of the gaze direction. Sensors for physical driver’s interaction. Our proposed system (see figure 2, figure 3 and fig- ure 5) presents the architecture of a Driver-Computer Interaction system. The driver can interact naturally with the simulated traffic environments through phys- ical input/output interaction interfaces by bridging the gap between the digital and physical world. Figure 2. Car driving simulator. Figure 3. Typical scene from the curved race track with billboards. Figure 4. The curved race track driven by the par- ticipant. Markers indicate locations of the billboards, if enabled. We used as an input interface a Logitech G27 joy- stick controller based on a steering wheel, a gear shifter and three pedals (clutch, brake and throttle). It is similar to the usual basic control system from a cock- pit car and the driver’s actions can be performed in the same way as in the real car. The driving scenes are displaying on an output inter- face which is a large screen like a real car windshield. We used a TV screen with 56 inches size diameter as a large screen. Four optical sensors of head camera from the Eyelink II device had placed in the four corners of the screen. 5. Experiments In this section we discuss two experiments testing re- spondents in our driving simulator TORC as displayed in figure 5. Several devices were used to assess the actions of the driver. In section 5.2 we discuss the assessment procedures using the KINECT device. But first we discuss the experimental results of assessment of driving actions using the car sensors as discussed in section 3. Data captured with video cameras and gaze tracker has been captured with different sampling rates. The same holds with data from the KINECT sensor. Recorded data is redundant, full of errors and missing data. The different data streams have to be fused. Dif- ferent types of data fusion using multiple, multimodal streams have been discussed in [22]. Figure 5. Test person in action in the driving simu- lator. 5.1. Experiment 1 Detection behaviour actions In total, there were 23 students invited to take part in the driving experiment. They were supposed to drive for two hours, one hour on a simple trajectory and one hour on a curved trajectory. They had to drive with different speed, normal speed and as fast as possible. Along the simple trajectory we placed some billboards. There were other car drivers on the road driving both in a low and high speed speed. This forced our test persons to take over at regular times. We were interested in isolated driving actions or scenarios composed of single driving actions. In figure 6 we show a state diagram of all possible states of actions and transition between states. During driving, the students got phone calls or visual messages about routing. The billboards along the road caught the car driver’s attention. 87 M. Toma, M. Popa, L.J.M. Rothkrantz Acta Polytechnica CTU Proceedings Figure 6. The state diagram describing the driver intention and rules used on each transition. 5.1.1. Analysis experiment 1 The goal of our system is to generate alerts in case of dangerous driver’s behaviour. We analyse single ac- tions and scenarios composed of actions. A car driver is supposed to watch the road all the time. In case he/she is looking left or right for some time, looking to the billboards of traffic signs for a long time or looking at their telephone or routing devices for a long time, a dangerous situation can occur. Assessing the gaze direction via the gaze tracker is rather complicated. Fast movements over large view angles were difficult to track. Sensors in the car detected if the car left its lane. In the next section we discuss our experiments using KINECT to assess the position of the head and gaze direction. Parallel to the single action analysis we researched possible scenarios. As soon as the action has been detected we find the position in the state diagram and research if this state is the starting point of a scenario or a point within a scenario. If an important preceding action is missing an alert can be generated. In case of the take-over scenario it can happen that the driver switch on the signal lights but monitoring the side mirror is not detected. To reduce the amount of false alarms we used a probabilistic approach. States in the state diagram and transition between states have probability numbers depending on the frequencies in all recordings during the experiment. We choose the most probable states and transition path to compute the possibility of an alert. We tested 23 drivers during 2 hours (high-low speed). On average, every minute a single action or a scenario of average 3 actions of the driver was required or generated. In total we observed 16.500 single actions and 48.230 single action in scenarios (see table 2). We focused on the actions with most frequent errors. 5.2. Experiment using KINECT This section presents one of the first digital co-driver experiments [23]. We present adapted behavioral mod- els, which are based on our driving experience and also on information gathered from experts in the field. We assess both head pose and body pose in terms of orientation and hands position in relation to the car environment. We defined normal vs. dangerous driver’s behavior, on several levels of importance. Normal driver’s behavior implies both hands on the steering wheel and the head orientation frontal. There are variations from these rules, such as: head orientation can be left or right for a small amount of time in case of lane changing or observation of the surroundings, etc. The dangerous behavior appears in case the driver keeps looking in any other direction than frontal for a long period of time. Regarding hands position, the driver is supposed to have at least one hand on the steering wheel, while the other one can rest, be used for changing the gears or 88 vol. 12/2017 An intelligent co-driver surveillance system High speed Low speed Actions/Scenarios Frequency errors Correct Alert Frequency Correct alert Looking around 121 72% 276 81% Switching lanes 322 68% 158 77% Giving priority 220 75% 118 87% Speed limit 296 92% 88 94% Table 2. Frequency of detected actions generated alerts. for manipulating the radio or the navigation system. If the driver doesn’t keep his hands on the steering wheel for a long period of time, it is considered to be a dangerous action. Furthermore, if besides having no hands on the steering wheel, the driver is also looking in another direction we consider this behavior very dangerous and the system will generate an alert to notice the driver. A very important aspect taken into consideration while defining the behavioral models is the temporal evolution of an action. Under normal circumstances, stretching arms (no hands on the steering wheel) or looking in another direction, these activities are per- formed by a driver without any serious consequences. Still, in case any of these actions are performed for a longer period of time they might affect the level of concentration and of the driver’s attention. What is more, in case of an unexpected or sudden event, the driver’s speed of reaction might not be high enough, which could lead to an accident. Figure 7. Architecture of a digital car driver system. KINECT software module was used for pose esti- mation. The output of the pose estimation module, consisting of the location and orientation of each body part (see figure 8) was used to assess the relation between the different body parts (e.g. arms relative to the torso, lower arms relative to upper arms). We computed the angles between the different body parts on a frame basis and also for an image sequence. The angles between head, torso and the vertical axis to- gether with the one between head and torso provide relevant information regarding head and body posi- tion.We considered three basic orientations of the head relative to the body: straight, left, and right. The goal of our research is to test the possibility to train the basic car driver scenarios using a driving simulator and serious gaming. In table 4 we displayed Figure 8. Model of the upper body used for pose estimation. Figure 9. Hierarchy of driver activity. Figure 10. Incorrect detection of left arm. 89 M. Toma, M. Popa, L.J.M. Rothkrantz Acta Polytechnica CTU Proceedings Figure 11. Motion angles pattern for two actions: (a) hands on the steering wheel (b) hands raised . some of the basic scenarios. Every scenario is com- posed of a sequence of basic actions. We selected the following basic actions: • Browsing (moving position of head, looking into the mirror, inspecting the routing device or radio) • Tuning (radio, mobile phone etc.) • Activating (touching eyes, mouth, pulling ears etc) • Selecting (putting the gear by hand in the requested position) • Waving (putting left/right hand up) • Driving (turning the steering wheel). We evaluated the performance of the body pose estimation module on the driving recorded data, by inspecting visually the accuracy of each detection (see figure 9 and figure 10), and we achieved 89% correct upper body detections and 58% correct upper body parts estimation. Kinect proved to be very succesfull with limb detec- tion and person tracking. Another important thing about Kinect’s limb tracking is that Kinect is aware of occlusion and for example if one of the arms is missing, the stick configuration shows that a limb is missing. In short two main problems that are faced with Kinect and the video tool are undetected limbs (see table 5) and frames with no detection at all respectively (see table 6). The detection heavily depends on the exposed action. In table 3, we show the average missing rate. The car environment can be divided into a number of regions of interest for the driver. The most impor- tant and used ones are the ’steering wheel’ (1) and the ’gears’ (2) regions, followed closely by the ’contact’ (3) and ’brake’ (4) regions. Other secondary regions are the ’navigation system’ (5), the ’radio’ (6) and the ’drawer’ (7) ones. A visual representation of the defined regions can be found in Figure 12. Figure 12. Regions of interest inside a car. We included a graphical division of the inside the car environment into regions of interest. The driver’s interaction with each object inside the car can be assessed by determining hands’ position inside a region. This information is used in combination with the motion pattern analysis in order to extract a first semantical interpretation of the possible action of the driver. Movements characterized by a certain speed and amplitude, inside a specific region of interest are very likely to depict a certain type of action. For example movements having a low speed and a small amplitude associated with the ’steering wheel’, ’radio’ or ’navi- gation system’ regions of interest are very probable to correspond to driving, manipulating the radio or the navigation system actions. On the other hand, movements with a high speed and a large amplitude inside the ’drawer’ or the ’gears’ regions are associated with picking an item from the drawer or changing the gears actions. The ROIs, the transitions between the different ROIs, the associated type of movements, and the probable behavioural interpretation are presented in Figure 13. The reasoning step was implemented using a rule- based system which received as input, features from the previously described modalities (body pose esti- mation, face detection, and regions of interest assess- ment). On a time basis the extracted features were observed and using the rules containted in the state- based model we were able to distinguish normal from potential dangerous and dangerous driving behavior. Conclusion regarding the driving behavior was gener- ated. We employed the state-based model depicted in Figure 13. 90 vol. 12/2017 An intelligent co-driver surveillance system Browsing Tuning Activating Selecting Waving Driving Browsing 23 22 55 0 0 0 Tuning 0 95 5 0 0 0 Activating 5 10 77 3 0 5 Selecting 0 17 0 83 0 0 Waving 0 0 0 14 85 0 Driving 25 0 25 0 0 50 Table 3. Confusion matrix of selected driving actions. Driver action Behavior detected by KINECT,video and car sensors S1 Starting the car 1.Check if the gear is in neutral position Hand movements, Movements of head (turn down) 2.Start the car Hand motion 3.Press the throttle a little bit Movements of right leg S2 Driving away 1.Start the car See S1 2.Press the clutch to the floor Movement of left leg 3.Enter the gear shift in position 1 Hand movement (movement of the head) 4. Press the throttle pedal Movement of right leg 5.Release the clutch Movement of left leg S3 Driving away from parked position 1.Start the car See S1 2.Look in the inner and outer mirror if next lane is free Turning the head right/left 3.Switch on the left signalling light Movement of hand 4.Press the clutch to the floor Movement of left leg 5.Enter gear shift in position 1 Movement of hand, Movement of leg 6.Press slowly the throttle pedal Movement of right leg 7.Turn the steering wheel to the left or right Hand movement 8.Turn the steering wheel to the neutral position after reaching the next lane Hand movement Table 4. Scenarios and behavioural cues. Total number of frames KINECT 16% 574 Video-tool 32% 574 Table 5. Rate of frames without detected limbs. 91 M. Toma, M. Popa, L.J.M. Rothkrantz Acta Polytechnica CTU Proceedings Upper right arm Upper left arm Lower right arm Lower left arm Browsing 12 74 20 74 Tuning 10 2 12 2 Activating 2 0 2 0 Selecting 12 7 12 7 Waving 16 75 21 75 Driving 17 20 17 20 Table 6. Kinect rate of frames without detected limb relative to all frames in the dataset. Figure 13. State based model. We used 3 seconds as a threshold for the critical duration of a secondary action or body part orienta- tion, such as head orientation, lower arms position, or body pose orientation. 6. Conclusions In the framework of designing a digital co-driver sys- tem, we performed a study to assess the actions of a driver. Based on the experiments in a car simulator it was shown that the action of the car driver can be assessed in more than 80% of the cases. But we stress the fact that the results are based on the experiments in a car simulator using students as testpersons doing their best to drive according to the rules. In real life situations the results could be quite different. We expect that in real life situations the recognition rate could be quite lower because of bad lighting conditions in the car, occlusions, position of the car driver. That is the reason that we used multiple multimodal sensors. This implies sensor fusion. To assess the driver actions we used a probabilistic rule based system. The limitation of such a system is that all possible (string of) actions have to be defined in advance. Unexpected rare actions will not be detected. This is also caused by the fact that our system selects the most probable actions. In case that our system has to detect some actions with a high priority to send an alert such actions have to be labeled with a higher priority in our system. The designed system can be used as a decision support system in current cars. The car has to be equipped with some sensors and a microprocessor to process the data and generating alerts. The results can also be used in the design of self driving cars in the near future by modelling a digital car driver according to a human model. The basic positions of individual body parts, along with the motion temporal patterns and the corre- sponding regions of interest were fused in order to draw a conclusion regarding the driver’s possible type of activity. References [1] L. Rothkrantz. Surveillance angels. Neural Network World 24:1–25, 2014. [2] L. Rothkrantz. Smart surveillance systems, network topology. In Command and Control: Organization, Operation, and Evolution: Organization, Operation, and Evolution, pp. 270–290. 2014. [3] P. Bouchner, M. Hajný, S. Novotný, et al. Car simulation and virtual environments for investigation of driver behavior. In Proceedings of the 7th WSEAS International Conference on Automatic Control, Modeling and Simulation, ACMOS’05, pp. 523–530. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA, 2005. [4] M. Haak, S. Bos, S. Panic, L. Rothkrantz. Detecting stress using eye blinking and brain activity from eeg. In Proceedings of the 1st Driver Car Interaction and Interface, DCII’08, pp. 35–60. 2008. [5] V. Neale, T. Dingus, S. Klauer, et al. An overview of the 100-car naturalistic study and findings, national highway traffic safety administration. [6] N. Merat, A. H. Jamson. Multisensory signal detection: How does driving and ivis management affect performance? In Proceedings of the 4th International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, p. 351Ű357. 2007. [7] A. Doshi, M. M. Trivedi. On the roles of eye gaze and head dynamics in predicting driver’s intent to change lanes. Trans Intell Transport Sys 10(3):453–462, 2009. doi:10.1109/TITS.2009.2026675. [8] V. Ferrari, M. Martin-Jimenez, A. Zisserman. 2d human pose estimation in tv shows. In Proceedings of the Dagstuhl Seminar on Statistical and Geometrical Approaches to Visual Motion Analysis. 2009. [9] P. Bouchner. Car simulation and virtual environments for investigation of driver behavior. Neural network World 15(2):149–163, 2005. 92 http://dx.doi.org/10.1109/TITS.2009.2026675 vol. 12/2017 An intelligent co-driver surveillance system [10] M. Andriluka, S. Roth, B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009. [11] H. I., H. D., D. L. S. Ghost Ű a human body part labeling system using silhouettes. In Proceedings of the 14th International Conference on Pattern Recognition. 1998. [12] H. Winner, S. Hakuli, F. Lotz, C. Singer (eds.). Handbook of Driver Assistance Systems: Basic Information, Components and Systems for Active Safety and Comfort. Springer, Cham, 2016. doi:10.1007/978-3-319-12352-3. [13] R. Adarsha, V. Kumar, K. Ganesan. Low cost driving trainer assistance system. Journal of Transportation Technologies 2(1):63–66, 2012. doi:10.4236/jtts.2012.21007. [14] A. Benoit, L. Bonnaud, A. Caplier, et al. Multimodal signal processing and interaction for a driving simulator: Component-based architecture. Journal on Multimodal User Interfaces 1(1):49–58, 2007. doi:10.1007/BF02884432. [15] T. Bellet, B. Bailly-Asuni, P. Mayenobe, A. Banet. A theoretical and methodological framework for studying and modelling driversŠ mental representations. Safety Science 47(9):1205 – 1221, 2009. Research in Ergonomic Psychology in the Transportation Field in France, doi:http://dx.doi.org/10.1016/j.ssci.2009.03.014. [16] Y. Liang, M. L. Reyes, J. D. Lee. Real-time detection of driver cognitive distraction using support vector machines. IEEE Transactions on Intelligent Transportation Systems 8(2):340–350, 2007. doi:10.1109/TITS.2007.895298. [17] A. Giusti, C. Zocchi, A. Rovetta. A noninvasive system for evaluating driver vigilance level examining both physiological and mechanical data. Trans Intell Transport Sys 10(1):127–134, 2009. doi:10.1109/TITS.2008.2011707. [18] J.-H. Wang, Y. Gao. Multi-sensor data fusion for land vehicle attitude estimation using a fuzzy expert system. Data Science Journal 4(1):127–139, 2005. [19] L. Fletcher, L. Petersson, A. Zelinsky. Driver assistance systems based on vision in and out of vehicles. In IEEE Proceedings. Intelligent Vehicles Symposium, 2003, pp. 322–327. IEEE, 2003. [20] M. M. Trivedi, S. Y. Cheng, E. M. C. Childers, S. J. Krotosky. Occupant posture analysis with stereo and thermal infrared video: algorithms and experimental evaluation. IEEE Transactions on Vehicular Technology 53(6):1698–1712, 2004. [21] J. Shotton, A. Fitzgibbon, M. Cook, A. Blake. Real-time human pose recognition in parts from single depth images. In Proceedings of the CVPR. 2011. [22] I. Lefter, L. J. M. Rothkrantz, G. J. Burghouts. A comparative study on automatic audio-visual fusion for aggression detection using meta-information. Pattern Recogn Lett 34(15):1953–1963, 2013. doi:10.1016/j.patrec.2013.01.002. [23] M. Popa, L. Rothkrantz. Assessment of behaviour in serious games of driving simulators. International Journal of Intelligent Games & Simulation 6(2), 2011. 93 http://dx.doi.org/10.1007/978-3-319-12352-3 http://dx.doi.org/10.4236/jtts.2012.21007 http://dx.doi.org/10.1007/BF02884432 http://dx.doi.org/http://dx.doi.org/10.1016/j.ssci.2009.03.014 http://dx.doi.org/10.1109/TITS.2007.895298 http://dx.doi.org/10.1109/TITS.2008.2011707 http://dx.doi.org/10.1016/j.patrec.2013.01.002 Acta Polytechnica CTU Proceedings 12:83–93, 2017 1 Introduction 2 Related work 3 Model 4 Simulation environment 5 Experiments 5.1 Experiment 1 Detection behaviour actions 5.1.1 Analysis experiment 1 5.2 Experiment using KINECT 6 Conclusions References