INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 13(6), 988-1006, December 2018. Indoor Localisation through Probabilistic Ontologies I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc Irina Mocanu* Computer Science Department, University Politehnica of Bucharest Romania, RO-060042 Bucharest, Splaiul Independentei, 313 *Corresponding author: irina.mocanu@cs.pub.ro Georgiana Scarlat Computer Science Department, University Politehnica of Bucharest Romania, RO-060042 Bucharest, Splaiul Independentei, 313 georgiana.scarlat@cti.pub.ro Lucia Rusu Faculty of Economics and Business, Babes Bolyai University of Cluj-Napoca Romania, 400591, Cluj-Napoca, Teodor Mihali, 58-60 lucia.rusu@econ.ubbcluj.ro Ionut Pandelica Agora University of Oradea Romania, 410526, Oradea, Piata Tineretului, 8 ionut.pandelica@univagora.ro Bogdan Cramariuc IT Center for Science and Technology Romania, 11702, Bucharest, Av. Radu Beller, 25 bogdan.cramariuc@citst.ro Abstract: For elderly people that are living alone in their homes there is a need to permanently monitor them. One of this aspect consist in knowing their indoor position and motion behavioural status, in real time. One possibility for indoor positioning of an user consists in understanding the images provided by supervising cameras. In this case the main aspect is represented by recognition of objects from these images. Thus, object recognition plays an essential part in understanding the environment and adding meaning to it. This paper presents a method for indoor localisation based on identifying the user’s context. The user’s context is computed based on object recognition and using a probabilistic ontology. The key element is represented by the probabilistic ontology that describes objects, scenes and relations between them. This ontology contains probabilistic relations that are learned using a large database. Results show that given a set of object detectors with high detection rate and low false positive rate, the system can recognize the user’s context with high accuracy. Keywords: object recognition, detection rate, probabilistic ontology, context iden- tification. 1 Introduction The high number of elderly who are living alone, or who are spending too much of the day without supervision of specialized people is increasing exponentially. With the development of Active and Assisted Living (AAL) technologies the localisation of persons is easier. Indoor positioning/localisation systems (IPS) is similar with GPS and can be used successfully to locate people or objects inside buildings via mobile devices (smartphones or tablets). IPS relies on cameras mounted on walls or ceiling that work together to recognize user’s context or Copyright ©2018 CC BY-NC Indoor Localisation through Probabilistic Ontologies 989 objects. Results are in highly accurate position. Like GPS, IPS systems can detect the direction of movement, and it can predict the path based on this information so that accurate position remains as the space is displaced [20]. Hospitals and medical centers can benefit from this indoor localisation systems for staff, patients and managerial purposes. IPS staff includes: rapid location of colleagues in the building, finding records and mobile devices, notifications when and where patients are checked. IPS for patients that are seek can achieve the below benefits: automatic checking of the building’s entrance, turn-by-turn dynamic way finding meetings/appointments with relevant information based on location, way finding back to the parked car. For people with Alzheimer, eHealth and mHealth solutions includes complex IPS technologies, which are essential for the pursuit of both indoor and outdoor patients [1]. Therapy Acceptance and Commitment (ACT) encourages seniors (patients in general) in two directions: (1) accep- tance of thoughts and emotions, difficult and undesirable, and (2) adoption and simultaneous promotion, of actions and behaviors in daily practice, which is a consistent of individual values. ACT includes mindfulness exercises that promotes contact with the present [1]. The goal of the system proposed in this paper is to offer a reliable context detection of a user in his home. The user’s context will be represented by surrounding objects. Thus the main problem that must be solved consist in object recognition. Object recognition has evolved very much in the past decade but the current state of the art solutions are still far from what the human brain can do. Moreover, training object detectors involves a big amount of resources such as computational power and large image training set. Hence, object recognition is the main part of the proposed system - especially extracting the meaning from the object configurations found and detect the type of the scene in which they are in. For this scope, we propose a probabilistic ontology that describes objects, scenes and relations between them. The system is described by a generic probabilistic ontology which can be instantiated ac- cording to the set of objects and scenes that need to be recognized. The ontology contains probabilistic relations that are learned using a large database containing thousands of images (LabelMe image database [15]) of annotated images with object and scene information. The results of analyzing the co-occurrence and spatial relations between objects are then used to improve the object recognition process. For implementing a semantic information based solution for scene recognition, a large number of object detectors are needed because each scene can have so many different object configura- tions. Training so many object detectors is one problem, but it can be solved given the necessary resources. Another important problem is that using so many object detectors to scan through an image can be time consuming. The solution proposed in this paper tries to reduce the number of image scans to a minimum. This can be achieved if the algorithm would somehow "knows" what objects to search for. This would be possible in scenarios in which some objects are already detected and the next searched objects would be the ones that are relevant to the object context found so far. By doing this, the algorithm could recognize the scene after only a small number of object scans. The rest of the paper is organized as follows: Section 2 presents some existing methods for object recognition. Section 3 contains the proposed method for improving scene recognition by using semantic information organized as a probabilistic ontology model. Section 4 presents the evaluation of the proposed system. Conclusions and future work are given in Section 5. 2 Related work The IPS systems have made significant progress in the last years. The utility of those types of systems on GPS Tracking Devices is appreciated both for seniors, and also for health care 990 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc professionals (HCPs) and Caregivers. Bellow are described some of the most popular systems and applications for HCPs and caregivers [11]. Balance is an application geared specifically for Alzheimer’s Caregivers, which works on iPhone and iPad. The balance features are: Alzheimers disease references and information, Alzheimer’s caregiving and advice, advanced medication management features (refill date, start date and dosage), native scheduling features, adds categories, relevant to caregivers, "Doctor diary" for logging symptoms and taking notes, news about Alzheimer’s. Mobicare is a simple, straightforward and free iPhone and Web application. Their features are: profile of loved ones who are receiving personal information, including birth date, gender, basic insurance information, the contact information for one physician, basic symptoms, tracking based on 15 preset choices (i.e. insomnia, wandering, etc.), basic medication, tracking but with some limitations. Dementia Caregiver Solutions is an informational application for dementia caregivers. Its features are: perform advices for addressing the difficult behaviors associated with Alzheimer’s and other types of dementia. They also have bookmark or "star" articles you wish to read in the future. Object recognition is a vastly studied topic for which many systems were developed, however none of them are even close to the performance with which human brain can recognize objects, even though there are many variations in light, shape and color. The process through which the human brain accomplishes object recognition with very high speed and accuracy has been intensely studied by neurologists. Based on [2]: "The ability to rapidly recognize objects despite substantial appearance variation, is solved in the brain via a cascade of rflexive, largely feed for- ward computations that culminate in a powerful neuronal representation in the inferior temporal cortex. However, the algorithm that produces this solution remains poorly understood". In [3] suggests that "object recognition does not only involve physical properties (such as shape, color and texture) of the objects but also semantic information which includes the understanding of its use, previous experience with the object and how it relates to others". Some approaches in object recognition were purely based on visual properties of the objects. For example, paper [5] tries to detect general classes of objects, not just specific ones by us- ing part-based modelling and recognition of objects. The pictorial structure models were first introduced in [4]. They describe how a set of object parts arranged in a flexible configuration are used to model an object. The object parts represent their localized visual properties. The flexible object parts configuration is represented by pair-wise object connections. This approach is suitable for generic recognition problems because of the complex description of object visual features. Another machine learning approach is presented in [18]. Their solution is capable of process- ing images extremely fast and achieves high detection rates. Integral images are used to speed up computations. Also an important aspect is the fact that they train extremely efficient classifiers using an algorithm based on AdaBoost, but that are used only a small number of visual features which are selected as being critical features. Many regions of the image than on background areas are discarded in the early stages of the algorithm. This is accomplished by combining increasingly more complex classifiers in a cascade. A major advantage of this solution is that it can run in real-time applications at 15 frames per second. Both of the previously described systems are very promising but they lack semantic informa- tion about objects, which is crucial in recognizing objects with large variations in shape, such as furniture objects and others. A synergy between Google and Toronto University research team obtain a remarkable result called MultiModel, which solve multiple translation tasks, image captioning with COCO dataset, a speech recognition corpus, and an English parsing task. This model can caption images, cat- Indoor Localisation through Probabilistic Ontologies 991 egorize them, then translate to French and German and construct parse trees, by spanning multimple domains. MultiModel used encoder-decoder architectures, and applied to neural ma- chine translation Extended Neural. GPU is another model which used a recurrent stack of gated convolutional layers and ByteNet used left-padded convolutions in the decoder. Compared to Extended Neural GPU and ByteNet, MultiModel idea improves efficiency.and obtained good results in image classification [7]. The Inception deep convolutional architecture was called GoogLeNet or Inception-v1, than the was refined by the introduction of batch normalization in Inception-v2, or by additional fac- torization ideas in the third iteration - Inception-v3, later Inception-v4 with similarly expensive hybrid Inception-ResNet versions for both residual and non-residual Inception networks [17]. Another results was offered by CapsNet, which used a 3 layer architecture for convolutional neural networks (CNNs) for translated replicas of learned feature detectors. The primary capsules are the lowest level of multi-dimensional entities and corresponds to inverting the rendering process. The second layer (PrimaryCapsules) is a convolutional capsule layer with 32 channels of convolutional 8D capsules. The third Layer (DigitCaps) has one 16D capsule per digit class and each of these capsules receives input from all the capsules in the layer below.The implementation is in TensorFlow using the Adam optimizer [16]. There are, however, approaches that use semantic information concerning objects. One such approach is presented in paper [12,13]. Object recognition and scene understanding is strongly influenced by semantic and context-based information from a psychological point of view. There- fore, they use the context information to improve object recognition based on visual properties. Their approach presents how to extract context probability maps from images. Also, based on these maps, they learn specific configurations for a set of object classes. The final goal is to filter out false positives. There are new methods based on deep learning for object detection and recognition. In the context of object detection, several network architectures have been proposed [8], [9] and found to outperform methods based on traditional hand-crafted features. Most of these models were trained on RGB datasets such as PASCAL VOC to predict the bounding boxes of objects from images. The main difference between the solutions presented in paper [12] and the solutions presented in this paper is that, here the context is used not only to remove objects that don’t fit the context, but also for inferring what other objects can be found in the same context. For example, if a keyboard was detected at a previous step, then there is a high probability that a mouse may also be present in the same scene. This reasoning is used to make the system converge faster in order to recognize the scene by running the most relevant object detectors for the current context. For example, if a TV was detected, the system shouldn’t run an object detector for finding a car, but should run an object detector for finding a coach. Also, this system is designed in a generic way, thus allowing the use of any objects and scenes. This means that entities modeled by the ontology are abstract and can be mapped to any set of objects and scenes as long as these have the needed correlations between them (objects are relevant for the chosen set of scenes). 3 System description The system uses as stored data both a pre-trained probabilistic ontology model containing object and scene entities and relations among them and a set of object recognizers mapped on the object entities contained by the probabilistic ontology model, as given in Figure 1. The object recognition module can run, at request, a specific object recognizer and provide the result as a pair of object confidence and object bounding box. The inference module represents the main reasoning algorithm and, at each iteration, sends a request to the object 992 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc recognition module for finding a certain object. Based on the found/not found object results, the inference module uses the probabilistic ontology to filter out false positives, to determine what object to inquire about next and to determine if a scene can be recognized. Figure 1: System architecture In order to implement a reliable system for scene recognition, we consider the following steps: • Selecting the ontology’s structure: finding the set of relevant scenes and objects and map- ping the meaningful relations among them into ontology relation types with associated attributes that reveal their semantic understanding. • Choosing a large training data set that contains diverse context configurations for the chosen set of objects. • Implementing a component that scans through the training dataset to compute object prob- abilities, scene probabilities and relation probabilities and aggregate all this information into a probabilistic ontology model. • Implementing a object recognition method that allows testing the system and analyzing the influence on performance that object detection rate has on it. • Doing ontology-based reasoning using already found objects to determine if a new found object belongs or not to the current context, thus filtering object recognition false positives. • Doing ontology-based reasoning using previous object search results to determine what other objects can appear in the current scene in order to make the number of object recognition scans needed to recognize a scene much lower. • Doing ontology-based reasoning to recognize a scene based on the previously found objects. The system can run in two modes. The first one is the mode in which the system is used for running tests batches and computing system statistics. The tests are run on a big test database Indoor Localisation through Probabilistic Ontologies 993 (approximately 3000 images) and the computed statistics consist of scene recognition accuracy, mean number of iterations needed to converge to recognise a scene and total number of removed false positives from the entire test set. This data is used to evaluate the system and analyze how different changes and improvements can affect its performance. The second mode is used for viewing the analyzed image and obtaining the label associated to the image. An object is searched into the image - if the object is found then the most probable scene’s name will appear. After that, correlated objects with already found are searched in the image. The rectangles for founded objects are displayed on the input image and the recognized scene’s name is provided. 3.1 Object recognition The scene recognition problem can also be approached using only image processing algorithms that analyze low level information about color, shapes and texture. This approach has the advantage that the classification model is easier to learn and design, but the disadvantage is that this method cannot distinguish very well between different indoor scenes such as living room, bedroom and so on. This is due to the fact that indoor scenes usually have similar colors, shapes and textures. Such scenes with similar visual features can only be distinguished using more top level information about the image such as objects and correlations between them. Moreover, neurological studies [20] show that even the human brain cannot distinguish well objects and scenes unless there is some semantic meaning attached to them. As a result, scene recognition is best approached using semantic information regarding the scene. Using object recognition for solving a scene recognition problem has the advantage that once the objects are found, recognizing the scene becomes a much simpler problem. However, there is a huge disadvantage to this approach. Object detectors can sometimes generate false positives. An object that is "alien" to the current object context can affect the final recognized scene very much, especially if there are only a few objects in the scene. The approach presented in this paper shows how contextual information can also be used to filter out false positives, thus increasing overall scene recognition accuracy. Both ideas presented above about reducing the number of object scans and filtering out false positives are based on the relations that exist between objects. Some objects are strongly related among them (for example: keyboard and mouse, table and chair, etc.) and others are hardly related and can’t be seen together in the same scene very often (for example: tree and TV, refrigerator and car, etc.). However, these relations are not always applicable (for example: keyboard can appear in a scene without a mouse). Therefore, they should be modeled stochastically. In this paper, the chosen model for representing the relations between objects is a probabilistic ontology. The ontology used in this paper contains as entities objects and scenes and as relations the stochastic relations among objects and among scenes and objects. Every ontology entity has an apriori probability and every relation is also described by the probability of it to be true in a scene. The object to object relations from the ontology are described by an attribute called Interpretation that helps distinguish between positive and negative relations. Positive relations are those for which the objects are semantically connected to each other, meaning that this makes it more probable for them to appear in the same scene. The negative relations are exactly the opposite and they mean that the two objects are less likely to be found together. For filtering out false positives obtained as a result of object detection, the current approach uses the previously found objects in the scene and the relations that connect them to the newly found object. If these relations are positive ones, this will increase the detection confidence of the checked object, but if the relations have a negative interpretation then this will decrease the 994 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc detection confidence. If the newly found object’s detection confidence has decreased considerably after this update, then it is considered to be a false positive and it will be eliminated from the found objects list. If the first detected object is a false positives there are not afterward corrections applied. However, a prevention method is used in order to decrease the chances of this to happen. This measures is to use the false positive rate of each object detector as an influencing factor in choosing what object to search for in the case when no objects were found so far. This means that objects whose object detectors have lower false positive rates are preferred in the initial steps of the algorithm. Choosing what objects to search for so that a scene can be recognised faster uses the positive and negative relations between previously searched objects and current objects that are candi- dates for search. This means that candidate objects that are in positive relations with previously found objects with high detection confidence are more suitable to be searched for next than candidate objects that are in negative relations with previously found objects. Also, this reasoning is applied not only for the searched and found objects, but also for the searched and not found objects. If an object was not found after scanning the current scene, then candidate objects that are in a negative relation with it are more suitable for the next search than the ones that are in a positive relation with it. This reasoning is applied so that the next object choosing criteria can be more complex even if no objects were found yet. The weight of the influence the not found objects have on choosing a new candidate is smaller than the weight for the found objects. This is justified by the fact that even though an object is not found in a scene, it does not necessarily mean that the current context is not suitable for it. For example, in a kitchen scene there might not be a stove object found, but this does not mean that other objects related to it cannot exist in that scene. However, if there are two equally suitable objects candidates according to their relations with already found objects in the scene, a tie breaker between them can be the same criteria based on not found objects after scanning the image. This reasoning is useful if the algorithm is in a state when all the previous searched objects are not found and it should try to search objects from different contexts than the ones searched before so that the chances of finding a new object increase. All the criteria for choosing the next object to scan the image described earlier are combined into a fitness formula and the candidate object with the highest fitness value is chosen. The fitness value can also take into account the object’s associated object detector false positive rate in case no objects were found yet in order to avoid the situation when the first found object is a false positive. In order to recognize a scene after some objects were found, the proposed solution uses the semantic information stored as relations between scenes and objects inside the probabilistic ontology. The relation inScene is described by the probability of an object to be in a certain scene. At each step of the algorithm, the set of already found objects in the scene is used together with the relations between them and the entire set of scenes to compute the a fitness value for each scene. The number of objects needed to recognize a scene can vary very much depending on the objects and how correlated they are with a certain scene. In some cases, only a few objects can be enough to know the scene as long as their detection confidence is big enough. For example, if a stove object was found and it has a very high detection confidence, then the most probable scene by far is kitchen. Every time a new object is found, the fitness value is computed for each scene and if the scene with the highest fitness value is much beyond average then it is returned as the recognized scene and the object recognition process ends. The probabilities contained in the ontology are computed based on a large annotated image dataset, such that it can be applicable in general cases. The entities and the topology of the ontology can vary according to the use case it Indoor Localisation through Probabilistic Ontologies 995 is designed for, but it is important that they reflect meaningful semantic information about objects and how they relate to one another and scenes and what objects are most probable to be contained by them. Probabilistic ontology The probabilistic ontology can be modeled using any topology structure and any entity set as long as these are relevant to the current use case. However, there are some restrictions on how to build the probabilistic ontology. More precisely, the entity set must contain both scenes and objects. The object set has to be identical to the object set used by the object recognition module, and the objects must be relevant for the scene set. The general structure of the probabilistic ontology is given in Figure 2. The possible relations are: inScene, is-a, Object-Object relation, hasVisualFeature. Figure 2: Ontology structure There is no restriction regarding the stochastic relations between objects, except the fact that the relations have to be either positive or negative and weights have to be provided for each relation. The relation between objects and scenes is restricted to be of the type inScene, which reflects how probable it is for an object to be in a scene. Also, this relation should fully connect all the objects and scenes. Another restriction regarding the ontology building is that there should be no 0 value probabilities (in the case there are, they should be replaced with a very small value, close to 0). An example of a part of the probabilistic ontology is given in Figure 3. Each object to object relation contained by the probabilistic ontology has a fixed format. We consider the following relations properties: • Name: A string that uniquely identifies each relation • Interpretation: Represents how the relation should be interpreted during reasoning (POSITIVE, NEGATIVE). 996 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc Figure 3: Ontology example • Check Weight: Represents how important is this relation for checking newly detected objects and filtering out false positives. • Next Weight: Represents how important this relation is for determining the next object to query the Object Recognition Module. • Check Rule: Some relations are based on spatial requirements. This rule is used to check if the spatial requirements are met. For the relations that have no spatial meaning attached to them, this rule should be NONE. This ontology contains as entities objects from a house and the following scenes: kitchen, living room, street, office. We pointed relationship description between objects: • usedTogether: Two objects are often used together. The object pairs that are in this relation are given by the user according to common sense information about the objects in the object set. • inSameScene: Two objects often appear together in the same scene. The object pairs are determined by the object recognition algorithm according to the probability of the co-occurrence of each pair of objects. • cannotCoexist: Two objects can’t both be present in the same scene. The object pairs are determined by the object recognition algorithm according to the probability of the co-occurrence of each pair of objects. • areOddTogether Two objects may appear together in the same scene, but their combi- nation seems unnatural. The object pairs that are in this relation are given by the user according to common sense information about the objects in the object set. • onTop: Most of the cases when the two objects appear together, one of them is placed on top of the other. The object pairs that are in this relation are given by the user according to common sense information about the objects in the object set. Indoor Localisation through Probabilistic Ontologies 997 • canOverlap: Most of the cases when the two objects appear together, their areas overlap. The object pairs that are in this relation are given by the user according to common sense information about the objects in the object set. • isNeighbour: Most of the cases when two objects appear together, they are very close to one another. The object pairs that are in this relation are given by the user according to common sense information about the objects in the object set. Properties of these relationship between objects are syntetize in Table 1. The rules are the following: OnTopRule : Center(object1).y > Center(object2).y IsNeighbourRule : Distance(object1,object2) ≤ Diagonal(object1) + Diagonal(object2) 2 CanOverlapRule : Left(object1).x < Left(object2).x < Right(object1).x AND Left(object1).y < Left(object2).y < Right(object1).y Table 1: Properties of relationship between objects Relation Name Interpretation CheckWeight NextWeight CheckRule usedTogether POSITIVE 0.8 0.95 NONE inSameScene POSITIVE 0.6 0.9 NONE cannotCoexist NEGATIVE 0.98 0.98 NONE areOddTogether NEGATIVE 0.75 0.85 NONE onTop POSITIVE 0.8 0.7 OnTopRule canOverlap POSITIVE 0.8 0.65 CanOverlapRule isNeighbour POSITIVE 0.8 0.7 IsNeighbourRule inScene BELONGING 0 0 NONE The values for the relation CheckWeight and NextWeight attributes are chosen manually so that they reflect how reliable a relation is for checking for false positives (CheckWeight) and how reliable a relation is for predicting the presence of one of its referred objects when the other one is already found in the scene (NextWeight). For example, the relation usedTogether describes a much stronger bond between objects than the relation inSameScene, therefore its CheckWeight is much bigger (0.8 vs. 0.6). However, both usedTogether and inSameScene are relations between objects that are frequently found together, therefore their NextWeight is very big (0.95 and 0.9). These weights are used so that not all relations influence in the same way false positive filtering and next object choosing, because each of them has a different semantic interpretation and should affect the reasoning process in its own custom way. The CheckWeight and NextWeight attributes have the role of quantifying semantic relation attributes as numbers in the [0, 1] interval, that can be used as components in the inference module reasoning formulas. The current probabilistic ontology contains the relation between objects and scenes: in- Scene describes how probable is for an object to appear in a scene. The probabilities of each object, each scene and each relation were computed based on the LabelMe database [15] that currently contains 78840 annotated images. 998 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc 3.2 Object recognition The object recognition module has access to a set of predefined object detectors which are run at the request of the inference module on the input image or on a region of the input image. The result that is supplied by this module is represented by a tuple of the form (FoundObject, BoundingBox, Confidence). The bounding box represents the area in the image where the found object is placed. This information is used by the inference module to check ontology rules that have spatial requirements. The confidence of object detection is also used by the inference engine as a measure of how much the detected object influences reasoning regarding other related objects. An important aspect worth mentioning is that not all the object detectors are necessarily run in order to recognize a scene, The inference module decides to inquire about a certain object which it believes it is more probable to appear in the image based on previous detections, and when it has enough information to infer the scene with a high probability, the system returns and no more detectors are run. This means that, given a powerful inference module that obtains reliable information from all the other components, the final result can be obtained very fast, avoiding the costly image scans that the object detectors apply. The object recognition module is responsible for running object detectors on request. Also, this component has access to information regarding each object recognizer that is relevant to the inference module. More precisely, the object recognition module can provide infor- mation about each object detector’s detection rate and false positive rate. This information is meaningful for the inference module because it influences what object to inquire about. 3.3 Inference module The inference module represents the system’s component that implements the main al- gorithm for scene recognition. A scene can be inferred based on the objects recognized using the object recognition module. The information regarding found or not found objects is used to interrogate the probabilistic ontology. This ontology contains information about the stochastic relations between scenes and objects and between objects. The algorithm starts by choosing a first object to interrogate the object recognition module about. The first object is chosen based on false positive rate. Therefore, the first interrogated object is the one with the lowest false positive rate. This is done because at the initialization of the system, there is no information available about the input image, therefore it is important not to start with invalid information that can compromise future reasoning. After the initialization step when the first object is chosen, the algorithm enters a repetitive loop. At each iteration, the object recogni- tion module is queried about the existence of a certain object inside the input image. If the object is found, then its confidence and bounding box are available. Next, a set of inference rules are used. The first inference rule has the role of filtering out false positives. The information used to update an object’s detection confidence is represented by the positive and negative rela- tions between the current object and the previously found objects. If the newly found object is in a positive relation with a previously found object then its detection confidence will increase. On the other hand, if the current object is in a negative relation with a previously found object, then its detection confidence will decrease. A visual description of the first inference rule is given in Figure 4. The first inference rule are the following rules (Ri), i=1,7: • (R1) Compute the list of previously found objects rejected by the current found object. • (R2) Compute rejection factor of the found object REJFOUND according to each rejected object relation. Indoor Localisation through Probabilistic Ontologies 999 Figure 4: Description of the first inference rule • (R3) Compute the list of previously found objects attracted by the current candidate object. • (R4) Compute the found object attraction factor. ATTRFOUND according to each at- tracted object relation • (R5) Update the current found object’s detection C(O) rate according to the rejection and attraction factors already computed: ATTRFOUND, REJFOUND C(O) = C(O) ∗ (1 − REJFOUND |ObjectsREJECTED| + ATTRFOUND |ObjectsAttracted| ) • (R6) If the found object detection confidence C(O) has decreased more than 20%, then it is filtered out. The second inference rule has the role of determining what the next searched object should be. This rule is important because it helps the algorithm converge faster to the recognized scene and it avoids running all the object recognizers on the input image. In order to determine what object to search for next, information about the positive and negative relations from the probabilistic ontology is used against the set of previously found and not found object sets. Therefore, the next object to inquire the object recognition module about is the one that best fits the current context. More precisely, not searched objects that are in a positive relation with previously found objects will have a bigger chance at being next and not searched objects that are in a negative relation with previously found objects will have a smaller chance at being next. A visual description of the second inference rule is given in Figure 5. Similar reasoning is used also in the case of previously not found objects (objects that were searched for in the input image but were not found), but in this case the positive relations have a negative impact on the new object’s chance at being next and the negative relations have a positive impact on the new object’s chance at being next. However, the reasoning based on previously found objects has a bigger weight in influencing the detection rate than the one on previously not found objects. Treating these two cases differently is justified by the fact that, even though an object is not found in the input image, this does not necessarily mean that the object cannot belong to the current context. 1000 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc Figure 5: Description of the second inference rule The scene recognition algorithm reasoning can be compromised if the first found object is a false positive. This can lead to filtering out other following correct detection as false positives and the final result becomes meaningless. Therefore, when choosing the next object to search for, the false positive rate of the object detector is also taken into consideration if no objects have been found so far. This reduces the risk of starting the algorithm with a false positive detection. • (R1) Obtain the list of previously found objects rejected by the current candidate object ObjectsREJECTED • ( R2) Compute the found object rejection factor REJFOUND according to each rejected object relation • (R3) Obtain the list of previously found objects attracted by the current candidate object ObjectsATTRACTED • (R4) Compute the found object attraction factor ATTRFOUND according to each attracted object relation • (R5) Repeat the steps above for previously not found objects and compute REJNOT−FOUND and ATTRNOT−FOUND factors. • (R6) Compute the fitness of the current candidate object: F(OCAND) = (ATTRFOUND−REJFOUND)+α∗(REJNOT−FOUND)−ATTRNOT−FOUND) where α represent a sub-unit weight for decreasing the influence of the not found objects. If there was no object found, the fitness value is also influenced by the false positive rate (FP): FP(DetectorOCAND ) of the candidate objects: F(OCAND) = (1 −FP(DetectorOCAND ) ∗F(OCAND) • (R7) Choose the next object to search for from the list of candidate objects by finding the one with the biggest fitness value: ONEXT = argmax(F(OCAND)) Indoor Localisation through Probabilistic Ontologies 1001 The third inference rule has the role of determining the fitness value of each scene for the current image. If the scene with the highest fitness value is bigger than a threshold proportional to the average scene fitness, then the algorithm returns the scene and the object searching process is ended. For computing the fitness value of each scene the Inference Module uses the set of objects found so far to interrogate the probabilistic ontology about the probability of having each found object in the current candidate scene. The ontology relation that contains information about the probability of an object to appear in a scene is called inScene. This is a mandatory relation that should exist in any instance of the probabilistic ontology, no matter what object set and scene set is used or what other relations are used between objects. A visual description of the third inference rule is given in Figure 6. Figure 6: Description of the second inference rule The fitness value for a scene S is computed as: Fitness(S) = ∑ O∈ObjectsF OUND log P(inScene(O,S) P(O) where ObjectsFOUND represents the set of the previously found objects, P(inScene(O,S)) represents the probability of the inScene relation between object O and scene S and P(O) represents the apriori probability of object O (obtained from the probabilistic ontology). The best candidate scene is chosen by: SceneBEST = argmax(Fitness(S)) The best candidate scene is returned as the final recognized scene if it’s fitness is bigger than a threshold proportional to the average scene fitness: Fitness(SceneBEST ) > δ ∗ 1 |Scenes| ∗ ∑ s∈Scenes Fitness(S) (1) If no scene is recognized at the current iteration, then the object searching process continues and the previous three inferences are applied again at each iteration until a scene can be inferred or all the objects have been searched for. In the latter situation, the returned scene is the one with the biggest fitness value, without taking into consideration the equation 1. 1002 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc 4 System evaluation The system is implemented to be flexible regarding replacing any of the components im- plementation. Every module communicates with the others through an interface, making the components loosely coupled. Therefore, in order to replace the current implementation of a mod- ule all that is needed is to implement the module interface. The scene recognition application is written in Java language using Eclipse as IDE. For object recognition algorithm we use the YOLO network [19]. The scene recognition application was tested using images from LabelMe database [15]. This database includes images for many indoor and outdoor scene types. For testing the system, only images with relevant scene types were used: kitchen, office, living room and street. The images are annotated with scene information under the attribute scenedescrip- tion. After the probabilistic ontology probabilities are computed based on object and scenes co-occurrences inside the training image set, the model is stored inside an XML file. The XML ontology is then parsed by the application and mapped into Java objects that are used by the inference module. The pseudocode given in algorithm 1 that describes the reasoning algorithm implemented in the inference module: Algorithm 1 Reasoning algorithm 1: procedure reasoningAlgorithm(Imageimage) 2: searchObject ← getObjectWithLowestFP() 3: foundObjectsSoFar ⇐ [] 4: notFoundObjectsSoFar ⇐ [] 5: while true do 6: (confidence,boundBox) ⇐ findObject(searchObject,image) 7: updatedConfidence ⇐ checkFoundObject(searchObject,confidence,foundObjectsSoFar) 8: if updatedConfidence− confidence > threshold then 9: foundObjectsSoFar ⇐ foundObjectsSoFar ∪searchObject 10: else 11: notFoundObjectsSoFar ⇐ notFoundObjectsSoFar ∪searchObject 12: if [] 6= foundObjectsSoFar then 13: sceneFitnessList ⇐ computeAllSceneProbabilities() 14: bestScene ⇐ maxFitnessScene(sceneFitnessList) 15: meanFitness ⇐ meanFitnessV alue(sceneFitnessList) 16: if bestScene.fitness > �∗meanFitness then 17: return bestScene 18: end if 19: end if 20: end if 21: searchObject = getNextMostProbableObj(foundObjSoFar,notFoundObjSoFar) 22: end while 23: end procedure The object recognition module has access to a set of object detectors and to information regarding their performance: detection rate and false positive rate. We use YOLO (You Only Look Once) network [19]. It is a very robust method, which is almost invariant to position and lightning. It simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO trains on full images and directly optimizes detection performance. YOLO reasons globally about the image when making predictions. Unlike sliding window and region proposal- Indoor Localisation through Probabilistic Ontologies 1003 based techniques, YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance. And also YOLO learns generalizable representations of objects. For testing the system, only images with relevant scene types were used: kitchen, office, living room and street. The images are annotated with scene information under the attribute scenedescription. The database contains many object types that appear in many combinations. The images in the database are annotated with object information. For the current system, the relevant annotated object information is represented by the object outline polygon and the "verified": flag. The object polygon is used to obtain the object’s bounding box for the object recognition stubs and the "verified" flag is used to select the object’s confidence. If an object is verified, then it was annotated correctly and the object recognition will give it a high confidence, otherwise a lower confidence value is assigned. Some examples of LabelMe annotated images can be seen in Figure 7. Figure 7: LabelMe annotated images example Figure 8 shows some examples of recognising kitchen, office, living room. The current sys- tem was evaluated on a test set containing a total of 3381 annotated images from LabelMe database [15]. The aggregated evaluation results obtained after running the application on the test database can be seen in Table 2. Table 2: System evaluation results using stubs Scene Name Accuracy Mean Iterations Removed FP Test Images kitchen 89% 7 16 651 office 90% 9 35 914 living 91% 8 30 775 1004 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc Figure 8: Example of scene recognition 5 Conclusions and future work This paper presents a method for context detection of a user in an indoor space. The proposed method is based on the results of an object recognition process. In order to be able to recognize a wide range of scenes, the number of objects that need to be recognized can become very big. Running a large number of object detectors can be time consuming; therefore the current approach uses semantic information about objects and scenes to speed up the scene recognition process and to eliminate false positives that can have a negative impact on the final result. This semantic information is organized as an object and scene probabilistic ontology model. Reasoning is done using the stochastic relations between objects and between objects and scenes and its outcome can influence in which order objects are searched for or if a newly detected object is eliminated for being a false positive. Scene recognition is influenced by the semantic relations between scenes and objects, and the object recognition process ends as soon as there are enough objects found to determine the scene. Results show that given a set of object detectors with high detection rate and low false positive rate, the system can recognize a scene with high accuracy and in a small number of iterations. As future work, the ontology model can be extended to meet domain-specific requirements because it is easily adaptable to different domains. Acknowledgment This research and paper was co-founded by both the Executive Unit for Financing Higher Education, Research and Development and Innovation through Partnership Program, project "Mobility pattern assistant for elderly people", project number PN-II-PT-PCCA-2013-4-2241, by University Politehnica of Bucharest, through the "Excellence Research Grants" Program, UPB - GEX. Identifier: UPB-EXCELENTA-2016 "Optimizarea Activitatilor Zilnice Folosind Deep Learning Implementata pe Sisteme Reconfigurabile" / Daily Activities Using Deep Learning Implemented on FPGA, Contract number 8 / 26.09.2016 (code 341) and by the grant of the Romanian National Authority for Scientific Research and Innovation, CCCDI - UEFISCDI and of the AAL Programme with co-funding from the European Union’s Horizon 2020 research and innovation programme project "IONIS - Improving the quality of life of people with dementia and disabled persons", project number AAL2017-AAL-2016-074-IONIS (Contracts 52/20017 and 53/2017). Indoor Localisation through Probabilistic Ontologies 1005 Bibliography [1] Burm, C. (2015). Dementia and Elderly GPS Tracking Devices, http://www.aplaceformom.com/blog/4-29-15-dementia-and-elderly-gps-tracking-devices/, last accessed October 2018. [2] DiCarlo, J.; Zoccolan, D.; Rust, N. C.(2012). How does the brain solve visual object recog- nition, Neuron, 73(3), 415–434, 2012. [3] Enns, J. T.; (2004). The Thinking Eye, The Seeing Brain: Explorations in Visual Cognition, W. W. Norton Company, ISBN: 0393977218, 2004 [4] Fischler, M.A. ; Elschlager, R.A., (1973). The Representation and Matching of Pictorial Structures, IEEE Transactions on Computer, 22(1), 67–92, 1973. [5] Felzenszwalb, P.F, Huttenlocher, D.P.; (2005), Pictorial Structures for Object Recognition, International Journal of Computer Vision, 61(1):55–79, 2005. [6] Gupta, S.; Girshick, R.; Arbelaez, P.; Malik, J. (2014). Learning Rich Features from RGB-D Images for Object Detection and Segmentation, ECCV, 345–360, 2014. [7] Kaiser, L.; Gomez, A. N.; Shazeer, N.; Vaswani, A.; Parmar, N.; Jones,l.; Uszkoreit,J (2017). One Model To Learn Them All, http://arxiv.org/abs/1706.05137, last accessed October 2018. [8] Li, B.; Wu, T.; Shuai1, S.; Zhang, L.; Chu, R., (2017). Object Detection via Aspect Ratio and Context Aware Region-based Convolutional Networks, arXiv:1612.00534v2 , https://arxiv.org. [9] Leal-Taixe, L. (2016). Multiple Object Tracking with Context Awareness, http://arxiv.org/abs/1411.7935, last accessed October 2018. [10] Maturana, D.; Scherer., S. (2015). Voxnet: A 3d Convolutional Neural Network for Realtime Object Recognition, IEEE/RSJ International Conference on Intelligent Robots and Systems, 922–928, 2015. [11] Napoletan, A. (2015). 10 Best (and Worst) Apps for Caregivers, https://www.aplaceformom.com/blog/best-and-worst-apps-for-caregivers-07-03-2013/, last accessed October 2018. [12] Perko, R.; Leonardis, A., (2010). Context Awareness for Object Detection, Computer Vision and Image Understanding, 114(6), 700–711, 2010. [13] Rehman, Z.; Kifor C.K. (2016). An Ontology to Support Semantic Management of FMEA Knowledge, International Journal of Computers Communications & Control, 11(4), 507-521, 2016. [14] Ren, S.; He, K.; Girshick, R.B.; Sun, J. (2015). Faster R-CNN: towards real-time object de- tection with region proposal networks, Advances in Neural Information Processing Systems, 91–99, 2015. [15] Russell, B. C.; Torralba, A.; Murphy, K. P.; Freeman, W. T. (2008). LabelMe: a Database and Web-Based Tool for Image Annotation, International Journal of Computer Vision, 77(1-3), 157–173, 2008. 1006 I. Mocanu, G. Scarlat, L. Rusu, I. Pandelica, B. Cramariuc [16] Sabour, S.; Frosst, N.; Hinton, G. E. (2017). Dynamic Routing Between Capsules, Ad- vances in Neural Information Processing Systems 30: Annual Conference on Neural Infor- mation Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA,3859–3869, https://arxiv.org/pdf/1710.09829.pdf, last accessed October 2018. [17] Szegedy, C.; Ioffe, S.; Vanhoucke, V. ( 2016). Inception-v4, Inception-ResNet and the Im- pact of Residual Connections on Learning, http://arxiv.org/abs/1602.07261, last accessed October 2018. [18] Viola, P. A.; Jones, M. J. (2001). Rapid Object Detection using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1-9, 2001. [19] YOLO network, https://pjreddie.com/darknet/yolo/, last accessed October 2018. [20] https://senion.com/indoor-positioning-system/, last accessed October 2018.