Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. V (2010), No. 5, pp. 625-633 Generic Multimodal Ontologies for Human-Agent Interaction A. Braşoveanu, A. Manolescu, M.N. Spînu Adrian Braşoveanu Lucian Blaga Univeristy of Sibiu, Romania E-mail: adrian.brasoveanu@gmail.com Adriana Manolescu Agora University, Oradea and R&D Agora Ltd. Cercetare Dezvoltare Agora Oradea, Romania E-mail: adrianamanolescu@gmail.com Marian Nicu Spînu Aurel Vlaicu University of Arad, Faculty of Exact Sciences Department of Mathematics-Informatics Romania, 310330 Arad, 2 Elena Drăgoi Abstract: Watching the evolution of the Semantic Web (SW) from its in- ception to these days we can easily observe that the main task the developers face while building it is to encode the human knowledge into ontologies and the human reasoning into dedicated reasoning engines. Now, the SW needs to have efficient mechanisms to access information by both humans and arti- ficial agents. The most important tools in this context are ontologies. The last years have been dedicated to solving the infrastructure problems related to ontologies: ontology management, ontology matching, ontology adoption, but as time goes by and these problems are better understood the research interests in this area will surely shift towards the way in which agents will use them to communicate between them and with humans. Despite the fact that interface agents could be bilingual, it would be more efficient, safe and swift that they should use the same language to communicate with humans and with their peers. Since anthropocentric systems entail nowadays multimodal interfaces, it seems suitable to build multimodal ontologies. Generic ontologies are needed when dealing with uncertainty. Multimodal ontologies should be designed taking into account our way of thinking (mind maps, visual thinking, feedback, logic, emotions, etc.) and also the processes in which they would be involved (multimodal fusion and integration, error reduction, natural language processing, multimodal fission, etc.). By doing this it would be easier for us (and also fun) to use ontologies, but in the same time the communication with agents (and also agent to agent talk) would be enhanced. This is just one of our conclusions related to why building generic multimodal ontologies is very important for future semantic web applications. Keywords: multimodal ontology, ontology matching, interface agents, Se- mantic Web, human-agent interaction 1 Introduction The Knowledge Society (KS) is a society where information is the primary resource which can be consumed by both humans and machines. If we want to build such a society in a proper way we need different kinds of infrastructure: hardware, software, organizational, etc. SW and Copyright c⃝ 2006-2010 by CCC Publications 626 A. Braşoveanu, A. Manolescu, M.N. Spînu agents represent only a small part of the large infrastructure needed in order to build the true KS. SW ( [1], [2], [3], and [4]) is one of those disruptive technologies which tend to be talked about years before their coming of age. One of the visions presented in [1] was that of agents replacing humans for simple everyday tasks like buying tickets for a concert or making appointments to the doctor. The main reason why this vision hasn’t yet come to life is one that is now well understood and also explained in the article’s revision [2]: encoding the human knowledge into ontologies and the human reasoning into dedicated reasoning engines is not an easy task. This process requires trans-disciplinary knowledge, dedicated tools and repositories, and advanced techniques from mathematics, logics and software. It is in fact an extremely difficult procedure which relies entirely on the cooperation between hundreds or thousands of organizations and different standards. Since the standardization processes take a long time even in these days and the time of adoption for new technologies is sometimes around 2-3 years at least, we should not be surprised that it will take a while until the SW reaches the critical mass. Ontologies represent the key to a successful communication between human and agents if they are done right. We are only beginning to understand the implications of using the on- tologies for the great tasks we assigned for them, but some problems like ontology management (versioning, change, tools and standards), ontology matching (finding correspondences between different ontologies) and the adoption of ontologies on large scale by developers and users proved to be quite challenging. Ontology dynamics is definitely a field on which we should keep an eye on. According to [30] there is still no clear winner in the process of ontology matching (in other words: a standard or a methodology with clear rules to match almost everything automatically or semi-automatically since sometimes humans will need to check the results). Therefore we should not be surprised, when reading a journal or conference proceeding, that most of the arti- cles refer to these tasks rather than to the desired using of ontologies which is to give agents a way of understanding our world and reason about it. It is the way things should be: in order to build a functional system we always need to have its parts figured out. We should however not lose sight of the system we need to build and this is one of the purposes of this paper: to look at the current state of the art in several fields of study and see if we are heading in the right direction. In this context we will especially examine some problems related to the multimodal communication between human and agent and try to see how they are solved by using ontologies. 2 Rationale and Approach: Why Complicate Things and Use Generic Multimodal Ontologies? First we need to clear one question: what is an ontology? Some answers to this (and also some examples of how to use ontologies) can be found in [12], [15], [16], [17], [22], [23] and [31]. The classic definition proposed by Gruber tells us that an ontology is "explicit specification of a conceptualization" [12]. This definition is examined and extended by many papers, most recently by Guarino, Oberle and Staab in [16] which also focuses on the importance of "shared explicit specifications" because without commiting to ontologies every agent would understand something else (they also take the opportunity to revise the semiotic triangle). Ontologies are us, Mika’s thesis [23] is a simple yet powerful statement. It tells us that since we are the ones who design the ontologies they will only express what we want them to express and will sometimes be useless without the context in which they have been created. The main problem when designing ontologies is to carefully choose the concepts within a domain and the relationships between them in such a way as the ontology to be well founded because "any ontology will always be less complete and less formal than it would be desirable in Generic Multimodal Ontologies for Human-Agent Interaction 627 theory" [16]. In the light of this statement it should become quite clear why we sometimes need to use generic ontologies: there is simply no other way to address the problem of uncertainty when developing ontologies than genericity. Figure 1: One of the most popular programs for ontology matching: COMA++, developed at the University of Leipzig. In this screenshot we can see how we can establish some correspondences between two ontologies representing a Computer Science Department Nowadays there are probably thousands of ontologies in use, but if the SW will ever look like Berners-Lee’s visions then ontologies will be common place for every designer, developer or user. Usually an ontology only addresses the problems from a narrow field of knowledge (domain ontology) so it is not uncommon that applications may use many ontologies for different purposes. In some of these cases it is useful to also use upper level ontologies which are general ontologies that represent concepts that are the same across all domains. A unique upper level ontology which should encompass all the human knowledge is not feasible and will never be built because of practical reasons (each society has its concepts, every field of knowledge has a certain language to protect itself, etc.), but upper level ontologies are used for mediation mainly in the idea that universal agreement between different ontologies will be/is possible. In other cases in order to use different ontologies the applications will use ontology matching schemes like those discussed in [10]. Since ontologies are the building blocks of SW, any application from this area must use them, even if that means adding layers of complexity because of the matching process, APIs, uncertainty. For everybody working in the IT industry these days it should be clear that the medium in which we work is becoming more and more like OHDUE (Open Heterogenous Dynamic Uncertain Environment) [8] and ontologies are part of this medium. These issues are addressed in articles and books like [10], [19], [30] (ontology matching), [26] (automatic generation of ontology APIs), [8] (OHDUE, agents). Because the field of ontology engineering is becoming more popular we should not be surprised that we will also hear a lot about the ontology driven software engineering. Ontology Driven Information Systems (ODIS) [36] is just one of the recent examples which fell into this category. Given all these complications that appear when designing and working with ontologies it is interesting to ask a new question: why would we want to complicate our life even more by using multimodal ontologies? It is not enough that the ontology management or ontology matching problems still pose so many challenges? Are these new breed of ontologies even feasible? Certainly from a user’s perspective multiple modalities to enter input into a system (touch, voice, mouse, pen, etc.) can only mean increased usability (do we need to remember how touch screens became the norm in the mobile phones industry after iPhone was launched?), while from 628 A. Braşoveanu, A. Manolescu, M.N. Spînu Figure 2: The multimodal communication dream: to use all the five senses (smell, sight, touch, taste, sound) during the process of communication. a developer’s perspective this means that software gets even more complicated than it is now. This is the right moment for such a development since for the multiple streams of data that come with multimodal communication we need distributed systems. Since multi-core processors are now luckily the norm in desktop computing we should have no problem (at least not hardware) dealing with the huge flux of data. In the past 40 years scientists have developed different mechanisms for getting audio, video and touch input, but the integration of all five senses in the communication between man and machine remains a dream. It is enough however to use one sense in different ways (for example for seeing we have images, text, video) to be able to speak about multimodal communication. In this respect different research groups (most notably [29]) started to develop also multimodal ontologies, but most of them took the approach of developing different ontologies for text, images, video or voice and then use ontology alignment to match them (multimodal integration through ontology matching [29]). A multimodal ontology gets us all the benefits of having such different ontologies. Like all things in life, multimodal ontologies do not come without bad parts (even harder to design, mantain and match), but they are definitely closer to our way of thinking. Is this a sufficient reason to try it? It might not be, but it is not the only one. The usage of multimodal ontologies will allow us to give a more natural, even realistic, feeling during communication between agents and humans, enhanced usability, the possibility to model mechanisms that are closer to the way we understand the world (diagrams, mind maps, feedback, brainstorming, slides, visual thinking, and others). It should be clear that it’s not just art for art’s sake, but rather art for a better life in the future. 3 Generic Multimodal Ontologies for Human-Agent Interaction The process of multimodal ontology modeling is still open to exploratory research because ontologies are not everywhere. Without ontologies for all possible fields, and tools to match these ontologies it is debatable whether we will achieve an efficient semantic web, but rather the illusion of a semantic web maintained by few successful applications in certain areas (like social Generic Multimodal Ontologies for Human-Agent Interaction 629 networking, language translation or medicine). Since multimodal communication is difficult to process it is clear that in the first phase of any research regarding this subject, the communication between agents and humans will not be efficient. The question we need to ask ourselves in this situation is: If it is not efficient why should we bother at all to try something like this? The answer is simple and is typical for exploratory research: It takes time to find the best way to integrate multiple streams of data in an efficient manner and it also takes time to develop efficient ontology matching processes for such tasks. The role of exploratory research is to discover niches. The task of creating efficient mechanisms is one best suited for incremental research. Since this area of research is relatively new there is enough room for exploratory research and for breakthroughs. Generic ontologies are rarely used by developers. Most of the articles present different on- tologies and clearly state that they do not use generic ontologies because the problem’s domain was well understood. Generic ontologies are best suited for modelling as we can see from [17], and [13]. It is easier to say you have an ontology with few concepts and not define all of them when doing modelling. The task of defining all the concepts and relationships between them is one that remains to the ontology engineer or to the developer. When dealing with models that are related to multimodal communication it makes sense to use generic multimodal ontologies. It also makes sense to use a generic ontology whenever dealing with uncertainty as suggested by [8] [28]. The agents of tomorrow will be built taking into account recent findings like the requirements- driven self-reconfiguration [6], multi-party, multi-issue, multi-strategy negotiation [35], natural language [18], and controlled natural language [32]. If we are to follow Berners-Lee vision from [1] we absolutely need to integrate such findings into our work. In fact according to [18] ontologies are the "common ground for virtual humans". Their architecture suggests using multimodal communication, but this is not clearly stated in the article since the ontology is not multimodal. If we look at [6] and [35] we can envision agents that dynamically change their strategies according to the environment and the context of conversations. This requires designing flexible ontologies, another reason to make them generic. The agents must use ontologies if they are to understand something from this world. They also need to share them and commit to them if we want them to be able to talk between them. The multimodal ontology helps in some of the phases of multimodal communication: fusion and integration (getting the input from different channels), natural language processing, disambiguation, error reduction and fission (preparing the output). When designing a multimodal ontology one must also take into account the problems related to designing multimodal systems as described in [25], and also the medium in which these agents will evolve because an agent that needs to evolve in the urban computing environment [34] will have different needs than an agent that just surfs the web. The focus of research is usually on multimodal fusion, but a recent survey [9] shows that the interest in multimedia fission is increasing. Designing a multimodal ontology thus requires taking into account all these findings because the agent must be able to give us a response not only to understand our requirements. Probably one of the big challenges ahead is to annotate the multimodal content in real-time. This is particularly hard to do for video content, but not impossible, as [27] suggests. M3O (Multimedia Metadata Ontology) allows us to annotate the multimedia content from a page to retrieve it easier. If such ontologies will be improved then the road to the visions from [1] will be shorter. 4 Related Work The current state of the art in multimodal HCI is presented in [7] and [20]. One of the con- clusions from [7] leaves further space for improvements: "most researchers process each channel (visual, audio) independently, and multimodal fusion is still in its infancy". The same can be 630 A. Braşoveanu, A. Manolescu, M.N. Spînu rendered as true for the multimodal ontologies too. Since [7] is more recent we will use it as a basis for further investigation in this field. Since there are only few interesting articles related to multimodal ontologies every year, we have selected a few of them to be used as basis for future research. When searching for definitions related to ontologies and trends in the field of ontology devel- opment /matching some of the best research groups in the world are the ones from Trento (LOA and University of Trento), and Koblenz-Landau. Many of the articles cited in this paper come from some of the members of the Trento group: [6], [10], [16], [17], [30]. These are related to definitions of ontology, ontology matching, and modelling with ontologies. We have also used ar- ticles from the Koblenz-Landau group: [6], [26], [27] related to definitions, automatic generations of ontology APIs and M3O. One interesting idea is that of multimodal context-aware interaction presented by Cearreta and his team in [5]. If we have to model emotions there might be no other solution than to use multimodal ontologies combined with special reasoners. Another article related to our subject is [29]. Their approach of using different ontologies for text and images and then use ontology matching can definitely be improved on the long term. They clearly state that for the moment multimodal ontology do not offer fast communication, but that in time speed might be improved. Also [24], [32] and [33] study the relationships between Natural Language Processing (NLP) and SW. The work of these research groups must be studied. One of them [32] is from Southampton, one of the workplaces of Timothy Berners-Lee. When it comes to generic ontologies and tools for working with ontologies, one of the best research groups that needs to be followed is Stanford’s [11], [28]. Their work on biomedicine ontologies and Protégé is fundamental. 5 Conclusions and Future Work The SW tools are now an important part of the IT industry, the main clients coming from the fields of biomedicine, aeronautics, automotive, government and local administrations, and media. This sudden interest might be related to the success of social media [14], [21] and means that developers are starting to tap into the potential promises of the field. Even so there is a lot of work to be done regarding multimodal ontologies. The reason is one that was mentioned several times during this paper: the task of designing such ontologies is still difficult. As we do not have yet universal methods for ontology matching we do not have a clear methodology of designing multimodal ontologies (regardless of the fact that they are generic or not). The main advantages of using generic multimodal ontologies should be better understood now: they offer us a modality to design the process of communication with agents as close to our way of thinking as possible and also play a very important role in several phases of the multimodal communication (multimodal fusion and integration, disambiguation, NLP, error reduction, multimodal fission, etc.). The main disadvantage will probably be efficiency for the next years, but given the exploratory nature of the research this is normal. The future work of our group will consider implementing new mechanisms for linking the generic multimodal ontologies and affective interfaces with recent research in Semantic Web and HCI in a 3 years interval (during the PhD studies of the first author). The objectives are to be fulfilled involving European teams of researchers interested in this kind of projects. Acknowledgements This work was partially supported by the strategic grant POSDRU/88/1.5/S/60370(2009) on "Doctoral Scholarships" of the Ministry of Labour, Family and Social Protection, Romania, co- financed by the European Social Fund - Investing in People. Generic Multimodal Ontologies for Human-Agent Interaction 631 Bibliography [1] T. Berners-Lee, J. Hendler, O. Lassila. The Semantic Web. Scientific American, May 2001, 34-43. [2] N. Shadbolt, W. Hall, T. Berners-Lee. The Semantic Web revisited. IEEE Intelligent Systems, pages 96- 101, May/June 2006. [3] T. Berners-Lee, W. Hall, J.A. Hendler, K. O’Hara, N. Shadbolt, D.J. Weitzner. A Framework for Web Science. Foundations and Trends in Web Science, 1 (1), pages 1-130, 2006. [4] C. Bizer, T. Heath, T. Berners-Lee. Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems, Volume 5, Issue 3. [5] I. Cearreta, J. M. Lopez, N. Garay-Vitoria. Modelling multimodal context-aware affective interaction. Proceedings of the Doctoral Consortium of the Second international conference on ACII’07. Lisbon, Portugal, 57-64, 2007. [6] F. Dalpiaz, P. Giorgini, J. Mylopoulos. An Architecture for Requirements-driven Self- Reconfiguration. Proc. of the 21st Int. Conf. on Advanced Information Systems Engineering, LNCS 5565, Springer, 246- 260, http://www.disi.unitn.it/ pgiorgio/papers/caise09-b.pdf, 2009. [7] B. Dumas, D. Lalanne, S. Oviatt. Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In D. Lalame, J. Kohlas, editors, Human Machine Interaction Research Results of the MMI Program, Springer, 3-27, 2009. [8] I. Dzitac, B.E. Barbat. Artificial Intelligence + Distributed Systems = Agents. International Journal of Computers, Communications & Control, ISSN 1841-9836, 4(1):17-26, 2009. [9] D.W. Embley, A. Zitzelberger. Theoretical Foundations for Enabling a Web of Knowledge. Retrieved from: http://dithers.cs.byu.edu/tango/papers/formalWoK.pdf, 2009. [10] J. Euzenat, P. Shvaiko. Ontology Matching. Springer, 2007 [11] A. Ghazvinian, N. F. Noy, C. Jonquet, N. H. Shah, M. A. Musen. What Four Million Map- pings Can Tell You about Two Hundred Ontologies. International Semantic Web Conference 2009: 229-242 [12] T. R. Gruber. A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5(2):199- 220, 1993. [13] M. Gruninger. Designing and Evaluating Generic Ontologies. In ’ECAI96’s workshop on Ontological Engineering’. [14] T. Gruber. Collective knowledge systems: Where the social web meets the semantic web. Journal of Web Semantics, 6(1):4-13, 2008. [15] N. Guarino. The Ontological Level: Revisiting 30 Years of Knowledge Representation. In A. Borgida, V. Chaudhri, P. Giorgini, E. Yu (eds.), Conceptual Modelling: Foundations and Applications, Springer Verlag 2009: 52-67. [16] N. Guarino, D. Oberle, S. Staab. What is an Ontology? In S. Staab and R. Studer (eds.), Handbook on Ontologies, Second Edition. International handbooks on information systems. Springer Verlag: 1-17, 2009. 632 A. Braşoveanu, A. Manolescu, M.N. Spînu [17] G. Guizzardi, T. Halpin. Ontological foundations for conceptual modeling. Applied Ontology 3, 1- 12, 2008. [18] A. Hartholt, T. Russ, D. Traum, E. Hovy, S. Robinson. A common ground for virtual humans: Using an ontology in a natural language oriented virtual human architecture. In: Language Resources and Evaluation Conference (LREC). (May 2008) [19] W. Hu, Y. Qu. Falcon-AO: A practical ontology matching system. Web Semantics: Science, Services and Agents on the World Wide Web 6 (2008) 237-239. [20] A. Jaimez, N. Sebe. Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, Volume 108, Issues 1-2, October-November 2007, 116-134, Special Issue on Vision for Human-Computer Interaction, 2007. [21] F. Limpens, F.Gandon, and M. Buffa. Linking folksonomies and ontologies for supporting knowledge sharing: a state of the art. Technical report, EU Project, ISICIL, 2009. [22] D. Lonsdale, D. W. Embley, Y. Ding, L. Xu, M. Hepp. Reusing Ontologies and Language Components for Ontology Generation, accepted for publication in Data and Knowledge Engi- neering. Retrieved from: http://www.heppnetz.de/files/dke2008.pdf, 2010. [23] P. Mika. Social Networks and The Semantic Web, Springer, 2007 [24] J. Niekrasz and M. Purver. A multimodal discourse ontology for meeting understanding. In The 2nd Joint Workshop on Multimodal Interaction and Related, 2005. [25] L. Nigay, J. Coutaz. A design space for multimodal systems: Concurrent processing and data fusion. ACM Conf. Human Factors in Computing Systems (CHI), 1993. [26] F. S. Parreiras, C. Saathoff, T. Walter, T. Franz, S. Staab. APIs a gogo: Automatic Gen- eration of Ontology APIs. icsc, 342-348, 2009 IEEE International Conference on Semantic Computing, 2009 [27] C. Saathoff, A. Scherp. M3O: The Multimedia Metadata Ontology. Proceedings of the Work- shop on Semantic Multimedia Database Technologies, 10th International Workshop of the Multimedia Metadata Community (SeMuDaTe 2009), Graz, Austria, 2009. [28] Abraham Sebastian, Natalya Fridman Noy, Tania Tudorache, and Mark A. Musen. A generic ontology for collaborative ontology-development workflows. In Aldo Gangemi and Jérôme Eu- zenat, editors, EKAW, volume 5268 of Lecture Notes in Computer Science, 318-328. Springer, 2008. [29] A.A.A. Shareha, M. Rajeswari, D. Ramachandram. Multimodal Integration (Image and Text) Using Ontology Alignment. American Journal of Applied Sciences 6 (6): 1217-1224, 2009. [30] P. Shvaiko, J. Euzenat. Ten challenges for ontology matching. In Proceedings of the 7th In- ternational Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), pages 1164-1182, Monterrey (MX), 2008. [31] W. V. Siricharoen. Ontology Modeling and Object Modeling in Software Engineering. Inter- national Journal of Software Engineering and Its Applications, Vol. 3, No. 1, January, 2009, 43-59, 2009. Generic Multimodal Ontologies for Human-Agent Interaction 633 [32] P. Smart, J. Bao, D. Braines, N. Shadbolt. Development of a Controlled Natural Language Interface for Semantic MediaWiki. In: Proceedings of the Workshop on Controlled Natural Language, Springer- Verlag, Heidelberg, Germany. [33] D. Sonntag, M. Romanelli. A multimodal result ontology for integrated semantic web dialogue applications. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 2006), Genova, Italy, May 24-26. [34] A. Tenschert, M. Assel, A. Cheptsov, G. Gallizo, E. Della Valle, I. Celino. Parallelization and Distribution Techniques for Ontology Matching in Urban Computing Environments. OM 2009 [35] D. Traum, S. Marsella, J. Gratch, J. Lee, and A. Hartholt.. Multi-party, multi-issue, multi- strategy negotiation for multi-modal virtual agents. In Proc. of Intelligent Virtual Agents Conference IVA-2008. [36] M. Uschold. Ontology-Driven Information Systems: Past, Present and Future. In Pro- ceedings of the 5thInternational Conference on Formal Ontology in Information Systems (FOIS2008), Saarbrücken, Germany, (Oct 31st - Nov 3rd), 2008.