Talking with pictures: exploring the possibilities of iconic communication Colin Beardon*, Claire Dormann*, Stuart Mealing** and Masoud Yazdani** * Faculty of Art, Design and Humanities, University of Brighton, Brighton BN2 2YJ **Department of Computer Science, University of Exeter, Exeter EX4 QH Abstract As multimedia computing becomes the order of the day, so there is a greater need to understand and to come to terms with the problems of visual presentation. This paper deals with iconic languages as a means of communicating ideas and concepts without words. Two example systems, developed respectively at the universities of Exeter and Brighton, are described. Both embody basic principles of the iconic communication which,, though not unique to learning technology, is forming an increasingly important part of user-interfaces, including those in the area computer-assisted learning. Introduction Koji Kobayashi looks forward to a time when the telephone system will automatically translate between the languages of users in real time (Kabayashi, 1986). It will then be possible to speak in your native tongue to a Frenchman, a Japanese and a Russian, and for each of them to hear your words spoken in their own language as you speak. In The Hitch-Hiker's Guide to the Galaxy (Adams, 1979) it becomes possible to understand communication in any language by plugging a Babel fish into your ear - a fish with the natural ability to receive speech in any language and translate it into the language of the person wearing it. While these visions are certainly futuristic, even fantastic, the need for people knowing different languages to communicate quickly and easily is rapidly increasing at the same time as the technology for transmitting and receiving information is spreading and becoming more powerful. Attempts to create international languages have not been very successful, partly because of the need for a significant number of people to know them, and partly because they have to be learned like any other new language. Understanding is, however, possible across language 26 ALTJ VOLUME 1NUMBER 1 barriers in at least two ways. First, there are a number of internationally recognized signs, symbols and icons which one can find on the roadside and in airports, railway stations and similar places. They are not only used directly to denote the place where one might find trains or change money, but may also contain a number of 'meta-icons' such as the red diagonal and the arrow. Secondly, it is possible to make oneself understood, at least at some basic level, by means of gestures and mime. People who find themselves in a country where they know little or nothing of the local language can often communicate simple ideas by pointing or indicating their intentions through more performative signs, which may border on acting. Iconic communication is the attempt to build cross-language communication systems1 that completely avoid the use of words and rely solely on pictorial symbols. The system which manipulates these symbols should be as simple to use as possible, though it is clear that it has to be learned in some way. There are various forms of learning which can apply: users can receive some form of instruction (either by a tutor, or by some non-linguistic tutorial system); the elements of the system can contain an explanation of their meaning ('self-explaining icons'); the system can adopt a powerful metaphor; or the user can experiment and learn by trial and error. Icons It is important to say first of all what we mean by an icon. Figure 1 contains various ways of referring to 'men', and they vary from the arbitrary (Figure la) to the pictorially descriptive (Figure Id). Icons tend to exist between these extremes (Figures lb and lc). Figure 1: Different ways of referring to 'men' Words are essentially arbitrary in the way they refer, so there is no alternative but to learn their meaning. Of course words may be morphologically complex (spaceship, for example), but the components themselves are ultimately atomic and have arbitrary referents. With icons, the relationship to the referent is not arbitrary, but neither is it as direct as in the case of pictures. It would be quite wrong to see pictures as an ideal form of icon because there are a number of things they are not good at expressing. For example, it is difficult to draw a 27 Colin Beardon et al. Talking with pictures: iconic communication naturalistic picture of love or of general classes of things, such as the class of all mammals. For the purpose of pictorial language, pictures reveal too much information: the man in Figure l(d) might stand for businessmen, middle-aged men, or any of a number of other interpretations. Quite apart from this, the amount of detail means that differentiation from other symbols is not so easy, and therefore recognition can be considerably slower than with the more stylized forms. While an icon suggests its referent, its form is often insufficient to describe it precisely. What advantage, then, does it give us over words? There are two reasons for which an icon can be advantageous: it can be easier to learn and easier to remember. It can be easier to learn because its appearance at least suggests a set of possible referents and because it is often part of a consistent system of representation which itself will provide a context. For example, the symbol in Figure 2 is given its reference by being placed on a white background in a red triangle on a black and white striped poll beside a road. This tells us immediately that it is a sign intended for drivers and concerns some potential hazard. It can be easier to Figure 2: A hump-backed bridge r e m e m b e r b e c a u s e i l i s a well-known aid to memory to associate the thing to be remembered with a simple object to which it has a defined relationship. For example, to remember the French word for pig (cochon) I may think that it sounds like the English word cushion. I therefore associate the concept of the French for pig with an image of a cushion, and this serves as an aid to memory.2 This illustrates well the fact that an icon does not necessarily have to be directly representational but can indicate its reference by means of a convention, as in Figure l(b). Such conventions are of course also applicable to user-interfaces, including those in CAL programs. Iconic languages It has been suggested that most modern written languages are derived from pictorial languages, but iconic languages seem to be a more recent invention. Some of the more interesting examples are Semantography (Blis, 1965), Isotype (International System Of Typographic Picture Education) (Neurath, 1978) and Worldsign (Jones and Cregan, 1986), a language created for mentally handicapped children which allows dynamic representation. In Semantography (or Bliss symbols), one symbol corresponds to one word in natural language. It was proposed as an auxiliary writing tool for communication between different nations, and as a device to specify relative and vague meanings. The aim of the language is to communicate through simple pictorial symbols, with those representing physical things using outline and those representing non-physical things using geometric symbols. The first 25 symbols are already internationally accepted; they include the digits 0 to 9 and symbols such as a question mark, a full stop and a plus sign. Bliss based his grammar on the assumption that all languages are used to describe the phenomena of our physical world, that the main manifestations of our world can be classified into matter, energy and the mental, and that everything happens in space and time. There is a 28 ALT-J VOLUME 1 NUMBER 1 specific logic behind symbols constructed to represent words. A word like telephone takes its symbolization from the symbols for electric and language (a telephone is an electrical apparatus in which you can talk) and the language symbol is derived from the ear and mouth symbols (both used in conversation). The Isotype picture language does not have a simple correspondence between signs and words. An example given by Neurath is that there is no sign for the word foot that is common to expressions such as the foot of a man, the foot of mountain, and the foot of a table. These expressions are composed of simple signs of a very different sort. Furthermore, the final translation of the 'language picture' is a structured group of statements, and the system of connection between signs is far richer than in linear text. Several rows of connected signs are interpreted simultaneously, whereas one-dimensional text requires readers to bear in mind what they have read and to make connections between dispersed elements for themselves. Neurath's writing suggests two central rules for generating the vocabulary of an international picture language: reduction, for determining the style of individual signs, and consistency, for giving a group of signs the appearance of a coherent system. Then there needs to be a set of conventions to allow the user to know how the information is structured. Neurath's work was directed towards making statistical charts. He introduced two basic rules: the first of these related to the presentation of statistics by means of icons, and held that a icon represents a certain quantity or amount of things and that more signs represent a greater quantity or amount. The second was a general rule that perspective should not be used. His work includes a series of posters for an anti-tuberculosis campaign (non-statistical) and the publication of many books and charts. The Bliss approach is similar to text-based language and susceptible to some of the same pitfalls, but it presents some interesting insights for an international human-human computer system. Isotype in its present form is best suited for statistics and is too restricted for a computer environment but it could be seen as a precursor of an iconic computer language. Computer-based icons The computer provides a significant new environment for the use of icons. Whereas previous systems of icons have been essentially textual, their messages are one-dimensional sequences of icons in which everything is explicit. Each icon has its reference, and icons are placed in sequences (occasionally there are sub- and super-scripts) with blank spaces to indicate grouping. The typical windows environment on a modern computer provides a much richer environment. First, there are fully two dimensions to exploit so that any grammar which exploits relative positioning of icons is not restricted in the way that text-based languages have been (though from a practical point of view there are limits on how much one can present at one time). There is also the possibility of using colour (or greyscale) and animation. Most importantly, an icon in a computer environment needs to be defined both representationally and operationally. In addition to asking what an icon represents (pictorially), one can also ask what happens when the user clicks on it, or double-clicks on it, or clicks-and-drags it to some other part of the screen. Here we are opening up a world in which the sentence 'Don't look for the meaning, look for the use' has a new potency. The 29 Colin Beardon et al. Talking with pictures: iconic communication meaning of an icon is no longer simply what it resembles, but also what happens when you do something with it. This realization gives a dramatic new lease of life to what may appear to have been a marginalized form of communication. The remainder of this paper describes some of the work we have undertaken so far in this area. It is centred around two example systems: a hotel booking system (developed at the University of Exeter) and CD-Icon (developed at Brighton Polytechnic). While the hotel booking example does not relate specifically to learning technology, the principles on which it is based are relevant to any user-interface design, and CD-Icon has obvious similar relevance. Example 1 - Hotel booking system Hotel booking is a typical activity that requires communication across language barriers. It offers us the opportunity to apply iconic language in a simple dialogue between a potential guest and a hotel or city-wide hotel booking facility. In a final system we can envisage a touch-sensitive screen with plenty of interaction, but at this stage of development we are concerned with the initial formulation of a request by the customer that will be transmitted to a hotel for the compilation of a reply. A demonstration system has been built using HyperCard. The compilation of the booking message is accomplished in stages, and at each stage the current context is cued by a picture resident in the background. In sequence these are: a 'typical' hotel front (Figure 3), a 'typical' hotel reception area (Figure 4), and a 'typical' hotel bedroom (Figure 5). Each new screen holds the background picture before the other information is faded in over it. The initial screen shows a hotel overlaid by an appropriate caption, and clicking anywhere on the image starts the booking sequence. The first screen invites the user to indicate the intended destination (the name of a town or a hotel) and grade of hotel, by selecting from cyclable 'star' ratings (Figure 3). Movement to the next screen is initiated by clicking on the 'tick' icon, a convention used throughout the package. The second screen (Figure 4) shows a hotel reception area and invites selection of the dates and times of arrival and departure. The number of nights that are implied by these dates and times is indicated by black bars which appear (and disappear) as each night is added (or removed). Once again, the 'tick' icon moves the user to the next screen. The third screen (Figure 5) shows a room overlaid with icons permitting the selection of rooms and their required facilities. A room is shown as a white rectangle. One room is shown initially and the number of rooms can be altered by clicking on the ' + ' and ' - ' icons. Four icons at the top right of the screen each unlock further related icons to enable the selection, for each room, of: (a) the number and type of occupants, (b) the number and type of beds, (c) the type of bathroom facilities, and (d) the range of other facilities required. The various features are selected by clicking on an icon which causes a clone to be produced beside it, which is then dragged into the relevant room. The 'tick' icon moves the user to screen 4 (Figure 6) which displays the complete booking requirement. If this is satisfactory a further 'tick' sends the message to the hotel. 30 ALTJ VOLUME 1 NUMBER 1 Figure 3: Screen inviting input of destination Figure 4: Screen inviting selection of dates and times of arrival and departure 31 Colin Beardon et al. Talking with pictures: iconic communication Figure 5: Screen allowing selection of room types and facilities Figure 6: Screen displaying complete booking requirements 32 ALT-J VOLUME 1 NUMBER 1 Figure 7: Screen showing availability of 'star' ratings The message is revealed to the hotel in stages. Confirmation of the acceptability of each part of the message (tick) moves on to the next part of the message, whilst unavailability (cross) brings up a range of possible alternatives. Figure 7 shows that the required dates of stay are acceptable but the requested hotel grade is unavailable. The hotel is therefore presented with four options to propose in reply. Once the entire message has been processed by the hotel in this way, the final message is sent back to the customer who will be able to accept or reject the alternatives offered, continue the dialogue, and confirm a booking. The application, as it stands, does not pretend to be either comprehensive or the most practical solution in real terms, but is an initial attempt to create a simple, interactive, iconic dialogue using hotel booking as a convenient theme. It does, however, offer much that could be used in a real system, and serves its purpose in starting to explore the possibility of communicating with icons. Example 2 - the CD-Icon language The CD-Icon system is an attempt to build an iconic communication system based on the principles that underlie natural-language processing systems. The standard way of specifying semantics in such systems is to assume some other system (a Meaning Representation Language, or MRL) for which the semantics are already known (Woods, 1978). The task then becomes that of expressing rules by means of which statements in natural language are transformed into statements into the MRL, and vice versa. Schank's Conceptual Dependency representation (Schank, 1973) is an MRL in this sense, and its own semantics are either taken to be intuitive, or are established by their success in various practical projects, for example MARGIE (Schank, 1975) and PAM (Wilensky, 1981). CD-Icon is a means of testing the validity of Conceptual Dependency directly by making it the basis of a communication system that uses only icons and no words. A message is composed by selecting options from a series of interconnected screens (in the spirit of 33 Colin Beardon et al. Talking with pictures: iconic communication systemic grammar). The message is then transmitted, also as a set of interconnected screens, but not showing options that have not been selected. We will illustrate the system by composing the message equivalent to 'The big man went home'. Figure 8 shows the message expressed in Schank's CD formalism. PTRANS * ^ man Figure 8 Schank's CD representation of 'The big man went home' In CD-Icon a message is composed in four stages. The first stage is concerned with what Schank calls 'conceptual relations' and 'conceptual tenses'. A screen is presented (Figure 9) which enables the user to select an assertion, a question or an imperative, between simple and compound messages, and to decide upon negation. If the message is compound, the nature of the relationship between the two component conceptualizations is chosen (logical AND, logical OR, implication, temporal or spatial). Clicking on an icon for a conceptualization transfers control to stage 2 which is typically concerned with an event. Events, according to Schank, are based around a primitive 'act' so the first selection is between icons representing various primitive 'acts' (Figure 10). To assist the user, a Help facility presents a short animated explanation, in the manner of Mealing and Yazdani (1990). Having selected the appropriate 'act', the corresponding screen is presented. In this example it is the PTRANS screen which is shown in Figure 11. It contains a basic background for PTRANS with grey icons representing the object, origin, destination and instrument cases, as well as tense. (There is a divergence from Schank here in that we do not use the agent case.) Grey icons denote options, whereas black-and-white icons represent selections. A grey house, for example, represents the class of places, the grey question mark represents the class of objects, the grey clock represents the class of times, and the grey spade represents the class of instrument cases. Clicking on any one of these icons (except the clock or spade) will result in transfer to stage 3 which is concerned with the production of what Schank refers to as a 'picture'. The 'picture' 34 ALTJ VOLUME 1 NUMBER 1 Figure 9: Screen for selecting a simple assertive sentence Figure 10: Selecting a primitive 'act' 35 Colin Beardon et aL Talking with pictures: iconic communication Figure 11: The screen for PTRANS Figure 12: Picture screen for 'big man' 36 ALTJ VOLUME 1 NUMBER 1 Figure 13: A screen from the lexicon Figure 14: The four screens representing the complete message 37 Colin Beardon et al Talking with pictures: iconic communication screen initially contains only the head icon (the icon that was clicked on at stage 2). This will eventually be defined along with any modifiers to produce the screen in Figure 12. Clicking on the 'head' icon will result in transfer to stage 4 which is the lexicon, or rather that part of the lexicon which contains objects of the type specified by the grey icon (see Figure 13). The user selects an appropriate icon, in this case the one for 'man', and control is returned to the 'picture' screen (stage 3). There are two changes however: the selected icon replaces the grey icon, and those classes of object which normally modify the 'head' icon are shown in grey. The user can click on any of these modifying icons and be taken to the appropriate part of the lexicon to select the precise colour, size, location, etc. In our example, there is one modifier, 'big', and the outcome of stage 3 is shown in Figure 12. When the user is satisfied, the tick box is clicked and the grey icons are deleted. Control is passed back to the PTRANS screen (stage 2) with an icon composed of the head icon from the picture plus an asterisk in the top right corner. If this new icon is selected in Help mode, the full iconic representation will be displayed. The process is repeated for all icons which the user decides to specify. Temporal reference is established with respect to an imported clock icon representing 'now' (set at 6 o'clock). A clock internal to the 'act' can be set at 'past', 'present' or 'future' (3, 6, or 9 o'clock). In our example, the past tense is used. The instrument case is handled differently. In Schank's system the instrument case never points to an object, but always to a conceptualization. In CD-Icon, control will be passed recursively to stage 2 to devise a new conceptualization, which will be represented by a new icon (a black spade with an asterisk) by means of which the instrument case can be made explicit. When the user is satisfied, clicking on the tick box will return control to the message-level screen (stage 1) with an icon for PTRANS plus an asterisk in the top right corner. The final, message will be represented by the four interconnected screens shown in Figure 14. At present the system is being used to explore the validity of MRLs and the possibility of unrestricted communication by icons. It is hoped to soon have a system that will allow users to try to compose and understand simple messages, at which point empirical testing will take place. Future directions These two projects have already raised some important issues. We need to distinguish different communicative tasks, for example to distinguish between an iconic system that serves as a front-end to existing computer software and an iconic system for person-to-person communication. This distinction seems to mirror the distinction between systems that have a known MRL and those that do not. The systems also raise the possibility of a complete escape from linguistic forms. At present there is a tendency to explain them with reference to linguistic examples — that is to say, one tries to compose a message that corresponds to a sentence which has already been formulated in English. The intention is, however, to escape from this and view the systems as communication channels in their own right. The output should not be verbalized, except 38 ALT-J VOLUME 1 NUMBER 1 perhaps in some indirect way in order to test the degree to which communication has taken place. This raises the question of the form of the communication itself. In the hotel booking system it is a single screen, whereas in CD-Icon it is a set of connected screens. The question is being considered, particularly when dealing with larger texts, of whether an animated screen may not be more appropriate. In the case of our second example, the 'man' icon could appear moving to a house. This would be no ordinary animation, for the actors would be iconic and the animation may be stopped at any point and icons selected to reveal more information about themselves. Notes 1. Cross-language communication means communication irrespective of the language spoken by the participants. It is distinguished from cross-cultural communication which raises a number of specific problems, and while we can see how cross-cultural issues might be addressed, to date little research has been carried out in the area. 2. This system of language learning has been commercially exploited (Gruneberg 1987-1992, Gruneberg and Jacobs, 1991). References Adams, D. (1979), The Hitch-Hiker's Guide to the Galaxy, London, Pan Books. Bliss, C.K. (1965), Semantography, Australia, Semantography Publications. Gruneberg, M. (1987-1992); Gruneberg, M. and Jacobs, G. (1992), Linkword Language System (various languages), London, Corgi Books. Gruneberg, M. and Jacobs, G. (1991), 'In defence of Linkword', Language Learning Journal, 3, 25-29. Kobayashi, K. (1986), Computers and Communications, Cambridge (Mass), MIT Press. Mealing, S. (1992), 'Talking pictures', Intelligent Tutoring Media, 2, 2, 63-69. Mealing, S. and Yazdani, M. (1992), 'A computer-based iconic language', Intelligent Tutoring Media, 1, 3, 133-36. Neurath, O. (1978), International Picture Language, University of Reading, Department of Typography and Graphic Communication. Schank. R.C. (1973), 'Identification of conceptualisations underlying natural language' in Schank, R.C. and Colby, M.C. (eds), Computer Models of Thought and Language, San Francisco, Freeman, 187-247. Schank, R.C. (1975), Conceptual Information Processing, New York, North-Holland. Wilensky, R. (1981), 'PAM' in Schank. R. and Reisbeck, C. (eds), Inside Computer Understanding, New Jersey, Lawrence Erlbaum, 136-179. Woods, W. (1978), 'Semantics and quantification in natural language question answering' in Yovits, M. (ed), Advances in Computers (17, 2-64), New York, Academic Press. 39