ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2011 04 (03) :50 55 towards designing a routing protocol for opportunistic networks thabotharan kathiravelu, nalin ranasinghe and arnold pears  abstract — intermittently connected opportunistic networks experience frequent disconnections and shorter contact durations. therefore routing of messages towards their destinations needs to be handled from various points of view. predictability and connectedness are two features which can be determined by participating mobile nodes of an opportunistic content distribution network using their past contacts. epidemic or probabilistic routing protocols such as prophet do not fully utilize these features to route messages towards their destinations. in this paper we describe the design, implementation, experiment set up and the performance validation of a new, adaptive routing protocol which utilizes the predictability and connectedness features to route messages efficiently. simulation based comparative studies show that the proposed routing protocol outperforms existing epidemic and probabilistic routing protocols in delivering messages. index terms — opportunistic networks, adaptive routing protocol, prophet routing protocol, epidemic routing protocol i. introduction intermittently connected opportunistic networks operate in scenarios where wireless mobile devices establish pair wise contacts and exchange content of interest [3, 11, 12]. routing and forwarding of messages towards their destinations in this type of networking is a challenging task due to the uncertainty of node mobility and frequent disconnections between node pairs [4, 16]. there exist few routing protocols that try to maximize the utilization of contact opportunities and forward messages towards their destination nodes. these approaches vary from flooding the network with redundant messages [17] to forwarding and routing copies of messages through probabilistically determined paths [13]. all of these approaches have their own pros and cons and there is a real need to route packets towards destinations in an optimized way that utilizes the minimum resources available in the nodes and maximizes the chances of successful delivery of messages. our studies manuscript received march 01, 2011. recommended by dr. abhaya induruwa on october 13, 2011. thabotharan kathiravelu was with uppsala university, box 337,751 05 uppsala, sweden. (e-mail: thabo@it.uu.se) nalin ranasinghe is with university of colombo school of computing, colombo 07, sri lanka. (e-mail: dnr@ucsc.cmb.ac.lk) arnold pears is with uppsala university, box 337,751 05 uppsala, sweden. (e-mail: arnold.pears@it.uu.se) [10] indicate that there exists a need to address the shortcomings of existing protocols and to propose newer protocols that can better fit in this type of networking environments. in this paper, we describe our proposal for a new adaptive probabilistic routing protocol that utilizes each node’s predictability and connectedness [10] information to route messages towards their destinations. nodes determine their predictability and connectedness with their neighbors by applying probabilistic estimations on their past contact history with such nodes. nodes then propagate their predictability and connectedness of contacts information with their neighbors and this information is utilized by the neighboring nodes to choose the best forwarder node for their messages when they generate traffic or when they store, carry and forward messages for their neighbors. since our approach uses the predictability and connectedness of node contacts and uses this information to determine the best contact opportunities and the best message forwarders in order to forward messages towards their destinations, our approach is different from other approaches, especially from the probabilistic approach mentioned in [13]. our approach makes the best use of information available in the nodes and therefore maximizes the chances of message delivery. contact predictability, connectedness and contact quality information are maintained in each node in the form of contact history. based on the contact history, nodes make contact predictions about future meetings with their peer nodes with given levels of certainty. these simple predictions drive the opportunistic forwarding mechanism: they are used by nodes in selecting the best node to forward messages for others. nodes only forward messages when they opportunistically meet the best forwarder node for that message. for the validation of our adaptive probabilistic routing protocol we use connectivity models [9] to regenerate pair wise node connectivity with given levels of confidence of predictability and connectedness of contacts within the given time intervals using the extracted probabilistic properties of real field trace set [12]. we then use simulation based experimental studies to model node connectivity and traffic generation and then collect experimental results. our experimental evaluations show that our adaptive routing thabotharan kathiravelu, nalin ranasinghe and arnold pears 51 december 2011 international journal on advances in ict for emerging regions 04 protocol outperforms the other two protocols with various performance indicators. ii. background proposing and designing routing protocols has received enormous attention from the mobile ad hoc networks (manet) research community. traditional manet routing protocols try to find an end to end path to route packets form source node to destination node and use proactive and reactive routing techniques to establish paths before forwarding packets towards their destinations. the application of routing protocols such as ad-hoc on demand distance vector routing protocol [15] and dynamic source routing protocol [8] which assume the availability of an end to end path between the source node and the destination node do not perform well in the presence of intermittent connectivity. the epidemic routing protocol is one of the earliest protocols for routing in dtns [17]. the basic idea is very simple: source nodes of messages and intermediate nodes flood messages to all their neighbors to mitigate the effects of a single path failure, so that, eventually the message may arrive at the destination node. messages are quickly distributed through the neighborhood, but significant resources from network and nodes may be wasted in this process. this approach can achieve high delivery ratios, and does not need a previous knowledge of the network [17]. the prophet protocol [13] estimates the delivery probability based on the history of encounters. a metric called delivery predictability p(a,b) ϵ [0,1], is calculated for every node a for each known destination b. suppose that a node has a message m, for the destination d. when a contact occurs between a pair of nodes a and b, node a forwards the message m to node b only if b has a greater delivery predictability to the destination d, that is p(a,d) ≤ p(b,d). during the contact, in addition to the exchange of messages the delivery predictability for each node can be updated. the delivery predictability calculation is divided in to three parts. when two nodes meet each other, they immediately update the delivery predictability as shown below: p(a,b) = p(a,b)old + (1 − p(a,b)old) ∗ pinit (1) where pinit is an initialization constant. if, for a period of time, a pair of nodes does not encounter each other, then the delivery predictability metric is updated by the nodes as shown below: p(a,b) = p(a,b)old ∗ γ k (2) where γ is an aging constant and k is the number of the time slots elapsed since the metric was updated for the last time [13]. in pure probabilistic forwarding approaches, when a node has a packet to forward, it chooses a forwarder node independently based on some probabilistic measure and forwards the packet towards that node. this approach does not consider other factors that could influence the forwarding decision and the nodes too do not cooperate with each other in order for the nodes to choose the best forwarder node for any given packet towards its destination. this makes intermittently connected environment more prone to losses during routing and forwarding and increases the chances of network being uncertain of the forwarding possibilities of packets. with epidemic routing it is not possible to guarantee reliable delivery of all messages due to collisions of packets etc., even if most nodes try to forward packets as much as possible. in addition, epidemic routing protocol unnecessarily floods the network with redundant packets. recently few attempts, which try to reduce the problems caused by flooding by means of different forms of controlled flooding etc., to control the number of times a message can be forwarded have been proposed by researchers [7]. message ferries are introduced in a routing plan by [19], where the message ferries travel on a trajectory to provide communication services. in this routing plan two different approaches are employed to bring the nodes and the ferries closer to each other. on one hand the message ferries can choose a trajectory to contact nodes and on the other, nodes themselves can move near to a pre-defined trajectory at a certain time to exchange packets with message ferries. recent work in opportunistic routing environments use erasure coding based techniques to balance message replication. in erasure coding based techniques messages are decomposed into smaller units of packets with redundancy, so that the original messages can be reconstructed even with the reception of a subset of the smaller units [5, 18]. in one of our previous studies we modeled the opportunistic network to possess two high level properties of predictability and connectedness [10]. by the application of predictability and connectedness information based on their past history, nodes can predict their future contact opportunities with certain level of confidence. please refer [10] for the definition of predictability and connectedness information in opportunistic networks. in order to measure and analyze the applicability of predictability and connectedness information on opportunistically connected nodes, we have chosen an industrial command and control networking system where nodes generate a heterogeneous mix of data traffic throughout the operational time which has different bandwidth and forwarding requirements [10]. the application of a traditional ad-hoc routing protocol in modeling the system with the two high level properties and measuring the performance metrics reveals that the traditional ad-hoc network routing protocols operate memory less and do not utilize the underlying predictability 52 thabotharan kathiravelu, nalin ranasinghe and arnold pears international journal on advances in ict for emerging regions 04 december 2011 destination node number first choice connection est. time connection end time 1 4 t1s t1e 2 3 t2s t2e 4 4 t3s t3e 9 9 t4s t4e 10 2 t5s t5e and connectedness information. another reason is that because of frequent network partitions in the opportunistic networking environments many traditional routing techniques for manets will not work properly [2, 6]. please refer [10] for more descriptive details on the simulation setup, simulation parameters and other in depth description of the system of this experimental study. lessons learned from previous studies and other works in the field have led the authors to design more effective routing schemes for opportunistic networks. in this work we present an adaptive, proactive routing protocol for opportunistic networking which can use the history of past contacts and utilize that information to determine the predictability and connectedness of future contacts. each node uses these two information to choose the best forwarder node for data packets it possesses and forwards those packets to such selected nodes with high chances of delivery. iii. the adaptive protocol the adaptive routing protocol makes use of past history of opportunistic contacts to establish and exchange the neighborhood information in order to forward packets towards their destinations. our basic idea for this new protocol is found on the following principle: since each node has the ability to determine its predictability and connectedness information with its neighbors using its past history of contacts, nodes need not have to depend on any random exchanges and random probabilistic estimations to forward packets to their destinations. instead, they can determine their expected future contacts with their neighbors with given levels of confidence, maintain this information in their connectivity tables and then propagate this information with whoever they meet including newer and older neighbors in the form of connectivity summary vectors. a snap shot of the connectivity table, which is maintained by each node is shown in table i. table 1 a snapshot of a connectivity table maintained by a node each row of the connectivity table, arranged in the increasing order of neighboring node numbers, contains the detail of the best forwarder node towards that node, next expected contact establishment time with the best forwarder node and the expected termination time of with this best forwarder node. if a node has never met another node in the past or have no chances of meeting a particular node, then it maintains the information of the best node which might meet that particular node. by doing this, each node maintains a global view of the network connectivity even though it will not meet a given node in the future. choosing the best node that can forward packets to its best neighboring nodes increases the chances of delivering packets towards their destination. during opportunistic contacts, nodes first exchange their predictability and connectedness information of their intended neighbor contacts from their connectivity table as a connectivity summary vector. upon receiving the connectivity summary vector from its neighbor, each node compares its own connectivity table against the received connectivity summary vector and updates its connectivity table information, so that its own connectivity table contains the latest information on best forwarders for future packet forwarding. a. neighborhood establishment and message forwarding algorithm when each node receives predictability information from its neighbors in the form of a connectivity summary vector, it first updates its own connectivity table. for each destination, the connectivity table can also contain three best nodes that have the higher predictability value of meeting or forwarding to that destination node. the basic adaptive message forwarding algorithms executed by each node are given in the algorithms described in algorithm 1, algorithm 2 and algorithm 3. algorithm 1 send (connectivity summary table) 1: arrange predictability information in the minimum data structure. 2: exchange connectivity summary table with met neighbours. 3: wait for the next time interval to send if there is an update to the connectivity summary table. algorithm 2 receive (connectivity summary table) 1: compare connectivity summary table with own connectivity table. 2: if any of the received predicted contacts will occur before any of the stored contacts then, 3: update that forwarding information with the newly received contact predictability. algorithm 3 receive (data packets) 1: check if the node itself is the intended destination of the packet. 2: if so then forward it to the upper layers. thabotharan kathiravelu, nalin ranasinghe and arnold pears 53 december 2011 international journal on advances in ict for emerging regions 04 3: if the packet is destined for another node, then do a look up at the connectivity table for the best node to forward, store it in the message queue and, 4: wait for a contact opportunity to occur with that node and forward the packet towards that node 5: if the node itself is the best node to forward the packet to the destination then store the packet in the message queue and wait till a contact occurs with the destination. b. message exchange once a node pair establishes an opportunistic contact session, they first update their connectivity tables by mutually exchanging their connectivity summary vector. the basic steps in this process are: (1). exchange of the connectivity summary vector, (2) updating of their connectivity tables according to the received connectivity summary vector from the other node, (3) decide on which messages to forward to the other node, (4) forward the messages to the other node and also receive messages from the other node. we illustrate these steps in figure 1. fig. 1. nodes exchange connectivity summary vectors periodically with each other. during an opportunistic contact session, when a node receives a packet from another node and if the packet is destined for this node itself, then it passes the packet to the upper layer for processing. if that is not the case, then it will check for the best forwarder node for that packet. if the node itself is the best forwarder node, then it will store the packet in its own queue and will wait for an opportunistic contact session with that destination node to happen. if some other node is the best forwarder node then this node will store the packet in its own queue and will wait for an opportunistic contact session to happen with that best forwarder node. fig. 2. the arrangement of 12 nodes into 3 clusters and their wireless contacts [10]. iv. experimental setup we use simulation based experimental studies to compare the performance variations between our adaptive protocol, the prophet routing protocol [13] and the epidemic routing protocol [17]. we use the jist/swans discrete event driven simulator [1] to model these protocols and carry out our experiments and collect statistics. for our simulation based studies, we choose a scenario set up similar to the industrial command and control system as shown in figure 2 and consider three clusters of nodes, where each cluster consists of four nodes. in this regard, we simulate the traffic generation according to the traffic requirements given in [10], and implement connectivity modeling, as described in [9], to model and simulate the intermittent connectivity in the network. since it has been observed that in typical rescue scenarios the rescue team members are expected to make regular contacts for every 3 to 5 minutes. in all our experimental studies we select the contact and inter-contact durations to vary in the time interval of 3 to 5 minutes and carryout our simulation based experiments. with the chosen time interval of 3 to 5 minutes, we considered two test cases for the adaptive protocol. in the first test case the predictability value is kept at 90% and the connectedness value is kept at 50%. in the second test case the predictability value is kept at 50% and the connectedness value is kept at 50%. for the prophet routing protocol, we consider the experiment set up parameters described in [14], and for the epidemic routing protocol we model the protocol as described in [17]. 54 thabotharan kathiravelu, nalin ranasinghe and arnold pears international journal on advances in ict for emerging regions 04 december 2011 a. queuing and forwarding policies an important resource in small mobile devices is the buffer space available. in the presence of intermittent connectivity and store, carry and forward paradigms, devices carry messages for their neighbors until they find a suitable forwarder or even until the ultimate destination is found. in addition to this, nodes themselves too generate traffic of data periodically with the hope that they will encounter a potential neighbor who can forward these packets to their destination. since the buffer space can easily add up due to the frequent disconnections and longer inter-contact time intervals, buffer space should be maintained by adapting suitable buffer maintenance policies. devices may purge messages that are destined for their neighbors which they do not expect to meet very soon. for our experimental studies we considered the following two queuing policies: • default (nopo) in this queuing policy, when the buffer is full, all future packets are simply dropped till there is any space to accommodate any arriving packets. • mofo (drop the most forwarded message first) in this queuing policy, the message that has been forwarded the most will be dropped when there is a congestion [14]. • shli (drop the shortest life time first) this policy tries to drop the packet that has the shortest life time which is specified in the message [14]. when nodes meet each other in an intermittently connected fashion, they need to maximize the chances of forwarding packets from their buffers. therefore they select an optimal node to forward the packet and have to decide which packets to forward towards the encountered node. if there are messages that are destined for the encountered node, then those messages are first forwarded to the encountered node irrespective of the forwarding policy. in our experiments we use the greater predictability forwarding policy and in this forwarding policy, a message is forwarded to the other node if the contact predictability of this node is lesser than the encountered node for the given destination and exchanging summary vectors at the beginning of an opportunistic contact session helps each node to determine this information [14]. b. performance indicators to measure system performance under different connectedness and predictability constraints and correctly identify system dependability for delivering content towards the intended recipients, we define and use the following performance indicators: • number of messages delivered: the number of messages that have been received at the destination. calculating this value leads to the estimation of the message delivery percentage. • queuing policies: the three different queuing policies described above have been chosen and the number of messages delivered by the protocols under these queuing policies is also considered. • larger queue size: extending the queue size to some larger value to observe the number of messages delivered. v. case studies a. study1 in this study for each of the queuing policies mentioned above we have considered queue sizes of 20, 40, 60, 80 and 100 messages in each of the simulation run. with the considered simulation set up parameters we ran each of our simulation tests for a time period of six hours and have collected packet delivery statistics. results and discussions: in the sub figures 3(a), 3(b) and 3(c) of figure 3 we plot the message delivery count for each of the queuing policy separately. first of all, a general observation can be made from these three graphs with different queuing policies. it is obvious to note that the adaptive protocol outperforms the epidemic and prophet routing protocols in message delivery count in all cases with increasing queue sizes. this confirms our hypotheses of using the past history information in order to select the best future forwarder and thus achieves highest message delivery. for the two cases of the adaptive protocol mentioned previously, the test case with the predictability value of 90% and the connectedness value of 50% performs better than the case with the predictability value of 50% and the connectedness value of 50% as we expected. since the confidence level is higher in the first case, it is obvious that it achieves a higher value of message delivery count. surprisingly under the same testing conditions epidemic routing protocol performs better than the prophet routing protocol. the number of messages delivered by the adaptive and the epidemic protocols are considerably high when compared to the prophet protocol. among the considered queuing policies, mofo policy shows the best performance for the queue sizes considered when compared to the other two queuing policies. since mofo policy drops the most forwarded messages from its queue when there is congestion for buffer space, it ensures that the least forwarded messages get their chance to be forwarded and hence achieves the increasing number of messages delivered. when looked more closely, the just drop (nopo) queuing policy performs equally with the shli policy. this is interesting to observe that since shli considers the time to live values of messages and even then could not achieve a better performance when compared to the just drop policy. in order to analyze the effect of shli policy on different traffic types we considered the message delivery counts of type1, which is assumed to have the smallest time to live (ttl) value, and the type5, which is assumed to have the largest ttl value. figure 4 shows the results of this case thabotharan kathiravelu, nalin ranasinghe and arnold pears 55 december 2011 international journal on advances in ict for emerging regions 04 study. here we can observe that the larger ttl value of type5 enables the prophet protocol to show a performance increase with the increase of queue size, while the epidemic routing protocol and the two cases of the adaptive protocol show a similar behavior with the increase of ttl values. this clearly shows that our adaptive protocol not only achieves higher performance when compared to the other two protocols but also tries to deliver all different traffic types as quick as possible. b. study2 having looked at the impact of increasing queue sizes and the grater predictability forwarding policy in the presence of varying connectedness and the predictability combinations, we are now interested in finding the effect of increasing the queue size to a very large number for all three routing protocols. for this case study we have considered the queue size of 500 messages since it provides an extremely large buffer space for messages and will allow the routing protocols to cache as much massages as possible. results and discussion: the three dimensional figure 5 shows the message delivery count for this test with varying predictability and connectedness values along the x and the y axes respectively. in this figure we can clearly see that for all three protocols as the level of confidence increases along the x and the y axes, the number of successfully delivered messages also steadily increases. as it has already been observed in case study 1, the prophet protocol comparably shows the worst performance and the epidemic protocol shows the second worst performance. our adaptive protocol shows the best performance compared to the other two. of course it is a 56 thabotharan kathiravelu, nalin ranasinghe and arnold pears international journal on advances in ict for emerging regions 04 december 2011 good sign that there is a tremendous increase in message delivery by increasing the queue size to such a larger value. but having a queue size of 500 messages is too expensive for small devices used in opportunistic networking and is not practically feasible at the moment. since more and more extra storage space being added to such small devices because of the technological innovation and the price drop, allocating larger spaces for queues will soon be a reality. vi. conclusions and future work in this paper we have described the design principles of a new adaptive protocol for opportunistic networking that can utilize the high level properties of opportunistic networking. we have implemented the adaptive protocol in the jist/swans simulator and have measured its performance under different resource constrained scenarios. we have also implemented two well known protocols in the field and have measured the performance of them under the same testing environments. by considering various queuing policies for packet buffering and the greatest predictability forwarding policy we have shown that the new protocol outperforms the other two well known protocols currently used and the new protocol makes the best effort in delivering the maximum number of messages. we have also found that the mofo queuing policy combined with the greater predictability forwarding policy gives the maximum number of message delivery. our experiments indicate the need to identify and utilize system resources in a best way to maximize the system performance. we have also observed that increasing the queue size to some larger value also increases the message delivery ratio extensively. in this case too, our adaptive routing protocol outperforms the other two protocols. even though there are serious concerns about allocating such higher buffer spaces for queues, in the near future having such larger queue sizes will be a reality. one of our priorities in the list of future works is to investigate how nodes could use the adaptive protocol to avoid congestion at most centric nodes. we are also planning to put more effort in investigating how the protocol could be modified to consider messages with various priorities, sizes and other constraints. references [1] r. barr, z. j. haas, and r. van renesse. jist: an efficient approach to simulation using virtual machines. software practice & experience, 35(6):539–576, may 2005. [2] s. burleigh, a. hooke, l. torgerson, k. fall, v. cerf, b. durst, k. scott, and h. weiss. delay-tolerant networking: an approach to interplanetary internet. ieee communications magazine, july 2003. [3] a. chaintreau, p. hui, j. crowcroft, r. g. c. diot, and j. scott. pocket switched networks: real-world mobility and its consequences for opportunistic forwarding. technical report, ucam-cl-tr-617, university of cambridge, computer lab, february 2005. [4] a. chaintreau, p. hui, j. c. c. diot, r. gass, and j. scott. impact of human mobility on the design of opportunistic forwarding algorithms. in proceedings of the ieee infocom 2006, pages 1–13, barcelona, spain, april 23-29 2006. [5] l.-j. chen, c.-h. yu, t. sun, y.-c. chen, and h.-h. chu. a hybrid routing approach for opportunistic networks. in proceedings of the 2006 sigcomm workshop on challenged networks, chants ’06, pages 213–220, new york, ny, usa, 2006. [6] k. fall. a delay-tolerant network architecture for challenged internets. in proceedings of the sigcomm’03, august 2003. [7] k. harras and k. almeroth. controlled flooding in disconnected sparse mobile networks. wireless communications and mobile computing (wcmc) journal, 9(1):21–23, january 2009. [8] d. b. johnson and d. a. maltz. dynamic source routing in ad hoc wireless networks. in mobile computing, pages 153–181. kluwer academic publishers, 1996. [9] t. kathiravelu, a. pears, and n. ranasinghe. connectivity models: a new approach to modelling connectivity in opportunistic networks. in proceedings of the 8th international information technology conference iitc2006, colombo, sri lanka, october 12-13 2006. [10] t. kathiravelu, a. pears, and n. ranasinghe. evaluation of the impact of opportunistic networking on command and control system performance. in proceedings of the next generation wireless networks (ngws 2009) (http://www.jfn.ac.lk/faculties/science/depts/compsc/ngws09.pdf), melbourne, australia., october 12-13 2009. [11] t. kathiravelu and a. n. pears. what & when: distributing content in opportunistic networks. in proceedings of the international conference on wireless and mobile computing (icwmc 2006), bucharest,romania, july 2006. [12] j. leguay, a. lindgren, j. scott, t. friedman, and j. crowcroft. opportunistic content distribution in an urban setting. in proceedings of the acm sigcomm 2006 workshop on challenged networks (chants’06), pisa, italy, september 2006. [13] a. lindgren, a. doria, and o. schelen. probabilistic routing in intermittently connected networks. in proceedings of the fourth acm international symposium on mobile ad hoc networking and computing (mobihoc 2003), june 2003. [14] a. lindgren and k. s. phanse. evaluation of queuing policies and forwarding strategies for routing in intermittently connected networks. in proceedings of the first international conference on communication system software and middleware (comsware2006), january 2006. [15] c. e. perkins and e. royer. ad-hoc on-demand distance vector routing. in proceedings of the 2nd ieee workshop on mobile computing systems and applications (wmcsa), pages 90–100, 1999. [16] l. song and d. f. kotz. evaluating opportunistic routing protocols with large realistic contact traces. in proceedings of the chants ’07 workshop, montreal, quebec, canada., september 2007. [17] a. vahdad and d. becker. epidemic routing for partially connected ad hoc networks. technical report, duke university, april 2000. [18] y. wang, s. jain, m. martonosi, and k. fall. erasure coding based routing for opportunistic networks. in proceedings of the 2005 acm sigcomm workshop on delay-tolerant networking (wdtn ’05), 2005. [19] w. zhao, m. ammar, and e. zegura. a message ferrying approach for data delivery in sparse mobile ad-hoc networks. in proceedings of the 5th acm international symposium on mobile ad hoc networking and computing, mobihoc ’04, pages 187–198, new york, ny, usa, 2004. http://www.jfn.ac.lk/faculties/science/depts/compsc/ngws09.pdf ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2021 14 (3): international journal on advances in ict for emerging regions july 2021 neural machine translation approach for singlish to english translation dinidu sandaruwan#1, sagara sumathipala2, subha fernando3 abstract— comprehension of “singlish” (an alternative writing system for sinhala language) texts by a machine had been a requirement for a long period. it has been a choice of many sri lankan’s writing style in casual conversations such as small talks, chats and social media comments. finding a method to translate singlish to sinhala or english has been tried for a couple of years by the research community in sri lanka and many of the attempts were tried based on statistical language translation approaches due to the challenge of finding a large dataset to use deep learning approaches. this research addresses the challenge of preparing a data set to evaluate deep learning approach’s performance for the machine translation activity for singlish to english language translation and to evaluate seq2seq neural machine translation (nmt) model. the proposed seq2seq model is purely based on the attention mechanism, as it has been used to improve nmt by selectively focusing on parts of the source sentence during translation. the proposed approach can achieve 24.13 bleu score on singlishenglish by seeing ~0.26 m parallel sentence pairs with 50 k+ word vocabulary. keywords— singlish, nmt, language processing, seq2seq, lstm, attention model, word embedding i. introduction most of the languages use their own alphabet for writing, and we call it as a writing system for any language. we have seen latin (roman) script is being used to write many modern-day languages. it is the most popular writing system in the world today. it can be observed that this particular writing method has become an alternative writing system, especially in social media for some of the languages such as hindi, tamil, urdu, serbian and bosnian. many researchers are currently working on building models to analyze alternative writing systems as it has been trending in social media [1], [2]. in the sri lankan context, we have also seen that people tend to write sinhala in latin script (english alphabet) most of the times, and when they communicate with natives, and they call it “singlish”. the most significant issue of using singlish is the unavailability of a standard way to write singlish. in most of the occasions people use their own choice of ways to write singlish. but it can be observed that the singlish is a way of writing the sinhala pronunciation with english alphabet. the motivation for this research comes with the inability to interpret the texts written with alternative writing systems like singlish in certain circumstances. for example, many social media platforms give you an option to translate the texts written in different languages to english if you do not understand the original written language. currently, there is no option available to translate something written in an codemixed languages such as singlish, tanglish as those writing patterns are not recognized as standard languages. on the other hand, especially in the countries in which this type of writing systems is popular, struggle to analyze social media data as there are no language models implemented. ii. challenges • code mixed nature most significantly, when looking at a text written in singlish, it can be observed that a mixture of english and sinhala words are included in the text. furthermore, sometimes a singlish word could be an english word already existing in its vocabulary with a different meaning. one of the challenges is to differentiate the words by language. most of the times, it can be determined based on the context of the sentence and the position of it within the sentence. • diversity of writing the singlish writing pattern can be changed from person to person, based on how they spell sinhala pronunciation in english alphabet. that is also a challenge that needs attention to resolve. • lack of availability of resources there is no publicly available parallel dataset to be used in a deep learning approach as of now for singlish and english. it is an important requirement to develop a web crawler and additional supportive scripts to create a parallel dataset for training and testing. and also it is important to evaluate a language translation model for the translation activity for texts written in singlish. even though this has been tried for a couple of years, many of the attempts were based on statistical language translation approaches due to the challenge of finding a large dataset to use deep learning approaches [3]. iii. traditional machine translations machine translation is a subfield of computational linguistics, which studies how to use software to translate text correspondence: dinidu sandaruwan#1 (e-mail: iamdinidu@gmail.com) received: 03-05-2021 revised:26-07-2021 accepted: 28-07-2021 dinidu sandaruwan#1, sagara sumathipala2, subha fernando3 are form department of computational mathematics, university of moratuwa sri lanka. (iamdinidu@gmail.com, sagaras@uom.lk, subhaf@uom.lk) doi : http://doi.org/10.4038/icter.v14i3.7230 © 2021 international journal on advances in ict for emerging regions neural machine translation approach for singlish to english translation 2 july 2021 international journal on advances in ict for emerging regions from one natural language to another .the machine translation has been evolved and rapidly developed since 1950s to now, with different approaches and techniques [4]. the development of machine translation can be seen in three main branches of rule-based, statistic-based and neuro machine translation. in rule-based machine translations, we can see there are three main approaches as direct systems (dictionary based machine translation) map input to output with basic rules. the rbmt (rule-based machine translation) system uses morphological and syntactic analysis, while the bilingual rbmt system (interlingua) uses abstract meaning. but there are so many shortcomings in this approach. in earlier days, the difficulty of finding good dictionaries and development of a dictionary was also costly and yet certain linguistic information still needs to be processed manually. and also, the interaction of rules, ambiguities and idioms in large-scale systems are difficult to deal with. again, it fails to adapt to new domains. when compared with the rule-based approach, statistic-based approach has significant improvements as statistical mt performs better when large and qualified corpora are available. the translation is fluent, which means it reads well and therefore meets user expectations. however, the translation is neither predictable nor consistent. the training of high-quality corpus is automated, and the cost is low. however, the training on the universal language corpus (i.e., texts other than specified domains) is poor. in addition, statistical mt requires a lot of hardware to build and manage large-scale translation models. but statistical machine translation techniques are being used for many low resource languages [5]. however, the common issue of this type of traditional machine translation is that to build the model can be seen as the need of expertise knowledge of both the source and target language. iv. neural machine translation before getting into the details, it might be worth describing the terms “neural machine translation”. neural machine translation (nmt) is the latest method of machine translation and is said to produce much more accurate translations than statistical machine translation methods and it is a way to learn more sophisticated functions to improve the accuracy of our probability estimates with less feature engineering. [6]. nmt is based on the neural network model and sends information to different “layers” for processing before output. nmt uses deep learning techniques for self learning to translate text based on existing statistical models. it helps to build a translation model without having an expert knowledge about the languages. also, self-learning leads to a faster translation with a quality output compared to the statistical method of machine translation. nmt uses algorithms to learn language rules on its own. the most notable advantage of nmt is its speed and quality. many researchers say that the nmt is the way of the future, and there is no doubt that the process will continue to improve its capabilities. figure 1 shows the visualization of some famous nmt models and the various changes suggested by the researchers over time [7]–[11]. figure 1: classification of nmt models as shown in the figure 1, nmt can be categorized in to two main branches as “seq2seq” (sequence to sequence) and “transformers”. when selecting an appropriate nmt approach for a particular language transtlation, it is important to discuss capabilities of each approach before making a decision to select one over the other. a. seq2seq (sequence to sequence) sequence-to-sequence (seq2seq) models have been a great success for various nlp tasks such as machine translation, speech recognition, and text summarization. seq2seq is relatively a new paradigm, with its first published usage in 2014 [12]. at a high level, a seq2seq model consists of two recurrent neural networks, one as encoder and the other one as the decoder. the encoder is responsible for processing each item in the input sequence and converges the information it collects in to a separate vector called context vector. once the input sequence is fully processed by the encoder, it passes the context vector to the decoder. decoder starts producing the output sequence item by item. some of the seq2seq (nmt) models consist of a standard, two-layer, bidirectional lstm encoder with an attention layer and, two-layer unidirectional lstm decoders. in terms of performance, such models look better than the standard encoder-decoder architecture. figure 2 : highlevel encoder decoder architecture nmt model consists of two main recurrent neural networks: the encoder rnn only consumes the input source word sequence without any prediction. on the other hand, the decoder processes the target sentence while predicting the next https://github.com/thunlp-mt/mt-reading-list 3 d. sandaruwan#1, s. sumathipala2, s. fernando3 international journal on advances in ict for emerging regions july 2021 word. this simply means that the encoder converts the source sentence into a "meaning" vector, and then passes it through the decoder to generate a translation. b. transformer seq2seq models with recurrent neural networks (rnns) like long short term memory (lstm) networks were becoming more popular and securing their well-earned reputation for quite a while until they were recently challenged by a new commer on the field of machine translation something called a transformer [13]. it’s a very innovative concept and is addressing two major weaknesses in rnns: • rnns are not parallelizable as the output of the particular step mainly depends on the output of the previous step. • rnns struggle to maintain long-term language dependencies because it sees only the memory from the previous step. in order to get a better understanding of the transformer model, let’s look at the architecture of a transformer model which is built and trained to do a translation of a given sentence in a particular language to another language. note that here we will not be discussing all the bits and pieces of the transformer model, but just enough to understand how it differs from the seq2seq model. the transformer is also an encoder-decoder model. encoder and decoder consist of several layers. each layer is made up of two types of sub-layers such as self-attention and fully connected layers. in addition, the decoder layer must contain a softmax layer, since it must generate probabilities for the vocabulary of the target language for each position. the self-attention layer is the revolutionary concept of the transformer model as it allows the model to look at all other words while processing a single word in the sequence. there’s no specialty around the fully connected layer as simply it takes the outputs of the separate self-attention layer and creates a hidden representation for each word using a fully connected layer. as we can see, none of the sublayers have sequential computations and repeating units waiting for the output of the previous step like it does with lstm. this reduces the need for the model to maintain a memory state like in lstms. the transformer can thus calculate the outputs for all time steps at the same time. as we can see, at some point the self-attention sublayer also sees all the other inputs. because of this, it becomes trivial to have long-term dependencies on long chunks of text. by comparing seq2seq model with transformer model, its very obvious that the transformer model addresses couple of main drawbacks in seq2seq. and it's been consistently shown that transformer models almost always outperform sequential models. but the question is can this transformer models be applied in the context of every machine translation environment as it is known to be a heavy model which does not support in low resource environments. the original transformer models are quite large. and also it requires comparatively a large data set which is again challaging to be used in low resource languages. eg:bert (bidirectional encoder representation of transformers) in 2019, google ai again introduced a new language model for natural language processing with a revolutionary attention engine called bert [14], or bidirectional encoder representations from transformers. by design, the model can see the context from both left and right sides of each word of a given sentence. this model deals with 300m parameters and that limits the ability to use this model in low resource environments. but simple seq2seq models with lstms can be used at a fraction of memory of this massive models occupy. seq2seq models are in other hand easy to prototype and understand. with compre to a transoformer model, setting up a seq2seq model is comparatively earier. if the focus is to do a feasibility study of a machine translation for a language pair which previously have not tested with any nmt approach, seq2seq model is a much better approach to get started with. c. seq2seq model with attention mechanism after the further study of seq2seq model, found some advance features that can be used along with the seq2seq models to improve the performance of the model. the “attention mechanism”, which was first introduced by bahdanau [6], then later refined it further by luong in 2015 and others [15]. the main idea of the attention mechanism is to create direct links between target and source by “paying attention” to relevant source content during translation. the attention alignment metrix is a byproduct of the attention mechanism which is an easy way of visualizing the alignment between the source and target sentences. figure 3 : alignments between source and target sentences (singlish to english) https://arxiv.org/abs/1409.0473 https://arxiv.org/abs/1508.04025 https://arxiv.org/abs/1508.04025 neural machine translation approach for singlish to english translation 4 july 2021 international journal on advances in ict for emerging regions it is important to look at some other popular machine transtalions other than the two main branches we discussed so far in this paper. one out of other approaches which known to be popular was the meta learning approach. metamt is one of the models that is popular among low resource language translations. d. metamt metamt is a meta-learning method proposed as a solution for low resource languages [16]. nmt model with a new word embedding transition technique for fast domain adaptation. splits parameters in the model into two groups: model parameters and meta parameters. domain adaptation of the machine translation model to low-resource domains using multiple translation tasks on different domains. it proposes a new training strategy based on meta-learning to update the model parameters and meta parameters alternately. tr-en translation experiment results bleu score 13.74 with a training set of 0.21 m sentence pairs while fi-en results in 20.20 with 2.63 m pairs. after going through all positives and negatives of the existing machine translation approaches, we decided to select seq2seq approach with attention mechaniusm to test this singlish to english translation activity. v. approach the seq2seq neural machine translation topology with attention mechanism can be used to construct a new language model for singlish (languages with very different morphology and syntax) to english translations. the inspiration behind this hypothesis is coming along with the outperforming results of related nmt models which associated with seq2seq topology and attention mechanism. it has shown successful results in low resource languages like vietnam, urdu, etc. [17]. preparing a dataset was one of the biggest challenges of this research since there are not many resources available for singlish. to prepare the required dataset, we selected iwslt'15 english-vietnamese parallel data set published by stanford university [17], obtained english sentences in the dataset to generate sinhala translation with google translator api and prepared the singlish data set parallel to the original english sentences with google pronunciation api. once the data is prepared, the additional script was developed to clean the data generated from the pronunciation api. even though this is a dataset generated synthetically, it is close enough to the way how people write in singlish. however, one can choose to write in this way if he thinks about how sinhala pronunciation can be written with english letters. following example sentence pair shows the process of generating the dataset. scraped: “there was a time in my life where we had a very troubled experience in our family” translated: “අෙප් පවුල තුළ අපට ෙබාෙහෝ කරදරකාරී අත්දැකීම් ඇති කාලයක් මෙගේ ජීවිතෙයේ තිබුණි” pronunciation: “apē pavula tuḷa apaṭa bohō karadarakārī atdækīm æti kālayak magē jīvitayē tibuṇi” processed: “ape paula tula apata boho karadarakare atdakem ati kalayak mage jewitaye tibuni” with this approach, a parallel corpus of 0.26 m language pairs (singlish-english), 65 k singlish and 49 k english word vocabulary was generated for training and ~1.5 k languages pairs were prepared for each testing and validation. vi. design and implementation in this design, a deep multi-layer rnn is considered, which consists of bidirectional lstms as recurrent units. an example of such a model is depicted in figure 2. in this example, shows how the source sentence "mama gedara yanawa" is translated into a target sentence "i go home". at a high level, the neural machine translation model consists of two recurrent neural networks, the encoder rnn consumes the input source words without any predictions. the decoder rnn, on the other hand, processes the target sentence by predicting the next words. figure 4 : seq2seq nmt high level architecture for this implementation, tensorflow has been used as the development framework. tensorflow nmt is being backed by google ai team since 2017 and makes room for researchers to build competitive translation models from scratch [18]. embedding : given the categorical nature of a word, the model must first the source and target embeddings to retrieve the corresponding word representation. in order for the embedding layer to work, first select a vocabulary for each language. usually, the vocabulary size v is selected, and only the most frequent v words are considered unique. all other words will be converted to "unknown" tokens, and all words will get the same embedding. embedding weights (one set for each language) are usually learned during training. # embedding implementation embedding_encoder = variable_scope.get_variable( "embedding_encoder", [src_vocab_size, embedding_size], ...) 5 d. sandaruwan#1, s. sumathipala2, s. fernando3 international journal on advances in ict for emerging regions july 2021 encoder_emb_inp = embedding_ops.embedding_lookup(embedding_encoder, encoder_inputs) similarly, we can embed the decoder and decoder word sequences and construct the embedding later of the network. note that you can choose to use pre-trained word representations (such as word2vec or glove vectors) to initialize the embedding weights. usually, if there is a lot of training data, we can learn these embeddings from scratch. encoder : then the word embedding is fed as input to the main network, which is composed of two multi-layer rnns-the source language encoder and the target language decoder. in principle, these two rnns can share the same weight. however, in practice, two different rnn parameters are used very oftenly for this type of models as it performs better when fitting large training data sets. the encoder rnn uses zero vectors as its starting states and is built as follows: # encoder rnn cell implementation encoder_cell = tf.nn.rnn_cell.basiclstmcell(num_units) # run dynamic rnn encoder_outputs, encoder_state = tf.nn.dynamic_rnn(encoder_cell, encoder_emb_inp, sequence_length=source_sequence_length, time_major=true) decoder : the decoder also needs to access the source information. a simple way to implement it is to initialize it with the last hidden state encoder_state of the encoder. in figure 2, the hidden state at the source word "yanawa" is passed to the decoder side. # decoder rnn cell implementation decoder_cell = tf.nn.rnn_cell.basiclstmcell(num_units) # helper helper = tf.contrib.seq2seq.traininghelper(decoder_emb_inp, decoder_lengths, time_major=true) # decoder decoder = tf.contrib.seq2seq.basicdecoder(decoder_cell, helper, encoder_state, output_layer=projection_layer) # dynamic decoding outputs, _ = tf.contrib.seq2seq.dynamic_decode(decoder, ...) logits = outputs.rnn_output however, the last hidden state would depend more on the last word and no previous words have taken into consideration. here the attention mechanism comes in to play. attention to the human mind means giving attention to a particular aspect. conventional methods like tf-idf give more importance to particular words (according to tf-idf value) but are not able to see the sequential information. the whole idea is to check whether we can combine the best of both worlds. in figure 4, described how the decoder takes the last hidden state of the encoder. the results were not good. therefore, to produce a better translation, all the hidden states have to be considered. this importance is decided by the scores generated by this attention mechanism as an aggregated context vector. figure 5 : attention mechanism. note that the calculation occurs at each decoder time step. it includes the following steps: 1. compare the current hidden state of the target with all source states to get the attention weight. 2. based on the attention weight, the context vector is calculated using the weighted average of the source state. 3. combine both the context vector and the current target hidden state to produce the attention vector. 4. feed the attention vector as input to the next time step (input feed). this process can be summarized by the following equation [13], [18]. attention weights : 𝑎𝑎𝑡𝑡𝑡𝑡 = 𝑒𝑒𝑒𝑒𝑒𝑒 �𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑒𝑒�ℎ𝑡𝑡, ℎ�𝑡𝑡�� � 𝑒𝑒𝑒𝑒𝑒𝑒 �𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑒𝑒(ℎ𝑡𝑡, ℎ𝑡𝑡′)� 𝑡𝑡 𝑡𝑡′=1 context vector : 𝑠𝑠𝑡𝑡 = �𝛼𝛼𝑡𝑡𝑡𝑡ℎ�𝑡𝑡 𝑡𝑡 attention vector : 𝑎𝑎𝑡𝑡 = 𝑓𝑓(𝑠𝑠𝑡𝑡, ℎ𝑡𝑡) = 𝑡𝑡𝑎𝑎𝑡𝑡ℎ(𝑊𝑊𝑐𝑐[𝑠𝑠𝑡𝑡; ℎ𝑡𝑡]) score function : 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑒𝑒�ℎ𝑡𝑡, ℎ�𝑡𝑡� = � ℎ𝑡𝑡 𝑊𝑊ℎ�𝑡𝑡 𝑣𝑣𝑎𝑎 tanh (𝑊𝑊1ℎ𝑡𝑡 + 𝑊𝑊2ℎ�𝑡𝑡) neural machine translation approach for singlish to english translation 6 july 2021 international journal on advances in ict for emerging regions here, the function score is used to compare the target hidden state 𝒉𝒉𝒕𝒕 with each of the source hidden states 𝒉𝒉�𝒔𝒔, and the result is normalized to produced attention weights (distribution over source positions). there are various choices of the scoring function, popular scoring functions include the multiplicative and additive forms given in score function. after calculation use the attention vector 𝒂𝒂𝒕𝒕 to derive logit and softmax loss. this is similar to the hidden state of the target in the top layer of the vanilla seq2seq model. the function f can also take other forms. beam search finally, the straight forward way to generate output sequences is to use a greedy algorithm. picking the token with the highest probability and moving on to the next. but the problem is, there is a high chance of leading it to sub-optimal output sequences. computationally also, it is inefficient. one recommended way to deal with this issue is to use beam search [18]. beam search uses breadth-first search algorithm to build its search graph, but only keeps top n nodes (beamsize) at each level in the search tree. the next level will then be expanded from these n nodes. it is still a greedy search algorithm, but a lot less greedy than the previous one as its search space is larger. while greedy search can provide us with quite reasonable translation quality, beam search decoders can further improve performance. the idea of beam search is to better explore the search space for all possible translations by keeping a small group of the best candidates when we translate. the beam size is called the beam width. the minimum beam width, for example a size of 10, is usually sufficient. in this research, we tested with n=10 as the beam size. vii. evaluation a bidirectional encoder (one bidirectional layer of the encoder) is used to train a 2-layer lstm with 512 units, and the embedding size is 512. luongattention (scale = true) is used with dropout keep_prob of 0.8. all parameters are uniform. learning rate of 1.0 has been used, as shown below, training for 12k steps (~12 epochs); after 8k steps, the learning speed will be halved in every 1k step. below summary shows the averaged results of 2 models. measured the translation quality in terms of bleu scores [19]. table i bleu results systems test2020 (dev) test2020 (test) nmt (greedy) 19.24 21.27 nmt (beam=10) 21.80 24.13 (0.283s step-time, 18.3k wps) for 2.6 m sentences on a macbook i7 with 2.2 ghz 6 core cpu and 8 gb memory. here, the step-time refers to the time it takes to run a minibatch (size 128). for wps, we count words on both the source and target. figure 6 : test bleu score graph viii. conclution it can be observed from the evaluation of the results that we have achieved significant success with a 24.13 bleu score for singlish-english translation. finding a data set for this research was one of the key pain-points for many researchers in sri lanka, especially in the attempt of trying out a deep learning approach for singlish-english translation. an alternative way is provided in this research to generate a decent dataset for this translation activity. the main problem with the approach of generating sysnthetic data set is, this translation works only for the singlish sentences written in a particular scheme. but in real scenario, people use different schemes to wite words in singlish. in other words, people write the same word with different spellings. this scenario can be consider as an unseen scenario for a given word. the word has not been seen in the training phase nor included in the vocabulary. this is a similar scenario of misspelled word. eg:ඉදිරියට = idiriyata, idiriata. in this particular scenario the model thinks this as two different words. if the model is trained with both the scinarios with samples and if both the words included in the vocabulary, the model will give the correct translation. otherwise it will give correct translation for a one scenario and while giving the word prediction as unknown for the other scenario. scinario 1 (seen word) : singlish : eyala idiriyata ena wada gana katha kara 7 d. sandaruwan#1, s. sumathipala2, s. fernando3 international journal on advances in ict for emerging regions july 2021 english : they talked about the work ahead scinario 2 (unseen word) : singlish : eyala idiriata ena wada gana katha kara english : they talked about the work handling unseen word or misspelled word in this translation model will potentially solve the above problem. then the model will consider “idiriyata” and “idiriata” as the same word. this is an initial stage of the research domain of analyzing, translating alternative writing systems used in sri lanka. the most significant achievement is putting a step ahead to use the latest and greatest deep learning approaches in the machine translation domain. the good news is that this research opens up doors to several new research paths to continue forward by addressing the limitations of handling different schemes of writing styles and the model development and improvement for singlish-english machine translation. references [1] r. singh, n. choudhary, and m. shrivastava, “automatic normalization of word variations in code-mixed social media text,” p. 11. [2] k. sreelakshmi, b. premjith, and k. p. soman, “detection of hate speech text in hindi-english code-mixed data,” procedia comput. sci., vol. 171, pp. 737–744, 2020, doi: 10.1016/j.procs.2020.04.080. [3] a. m. silva and r. weerasinghe, “example based machine translation for english-sinhala translations,” p. 10. [4] j. hutchins, “machine translation: history,” in encyclopedia of language & linguistics, elsevier, 2006, pp. 375–383. doi: 10.1016/b0-08-044854-2/00937-8. [5] w. p. pa, y. k. thu, a. finch, and e. sumita, “a study of statistical machine translation methods for under resourced languages,” procedia comput. sci., vol. 81, pp. 250–257, 2016, doi: 10.1016/j.procs.2016.04.057. [6] d. bahdanau, k. cho, and y. bengio, “neural machine translation by jointly learning to align and translate,” arxiv14090473 cs stat, may 2016, accessed: nov. 01, 2020. [online]. available: http://arxiv.org/abs/1409.0473 [7] n. kalchbrenner, l. espeholt, k. simonyan, a. van den oord, a. graves, and k. kavukcuoglu, “neural machine translation in linear time,” arxiv161010099 cs, mar. 2017, accessed: jun. 05, 2020. [online]. available: http://arxiv.org/abs/1610.10099 [8] y. wu et al., “google’s neural machine translation system: bridging the gap between human and machine translation,” arxiv160908144 cs, oct. 2016, accessed: jun. 07, 2020. [online]. available: http://arxiv.org/abs/1609.08144 [9] j. gehring, m. auli, d. grangier, d. yarats, and y. n. dauphin, “convolutional sequence to sequence learning,” arxiv170503122 cs, jul. 2017, accessed: jun. 07, 2020. [online]. available: http://arxiv.org/abs/1705.03122 [10] j. zhou, y. cao, x. wang, p. li, and w. xu, “deep recurrent models with fast-forward connections for neural machine translation,” arxiv160604199 cs, jul. 2016, accessed: jun. 05, 2020. [online]. available: http://arxiv.org/abs/1606.04199 [11] n. shazeer et al., “outrageously large neural networks: the sparsely-gated mixture-of-experts layer,” arxiv170106538 cs stat, jan. 2017, accessed: jun. 07, 2020. [online]. available: http://arxiv.org/abs/1701.06538 [12] i. sutskever, o. vinyals, and q. v. le, “sequence to sequence learning with neural networks,” p. 9. [13] a. vaswani et al., “attention is all you need,” arxiv170603762 cs, dec. 2017, accessed: may 17, 2020. [online]. available: http://arxiv.org/abs/1706.03762 [14] j. devlin, m.-w. chang, k. lee, and k. toutanova, “bert: pre-training of deep bidirectional transformers for language understanding,” arxiv181004805 cs, may 2019, accessed: may 17, 2020. [online]. available: http://arxiv.org/abs/1810.04805 [15] t. luong, h. pham, and c. d. manning, “effective approaches to attention-based neural machine translation,” in proceedings of the 2015 conference on empirical methods in natural language processing, lisbon, portugal, 2015, pp. 1412–1421. doi: 10.18653/v1/d15-1166. [16] j. gu, y. wang, y. chen, k. cho, and v. o. k. li, “metalearning for low-resource neural machine translation,” arxiv180808437 cs, aug. 2018, accessed: jun. 21, 2020. [online]. available: http://arxiv.org/abs/1808.08437 [17] m.-t. luong and c. d. manning, “stanford neural machine translation systems for spoken language domains,” p. 4. [18] g. neubig, “neural machine translation and sequence-tosequence models: a tutorial,” arxiv170301619 cs stat, mar. 2017, accessed: nov. 01, 2020. [online]. available: http://arxiv.org/abs/1703.01619 [19] k. papineni, s. roukos, t. ward, and w.-j. zhu, “bleu: a method for automatic evaluation of machine translation,” in proceedings of the 40th annual meeting on association for computational linguistics acl ’02, philadelphia, pennsylvania, 2001, p. 311. doi: 10.3115/1073083.1073135. english : they talked about the work ahead english : they talked about the work international journal on advances in ict for emerging regions 2021 14(2): march 2021 international journal on advances in ict for emerging regions tool support for distributed workflow management with task clustering ayesh weerasinghe1*, kalana wijethunga2*, randika jayasekara#3*, indika perera4*, anuradha wickramarachchi5ϯ abstract— when in need for executing complex sets of interrelated calculations on high-performance computing (hpc) environments the obvious choice is to use scientific workflows. as workload management software do not support the execution of interrelated tasks, workflow management systems have been introduced to execute workflows on hpc environments. recently, a new distributed architectural model that offers dynamic workflow execution capabilities to workflow management systems is introduced. it executes workflows on a per-task basis. while this approach facilitates dynamic workflows, it adds a considerable overhead to workflows substantially increasing their makespans. as most workflows are static, task-wise execution of workflows degrades the performance of most workflows. in this paper, we introduce a distributed workflow management system, swarmform that introduces task clustering to the new architectural model. swarmform is open source and offers better performance than existing distributed workflow management systems by clustering workflow tasks to reduce overheads while allowing the users to choose between task-wise and cluster-wise execution of workflows depending on the workflow nature. the paper proves that swarmform enables the use of all the features introduced with the new architectural model while providing better makespans for scientific workflows. keywords— task clustering, workflow management systems, scientific workflows. i. introduction almost every scientific domain such as astrophysics, bio and health informatics, physics, and bio-sciences use workflows to express complex sets of tasks that are dependent on one another using scientific workflows. these workflows are executed in high-performance computing (hpc) environments as they need a lot of computing power to execute. workload management software like pbs pro [1], slurm [2], torque [3] are installed on these hpc environments to manage the computing resources of the environment. however, they do not support workflow scheduling but only support the execution of independent jobs. given the complexity of real-world workflows, the execution becomes cumbersome as the users have to manage a large number of individual job execution files. therefore, workflow management systems (wms) have been introduced to execute scientific workflows on hpc environments. a workflow management system is able to get a workflow consisting of a series of interrelated tasks as input and submit them as separate jobs to a workload management software while maintaining the dependencies between the tasks in order to be executed in an hpc environment. wmss executes workflows either by executing each task as a job and passing its results to other tasks (chained jobs) or by executing the whole workflow as a single job (pilot job). running a workflow as a pilot job results in better makespan with poor resource utilization of the execution environment whereas running a workflow as chained jobs results in better resource utilization with poor makespan. distributed wmss execute workflows as chained jobs with a separate job for each task whereas centralized wmss execute workflows as pilot jobs. therefore, centralized wmss have better makespan with poor resource utilization while distributed wmss have better resource utilization with poor makespan. although centralized wmss minimize this issue by clustering the tasks in the workflow and submitting them as few chained jobs, they fail to provide many features available in distributed wmss like dynamic workflows, concurrent execution of multiple workflows, failure detection and correction, etc. therefore, it is observed that distributed wmss offer much more important functions than centralized wmss. scheduling a job on an hpc environment consists of a considerable overhead [4]. thus, scheduling of jobs using a distributed wms causes a significant increase in the makespan of the workflow as they execute each task as a separate job. the advantages offered by distributed wmss can be retained while reducing the makespan of workflows by introducing task clustering to distributed wmss. this is clearly demonstrated in fig. 1(a) where the jobs are scheduled in the chained fashion resulting in a longer makespan and fig. 1(b) where the jobs are executed within a much shorter makespan with somewhat of a compromise on the resource utilization. however, there is a significant potential to improve the resource utilization while correspondence: r. jayasekara#3 (e-mail: rpjayasekara.16@cse.mrt.ac.lk) received: 20-12-2020 revised: 14-02-2021 accepted: 10-03-2021 this paper is an extended version of the paper “swarmform: a distributed workflow management system with task clustering” presented at the icter 2020 conference. a. weerasinghe1*, k. wijethunga2*, r. jayasekara#3*, i. perera4*are from the department of computer science & engineering, university of moratuwa, sri lanka. {ayeshweerasinghe.16, kalana.16, rpjayasekara.16, indika}@cse.mrt.ac.lk anuradha wickramarachchi5ϯ is from the australian national university, australia. (anuradha.wickramarachchi@anu.edu.au) doi: http://doi.org/10.4038/icter.v14i2.7223 © 2021 international journal on advances in ict for emerging regions tool support for distributed workflow management with task clustering 2 international journal on advances in ict for emerging regions march 2021 having pilot jobs in a chained fashion to make a near optimal balance of trades. the paper presents the following contributions to the domain of workflow scheduling: • the research introduces swarmform [5] a new open source distributed workflow management system with task clustering capabilities. • the research introduces an extension to the existing workflow and platform aware clustering algorithm [6] to improve its performance • the research implements the resource aware clustering (rac) algorithm [7] in swarmform to maximize the resource utilization of the clustered workflows that are executed through swarmform the rest of the paper is arranged as follows. section ii presents a review of the existing literature and background of the study. section iii presents the work proposed in the study. section iv evaluates the performance improvement introduced by the proposed work and section v concludes the paper with an overview on the future work in section vi. ii. related work liu et al. [8] show that the functional architecture of a wms consists of 5 layers workflow execution plan (wep) generation, wep execution, presentation, user services, and infrastructure. according to how these layers are managed, existing wmss can be categorized as centralized wmss and distributed wmss. in wmss like pegasus [9], taverna [10], etc. all these functional layers are managed by a single program and all the computing nodes in the hpc environment are managed by a single or a few instances of the wms which makes them centralized wmss. wmss like ehive [11], fireworks [12] consist of a set of programs that manages different functional layers and they have independent instances of the wms per each computing node of the hpc environment which makes them distributed wmss. each instance of the distributed wmss can be associated with a different database which eliminates the need to have a single central queue for all the jobs that are expecting to be run in the hpc environment. centralized wmss such as pegasus [9], taverna [10], kepler [13] have been used for over a decade for executing workflows in many high-performance computing environments all around the world. they include features such as workflow submission, special cli tools for workflow design and management, ability to store provenance data, etc. that are key requirements when executing a scientific workflow. taverna and kepler include a versatile workbench that allows fully graphical workflow design, which is extremely helpful in designing new scientific workflows. these systems are much more effective in executing scientific workflows than using workload management software for scientific workflow execution. the inability to execute dynamic workflows can be seen as the major drawback of the centralized wmss. all these systems need the workflows to be defined at the beginning of the workflow and they do not allow modifying the workflow while it is being executed. in addition to that, most of them run workflows as a single pilot job. as explained by rodrigo et al. [14] executing a workflow as a single pilot job causes a huge resource wastage as many of the resources of the hpc environment are idle most of the time. furthermore, they do not support concurrent execution of workflows as they are submitted as pilot jobs. the above issues have been addressed in ehive [11] and fireworks [12] using a new architectural model. they follow a blackboard-based architecture with 3 main components: a central database that holds details of each workflow submitted by the users, a set of clients that pull tasks from the database and execute them on the backend and a client manager that handles spawning, killing, and managing of the clients. all three components work as independent programs and that provides a distributed architectural model to these systems. the distributed architecture resolves the single point of failure in the existing centralized systems by having different programs control different layers in workflow management. in addition to that, these systems support concurrent execution of multiple workflows and the system manages the scheduling of tasks across workflows. distributed wmss submit workflows as chained jobs with each task packed as a single job. this allows the workflows to change its structure at the runtime while resulting in a substantial increase in resource utilization of the execution environment as only the required resources are obtained per each job. it also makes sure that the failure of one job does not affect the execution of other jobs. three major overheads are present when executing a job on an hpc environment: i.e., scheduling overhead (time taken to schedule the job on a specific node), queue delay (time a job must wait in the queue until it gets the opportunity to be executed) and communication overhead (time taken to transfer the results of parent job to its children). therefore, fig. 1(b) running a workflow as a pilot job fig. 1(a) running a workflow as a set of chained jobs 3 a. weerasinghe1*, k. wijethunga2*, r. jayasekara#3*, i. perera4*, a. wickramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions executing each task of a workflow as separately chained jobs will cause a substantial increase in the makespan [14] as executing each job adds a considerable overhead to the total runtime of the workflow. while distributed wmss offer a lot of features that are not available in centralized wmss, the increased makespan of workflows due to the execution of each task as a chained job raises a major concern. even though this adds support for dynamic workflows, executing both static and dynamic workflows as individually chained jobs cause an unnecessary overhead. a better approach to this problem would be to introduce task clustering to distributed wmss and allow the user to decide whether he needs clustering or not depending upon the application. this will reduce the makespan of workflows in distributed wmss while ensuring that all the advantages offered by distributed wmss are preserved. to address this issue, we introduce a new distributed workflow management system swarmform which includes task clustering to reduce the makespan of workflows. using swarmform, we intend to deliver all the advantages of a distributed wms to users while maintaining the optimum balance between the makespan of workflows and the resource utilization of the environments. task clustering is already implemented in some of the centralized wmss like pegasus [9] and in some grid middleware management systems like xavantes [15]; we intend to use those techniques to provide better makespans for workflows executed using distributed wmss. different researches have introduced different clustering techniques. in the related literature, horizontal runtime balancing, horizontal impact factor balancing and horizontal distance balancing algorithms introduced by chen et al. [16] are being used as the baseline for workflow task clustering. kaur et al. [17] has introduced a new clustering technique called hybrid balanced task clustering algorithm that clusters tasks both vertically and horizontally. chen et al. [16] has introduced a balanced clustering technique for horizontal clustering and sahni et al. [6] has introduced the workflow and platform aware task clustering (wpa) algorithm. zhang et al. [18] has introduced a new metric called dependency correlation to cluster tasks in their dependency balance clustering algorithm. dependency balancing clustering algorithm cluster tasks based on the similarity of their dependencies. wpa algorithm uses the knowledge about the structure of the workflow and the execution environment to cluster tasks such that there is the least possible ineffective parallelism as possible. hybrid balanced task clustering algorithm combines all three baseline algorithms to present a novel approach to cluster tasks both vertically and horizontally. a novel approach to cluster tasks considering both execution time of tasks and resource requirements has been introduced by rac algorithm [7]. this algorithm tries to cluster the tasks that are most similar in the resource requirements while trying to ensure that all the created clusters have near similar runtimes. it makes sure that the resource wastage is minimized, and all the tasks in clustered jobs are released nearly at the same time when tasks in a workflow are clustered together to reduce the makespan. none of the existing task clustering algorithms except the rac algorithm take resource requirements of tasks into account when clustering. considering the pros and cons of each algorithm, we implemented an extended version of the wpa algorithm and rac algorithm to carry out task clustering in swarmform while giving the privilege to the user to choose which algorithm he wants to use. iii. methodology a distributed wms called swarmform has been developed, which offers task clustering, to address the drawbacks in existing wmss explained above. task clustering will play a key role in reducing the makespan of a workflow in the new wms. a. wpa algorithm a workflow is represented as a directed acyclic graph (dag) with nodes of the graph representing tasks and edges between the nodes representing the dependencies between the tasks. the wpa algorithm only clusters the tasks at the same level of the workflow dag where the level of a task is defined as the longest distance from root node task(s) to task node. while this improves the makespan of a workflow, it causes a dependency imbalance in the workflow. it also does not reduce the communication overhead caused when transferring the output of the parent task to the child tasks as children and parents are not clustered together. to address these issues, we introduce a new technique to cluster the workflows both horizontally and vertically as follows. under the proposed technique, first, the tasks with singlechild single-parent relationships are clustered together and then the resulting tasks are clustered horizontally using the wpa algorithm. the wpa algorithm takes the available number of computing nodes as an input. since in most of the hpc environments we cannot get the exact number of resources available at the time of execution, we have proposed a slight modification to the wpa algorithm along with the addition of our vertical clustering approach. the modified pseudocode of the wpa algorithm is given in algorithm 1. tool support for distributed workflow management with task clustering 4 international journal on advances in ict for emerging regions march 2021 fig. 2 illustrates the significance of this task clustering approach. fig. 2(a) depicts an example workflow with 5 levels and the number on each node states the execution time of the task. first, the tasks are being clustered vertically considering their single-parent single-child relationships (fig. 2(b)). fig. 2(c) shows the result of the proposed vertical clustering technique on the example workflow of fig. 2(a). then the resulting workflow tasks are clustered horizontally using the wpa algorithm (fig. 2(c)). fig. 2(d) shows the result of our proposed extended wpa clustering algorithm. the algorithm 2 explains the pseudocode of the proposed vertical clustering technique. the algorithm takes a workflow as the input. it begins with the first level of the workflow and iterates to the depth of the workflow (line 3). it selects the tasks at each level (line 4) and iterates the tasks, one by one (line 5). if the task only has a single child and that child task has no other parents (single-parent single-child relationship), both the task and its child task are grouped into a cluster. this process is repeated in a depth-first manner until there are no more single-parent single-child relationships for the selected task (line 7-10). finally, the workflow is updated if a selected task is clustered with its children (line 13). b. swarmform workflow management system 1) swarmform architecture: swarmform distributed wms is developed on top of fireworks [12] distributed wms which is the state-of-the-art system in the domain of distributed wmss. fireworks is used as an open source library in the implementation of swarmform. swarmform ensures that all the functionalities of fireworks are available to the user while offering additional functionalities for workflow management. swarmform is highly decoupled from fireworks and this approach provides the ability to develop fireworks and swarmform independently ensuring fast and easy adaptations to any update to fireworks. the architecture of swarmform bears a close resemblance to fireworks with some additional improvements. in swarmform, a workflow is referred to as a swarmflow. a swarmflow can be represented as a directly acyclic graph (dag) and these swarmflows can be defined by the python interface, command-line interface or by directly loading a json or yaml swarmflow definition. swarmform adapts the workflow definition format introduced by fireworks for defining swarmflows as this format helps to define workflows in a more easy and readable way in contrast to the existing dax format. a swarmflow consists of one or more individual tasks that are called fireworks (fws). these fws represent the nodes in the swarmflow definition dag whereas the edges of the dag represent dependencies between fws. a firework can have a sequence of one or more atomic tasks that are called firetasks. these firetasks are separate python functions that can call shell scripts, transfer files, read/write files or call other python functions. firetasks can return fwactions that can modify the swarmflow dynamically at runtime based on the computational conditions which give the dynamic behaviour to the system. swarmpad is another key part of the swarmform wms that is used to store all the details of swarmflows, fws, provenance data and other data related to execution of swarmflows. swarmpad is a nosql database which is built using mongodb. fireworkers are the clients who pull fws from the swarmpad and execute. it launches unique agents called rockets to pull and execute each fw. workflow management is handled by the swarmpad and workflow execution is handled by rockets and fireworkers which provides the distributed behaviour to the swarmform wms. fig. 3 shows the architecture of the swarmform wms. the flowparser takes the input workflow and passes it to the swarmformer. swarmformer clusters the swarmflow and adds it to the database. optionally, the flowparser can save the swarmflows directly to the database without clustering, based on the user requirement. the swarmformer takes a swarmflow as the input and clusters the tasks in the swarmflow and saves the clustered swarmflow in the database. later, the fireworkers can pull tasks using rockets and execute clustered fireworks in hpc environments as shown in fig. 3. fig. 2(a) initial workflow fig. 2(b) cluster nodes with single-parent singlechild relationships fig. 2(c) horizontally cluster the resultant workflow fig. 2(d) clustered workflow 5 a. weerasinghe1*, k. wijethunga2*, r. jayasekara#3*, i. perera4*, a. wickramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions 2) swarmform features: as we have described above, the overhead in executing a job is a critical factor which results in increasing the makespan of a workflow. even the state-of-the fig. 3 swarmform architecture art distributed workflow management system does not address this issue as it executes each task in a workflow as a separately chained job. as a solution to the aforementioned problem, we introduce task clustering to swarmform which reduces the makespan of the workflows by minimizing the overheads in the execution of a workflow. in section iv, we have proven that swarmform outperforms the state-of-the-art distributed wms fireworks [12] when task clustering is enabled. in swarmform, workflows which are referred to as swarmflows are treated as primary entities and fireworks are considered as secondary entities. this considerably eases the process of managing workflows when executing workflow operations like task clustering. in addition to that, swarmflow can accept and process multiple task parameters like cost, execution time, resource requirements of the task etc. these parameters can be used for making better scheduling decisions and workflow management decisions like how the tasks will be clustered which increases the performance of the system. the support to these parameters is added in such a way that a user can easily extend the parameter set by easily adding new parameters. the wms takes cost parameters like execution time, required number of cores per task as inputs through _queueadapter identifier in the workflow definition. therefore, users will be able to define new parameters like memory required, wall time etc. which can be used in further workflow management decisions. initially, swarmform was only equipped with the wpa clustering algorithm, which did not consider the resource requirements when making task clustering decisions. later, we implemented the rac algorithm [7] which takes both execution time and resource requirements into consideration when making task clustering decisions. we integrated the rac algorithm [7] to the system in such a way that swarmform wms could use any task clustering algorithm based on the user requirement, without limiting to a single task clustering algorithm. with this modification, a developer can easily implement new task clustering algorithms and use them without modifying the core components of the wms. with the integration of these task clustering algorithms, we introduce a new feature to express the estimated resource wastage due to task clustering. because of this feature, users can see the resource wastage that could be occurred due to clustering of the workflow before executing workflows in hpc environments. this can be used to make decisions for selecting suitable task clustering algorithms without executing workflows in resource intensive environments. resource wastage of a single cluster containing l tasks (wj) and total resource wastage of the workflow containing k clusters (wt) can be calculated as in (1) and (2) respectively. as the workflow definition format used in fireworks and swarmform is a novel format, the users have to put an extra effort to convert their existing workflows to the new format. to ease out this process we introduce a workflow generator which can be easily used to generate workflows by inputting the minimum parameters possible. in addition to that, it supports converting dax files directly to the new workflow definition format with no user intervention at all. c. rac algorithm rac algorithm [7] is chosen for this due to its ability to minimize resource wastage in the execution environment. it uses a novel metric called resource aware clustering coefficient to identify the most suitable tasks that should be assigned to the same cluster. although the rac algorithm does not always outperform the existing task clustering algorithms in makespan reduction of workflows, it outperforms all the existing task clustering algorithms in maximizing resource utilization while providing competitive makespan reductions in workflows. therefore, rac algorithm [7] is implemented in swarmform to maximize the resource utilization of the execution environment while minimizing the makespan of the workflow. since the scientific workflow is represented as a directed acyclic graph (dag), we have defined our data structure to model the workflow in swarmform which we referred to as dag model. that dag model is used to implement the wpa algorithm. the same approach is followed when implementing tool support for distributed workflow management with task clustering 6 international journal on advances in ict for emerging regions march 2021 the rac algorithm. the algorithm takes the workflow represented using the dag object and number of clusters per horizontal level(r) as the inputs and returns a dag object which represents the workflow with clustered tasks. the algorithm traverses the dag following a level-by-level approach, starting from level one. it takes the tasks at each level and clusters the tasks at level only if the number of tasks at level is greater than the number of clusters per level. in each level, first it creates r number of empty clusters and iterates the tasks in the level task by task. in each iteration in the inner loop the resource aware clustering factor is calculated for the task respective to the clusters created for that level. then it selects the cluster with the minimum factor value since resource-aware clustering factor gives the smallest value with the cluster that the considering task fits best and checks whether the cluster has not exceeded the number of tasks that it can hold. if it does not exceed, the task is put into that cluster. this process is repeated for each task in each level. after populating the clusters by tasks for each layer workflow dag is updated as it needs to preserve the dependencies. first, it removes the task in the considering level from the workflow dag and adds the new clusters to the workflow. then updates the parent-child relationships appropriately as the updated workflow dag needs to preserve the dependencies between tasks. iv. results in this section, we evaluate the performance of swarmform wms. as fireworks is the state-of-the-art in distributed workflow management systems, wpa task clustering enabled swarmform wms is compared and evaluated against the fireworks wms. to have the same evaluation setup for both systems, we have evaluated both wmss on standard benchmark workflows cybershake (fig. 4), ligo (fig. 5) and sipht (fig. 6) presented by bharathi et al. [19]. the workflow definitions of cybershake 100 job workflow, ligo 100 job workflow, and sipht 97 job workflow provided by pegasus workflow generator are used for the evaluation [20]. we use a workflow simulation setup for evaluating the performance of the systems. this is a widely used approach since reserving an hpc environment for evaluation purposes is highly costly. the simulation setup consists of 5 rockets with each rocket acting as a computing node with a single core. each rocket pulls a job from the database and executes it. we have added a constant delay after completion of each job to represent the communication overhead incurred when transferring the output of a parent job to its children jobs. only the considered workflow is present in the database throughout the evaluation. initially, two sets of the same workflows in dax format are taken and converted into swarmform/firework readable format using the swarmform workflow generator. then, a set of workflows are clustered and executed using the swarmform wms and the other set of workflows are directly executed using the fireworks wms. makespan of each workflow is measured in both systems and the performance gain (3) is calculated. fig. 4 cybershake workflow structure fig. 5 ligo workflow structure fig. 6 sipht workflow structure from fig. 7, it can be observed that the makespan of each workflow has been reduced when executed using swarmform than with fireworks. this proves that executing workflows with task clustering enabled in swarmform reduces the makespan of each workflow considerably than executing it in fireworks. 7 a. weerasinghe1*, k. wijethunga2*, r. jayasekara#3*, i. perera4*, a. wickramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions the performance gain shows the percentage improvement in the makespan of each workflow executed in swarmform compared to fireworks. from the results of the experiments fig. 7 comparison of the makespan of each workflow executed using fireworks and swarmflow with task clustering enabled. (fig. 8), it can be observed that swarmform shows a 10.19% improvement in the makespan of cybershake, 24.36% improvement in the makespan of ligo and 9.41% improvement in the makespan of sipht workflows. further, it should be noted that the performance gain of each workflow is positive which shows that swarmform outperforms fireworks when task clustering is enabled. fig. 8 comparison of the average performance gain in executing each workflow in swarmform and fireworks in this evaluation, we have considered only the communication delay between tasks and the queue delay among jobs in the same workflow as the overhead. clustering related tasks together eliminate the communication overhead between those tasks as they are executed in the same node under the same job. the improvement shown in the evaluation mainly results from the reduction of communication overhead between the tasks. however, in real environments, there are many more overheads like scheduling overhead and queue delays due to the competition for limited resources by a large number of jobs from multiple workflows. among them, queue delay can increase the makespan by a substantial amount as the delay increases considerably with the increase of the job submissions. these overheads are reduced when tasks are clustered. therefore, we expect that swarmform will perform even better when used with real workflows in hpc environments. v. discussions this paper presents swarmform, a new distributed workflow management system with task clustering capabilities. swarmform is built using fireworks which is an open source library and offers useful features such as support for dynamic workflows, concurrent workflow execution, and failure detection and correction that are not available in the existing centralized wmss. swarmform introduces task clustering to increase the performance of existing distributed wmss, a dax workflow importer and a workflow generator that can be used for workflow simulation purposes. as another contribution, the research has introduced an extension to the wpa algorithm which improves its performance. the extension of the wpa algorithm is to introduce a hybrid clustering approach, which clusters the tasks both vertically and horizontally. we implement the updated clustering algorithm in swarmform and evaluate swarmform with fireworks and prove that execution of workflows in swarmform yields better makespans than executing them in the existing state-of-the-art distributed wms due to the introduction of task clustering. finally, we implement the rac algorithm as the primary clustering algorithm in swarmform to introduce resource management capabilities to swarmform. none of the existing wmss consider minimizing resource wastage when clustering tasks. therefore, executing workflows using swarmform by clustering their tasks with rac algorithm significantly reduces the resource wastage of the execution environment while providing a considerable improvement in the makespan of the workflow. further, the users are given the opportunity to choose any of the task clustering algorithms depending on the requirement for clustering their workflows while providing the developers with the ability to implement any required task clustering algorithm and use them without having to change any core components of swarmform. the estimated resource wastage after clustering of workflows with each clustering algorithm is also shown to the users which allows them to choose the algorithm that gives them the best resource utilization and the makespan. vi. future work task clustering is done to achieve different objectives along with reducing makespan like minimizing resource wastage, minimizing dependency imbalance, achieving qos requirements etc. currently, swarmform contains only two task clustering algorithms which are capable of solving resource imbalance and runtime imbalance problems. we plan to implement a few more task clustering algorithms in swarmform thus allowing the user to choose the suitable tool support for distributed workflow management with task clustering 8 international journal on advances in ict for emerging regions march 2021 algorithm depending on the use case from a variety of task clustering algorithms. we plan to improve our workflow generator to generate actual workflows and to import actual workflows defined in dax format into swarmform workflow definition format. it will later be extended to support common workflow language [21] as well. further, we plan to introduce a gui to swarmform to easily define new workflows graphically as the existing distributed wmss consist of graphical user interfaces (gui) only for reporting. references [1] j. nabrzyski, j. m. schopf and j. and węglarz, “pbs pro: grid computing and scheduling attributes,” in grid resource management, boston, kluwer academic publishers, 2004, pp. 183-190. [2] a. b. yoo, m. a. jette and m. grondona, “slurm: simple linux utility for resource management,” in job scheduling strategies for parallel processing, berlin, springer, 2003, pp. 44-60. [3] d. klusáček, v. chlumský and h. rudová, "planning and optimization in torque resource manager", proceedings of the 24th international symposium on high-performance parallel and distributed computing, pp. 203-206, 2015. [4] w. chen and e. deelman, "workflow overhead analysis and optimizations", proceedings of the 6th workshop on workflows in support of large-scale science works '11, pp. 11-20, 2011. available: 10.1145/2110497.2110500. [5] "swarmform/swarmform", github, 2020. [online]. available: https://github.com/swarmform/swarmform. [6] j. sahni and d. p. vidyarthi, “workflow-and-platform aware task clustering for scientific workflow execution in cloud environment,” futur. gener. comput. syst., vol. 64, pp. 61–74, 2016, doi: 10.1016/j.future.2016.05.008 [7] a. weerasinghe, k. wijethunga, r. jayasekara, i. perera and a. wickramarachchi, "resource aware task clustering for scientific workflow execution in high performance computing environments", in 22nd ieee international conference on high performance computing and communication, fiji, 2020. [8] j. liu, e. pacitti, p. valduriez and m. mattoso, "a survey of dataintensive scientific workflow management", journal of grid computing, vol. 13, no. 4, pp. 457-493, 2015. available: 10.1007/s10723-015-9329-8. [9] e. deelman et al., "pegasus: a framework for mapping complex scientific workflows onto distributed systems", scientific programming, vol. 13, no. 3, pp. 219-237, 2005. available: 10.1155/2005/128026. [10] d. turi, p. missier, c. goble, d. d. roure and t. oinn, "taverna workflows: syntax and semantics," in third ieee international conference on e-science and grid computing (e-science 2007), bangalore, 2007, pp. 441-448. [11] j. severin et al., "ehive: an artificial intelligence workflow system for genomic analysis", bmc bioinformatics, vol. 11, no. 1, p. 240, 2010. available: 10.1186/1471-2105-11-240. [12] a. jain et al., "fireworks: a dynamic workflow system designed for high-throughput applications", concurrency and computation: practice and experience, vol. 27, no. 17, pp. 5037-5059, 2015. available: 10.1002/cpe.3505. [13] i. altintas, c. berkley, e. jaeger, m. jones, b. ludascher and s. mock, "kepler: an extensible system for design and execution of scientific workflows," proceedings. 16th international conference on scientific and statistical database management, 2004., santorini island, greece, 2004, pp. 423-424. [14] e. elmroth, p. östberg, l. ramakrishnan and g. p. rodrigo, "enabling workflow-aware scheduling on hpc systems", in hpdc '17: the 26th international symposium on high-performance parallel and distributed computing, washington dc usa, 2017, pp. 3 14. [15] l. bittencourt and e. madeira, "a dynamic approach for scheduling dependent tasks on the xavantes grid middleware", in middleware06: 7th international middleware conference, melbourne, australia, 2006. [16] w. chen, r. f. da silva, e. deelman, and r. sakellariou, “balanced task clustering in scientific workflows,” proc. ieee 9th int. conf. escience, e-science 2013, pp. 188–195, 2013, doi: 10.1109/escience.2013.40. [17] a. kaur, p. gupta, and m. singh, “hybrid balanced task clustering algorithm for scientific workflows in cloud computing,” scalable comput., vol. 20, no. 2, pp. 237–258, 2019. [18] l. zhang, d. yu, and h. zheng, “optimization of cloud workflow scheduling based on balanced clustering,” lect. notes comput. sci. (including subser. lect. notes artif. intell. lect. notes bioinformatics), vol. 10581 lncs, pp. 352–366, 2017. [19] s. bharathi, a. chervenak, e. deelman, g. mehta, m. su and k. vahi, "characterization of scientific workflows," 2008 third workshop on workflows in support of large-scale science, austin, tx, 2008, pp. 1-10. [20] "workflowgenerator pegasus pegasus workflow management system", confluence.pegasus.isi.edu, 2014. [online]. available: https://confluence.pegasus.isi.edu/display/pegasus/workflowhub. [21] b. chapman et al., common workflow language, v1.0. united states: figshare, 2016. https://github.com/swarmform/swarmform https://confluence.pegasus.isi.edu/display/pegasus/workflowhub ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2019 12 (1): september 2019 international journal on advances in ict for emerging regions a hybrid approach for aspect extraction from customer reviews yasas senarath#1, nadheesh jihan#2, surangika ranathunga#3 abstract— aspect extraction from consumer reviews has become an essential factor for successful aspect based sentiment analysis. typical user trends to mention his opinion against several aspects in a single review; therefore, aspect extraction has been tackled as a multi-label classification task. due to its complexity and the variety across different domains, yet, no single system has been able to achieve comparable accuracy levels to the human-accuracy. however, novel neural network architectures and hybrid approaches have shown promising results for aspect extraction. (support vector machines) svms and (convolutional neural networks) cnns pose a viable solution to the multi-label text classification task and has been successfully applied to identify aspects in reviews. in this paper, we first define an improved cnn architecture for aspect extraction which achieves comparable results against the current state-of-the-art systems. then we propose a mixture of classifiers for aspect extraction, combining the proposed improved cnn with an svm that uses the state-of-the-art manually engineered features. the combined system outperforms the results of individual systems while showing a significant improvement over the state-of-the-art aspect extraction systems that employ complex neural architectures such as mtna. keywords— aspect extraction, deep learning, sentiment analysis, text classification, natural language processing i. introduction ustomer reviews have become the means of expressing opinions and views of consumers towards different aspects of products and services. the information contained in such reviews can be leveraged by customers to identify the best available products/ services in the market and by the organizations to identify and satisfy customer needs. however, customer reviews are in unstructured textual form, which makes it difficult to be summarized by a computer. in addition, manual analysis of this huge amount of data for information extraction is nearly impossible. automatic sentiment analysis of customer reviews has, therefore, become a priority for the research community in recent years. conventional sentiment analysis of text focuses on the opinion of the entire text or the sentence. in the case of consumer reviews, it has been observed that customers often talk about multiple aspects of an entity and express an opinion on each aspect separately rather than expressing the opinion towards the entity. aspect based sentiment analysis (absa) has emerged to tackle this issue. manuscript received on 4 march 2019. recommended by prof. g.k.a. dias on 10 june 2019. this paper is an extended version of the paper “aspect extraction from customer reviews using convolutional neural networks” presented at the icter 2018. yasas senarath, nadheesh jihan and surangika ranathunga are from department of computer science and engineering, university of moratuwa, sri lanka. (wayasas.13@cse.mrt.ac.lk, nadheeshj.13@cse.mrt.ac.lk, surangika@cse.mrt.ac.lk) the goal of aspect based sentiment analysis is to identify aspects present in the text, and the opinions expressed for each aspect [1]. one of the most important tasks of absa is to extract aspects from the review text. however, there have been several challenges in extracting aspects such as support for multiple domains, detecting multiple aspects in a single sentence, and detecting implicit aspects [2]. state-of-the-art systems presented by kim et al. [3] and jihan et al. [4] try to address the above challenges, but those systems lack in terms of performance. moreover, neural network models have increasingly been used in text classification and aspect extraction [5, 6, 7]. among these neural network models, a common type is the convolutional neural networks (cnn) [3, 5]. however, existing state-of-the-art cnn architectures used in text classification for aspect extraction do not incorporate improvements [7, 8] (e.g. non-static cnn, multi-kernel convolution layers, and optimizing the number of hidden layers and hidden neurons) that have been identified as beneficial for general text classification tasks [3]. moreover, traditional cnn models lack the ability to capture context level features. there have been models based on cnns used to extract aspects from customer reviews [5, 6]. in the light of the above identified limitations of traditional cnn models for aspect extraction, this paper presents following contributions:  we present a modified cnn architecture for aspect extraction, which implements two improvements. to capture context level features, we incorporate multiple convolutional kernels with different filter sizes. we also introduce dropout regularization to prevent models from over-fitting to the training samples. although these improvements have been used in general text classification tasks [3], the effect of the same has not been explored for aspect extraction.  we implement an optimal dense layer architecture between the feature selection layer and the output layer of the cnn with the use of a feed-forward network with two hidden layers that was derived using the constructive method proposed by huang et al. [7]. this also helps to calculate the optimal number of hidden neurons for each layer that is sufficient to store the relationship between the training instances and the classes. the effect of such optimization techniques on hidden dense layers of the cnn models is not yet investigated for aspect extraction or text classification tasks.  we compare the effect of initiating the word embedding features for cnn models using skipgram and continuous bag of word (cbow) trained word2vec [8] models for aspect extraction. although c a hybrid approach for aspect extraction from customer reviews 2 international journal on advances in ict for emerging regions september 2019 related research reported the use of cbow models for aspect extraction, the optimal technique for the same has not been identified through a comparative study.  we show that the use of non-static cnn models (that update word vectors during training) perform better than static models (that do not update word vectors during training) for aspect extraction, in the absence of word2vec models trained with domain-specific corpora.  we incorporate prediction probabilities from svm aspect classification model [4] to improve the performance of our cnn with the expectation that manually constructed features could help to improve the overall performance. the semeval task 5 datasets [9] for restaurant and laptop domain have been used in this research for training and evaluation of the models. we were able to significantly outperform the current state-of-the-art techniques for multidomain aspect extraction using our mixture of classifier. the rest of the paper is organized as follows. in section 2, related work is discussed. section 3 explains the semeval2016 task 5 dataset. section 4 elaborates our aspect classifier models in detail. experimental results are discussed in section 5. finally, section 6 concludes the paper. ii. literature review in the recent literature, majority of work on aspect detection is performed using supervised and hybrid machine learning approaches. machacek [10] presented a supervised machine learning approach using bigram bag of words model. although this model was tuned with several different features extracted manually, it has not represented the sentence well as opposed to cnn models that capture features automatically during training. in contrast to the traditional supervised machine learning methods, toh et al. [5] presented a hybrid approach, which uses a cnn along with a binary classifier. this system was the top ranked system in the semeval 2016 task 5 competition. furthermore, khalil and el-beltagy [11] used an ensemble classifier that used a combination of a cnn initialized with pre-trained word vectors and a support vector machines (svm) classifier with a bag of words model as features. it has also been shown that cnn architecture performs well in multiple other text categorization tasks [3]. kim [3] has experimented with a cnn model with static and non-static channels of word vectors to represent a sentence. he has observed that non-static cnn has outperformed static cnn for a significant number of datasets. however, these experiments have not been carried out for aspect extraction. jihan et al. [4] use an svm to predict the aspect category with multiple features extracted from text. they have used a clever pre-processing pipeline to clean and normalize text data. this model has obtained a f1 score of 74.18 and 52.21 for datasets from restaurant and laptop domains (respectively) provided in semeval-2016 task 5. furthermore, mtna [12] obtained a f1 score of 76.42 on the restaurant dataset by training a set of one-vs-all deep neural network models consisting of an lstm layer followed by a cnn layer using both aspect category and aspect term information. we consider these two systems as our benchmark. iii. semeval-2016 task 5 dataset existence of a dataset such as the one provided by semeval2016 task 5 provides a standardized evaluation technique to publish our results, and they can be compared fairly with other systems, which are evaluated on the same dataset. previously many different researchers used various data sets in their publications, making it difficult to compare the results obtained. our proposed cnn classifier and the baseline cnn are trained using the official semeval-2016 task 5 dataset of reviews for restaurant (training: 2000, testing 676 sentences) and laptop (training: 2500, testing 808 sentences) domains. training sentences are annotated for opinions with respective aspect category, while taking the context of the whole review into consideration. sentences are classified under 12 and 81 classes in the restaurant and laptop domains, respectively. iv. methodology this section describes the architectures for the mixture of classifiers that we propose for the task of aspect extraction. convolutional neural network architecture is presented in section a, in section b we introduce word2vec embedding, support vector machine classifier and features used are introduced in section c, and proposed mixture of classifiers in section d. a. convolutional neural network our cnn model is inspired by the text classification architecture proposed by kim [3], and the work done by toh et al. [5] for aspect extraction. in implementing the cnn, each sentence is represented with a 𝑅𝑛×𝑘 sentence feature matrix, where each row is the feature vector of the corresponding word. here 𝑛 is the number of words in the sentence, and 𝑘 is the size of the feature vector. we only used word vectors for each word as the features. even though the convolutional layer requires a sentence matrix with a fixed size, customer reviews have different word counts. therefore, a padding tag was added to extend the sentence length to a predefined length, thus allowing all the sentences to have the same length. 1) baseline cnn: our baseline cnn is similar to the cnn presented by toh et al. [5]. in this model, a convolution layer with a window size of 𝑤 is applied to the sentence feature matrix to generate new features. we use zero padding for convolutional operations to generate a feature map with the same height as the sentence matrix. then the max pooling layer is applied to select the most important feature from each feature map. then we use a single hidden dense layer as proposed by toh et al. [5]. using the output from the last dense layer, the softmax layer computes the probabilities of having each aspect in each sentence. then a predefined threshold value (𝑡ℎ) is used to classify each sentence to the aspect categories according to the probability outputs from the softmax layer. toh et al. [5] introduced another category for sentences with no aspects. however, we consider this as redundant. we can determine the 3 yasas senarath#1, nadheesh jihan#2, surangika ranathunga#3 september 2019 international journal on advances in ict for emerging regions sentences without any aspects when all the probability values for each aspect are less than the threshold. 2) improved cnn: cnn model used by toh et al. [5] contains a convolution layer with a single kernel. since the convolutional kernel has a fixed window size, determining that value to capture most of the contextual information is a difficult task. with a small kernel, the convolutional layer may fail to capture contextual information and semantic relationships that are larger than the selected kernel size. choosing a very large kernel can degrade the quality of features by capturing multiple contextual information into a single feature. therefore, the convolution layer of our improved cnn uses several convolutional kernels with different filter sizes and single step stride, thus generating a 1 × 𝑛 feature map for each filter. use of the convolutional layer with multiple kernel sizes provides more flexibility to the cnn model to extract semantic relationships with various lengths as the features. toh et al. [5] used only a single hidden dense layer with rectified linear unit (relu) activation. however, huang et al. [7] constructively proved that a two-hidden layer feedforward networks with 2√(𝑚 + 2)𝑁(≪ 𝑁) hidden units can be used to learn 𝑁 distinct samples with any arbitrarily small error, where 𝑚 is the number of output neurons. if we consider the outputs from the convolutional layer as features and the softmax layers as the output layer with 𝑚 number of hidden units, then we can implement the two hidden layer feedforward network in between those two layers replacing the single hidden layer in the baseline cnn. therefore, we introduced two hidden layers 𝐿1 and 𝐿2 withℎ1 and ℎ2 hidden units, respectively. the hidden units ℎ1 and ℎ2 are determined using equations (1) and (2) as proposed by huang et al. [7]. h1 = √n × (m + 2) + 2 × √n/(m + 2) (1) ℎ2 = 𝑚 × √𝑁/(𝑚 + 2) (2) kim [3] shows that using dropout to prevent co-adaptation of hidden units by randomly dropping a proportion of hidden units can significantly improve the cnn for general sentence classification tasks. therefore, we introduced a dropout layer instead of kernel regularization to our cnn implementation to perform dropout regularization [13] to prevent the model from over-fitting to the training data. figure 1 shows the network structure of our improved cnn. it presents the process of extracting convolutional features from the sentence matrix using two convolutional kernels. then the max pooling layer selects the best features from both convolutional feature matrices extracted by two convolutional kernels. the output neurons from max-pooling layers are transformed to class probability outputs using the two-hidden layers and the softmax layer. b. word2vec embedding mikolov et al. [8] presented cbow and skip-gram architectures to implement word2vec models. the cbow architecture predicts the current word based on the context 1 https://www.yelp.com/dataset/_challenge fig. 1 the architecture of our convolutional neural network (surrounding words), whereas the skip-gram architectures use the current word to predict the surrounding words (context) [8]. kim [3] showed that in the absence of a large supervised training set, initializing the feature vector using word2vec improves the performance of the cnn model for text classification tasks. even though toh et al. [5] and khalil et al. [11] have only used the cbow trained word2vec models to train cnn models for aspect extraction, a comparative study of the performance of cbow and skip-gram to initiate word embeddings to train cnn models for text classification is not available. thus, we tried both continuous bag of words (cbow) and skip-gram trained word2vec models to initiate word embedding features for the improved cnn model. the word2vec models were trained using the yelp1 and amazon product review2 datasets. in addition, we trained both the cnn models with google's pre-trained word2vec (cbow trained) 2 http://jmcauley.ucsd.edu/data/amazon/ a hybrid approach for aspect extraction from customer reviews 4 international journal on advances in ict for emerging regions september 2019 table i hyper-parameters of baseline cnn layer parameter value convolutional layer window size 5 # of filters 300 activation tanh hidden layer # of neurons 100 activation relu training parameters batch size 50 epochs 50 threshold 0.2 model3, which was trained using over 3 million words and phrases. kim et al. [3] presented the use of a non-static cnn instead of static cnn to further fine-tune the word2vec embedding during the training of the cnn model for text classification tasks. he found that non-static cnn performs better for most of the tasks that he experimented on. however, toh et al. [5] and khalil et al. [11] followed only the static approach for aspect extraction, where the word2vec embeddings for each word are kept fixed during the training time. fine-tuning of word embedding features can be useful when using word2vec models that are trained using a corpus different from the dataset that is used to train the cnn model. especially for aspect extraction, if both datasets are from different domains (restaurant reviews vs laptop domain) and generated using different sources (e.g. online articles vs customer reviews), then the syntactic-semantic patterns and vocabulary used may not be the same for both datasets. therefore, we experimented with both static and non-static model variations [3] of our improved cnn to test our hypothesis. toh et al [5] used adadelta [14] as the update function. we used adam as the optimizer of both cnn models, which is shown to converge faster than most of the existing optimization techniques [15]. we used k-fold cross validation with {k=5} to determine the best neural network configuration and values for hyperparameters (except for ℎ1 and ℎ2). we set 100 as maximum word count (𝑛) for any sentence. table i shows the hyperparameters used with baseline cnn, which are similar to the parameters selected by toh et al. [5]. table ii presents the hyperparameters of improved cnn that are tuned for both domains using the cross-validation results and the equations (1) and (2) that are used determine the number of hidden units for each hidden layer. c. support vector machine we used features used in jihan et al. [4] to create svms for aspect category classification. multi-label classification required to classify the aspect terms is performed with one-vsrest strategy, as the svm classifier itself is a binary classifier. therefore, following a one-vs-rest strategy we used 12 and 82 3 https://code.google.com/archive/p/word2vec/ table iiiii hyper-parameters of improved cnn layer parameter rest. laptop convolutional layer window size 3, 5 3, 5 # of filters 300 each 300 each activation tanh tanh dropout layer dropout rate 0.7 0.7 hidden layer 1 # of neurons 191 467 activation relu relu hidden layer 2 # of neurons 143 445 activation relu relu training parameters batch size 50 50 epochs 60 60 threshold 0.2 0.2 svm classifiers for the restaurant and laptop domains, respectively. we used cross-validation for selecting the optimal parameters for the classifier. following is the list of features we used in our research: 1) bag of words: text represented as the multiset of its lemmatized words 2) custom built word lists: count of words in a collection of food and drink names / laptop related keywords 3) frequent words: count of frequent words per category based on tf-idf score identified in training dataset. 4) opinion targets: extracted opinion targets annotated in the training dataset. count of words per required target is identified by opinion target. 5) symbols: presence of price indicators and presence of exclamation mark. 6) ending words: bag of five words at the end of a sentence. 7) named entities: presence of a person, organization, product or location in the text. 8) head nouns: presence of a head nouns extracted by per sentence phrase. 9) mean embedding: mean embedding vector for each sentence was calculated using the word2vec google`s pre-trained model4. 4 https://code.google.com/archive/p/word2vec 5 yasas senarath#1, nadheesh jihan#2, surangika ranathunga#3 september 2019 international journal on advances in ict for emerging regions table iii experimental results for static and non-static cnn with each word2vec model word2vec restaurant f1 laptop f1 static non-static static non-static cbow trained 0.6700 0.6849 0.4229 0.4422 skip-gram trained 0.7405 0.7481 0.4694 0.4880 google word2vec 0.7538 0.7596 0.493 0.5174 fig. 2 f1-score against word2vec on restaurant dataset fig. 3 f1-score against word2vec on laptop dataset fig. 4 mixture of classifiers for aspect classification (𝑹 indicates the input review sentence; svm and cnn refers to pretrained models discussed in previous sections; 𝑨𝒗𝒈. represents average function and 𝑽𝒂 is the output aspect vector) d. mixture of classifiers first, cnn and svms are trained individually following the procedure explained in section 4. each model can estimate the probability of each aspect being presented in a given review. thus, in the mixture of classifiers, we consider the probability outputs from both models to determine the class labels of each prediction. let us consider 𝑝𝑘 (𝑐) the probability of class 𝑐 ∈ 𝐶 , where 𝑘 is either cnn classifier or one-vs-rest svm classifiers. therefore, the output probability of the mixture of classifiers 𝑝𝑚𝑐 (𝑐) is defined as illustrated in equation 3. a visual illustration of the same is provided in fig. 4. 𝑝𝑚𝑐 (𝑐) = 𝑝𝑐𝑛𝑛 (𝑐)+𝑝𝑠𝑣𝑚(𝑐) 2 ; for 𝑐 ∈ 𝐶 (3) in equation 3, the final probability of each class is computed by averaging the probability output for each classifier. the resulting probability is then considered the prediction of the mixture of classifiers. since the output is a probability value, we use a threshold to decide the actual classification; the predicted aspect labels. a suitable threshold is determined by using k-fold cross validation (similar settings to the hyperparameter tuning). a hybrid approach for aspect extraction from customer reviews 6 international journal on advances in ict for emerging regions september 2019 table iv result comparison with baseline and benchmark models model restaurant f1 laptop f1 cnn (baseline) 0.7356 0.4824 cnn (improved: l1 only) 0.7492 0.5044 cnn (improved) 0.7596 0.5174 svm [5] 0.7418 0.5221 nlangp [8] 0.7303 0.5194 mtna [7] 0.7642 hybrid model (t = 0.3) 0.7717 0.5454 v. discussion table iii presents the f1 scores of the improved cnn model for both restaurant and laptop domains. results are shown for each word2vec type used to initiate the word vectors to train the cnn models. moreover, table iii shows the change of accuracy from static models to non-static models for each word2vec used. figurefig. 2 andfig. 3 show the improvement of the models with different word2vec models for each static and non-static version with both restaurant and laptop datasets, respectively. using skip-gram trained word2vec, we were able to increase the accuracy of the cnn model significantly compared to the cbow trained word2vec model. this is not surprising, as we have seen that skip-gram models are significantly better on semantic tasks than cbow models [8]. aspect extraction also mostly involves understanding the semantic word relationships rather than interpreting the syntactic relationships between words. however, the cnn model that used the pre-trained google word2vec model gave better accuracy than when using other word2vec models that were trained using yelp and amazon review datasets. this is because those review datasets are much smaller (in the number of documents and vocabulary) than the google news dataset that was used to train the pretrained google word2vec. kim [3] shows that even though non-static cnn models are expected to perform better than static cnn models, it is not true for all the cases. however, aspect extraction for restaurant or laptop domain is a domainspecific task and it requires word vectors to be fine-tuned for that specific domain. therefore, non-static cnn models performed better than static cnn models with the fine-tuned word vectors for the considered task and domains. table iv shows the best f1 scores for both baseline and improved cnn compared with the existing state of the art systems. cnn (baseline) and cnn (improved) are the baseline cnn and improved cnn, respectively. we also added the results of the improved cnn before optimizing the number of hidden layers and hidden units. therefore, the cnn (improved: l1 only) uses a single hidden layer with 100 hidden neurons as similar to the baseline model. the improved cnn has achieved a remarkable improvement compared to the baseline cnn model. this shows the significance of the modifications to the improved cnn model. if we compare cnn (baseline) and cnn (improved: l1 only), the modifications to the feature extraction and fine-tuning have shown a significant improvement of the cnn model. moreover, optimizing the number of hidden layers and hidden units using the two hidden layer feed-forward network that was proposed by huang et al. [7] has a noticeable contribution to the overall improvement of the cnn models for both restaurant and laptop domains. moreover, we can observe that improved cnn has a significant improvement for the restaurant domain. our cnn model outperforms the hybrid system presented by toh et al. [5] that combines both cnn and feedforward neural network (fnn), and the one-vs-rest svms presented by jihan et al. [4]. it is important to highlight that both above models use more features including word embeddings and they use strong classification models such as fnn and svm. yet, we showed that even by adding little flexibility to the cnn kernel with multiple kernels (e.g. cnn (improved: l1 only)), we can improve the feature selections to outperform the classification models that use both neural and traditional features. however, cnn alone has failed to outperform the mtna. in contrast to the mtna, our cnn architecture is simpler. hence, instead of compromising the simplicity and computational complexity of the cnn architecture, we have outperformed the mtna using our mixture of classifiers; which utilizes the both automatically extracted features and manually engineered features to extract the aspect from customer reviews. even though our cnn model shows close performance to the laptop domain results of both benchmark models [4, 5], it fails to outperform those models. we can explain this observation using the evaluation results of static and non-static variations of the cnn model. we can observe a significant improvement for the non-static model when compared with the static version for the laptop domain, whereas for the 7 yasas senarath#1, nadheesh jihan#2, surangika ranathunga#3 september 2019 international journal on advances in ict for emerging regions restaurant review dataset that improvement is not that significant. therefore, we can assume that the google word2vec embeddings are semantically relevant to the restaurant domain, and less accurate for laptop domain. the significant improvement due to using non-static cnn opposed to static cnn for laptop domain provides evidence to the poor accuracy of google word2vec embedding for laptop domain. the fine-tuning of the non-static model increased the results remarkably from 0.4930 to 0.5174, which brings us closer to the benchmark models. yet, this fine-tuning fails to improve the word embedding after a certain level (otherwise eventually we could have observed the same accuracy with every word2vec model used). the benchmark models used additional features specially designed for each domain, whereas we used only the google pre-trained word2vec embeddings that are not optimized for laptop domain, which explains the failure of our cnn model to outperform benchmark models for the laptop domain. yet, our hybrid classifier has yielded a 4-5% accuracy gain compared to the state-of-the-art aspect extraction techniques in laptop domain. the cnn model illustrated comparably poor accuracy due to the insufficient domain specific evidence to the model. however, svms with manually engineered features have shown to capture such domain specific features remarkably [4]. therefore, the use of svms probabilities to strength the softmax outputs of cnn classifier has allowed us to incorporate that domain-specific evidence to strengthen the final probability outcomes of the hybrid model. vi. conclusion this paper presents a mixture of classifiers for multidomain aspect extraction, which can outperform the current state-of-the-art aspect extraction techniques by combining a cnn and one-vs-rest svm classifiers. first, we presented an improved cnn for aspect extraction, which can outperform the state-of-the-art systems when provided with well-trained word2vec embeddings. moreover, we showed that word embedding features generated using skip-gram trained models are better than the features from cbow trained word2vec models for aspect extraction. we also demonstrated how the size and the domain of corpus used can affect the accuracy of cnn models used for aspect extraction. our experiment shows that non-static cnn models can be used to improve aspect extraction in the absence of word2vec models trained with domain-specific corpora. moreover, we have improved the cnn model by introducing a second hidden layer. we have shown that using the equations proposed by huang et al. [7] to determine the number of hidden units of both layers can outperform the traditional cnn models with a single dense layer. we are expecting to further explore the effect of this modification to the cnn model for general text classification tasks. secondly, we showed that our improved cnn model can achieve comparable performance for both restaurant and laptop domains, without any domain-specific hyperparameter optimizations. our experiments highlight an important observation; that the same model can be used in different domains effectively with the same set of hyperparameters that is optimized for another domain. we are yet to determine the general applicability of this observation by experimenting with data sets from different domains. if the hyperparameter optimization of our improved cnn model proves to be domain independent, this will make the use of this cnn model on a new domain more straightforward, since no domain-specific parameter optimization is needed. finally, we derived a mixture of classifiers combining our improved cnn model with the svm classifiers based on stateof-the-art custom engineered features, without introducing additional complexity to the improved cnn architecture. we demonstrated that the combined accuracy of cnn and svm classifiers to outperform the current best systems for both restaurant and laptop domains. in the future, we expect to extend the cnn architecture and to experiment with new deep neural architectures for aspect extraction from multi-domain customer reviews. the attention technique can be a possible direction to further improving deep neural networks for the task of aspect extraction. moreover, exploring the new ways of building embeddings models; capturing both general and domain-specific data can enable new avenue of research for the both aspect extraction and text classification tasks. references [1] c. lin and y. he, “joint sentiment/topic model for sentiment analysis,” in proceedings of the 18th acm conference on information and knowledge management, 2009. [2] k. schouten and f. frasincar, “survey on aspect-level sentiment analysis,” ieee transactions on knowledge and data engineering, vol. 28, pp. 813-830, 2016. [3] y. kim, “convolutional neural networks for sentence classification,” arxiv preprint arxiv:1408.5882, 2014. [4] n. jihan, y. senarath, d. tennekoon, m. wickramarathne and s. ranathunga, “multi-domain aspect extraction using support vector machines,” in proceedings of the 29th conference on computational linguistics and speech processing (rocling 2017), taipei, 2017. [5] z. toh and j. su, “nlangp at semeval-2016 task 5: improving aspect based sentiment analysis using neural network features.,” in semeval@ naacl-hlt, 2016. [6] b. wang and m. liu, deep learning for aspect-based sentiment analysis, stanford university report, https://cs224d. stanford. edu/reports/wangbo. pdf, 2015. [7] g.-b. huang, “learning capability and storage capacity of twohidden-layer feedforward networks,” trans. neur. netw., vol. 14, pp. 274-281, 3 2003. [8] t. mikolov, k. chen, g. s. corrado and j. dean, “efficient estimation of word representations in vector space,” corr, vol. abs/1301.3781, 2013. [9] m. pontiki, d. galanis, h. papageorgiou, i. androutsopoulos, s. manandhar, m. al-smadi, m. al-ayyoub, y. zhao, b. qin, o. de clercq and others, “semeval-2016 task 5: aspect based sentiment analysis,” in proworkshop on semantic evaluation (semeval-2016), 2016. [10] j. machácek, “butknot at semeval-2016 task 5: supervised machine learning with term substitution approach in aspect category detection.,” in semeval@ naacl-hlt, 2016. [11] t. khalil and s. r. el-beltagy, “niletmrg at semeval-2016 task 5: deep convolutional neural networks for aspect category and sentiment extraction.,” in semeval@ naacl-hlt, 2016. [12] w. xue, w. zhou, t. li and q. wang, “mtna: a neural multi-task model for aspect category classification and aspect term extraction on restaurant reviews,” in proceedings of the eighth international joint conference on natural language processing (volume 2: short papers), 2017. [13] n. srivastava, g. hinton, a. krizhevsky, i. sutskever and r. salakhutdinov, “dropout: a simple way to prevent neural networks a hybrid approach for aspect extraction from customer reviews 8 international journal on advances in ict for emerging regions september 2019 from overfitting,” journal of machine learning research, vol. 15, pp. 1929-1958, 2014. [14] m. d. zeiler, “adadelta: an adaptive learning rate method,” arxiv preprint arxiv:1212.5701, 2012. [15] d. kingma and j. ba, “adam: a method for stochastic optimization,” arxiv preprint arxiv:1412.6980, 2014. [16] n. jihan, y. senarath and s. ranathunga, “aspect extraction from customer reviews using convolutional neural networks,” in 2018 18th international conference on advances in ict for emerging regions (icter), 2018.  10 ( artificial neural network ensembles in time series forecasting: an application of rainfall forecasting in sri lanka harshani r. k. nagahamulla, uditha r. ratnayake, asanga ratnaweera abstract— weather forecasting is a widely researched area in time series forecasting due to the necessity of accurate weather forecasts in various human activities. out of numerous weather forecasting techniques artificial neural networks (ann) methodology is one of the most widely used techniques. in this study the application of neural network ensembles in rainfall forecasting is investigated by using various types of ensemble neural networks (enn) to forecast the rainfall in colombo, sri lanka. ensembles are generated by changing the network architecture, changing initial weights of the ann and changing the ann type. two ensembles one consisting of a collection of networks of various architectures of multi layer feed forward network with back propagation algorithm (bpn), and the other consisting of a combination of bpn, radial basis function network (rbfn) and general regression neural network (grnn). the performance of ensembles are compared with the performance of bpn, rbfn and grnn. the anns are trained, validated and tested using daily observed weather data for 41 years. the results of our experiment show that the performance of the ensemble models are better than the performance of the other models for this application and that changing the network type gives better results than changing the architecture of the ann. index terms— rainfall forecasting, artificial neural networks (ann), multi layer feed forward network (mlffn), back propagation algorithm (bpn), radial basis function network (rbfn), general regression neural network (grnn), ensemble neural network (enn) i. introduction weather forecasting is predicting the state of the atmosphere for a certain location for a certain time period. data on previous and current atmospheric conditions and various scientific methods are used for this. rainfall is the liquid form of precipitation. quantitative precipitation forecasts are very important for planning day to day human activities. also for an agricultural based country like sri lanka long term plans and management activities like flood management, water resource management and agricultural planning activities depend on accurate rainfall predictions. ancient people observed weather patterns and predicted weather according to past weather occurrences. nowadays there are many approaches to weather forecasting. mathematical modelling, statistical modelling and artificial intelligence techniques are some of them. using mathematical models of the atmosphere to predict future weather based on current weather conditions is called numerical weather prediction. this needs full knowledge of atmospheric dynamics and involves calculations with a large number of variables and huge datasets. although this process requires a lot of computational resources due to the advancement of modern computer hardware there have been many improvements in numerical weather prediction [1]. still there are difficulties in short term weather predictions because of sudden atmospheric changes. statistical weather forecasting methods mainly use time series analysis, time series forecasting and regression analysis using statistical models like autoregressive integrated moving average (arima). these methods give excellent results in forecasting pressure and temperature but faces difficulties in forecasting precipitation accurately because the distribution of precipitation is bounded and skewed [2]. nowadays a lot of researches are conducted on the use of intelligent techniques in rainfall forecasting. genetic algorithms, fuzzy systems and ann are some of them. ann is a forecasting tool that can handle complicated data efficiently. different types of ann exhibit different advantages. a collection of a finite number of ann trained for the same task is called an ensemble neural network. hansen and salamon [3] explains that the generalisation ability of an ann can be significantly improved through an enn and the advantages of separate ann can be combined to give a better result. in an ensemble separate ann are trained individually and then their outputs are combined. the objective of this study is to investigate the appropriateness of enn in rainfall forecasting and compare different methods of ensemble generating techniques. the performance of enn is compared with bpn, rbfn and grnn models. the rest of this paper is organised as follows. section ii reviews the usage of ann and ensembles in forecasting. section iii describes the ann methodology, bpn, rbfn, grnn and the ensemble techniques and a description of our methodology and the experimental setup. section iv presents the results of our study, section v contains an analysis of the obtained results as a discussion and section vi concludes the paper. ii. related work in recent years many researches were conducted on forecasting with ann. a brief analysis on a few of them are included here. kuligowski and barros [4] used an ann model and a linear regression model to forecast six hourly precipitation amounts on four locations in middle atlantic region of the united states and found that the ann model gives better results for heavy rainfall. they have used a dataset with 528 possible predictor variables and forward screening regression was used to select the best predictor variables. santhanam and subhajini [5] evaluated the performance of rbfn with bpn to identify which ann is the most effective on classification of rainfall prediction for kanya kumari district in india using 10 years meteorological data. they have classified the rainfall into rain and no rain. according to their study the rbfn was the most effective method with an accuracy of 88.5%. santhanam and subhajini [6] has extended the same study by including grnn in the performance analysis and evaluating the performance of grnn, rbfn and bpn to identify which ann is the most effective on classification of rainfall prediction. they have found that the grnn was the most effective method with an accuracy of 96.8% and have outperformed bpn and rbfn in identifying both rain and no rain situations. the grnn have a parallel structure which allow the network to train in a single iteration and as the size of the dataset increases the error approaches towards zero. these properties enable the grnn to outperform rbfn. leung et al [7] used a grnn to predict the monthly exchange rate of three currencies british pound, canadian dollar and the japanese yen. their results show that grnn is an effective method for financial forecasting problems. luk et al [8] used a mlffn to forecast short duration rainfall at specific locations within a catchment area. it had given reasonable prediction for the next 15 minutes but had difficulty in predicting the peak values of rainfall. they have found that the networks with simple structure yielded better performance and the networks with more hidden nodes tended to over learn the training data. lee and liu [9] developed an automatic system for weather information gathering, filtering and prediction. they have used an agent based approach where a mobile agent was used to gather data and a fuzzy-neuro system to predict rainfall. the neural network was trained using back propagation algorithm. they have only considered the rain depth and were not interested in the exact amount of rainfall. gheyas and smith [10] developed an ensemble with a collection of grnn for time series prediction. they have used synthetic and real datasets to test their model. they found that the ensemble method gives better predictions compared with bpn and grnn and other statistical forecasting methods. maqsood, khan and abraham [11] developed an ensemble neural network model for weather prediction in southern saskatchewan, canada. in this model weights for each network in the ensemble were determined dynamically from the respective certainties of the network outputs. 24 hour ahead forecasts were made for temperature, wind speed and relative humidity. the performance of the ensemble was compared with different types of single networks and statistical models and it was found that the ensemble model gives the best performance. the above studies indicate that various types of ann models yield reasonable results in forecasting applications and they outperform statistical models. even though there are many researches done on forecasting using ensemble techniques in weather forecasting no studies were found on predicting the actual rainfall amount using ensembles. in this study the applicability of ensembles in forecasting the rainfall amount in sri lanka is investigated by developing different enn models. iii. methodology a. study area the study area was selected to be colombo which is located on western coast of sri lanka on north latitude 6º 55' and east longitude 79º 52'. rainfall of sri lanka is influenced by the monsoon winds from the indian ocean. rainfall is categorised into four climate seasons. first inter monsoon season from march to april, southwest monsoon season from may to september, second inter monsoon season from october to november and northeast monsoon from december to february. rainfall occurs in three typesmonsoonal, convectional and depressional. monsoon rain occurs during the two monsoon periods and is responsible for nearly 55% of the annual precipitation. the other types of rainfall occurs in inter-monsoon periods. colombo has a tropical monsoon climate. although colombo experience rain in all four seasons heavy rains occur from may to august from the southwest monsoon and october to february. the annual rainfall of colombo is about 240 cm. b. data collection the data used for predictor variables was ncep_1961-2001 dataset. the dataset contains 41 years daily observed data from 1961 to 2001, derived from the ncep reanalysis [12]. it contains 26 variables as described in table 1. the data set was obtained from the canadian climate change scenarios website (http://www.cccsn.ca) from the grid box 22 x, 32 y where the middle of the grid box corresponds to north latitude 7º 5' and east longitude 78º 75'. the daily ncep values are the average of 4 values taken at 0z, 6z, 12z and 18z. (z time is in reference to 0º longitude at greenwich, england.) daily rainfall data for colombo for 41 years (1961-2001) have been collected from the department of meteorology sri lanka (http://www.meteo.gov.lk/). this dataset was used as the output of the ann. the ann models were trained using the first 25 years (1961-1985) data as training data. they were validated using the next eight years (1986-1993) data as validation data and were tested using the remaining eight years (1994-2001) data as test data. table i predictor variables variable description variable description mslpas mean sea level pressure p5zhas 500 hpa divergence p_fas surface airflow strength p8_fas 850 hpa airflow strength p_uas surface zonal velocity p8_uas 850 hpa zonal velocity p_vas surface meridional velocity p8_vas 850 hpa meridional velocity p_zas surface velocity p8_zas 850 hpa velocity p_thas surface wind direction p850as 850 hpa geopotential height p_zhas surface divergence p8thas 850 hpa wind direction p5_fas 500 hpa airflow strength p8zhas 850 hpa divergence p5_uas 500 hpa zonal velocity r500as relative humidity at 500 hpa p5_vas 500 hpa meridional velocity r850as relative humidity at 850 hpa p5_zas 500 hpa velocity rhumas near surface relative humidity p500as 500 hpa geopotential height shumas surface specific humidity p5thas 500 hpa wind direction tempas mean temperature at 2m c. predictor variable selection the ncep_1961-2001 dataset was normalised over the complete period. i.e. the mean and standard deviation for the period were calculated and the mean subtracted from each daily value before dividing by the standard deviation. the predictor variables were in different ranges. the ann can learn faster and find weights in a predictable range if all inputs are in similar ranges. to equalise the importance of predictor variables all predictor variables were scaled to a range from -1 to 1. to chose the predictor variables first correlation analysis was performed. the pearson correlation coefficient of rainfall and each variable in the dataset were calculated. as summarised in table ii the correlation coefficients were very small. then principle component analysis was performed. according to eigen analysis it was decided to use 8 components to represent 80.1%. the correlations between the variables and the components were small with a very few larger values making it difficult to identify the principle components that represent the dataset well. finally it was decided to use all 26 variables as predictor variables. the results of principle component analysis is given in appendix a. table vii shows the eigen analysis of the correlation matrix and table vii shows the correlations between the variable and the components. table ii correlation coefficients of rainfall and the other variables variable correlation coefficient variable correlation coefficient mslpas -0.078 p5zhas -0.063 p_fas -0.094 p8_fas -0.041 p_uas -0.002 p8_uas -0.011 p_vas 0.061 p8_vas 0.040 p_zas 0.068 p8_zas 0.074 p_thas -0.028 p850as -0.072 p_zhas -0.056 p8thas -0.027 p5_fas 0.048 p8zhas -0.044 p5_uas -0.021 r500as 0.105 p5_vas 0.060 r850as 0.113 p5_zas 0.076 rhumas 0.208 p500as -0.040 shumas 0.158 p5thas 0.032 tempas -0.039 d. artificial neural network methodology an ann is a computational model motivated by the biological neural networks. they consist of a large number of processing elements that can work in parallel. anns are mostly used in classification (pattern recognition) and prediction problems because they can derive meaning from data that are too complex to be handled by humans. anns can manipulate large volumes of data with noise and imprecise information easily. due to these characteristics of ann they will be ideal to handle the complex and imprecise weather data and provide an accurate rainfall prediction. anns have to be trained, validated and tested before using in an application. training is the process of adjusting network parameters to represent the data set. once trained the network can give the output for the given set of input data it has not seen previously. this is called generalisation. when training the network there is a chance of the network adjusting its parameters to only match the training data set. this is called over fitting. validation data is used to avoid over fitting and stop the training at an appropriate time. testing is the process of checking the network to see whether it can match an unseen data set. e. multi layer feed forward network with back propagation algorithm mlffn is a basic ann architecture with at least one hidden layer. the hidden layers provide nonlinearities to the network. the basic principal of an mlffn is that all the connections point in one direction so that the data flow from input layer to the output layer. bpn is a kind of gradient descent technique with backward error propagation. the initial network output is compared with the expected output and the network parameters are adjusted until the error becomes minimal. the performance of the bpn increases with the size of the data set available. one major limitation in bpn is that it is prone converge to local minima. mlffn with varying architectures were implemented by changing the number of hidden layers (one, two), number of nodes per layer (5, 6, 7, 8, 9, 10, 11, 12 in the first hidden layer and 3, 4 in the second layer) and the activation functions for the hidden and the output layers (sigmoid (1) and gaussian (2)). the input layer had 26 nodes one for each predictor variable and the output layer had one node. to train the networks bpn algorithm was used. fig 1, depicts the architecture of the bpn. f(x) = 1 1+e-4x (1) f(x) = exp[-x2/4] (2) the weights on the hidden and the output layer were calculated according to the following. wij(t+1) = wij(t) + ηδpjxi (3) where the learning rate η = 0.7 the mlfn was trained until the root mean square error (rmse) of training set was less than 0.1. fig 1. architecture of the multi layer feed forward network with back propagation algorithm. f. radial basis function network a rbfn is a three layer feed forward network whose output units form a linear combination of the basis functions computed by the hidden units. the basis function (activation function) in the hidden layer produces a localised response to the inputs. the most common basis function used is the gaussian function [13]. the network complexity and its generalisation capability depend on the number of neurons in the hidden layer. learning in the rbfn can be divided into two stages, learning in the hidden layer using an unsupervised learning method followed by learning in the output layer using a supervised learning method [13]. the rbfn can be used for both classification and function approximation. rbfn with three layers 26 nodes in the input layer one for each predictor variable and one node in the output layer was developed. the number of nodes in the hidden layer was varied methodically to find the number of nodes that gave the best prediction. the hidden layer was trained using k-means clustering algorithm and the output layer was trained using gradient descent algorithm. in k-means clustering algorithm the centre of each cluster was initialized to a different randomly selected training pattern. then each training pattern was assigned to the nearest cluster by calculating the euclidean distances between the training patterns and the cluster centres. when all training patterns were assigned, the average position for each cluster centre was calculated. they then become new cluster centres. this process was repeated until the cluster centres do not change during the subsequent iterations. fig 2, depicts the architecture of the rbfn. the gaussian function (4) was used as the rbf in the hidden layer. fig 2. architecture of the radial basis network. oj = exp[-(x-wj)*(x-wj)/2σj2] (4) where σ is the normalisation factor equation (3) was used to calculate the weights on the output layer with the learning rate η = 0.7 g. general regression neural network grnn is a probabilistic neural network proposed by d. f. specht [14]. grnn is popular in forecasting applications because there is no iterative learning due to its parallel structure. the main advantage of grnn is that it can learn in one iteration. another advantage is that they can converge to the underlying function with a very few training samples compared with bpa. as the data set grows the error becomes zero. because of these characteristics grnn are widely used in forecasting applications. there are four layers in grnn: input layer, hidden layer, summation layer and output layer. the input layer consists of one node per predictor variable. the hidden layer consists of one node per training sample. this layer computes the euclidean distance of the test case from the nodes centre point. the summation layer has only two nodes: the denominator which adds up the weight values from the hidden layer and the numerator which adds up the weight values multiplied by the actual target value for hidden neurons. the output layer consists of one node which divides the value of numerator node by the value of denominator node. grnn with four layers 26 nodes in the input layer one for each predictor variable, a node for each input pattern in the hidden layer, two nodes in the summation layer and one node in the output layer (26,9131,2,1) was developed (fig 3). the input layer receives the input vector and distributes the data to the pattern layer. the pattern layer calculates oj the euclidean distance of each case from the nodes centre point using (4). the numerator uses (5) to add the weight values multiplied by the actual target value for hidden neurons, the denominator uses (6) to add weight values from the hidden layer and the output node uses (7) to divide the value of numerator node by the value of denominator node. sj = ∑i wij θj (5) sd = ∑i θj (6) yj = sj/sd (7) fig 3. architecture of the general regression neural network h. neural network ensembles training a finite number of ann for the same task and combining their results is known as an ann ensemble. hansen and salamon’s work [3] shows that the generalisation ability of ann increase through ensemble. due to this ann ensemble techniques have become very popular in ann applications. creating an ensemble involves two phases: training the individual ann and combining them. 1) training the individual networks there are many different ensemble techniques as explained by sharkey [15]. these methods are based on varying the parameters related to the design and the training of ann such as · varying the initial random weights the ensemble is created by networks trained with different initial random weights. · varying the network architecture – the ensemble is created by networks with different network architecture. (number of hidden layers, number of nodes in hidden layers, activation function) · varying the network type – the ensemble is created with different network types. · varying the training data – the ensemble is created with networks trained by different training data sets. (different data sets obtained by sampling the training data, data sets obtained from different sources, data sets obtained by different preprocessing phases.) past researches [15], [16] have shown that combining these techniques can give better generalisation in the ensemble. after training the set of ann a decision has to be made about which ann are to be included in the ensemble. there are various techniques available from trial and error methods to genetic optimisation techniques like addemup [17] for this. in this study two ensembles were created by combining some of the above techniques. · enn1 – varying the network architecture and varying the initial weights. · enn2 – varying the network type and varying the initial weights. 2) combining the networks to combine the selected ann the most frequently used methods are average or weighted average method. the simple average method gives the same priority for all ann. it does not consider that some ann are more accurate than others. in weighted average method each ann is assigned a weight to minimise the mean squared error (mse) of the ensemble. fig 4. flow chart for creating enn1 enn1 was created with 12 bpn created with different network architectures. the network architecture was changed by changing the number of hidden layers (one, two), number of nodes per layer (5, 6, 7, 8, 9, 10, 11, 12 in the first hidden layer and 3, 4 in the second layer) and the activation functions for the hidden and the output layers (sigmoid (1) and gaussian (2)). each different bpn was trained sevaral times with different initial weights. fig. 4 illustrates the creation of enn1 using a flow chart. bpn, rbfn and grnn models each shows different advantages and different generalisation capabilities. to incorporate all these advantages and to improve generalisation of the ensemble it was decided to use bpn, rbfn and grnn models to create the enn2. the networks were trained using different random initial weights and using the same training and validation datasets. fig. 5 illustrates the creation of enn2 using a flow chart. the architecture of enn2 is illustrated in fig. 6 fig 5. flow chart for creating enn2 for both enn1 and enn2 trial and error method was used to identify the anns that generate the best performing ensemble by trying out different combinations of the trained anns. fig 6. architecture of enn2 ann were combined using the weighted average method. weights were assigned to minimise the mse of the ensemble. the validation dataset was provided for the ensemble and for each ann and the mse was calculated separately. the weights were assigned according to each mse such that the summation of all weights is equal to one (8). (8) each previously trained bpn, rbfn and grnn provided a separate prediction and those predictions were combined using the weighted average method to get the final forecast of the ensemble. i. measuring forecasting accuracy accuracy of a forecasting model is how well the forecasting model is able to reproduce data that is already known. in this study the root mean square error (rmse) (9), mean absolute error (mae) (10) and the coefficient of determination r2 are used as measurements of accuracy. rmse = (9) mae= (10) where e is the difference between the actual value and the predicted value. the smaller the rmse and mae value the model is more stable but if there are a few larger errors rmse value will magnify those errors and as a result will be larger. in such a case mae is a better measurement. iv. results all ann models in the study were trained with daily data of 25 years (1961-1985) and validated with data of next eight years (1986-1993) and the trained networks were tested using the remaining eight years (1994-2001). a. training results of the bpn models bpn models were created by varying the their architecture by increasing the number of the nodes in the hidden layers from (26,5,1) to (26,12,4,1) and changing the activation functions. each network was trained several times with different initial weights and it was noted that the rmse value for testing decreases when the number of nodes are increased and then it starts to increase again. the best performance was given by a bpn with (26,10,1) architecture and sigmoid function as the activation function. also it was noted that for many instances the bpns with sigmoid function as the activation function performed better than the bpns with gaussian function as the activation function. the average rmse values obtained by the each bpn for testing is summarised in the table iii table iii correlation coefficients of rainfall and the other variables number of nodes average rmse for testing activation function hidden layer 1 hidden layer 2 sigmoid gaussian 5 9.87 9.85 3 9.92 9.97 4 9.95 9.93 6 9.84 9.78 3 9.88 9.91 4 9.91 9.86 7 9.76 9.77 3 9.82 9.84 4 9.80 9.88 8 9.70 9.73 3 9.77 9.81 4 9.73 9.84 9 9.64 9.68 3 9.78 9.74 4 9.69 9.71 10 9.51 9.62 3 9.66 9.77 4 9.82 9.86 11 9.60 9.57 3 9.81 9.93 4 9.79 9.82 12 9.72 9.74 3 9.83 9.85 4 9.87 9.91 b. training results of the rbfn models the generalization power of the rbfn depends on the number of hidden nodes. to find the number of nodes in the hidden layer of rbfn that gives the minimum rmse, the number of hidden nodes in rbfn were incremented gradually and the networks were trained and tested several times using different initial weights. similar to the performance of bpn it was noted that after some variations the rmse value for testing decreases when the number of nodes are increased and then it starts to increase again. the best performance was given by a rbfn with (26,73,1) architecture. the average rmse values obtained by the each rbfn for testing is summarised in the fig. 7. fig 7. average rmse obtained by each rbfn for the testing c. training results of the grnn models a grnn with (26,9131,2,1) architecture was created and trained several times with different initial weights. it was noted that unlike bpn and rbfn the performance of the grnn doesn’t change much with the initial weights. d. training results of the enn1model enn1 was created by combining some of the trained bpn models using the weighted average method. the ensemble was started from two networks and networks were added one by one. the performance of the enn1 when the networks are added is illustrated in fig. 8. fig 8. rmse of enn1 with the number networks in the enn1. the best performance was obtained by enn1 when the number of networks were six and seven. the rmse obtained in both cases was 8.21. e. training results of the enn2 model enn2 was created by combining some of the trained bpn, rbfn and the grnn models using the weighted average method. the ensemble was started from two networks and networks were added one by one to it using trial and error method. the best performance was obtained by enn2 when eight bpn, two rbfn and a grnn model were in the ensemble. the rmse obtained was 8.06. table iv summarises some of the combinations of networks used in the ensemble. table iv combinations of networks in enn2 and their rmse number of networks rmse bpn rbfn grnn 8 2 1 8.06 7 2 1 8.07 6 2 1 8.09 8 1 1 8.12 7 3 1 8.14 adding a bpn to the ensemble only made very small difference in the rmse of the ensemble. weights of each network in the best performing enn2 is given in the table v. table v weights of the networks in enn2 ann type individual rmse after training weight in the enn2 bpn 1 9.44 0.03 bpn 2 9.48 0.03 bpn 3 9.49 0.02 bpn 4 9.52 0.02 bpn 5 9.57 0.01 bpn 6 9.60 0.01 bpn 7 9.61 0.01 bpn 8 9.63 0.01 rbfn 1 8.69 0.11 rbfn 2 8.86 0.14 grnn 8.22 0.61 table vi summaries the performance of each model for the test data by the rmse value mae value and the coefficient of determination (r2). table vi performance of bpn, rbfn, grnn, enn1 and enn2 ann type rmse mae r2 training time (s) bpn 9.44 5.36 0.47 14573 rbfn 8.69 5.04 0.55 573 grnn 8.22 4.76 0.60 10 enn1 8.21 4.74 0.60 enn2 8.06 4.61 0.62 bpn gives the highest rmse, mae and the lowest r2 and the enn2 gives the lowest rmse, mae and the highest r2. the graphs of expected output vs. network output for bpn, rbfn, grnn, enn1 and enn2 are given in appendix b fig 8. v. discussion compared with the performances of bpn, rbfn and grnn the two ensemble models enn1 and enn2 showed better performance. from the two ensembles enn2 showed the best performance. enn1 was created with varying the network architecture and initial weights and enn2 was created with varying the network type and initial weights. results obtained by the study indicates that ensembles of networks gives better performance than individual networks and that varying the network type is more suitable for creating ensembles than varying the network architecture. the ensemble models were able to predict almost all occurrences of zero rainfall accurately and were able to give reasonably accurate predictions for other rainfall occurrences. but for rainfall larger than 100 mm the predicted values by the ensembles were very small. the rmse value is larger than the mae value by a considerable amount for all ann models. this is because all models were unable to predict the higher rainfall accurately and these large errors were magnified by the rmse value. bpn was able to predict smaller rainfall values accurately but was unable to predict higher rainfall accurately. higher rainfall predictions from grnn were more accurate than that of rbfn. bpn, rbfn and grnn were all able to predict the occurrences of zero rainfall accurately with rbfn giving the best performance but all performed poorly in predicting rainfall larger than 100 mm. vi. conclusion and future work although the ensemble model predicts the rainfall with reasonable accuracy there are still deviations between the ensemble predictions and actual rainfall for higher rainfall occurrences. these high rainfall occurrences maybe the result of some random changes in the atmosphere. in this study the ensembles were created by changing the network architecture and initial network weight (enn1) and network type and initial network weights (enn2). the ann for the ensemble was selected using trial and error method and the weighted average method was used to combine the selected ann. further investigation on obtaining more accurate prediction for higher rainfall is to be done in future studies by experimenting with other ensemble techniques like changing the training data and network selection and combining methods. ann has become a widely used technique in weather prediction because of their ability to manipulate complex data and deal with noise efficiently. in this study the performance of two ensemble networks enn1 and enn2 in rainfall forecasting in colombo, sri lanka was studied and its performance was compared with bpn, rbfn and grnn models. daily observed data for 41 years was used to train, validate and test the models. the objective of this study was to investigate the performance of ensemble networks in rainfall forecasting. the results show that ensemble models predict rainfall more accurately than individual bpn, rbfn and grnn models. the major weakness of both ensemble models are that they were unable to predict higher rainfall accurately. appendix a result of the principle component analysis is given below. table vii shows the eigen analysis of the correlation matrix. table vii eigen analysis of the correlation matrix eigen value proportion cumulative 9.1131 0.338 0.338 3.2012 0.119 0.456 2.8366 0.105 0.561 1.8493 0.068 0.630 1.4728 0.055 0.684 1.1878 0.044 0.728 1.0192 0.038 0.766 0.9472 0.035 0.801 0.8275 0.031 0.832 0.7781 0.029 0.860 0.6545 0.024 0.885 0.5606 0.021 0.905 eigen value proportion cumulative 0.5049 0.019 0.924 0.4663 0.017 0.941 0.3431 0.013 0.954 0.2529 0.009 0.964 0.2351 0.009 0.972 0.2156 0.008 0.980 0.1564 0.006 0.986 0.1141 0.004 0.990 0.0784 0.003 0.993 0.0508 0.002 0.995 0.0478 0.002 0.997 0.0318 0.001 0.998 0.0262 0.001 0.999 0.0230 0.001 1.000 0.0058 0.000 1.000 according to eigen analysis 8 components were used to represent 80.1% in the principle component analysis. table vii shows the correlations between the variables and the components. table viii correlations between the variables and the components variable pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 mslpas -0.214 -0.148 -0.225 0.249 -0.291 0.023 -0.223 0.151 p_fas 0.226 -0.115 -0.063 0.193 -0.008 -0.296 -0.147 -0.066 p_uas 0.281 0.076 0.224 0.096 -0.032 -0.018 0.004 -0.055 p_vas -0.287 0.164 -0.064 0.003 0.078 -0.032 0.135 0.106 p_zas -0.129 0.025 0.324 -0.321 0.150 0.093 -0.159 0.137 p_thas -0.230 -0.131 -0.260 -0.040 0.077 -0.046 -0.039 0.087 p_zhas 0.269 -0.225 0.045 -0.042 -0.072 0.029 -0.119 -0.089 p5_fas 0.004 -0.062 -0.065 -0.143 0.176 -0.736 0.142 0.116 p5_uas 0.220 0.096 -0.033 0.300 0.107 0.172 0.142 0.350 p5_vas -0.107 0.410 0.038 0.224 0.088 -0.170 -0.240 -0.121 p5_zas 0.108 0.109 -0.090 -0.240 0.171 0.054 -0.577 0.364 p500as -0.151 -0.128 0.277 0.250 -0.438 -0.164 -0.126 0.105 p5thas -0.185 -0.079 0.021 -0.326 -0.129 -0.208 -0.167 -0.415 p5zhas 0.104 -0.419 -0.039 -0.220 -0.074 0.172 0.248 0.120 p8_fas 0.236 -0.051 -0.133 0.181 0.069 -0.372 0.024 0.080 p8_uas 0.295 0.107 0.016 0.182 -0.011 -0.037 0.042 -0.013 p8_vas -0.243 0.288 0.063 0.097 0.070 0.018 0.131 -0.006 p8_zas 0.193 0.172 0.059 -0.145 -0.036 -0.011 -0.421 0.095 p850as -0.236 -0.177 0.003 0.237 -0.341 -0.012 -0.195 0.173 variable pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 p8thas -0.250 -0.144 -0.067 -0.136 0.075 -0.106 -0.008 0.061 p8zhas 0.240 -0.301 -0.073 -0.115 -0.054 -0.003 -0.125 -0.006 r500as 0.136 0.182 -0.209 -0.126 -0.195 0.057 0.091 -0.120 r850as 0.045 0.197 -0.289 -0.139 -0.216 -0.077 0.082 0.039 rhumas 0.122 0.277 -0.148 -0.211 -0.442 0.006 0.061 -0.090 shumas 0.093 0.157 0.403 -0.164 -0.346 -0.113 0.105 0.014 tempas -0.038 -0.116 0.526 0.031 0.093 -0.093 0.062 0.087 appendix b fig 8. the graphs of expected output vs network output for bpn, rbfn, grnn, enn1 and enn2 with the corresponding linear regression lines. references [1] lynch p. (2008). the origins of computer weather prediction and climatic modelling. jounal on computational physics, 227: 3431–3444. [2] malone t. f. (1955). application of statistical methods in weather prediction. geophysics, 41: 806–815. [3] hansen l. k. and salamon p. (1990). neural network ensembles. ieee trans pattern anal, 12(10): 993-1001, [4] kuligowski r. j. and barros a. p. (1998). localized percipitation forecasts from a numerical weather prediction model using artificial neural networks. journal on weather and forecasting, 13: 1194-1204. [5] santhanam t. and subhajini a. c. (2011). an efficient weather forecasting system using radial basis neural network. journal of computer science, 7: 962-966. [6] santhanam t. and subhajini a. c. (2011). computational performance of grnn in weather frecasting. asian journal on information technology, 10: 165-169. [7] leung m. t. chen a. and daouk h. (2000). forecasting exchange rates using general regression neural networks. computers and operations research, 27: 1093-1110. [8] luk k. c. ball j. e. and sharma a. (2001). an application of artificial neural networks for rainfall forecasting. mathematical and computer modeling, 33: 683-693. [9] lee r. and liu j. (2004). ijade weatherman: a weather forecasting system using intelligent multiagent-based fuzzy neuro network, ieee transactions on systems, man and cybernetics – part c: applications and reviews, 34(3): 369 – 377. [10] gheyas i. a. and smith l. s. (2009). a neural network approach to time series forecasting. proceedings of the world congress on engineering, ii. [11] maqsood i. khan m. r. and abraham a. (2004). an ensemble of neural networks for weather forecasting. neural computing and applications, 13: 112 – 122. [12] kalnay e., et al. (1996). the ncep/ncar 40-year reanalysis project. bulletin of the american meteorological society, 77: 437-471. [13] fu limin. (2003). neural networks in computer intelligence, tata mcgrow-hill. [14] specht d. f. (1991). a general regression neural network. ieee transactions on neural networks, 2: 568-576. [15] sharkey a. j. .c. (1999). combining artificial neural nets ensemble and modular multi net systems, springer-verlag. [16] partridge d. (1996). network generalization differences quantified. neural networks, 9(2): 263-271. [17] optiz d. w. shavlik j. w. (1996). actively searching for an effective neural network ensemble. connection science, 8: harshani r. k. nagahamulla is with the dept of computing & information systems, faculty of applied sciences, wayamba university of sri lanka, kuliyapitiya, sri lanka (e-mail: harshaninag@yahoo.com) uditha r. ratnayake is with the dept of civil engineering, faculty of engineering, university of peradeniya peradeniya, sri lanka (e-mail: � hyperlink "mailto:udithar@pdn.ac.l" �udithar@pdn.ac.l�k) asanga ratnaweera is with the dept of mechanical engineering, faculty of engineering, university of peradeniya peradeniya, sri lanka (e-mail: asangar@pdn.ac.lk) ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2014 07 (02) 1 reconfigurable wireless sensor node with improved autonomous linearization anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala abstract— the usage of reconfigurable hardware in wireless sensor networks plays an important role which enables multi parameter estimation in complex environments. the need for vast sensor compatibility and sensor fusion will become crucial requirements as wireless sensor networks become autonomous and intelligent. due to the non linear behaviour of sensors and analogue circuitry in the front end, such applications often require a rigorous software model to perform sensor linearization for best accuracy. this paper presents an implementation of a reconfigurable wireless sensor node based on a cmos analogue multiplexer network supporting universal functions for different industrial sensors while being compact and energy efficient, which is conducive to wireless sensor networking. further an accurate optimized and autonomous linearization technique based on rational interpolation is also presented in order to compensate non idealities of the sensor node. the concept, simulation results, prototype implementation with industrial components and the results of system integration are discussed in this paper to illustrate potential applications in mass scale data acquisition based on wireless sensor networks. keywords— wsn, non linear sensors, reconfigurable hardware, autonomous systems, rational interpolation, multiplexers i. introduction in the modern world, the combination of multi functioning data acquisition systems and wireless communication protocols provide creative means to develop energy efficient, intelligent wireless sensor nodes for industrial applications. the ability to interact with the physical environment in wireless domain provides more robust and versatile platform over traditional environmental monitoring scheme. reconfigurable hardware technologies have been recently introduced to wireless sensor networks (wsn) for robust, low power operation while enabling multi parameter data acquisition for a given node. a single sensor node may be designed integrating a sensor element, signal conditioner and network application processor (ncap) into one sensor node. single sensor mems (micro-electro-mechanical sensor) chips and motes are some examples for such a system [1]. if a node per sensor approach is used, many nodes are required in case different parameters of a certain area are to be monitored. maintaining such a wsn is quite expensive as well as increases the complexity of the network. the traditional approach to provide multi sensor capability for a wsn node is by employing several signal conditioning stages to the front end of the circuit where every sensor interface is dedicated to a certain sensor type. compared to traditional hardware design topologies, reconfigurable hardware designs have shown a certain improvement of designing more flexible devices [2]. field programmable gate array (fpga) based designs are more popular for digital sensors, though they require additional components to facilitate multi sensing operation in mixed signal domain [3]. in the analog domain, two major programmable hardware technologies are the field programmable analog arrays (fpaa) and programmable mixed signal system on chip technology (e.g. psoc). both technologies are capable of providing all the common components for supporting different mixed signal processing applications [4]. ii. related work the development of an autonomous wireless sensor node involves the embedding self-adjustment functionalities that should be able to fix non idealities of the system such as offset, variations in gain and non linearity for rigorous operation. such a system should utilize the least amount of time and power for readjusting process. an accepted practice utilized in the past by measurement system designers was to linearize the sensor signal in order to compensate for system non idealities. the subject of linearization of measurement systems has been considered on different forms and stages, basically in the design of circuits with mos and cmos technologies [5-6]. studied cases included the usage of auxiliary hardware and programmable software solutions to evaluate the linearization performance of measurement systems. one application suggests analogue to digital converters to solve nonlinearities at the same time the conversion is made [7-8]. further, rom memories have been used to save data tables and optimize lookup tables to solve linearization problems [9-10]. simple resistor divider technique and a programmable hardware linearization method were also presented [11-12]. the self-calibration concept using artificial neural networks is approached from different perspectives: simulation of auto calibration results [13] and works related with auto calibration of specific sensors [14-15]. some of these cannot be easily implemented on a low power microcontroller (μc). several important works related to recursive algorithms that can be applied to the self readjustment of intelligent sensors exist based on different type interpolation techniques. among the simplest algorithms, the three point calibration method [16-17] suggests a simple point by point calibration technique for single and multidimensional data set. also the progressive polynomial calibration method [18] and its improved version [19] were proven to be efficient methods in terms of computation burden and optimized accuracy. however achieving a highly non linear transfer function becomes difficult due to the large set of calibration points required at initial stage. manuscript received february, 16, 2014. recommended by dr. nalin ranasinghe on july 28, 2014. anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala are with department of electrical and computer engineering, sri lanka institute of information technology, new kandy road, malabe, sri lanka (e-mails: anuradha.r@sliit.lk, lahiru.r@sliit.lk, kalyanapala.m@sliit.lk) mailto:anuradha.r@sliit.lk mailto:lahiru.r@sliit.lk mailto:kalyanapala.m@sliit.lk anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala international journal on advances in ict for emerging regions 2 fig. 1 proposed wsn node design recent studies [20] show that the use of rational polynomial interpolation for transfer function modelling has a better approximation for a given data set with the minimum least square error compared to polynomial approximations methods. compared with other techniques such as lagrangian interpolation or gauss newton approximation , the asymptotic behaviour and better interpolation ability of rational functions makes it ideal for modelling highly non linear characteristics of systems where the transfer function of the system can be expressed with least number of terms. to counteract previously mentioned issues, we propose a design topology of a reconfigurable wireless sensor node which was designed based on ultra low power analogue components. the software of the central processing unit incorporates point by point linearization method which autonomously determines the best transfer curve for sensors by given calibration points. by using programmable mixed signal hardware architecture, this new approach enables a single channel of this node to obtain signals from a vast diversity of sensors. the most significant advantage would be the autonomous transfer function identification of non linear sensors. it also ensures homogeneity among wsn nodes. iii. concept of reconfigurable interface in the proposed system, the flexibility of reconfigurable interface is essentially important since it decides the universal functionality and application scope of the data acquisition process of wsn node. the interface circuit was designed to accommodate various types of sensor output signals, provide sensor driving, offset adjustment, data linearization and automatic gain to match the signals and sensitivities of different sensor elements. the trade-off between the measurement accuracy and the flexibility should be optimized so that the resulting system can be widely adaptable to the multi parameter system. moreover power management of the device is critically important since wsn nodes are deployed in an energy constrained environment [21]. fig. 1 shows the hardware block diagram of the wsn node where dedicated signal conditioning stages in traditional design were replaced by universal transducer interface module (utim). as seen from fig. 1, the most distinct advantage of this design is that this can be further miniaturized through an integrated mixed circuit design, even though a modular approach was employed in the current prototype. a. mixed signal circuit design the utim is based on an analog multiplexer network which facilitates the required functionality to perform signal routing between sensor signals and amplifier stages. time-divisionmultiplexing will be used to take analog sensor signals from a distributed sensor array periodically so that data can be processed on a common transmission line in a digitally encoded format. these multiplexers were chosen from cmos logic family because of its high noise immunity and ultra low power profile, where the design can be fabricated to be compact with low power, cost effective implementation. also utim uses the architecture to synchronize the multiplexing and sampling to ensure a proper settling of the analog data signals while maintaining energy efficiency. the use of dynamic sensor power cycling with utim provides an opening for reducing average power consumption in applications where energy use must be tightly managed and power hungry sensors are interfaced. the average power dissipation, of a transducer is quantified by the following equation: (1) where d, pon , and poff represent duty cycle, power in normal operation and power in off-mode of the transducer respectively. selecting appropriate duty cycle by considering settling time of the sensor, the average power of the entire system can be managed to operate in very low level. for instance, considering a load cell (vishay model 9010), pavg can be greatly minimized while maintaining the measurement accuracy. table 1 shows the technical data of the load cell. taking ton (power-on time) =160ms (settling time + 10ms acquisition time), and ton+off (cycle time) =1s the average power can be written: the average power consumed by the sensor is approximately 16% of its normal operation power. hence dynamic power cycling is proven to be energy efficient with the utim circuit. the initial design shown in fig. 2 is configured for four analogue channels. each analogue channel consist of 4-wire interface, where ex+, ex-, sen+ and sencorrespond to the terminals of excitation supplies and differential sensor signal respectively. these differential signals will be converted into a single ended voltage by routing the signal to an appropriate analog block. each analog block was design for a specific sensor type which may include op-amps, discrete component to provide signal conditioning and filtering before fed into the adc of the processing unit (mcu). the basic utim design may require an optional multiplexer stage, so that only one adc channel is sufficient to digitize the sensor analog signal. table 1 technical data of the load cell (vishay model 9010) parameter value settling time 150 ms (typical) operating power 240 mw off-power 6.8 w offonavg pddpp )1(  mwp avg 0068.0)16.01()24016.0(  mwp avg 405.38 reconfigurable wireless sensor node with improved autonomous linearization international journal on advances in ict for emerging regions 3 fig. 3 bridge excitation (constant current) fig. 4 low cost programmable gain amplifier b. signal conditioning blocks the multi functioning capability of utim is achieved by integrating different signal conditioning blocks at the back end of multiplexer network. these intermediate blocks were made of analog functioning integrated circuits such as operational amplifiers; each block converts the sensor signal into a voltage parameter, matches impedances and scales the signal level prior to the adc input. these analog blocks were designed based on the operational principle and the type of central element of the sensor, so that each block provides more general functions rather than being a dedicated interface type. besides that, two independent excitation supplies were provided, where source selection is optional. figure 3 shows the utim operation for a typical bridge sensor. normally bridge sensors generate a weak signal which represents change in resistance of the strain gauges. if the gauge is powered with voltage excitation, the voltage drops appear on switch on resistance (ron) of the multiplexer will cause a significant error to the measurement. in such a situation, it is adequate to drive the bridge with known current excitation where the ron is not constant. excitation block (i) represents a basic current source device made with very few components. for single element varying bridge, the voltage output is given by; (2) where ∆r is the resistance change of the bridge element due to the applied strain. programmable gain amplifier (pga) is another important part of a data acquisition system. when dealing with wide range of signal levels, different gain ratios may be required to scale signals for better accuracy. since most of pgas available in the market are very expensive, we are proposing a low cost solution based on simple analog components. figure 4 shows the design of pga using a single op amp, analog mux with couple of discrete components for three different gain values. the control bus line of the multiplexer should be maintained in a logic level to provide the required gain. this design has the advantage of bypassing ron from the gain stage so that the variations of ron (caused by thermal drifts, multiplexer input voltage levels) will not affect on the measurement. better performance can be obtained using precision resistors for r1 to r3. system parameters relevant to this design such as frequency response and gain errors will be discussed in the device implementation stage. iv. autonomous linearization technique some sensors have the advantage of a very high sensitivity to changes in physical phenomena, and the disadvantage of an aggressively nonlinear characteristic. on the other hand this might introduce a non linear error to the measurement, which fig. 2 utim based on analogue components at front end        r ra v r r v ref o 1 2 anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala international journal on advances in ict for emerging regions 4 should be eliminated using a proper hardware design or software calibration. some techniques based on numerical approximation methods have been discussed in early literature [22-28], include lagrangian interpolation polynomials, cubic splines, gauss newton algorithm, steepest descent techniques. generally for power constraint reconfigurable embedded systems, the least amount of time and computational burden always preferred in applying these algorithms. very little literature has been written with regard to rational functions so this area is mostly open for exploration to researchers and mathematicians. rational approximation was found to be a successive method for linearizing highly non linear data due to its asymptotic nature. the method presented here is a simplified version of the thiele’s continuous fraction algorithm which ultimately derives the transfer function of a sensor using conventional recursive divided differences for given dataset. the algorithm finds appropriate rational function after each calibration point utilizing a finite iterative technique. because of this simple point by point interpolation process, the algorithm does not require memory to hold large matrices and computational loads such as matrix inversion etc. in the first step, we would consider calibrating univariate response. the algorithm starts with taking divided difference between two calibration data points. using triangle rule, we can form the divided difference coefficients according to format as in table ii given below. where; (3) (4) corresponding rational expression for each point can be recursively found according to the following procedure: (5) where and are given by the coefficients of diagonal and numerator fractions respectively. (6) where j = 1……n-1, n = number of calibration points defining the continuous fraction given in (3) can be simplified to equation (4) which enables finding the final rational function for the entire data set. when using the proposed algorithm, it is necessary to define its performance to satisfy certain criteria such as, number of calibration points, minimum non linearity error, and computational speed. in order to demonstrate the performance of the proposed method, a non linear data set was generated using an increasing exponential function to interpolate by the simplified rational interpolation (sri) algorithm in the first step. the evaluation has been carried out by changing the linearity of the function keeping number of calibration points constant. the result shown in figure 5 was obtained using 4 calibration points which clearly explains the behaviour of the non linear response. figure 5 clearly shows the interpolated result of the non linear data set and its corresponding error curves. point a, b, c and d represent successive calibration points which have been used to determine the approximated rational function. variable r determines the non linearity of the function. changing the non linearity by setting r = 0.2, 0.4 and 0.6, the table ii calibration data set i xi fi divided differences 0 x0 f0 φ(x0, x1) φ(x0, x2) φ(x0 , x1 , x2) φ(x0, x3) φ(x0 , x1 , x3) φ(x0 , x1 , x2, x3) 1 x1 f1 2 x2 f2 3 x3 f3 fi and xi represent dependent and independent variables of the sensor response n n n a b a b a b fxr | | ..... | | | | )( 2 2 1 1 0            )( ),,....,( 1 10 00 nn nnn xxb xxxa fa  mn mn nm ff xx xx   ),( ),,...(),,...( ),...,( pomqom pq qpm xxxxxx xx xxx      fig. 5. the interpolates data and its error plot with proposed algorithm for a non linear function 21 21      jjjj jjjj j j j qbqa pbpa q p r        1,0,1 ,1,0 012 0012 qqq appp reconfigurable wireless sensor node with improved autonomous linearization international journal on advances in ict for emerging regions 5 simulated error plot gives a good evaluation of the effectiveness of the algorithm. according to the error plots given in fig 1, it can be noticed that 20% of maximum error has been reduced to 1% when the non linearity factor decreases by 0.2. apart from that, linearization was attempted with popular gauss newton algorithm (gna), polynomial least square method (plsm) [29-33] and progressive polynomial method (ppm). to obtain an acceptable error for gn and plsm method, a large number of data set, iterations and initial values were required. compared to these methods, ppm which is a point by point calibration technique shows an acceptable accuracy of 75%, 4 data points. when the non linearity is further reduced (r = 0.4, 0.6) the observed accuracies were 80% and 87% respectively. this shows that the sri algorithm provides the best accuracy with the least required number of calibration data points and computational burden. increasing the number of calibration points to six, we can see the nonlinear error of the generated data set can be greatly reduced as shown in fig 6. with r set to 0.2, the maximum error of the data set was minimized to 0.37% which is a great benefit of the proposed algorithm. the result proves that even the highest non linearity can be greatly reduced by increasing number of points in the algorithm. in the second step of the evaluation, the algorithm was applied to linearize a standard 3.3k thermistor using 4 calibration points. a thermistor is a good example for a non linear sensor which has considerable non linearity where the change in the measurement is most rapid at low temperatures, giving great resolution for determining the corresponding temperature values there. at the other end of the range, resistance levels change relatively less with temperature and measurement resolution is relatively poor. the sensor data set was taken from datasheet recommended by national institute of standard and technology (nist) [34]. points were selected to cover its entire temperature range. the resistance value of the thermistor and corresponding temperature in kelvin were selected as independent and dependent variables respectively. following table shows the data set and their inverted differences. four calibration points were selected from the resistance range representing temperature values ranging from 273k to 360k. using eq(3), (4) and (6) the rational function can be computed step by step according to the following format. j = 1; j = 2; j = 3; the obtained rational function by the sri algorithm and conventional steinhart model for 3.3 k thermistor were compared with sensor data to evaluate the performance of the proposed method. the result from the simulation is shown in fig 7. the error is significantly much smaller in terms of percentage when compared with other algorithms described previously and as seen in the figure, the sri algorithm, with its lower table iii thermistor example i r(ω) t(k) divided differences 0 341 359.26 -52.14 -87.23 -99.25 -118.53 -103.00 -849.09 1 3643 295.93 2 7126 281.48 3 10481 273.71 fig. 6.minimized error result with six calibration points. x qbqa pbpa q p 012.08.365 1101 1101 1 1       81.1531 1084.501.260 5 0212 0212 2 2       x x qbqa pbpa q p 5 852 1323 1323 3 3 1098.923.946 1086.31059.2       x xx qbqa pbpa q p fig. 7.interpolated result compared with stein hart equation anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala international journal on advances in ict for emerging regions 6 computation time is a significant leap in deriving the sensor’s characteristic function. the simulation results show that approximately 98% accuracy (1.14% of maximum error) can be achieved with 4 calibration points of the sri algorithm. compared to steinhart model, sri result gives a moderate value. if the computational workload is considered, the sri algorithm has robust simplification steps rather than using logarithmic functions of steinhart model for time constraint applications. the proposed algorithm was tested on an arm cortex m3 microprocessor setting its operating frequency to 10mhz which is ideal for low power operating wsn applications. to find coefficients (divided differences) at the initial stage, the microprocessor required 119 us. only 84 us execution time was required to compute the rational function at each recursion cycle. in contrast, steinhart equation required 3.32 ms for its logarithmic function, which makes the new sri algorithm efficient in saving computational burden. however it is also important to note the convergence of the proposed method with minimum number of calibration points in order to evaluate the efficiency. the same example for thermistor can be used for this. next figure shows the simulation result of the maximum error with respect to the calibration points used to interpolate the data set. increasing the number of calibration points to 5, shows a dramatical change in maximum error of the interpolated data set. the maximum error of 1.14% for 4 point calibration has been minimized to 0.083% by increasing the number of points to 5, which is a great benefit of the proposed method. v. overall performance & prototype design since the emphasis of this paper is focused on suggesting an autonomous linearization technique for a reconfigurable wsn system, the user may select and integrate other auxiliary peripherals for the wsn node, depending on the application requirements and feasibility. in developing the experimental prototype for this research, the paramount concern was creating a low power design, with universal capability and optimized accuracy. the effectiveness of the algorithm will be proven later in terms of processing speed and convergence. the components used for the utim included dual 4:1 cmos analog multiplexers (max4618) for the signal routing stage and a micro power op-amp (lt1079) for optimum power efficiency in this prototype. the performance of this utim was tested with the lpc1769 arm cortex m3 micro processor from nxp and the xbee zb module. the lpc1769 has ultra low power profiles for different sleep modes. the power consumption of the utim with respect to its switching frequency with different sensor attachments was recorded as shown fig 9. with the current consumption given in micro amperes (ua) and channel switching frequency in hertz (hz) in log-scale, the observations prove that the utim is highly power efficient for typical industry grade applications since the current consumption can be maintained as low as 478 ua for adequate switching frequencies. the device performance in high frequencies is determined by the frequency responses of individual components. to evaluate the frequency response of the measurement chain, we feed a sinusoidal signal to the input and record the signal output at final stage by setting pga gain to 1, 10 and 100 respectively. the gain error of the final output with respect to the input signal frequency is shown in fig 10. as shown in fig 10, when pga gain is set to 1 (1% tolerance resistors were used) the operational range of the device is limited to 200 khz. increasing gain to 10 and 100 subsequently reduces the operation range to 50kz and 8khz respectively. this behaviour can be expected in micro power operational amplifiers since the power optimization of vlsi design lead to a reduction in gain bandwidth product of the component. however this frequency limit is more than adequate for low power industry grade applications and this limit is where the optimum performance is preserved. if required, the performance can be extended to higher frequencies by simply replacing the op-amp used. once the data acquisition node was ready for operation along with the required calibration setup, the autonomous fig. 8.convergent property. fig. 9 current consumption of utim. fig. 10 frequency response of the prototype design. reconfigurable wireless sensor node with improved autonomous linearization international journal on advances in ict for emerging regions 7 calibration could begin. this could be realized in two ways: the user may either provide points remotely to the node or let the node use an intelligent mechanism to find appropriate points for the automatic calibration stage. this improved linearization method provides the means to model the input output transfer function to match the effects of the environment parameters depending on whether the user knows about the environment subject to monitoring. if the user may not know much about the environment, the device will intelligently perform the calibration. a reference sensor is also used to correctly map the inputs with the outputs, so that subsequently the transfer function of the sensor is obtained by the sri algorithm after the required calibration points are derived and this can be done within the microprocessor or externally for much more computational accuracy if required. with the sri algorithm running in the micro-processor, the accuracy of the acquired data has been evaluated with different sensors attached to the prototype sensor node. the test bench included a k-type thermocouple (aem 30-2067), 10k ntc thermistor (epcos b57867s0103 10kω @ 25°c), pt100 rtd (rs 611-8264) and a 20kg load cell (vishay model 9010 2mv/v). initially all sensors were tested with both voltage and current excitation supplies for best accuracy. the results were compared with the improved linearization method output to determine the effectiveness of proposed method. table iv shows the comparison result. table iv accuracy comparison with sri method sensor best accuracy without sri with sri k type thermocouple ±3 c˚ ±0.8 c˚ thermistor ±1 c˚ ±0.5 c˚ pt100 rtd ±3 c˚ ±0.4 c˚ load cell ±250g (1.3% full scale) < 0.5% full scale here the proposed sri algorithm can be successfully used in micro processor software to compensate signal errors present at the measurement. further, the fast execution time of the algorithm provides a feasible time schedule for wsn event management. the prototype implementation of the wsn node is shown in fig 11. vi. conclusions and future exploration the proposed utim concept has a highly flexible front end design for wireless sensor node technologies which reconfigures its hardware in an autonomic way to interface with a wide range of distributed sensors to solve the multisensing challenge encountered in many wsn applications where different types of sensors are to be supported. proposed simplified rational polynomial interpolation technique provides a robust linearization for sensors in order to maintain least amount of execution time required for reconfigurable hardware systems. power consumption tests prove that utim is highly power efficient and can be integrated with an ultra low power processing unit (e.g. nano watt) in a wsn node. the circuit can be further miniaturized through surface mount devices. for future work, the development of adaptive reconfigurable hardware designs, autonomous sensor identification and sensor plug and play technologies will be evaluated to introduce a better deployment for distributed wsn applications. new possible algorithms that follow in the path of rational interpolation [35] will be explored to improve sensor linearization. design improvements and miniaturization through newer high end versions of programmable system-on-chip integrated mixed signal devices will be considered with the required reconfigurability and the capable computational power. other signal processing techniques will be also integrated to optimize the final solution. acknowledgment the authors would like to thank the managing director of attotech system engineering (pvt) ltd and senior staff of sri lanka institute of information technology, malabe for providing hardware accessories, laboratory facilities to accomplish this task. references [1] j. t. john and s. r. j. ramson, “energy-aware duty cycle scheduling for efficient data collection in wireless sensor networks”, ijarcet volume 2, issue 2, feb . 2013. [2] shenfang yuan, lei qiu, shang gao, “providing self-healing ability for wireless sensor node by using reconfigurable hardware”, mdpi sensor journal, oct. 2012. [3] k. m. iswarya, s. usha, “adaptive runtime reconfigurable wireless sensor nodes based on dynamical data aggregation”, ijreat volume 1, issue 1, march. 2013. [4] rentao wang, ken deevy, “reconfigurable adaptive wireless sensor node”, issc 2012, nui maynooth, june .28, 2012. [5] n.i. khachab, m. ismail, “linearization techniques for n-th order sensor model in mos vlsi technology”. ieee trans. circuits syst. 1991, 38, 1439-1449. [6] z. guangbin, s. sooping, l. jin, s. sterrantino, d.k. johnson, s. jung, “an accurate current source with on-chip self-calibration circuits for low-voltage current-mode differential drivers”. ieee trans. circuits syst., 2006, 53, 40-47. [7] g.e. iglesias, e.a. iglesias, “linearization of transducer signal using an analog to digital converter”. ieee trans. instrum. meas. 1988, 37, 53-57. [8] b. vargha, i. zoltán, “calibration algorithm for current-output r-2r ladders”. ieee trans. instrum. meas. 2001, 50, 1216-1220. [9] p. malcovati, c.m. leme, p. o´leary, f. maloberti, h. baltes, “smart sensor interface with a/d conversion and programmable calibration”. ieee j. solid-state circuits 1994, 29, 963-966. [10] l. e. bengtsson, “lookup table optimization for sensor linearization in small embedded systems”, journal sensor technology, doi:10.4236/jst.2012.24025, dec.2012.. [11] anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala, “reconfigurable universal sensor interface for distributed wireless sensor node”. ieee icter. 2013, colombo. fig. 11. prototype implementation anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala international journal on advances in ict for emerging regions 8 [12] anuradha c. ranasinghe, lahiru k. rasnayake, m. kalyanapala, “development of intelligent reconfigurable ncap for smart transducer applications”. ieee in proc, care2013, jabalpur, india 2013. [13] r.a. hooshmand, m. joorabian, “design and optimization of electromagnetic flow meter for conductive liquids and its calibration based on neural networks”. iee proc. sci. meas. technol. 2006, 153, 139-146. [14] j.c. patra, a.e. luang, p.k. meher, “novel neural network-based linearization and auto compensation technique for sensors”. presented at the ieee international symposium on circuits and systems, island of kos, greece, may 2006. sensors 2008, 87427 [15] j. tao, p. qingle, l. xinyun, “an intelligent pressure sensor using rough set neural networks”. in proceedings of the ieee international conference on information acquisition, shandong, china, july 2006. [16] c.a. zoric, d. martinovic, s. obradovic, “a simple 2d digital calibration routine for transducers”. facta universitatis (nis ) ser.: elec. energ. vol. 19, no. 2, august 2006, 197-207. [17] w.t. bolk, “general digital linearising method for transducers”. j. phys. e: sci. instrum., vol. 18, 1985. great britain. [18] l.k. fouad, g. horn, j.h. huijsing, “a noniterative polynomial 2-d calibration method implemented in a microcontroller”. ieee trans. instrum. meas. 1997, 46, 752-757. [19] j. rivera, m. carrillo, m. chacon, g. herrera, g. bojorquez, “improved progressive polynomial algorithm for self-adjustment and optimal response in intelligent sensors”. sensors (issn 14248220; coden: sensc9). 2008, 8, 7410-7427. [20] y. hu, v. tao, “a comprehensive study of the rational function model for photogrammetric processing”, photogrammetric engineering & remote sensing , vol. 67, no. 12, december 2001, pp. 1347-1357. [21] yuan tian, eylem ekici, “energy-constrained task mapping and schedualling in wireless sensor networks”, ieee international conference on mobile adhoc and sensor systems conference, nov. 7 2005. [22] j. stoer, r. bulirsch, “introduction to numerical analysis”, second edition, springer-verlag new york, inc. 1980, 1993. [23] k.e. atkinson, “an introduction to numerical analysis”, second edition, john wiley & sons inc, us 1989. [24] j. fraden, “handbook of modern sensors-physics, designs and applications”, fourth edition, springer, isbn 978-1-4419-6465-6 [25] a. pasic, j. dowling, “linearising calibration methods for a generic embedded sensor interface (gesi)”, 1st international conference on sensing technology, november 21-23, 2005 palmerston north, new zealand [26] h. yamasaki, “intelligent sensors”, yokogawa research institute corporation tokyo, japan [27] g. walberg, “cubic spline interpolation: a review”, department of computer science, columbia university, new york, ny 10027 [28] j. bartkovjak, m. karovičová, “approximation by rational functions”, institute of measurement science, slovak academy of sciences dúbravská cesta 9, 842 19 bratislava, slovakia. [29] c.t. kelley, “iterative methods for optimization”, society for industrial and applied mathematics, 1999 [30] l. burden, j. faires, “numerical analysis”, 7 th edition. [31] n. carlson, “nurbs surface fitting with gauss-newton”, institute of measurement science, slovak academy of sciences dúbravská cesta 9, 842 19 bratislava, slovakia [32] c. l karr, d. a. stanley, and b. j. scheiner, “genetic algorithm applied to least squares curve fitting” [33] wilfiam k. george, paul d. beuther, aamir shabbir, “calibrations for hot wires in thermally varying flows”, turbulence research laboratory, department of mechanical and aerospace engineering, university at buffalo, suny.alo, new york [34] nist/sematech, “e-handbook of statistical methods”, http://www.itl.nist.gov/div898/handbook/, 2012 [35] chang wen li, xiao lin zhu, le zou, “modified thiele-werner rational interpolation”, journal of mathematical research & exposition, jul 2010  1 abstract— this paper reports on a research study to investigate the impact of assessment practices on the learning process, student learning outcomes and student attitudes to learning in a large-scale distance education it degree programme in sri lanka. the national context for higher education is described, especially the national challenges for widening access to, and participation in, higher education. the study is set within a developmental context which has involved a continual process of improvement through the use of technology enhanced learning. this process has addressed the need to improve retention and progression rates and the need to develop assessment practices in order to improve support for student learning. the paper includes a discussion of related work with regard to the development of online learning communities, promotion of collaborative learning and development of assessment for learning approaches. findings from the research study are presented which arise from an analysis of students’ enrolment, retention and progression rates in relation to three developmental phases of the programme. the outcomes of a survey of students’ perspectives on their experience of assessment practices within this developmental context are also reported. the discussion identifies successes, areas for further improvement and directions for future research and development in relation to technology enhanced assessment for learning. index terms— distance education, learning, assessment, technology, learning communities, it education i. introduction a. the university system in sri lanka – addressing issues of access and participation ri lanka has a state university system that is made up of fifteen universities, three campuses and seventeen other higher education institutes [25] with a total enrolment of 73,398 undergraduate students and employing 4,341 permanent and 643 temporary academic staff [26]. all higher education institutes are overseen by the university grants commission (ugc) created under the university act of 1978. another institution awards degrees in it and computer science and, while it has no affiliation to the ugc, it is sanctioned by the government to award degrees. there are also many other private institutions which are not sanctioned by the government to award degrees. these institutions act as proxies of foreign universities that offer degrees in sri lanka. admission to these universities and institutes under the ugc scheme is based on the results of the g.c.e. advance level [a/l] examination, a national examination conducted by the ministry of education. the subjects students are offered at the g.c.e. a/l examination dictate which study programmes students may take at university. in recent years the number of students who sit for the examination has been approximately 210,000 of whom over 130,000 qualify with the minimum marks required for university admission. however currently, universities in sri lanka can only accommodate about 21,000 new admissions per year, representing around 10% of all students sitting for the a/l examination and 16% of those who qualify for admission to universities. these students attend full-time courses on campus [internal] and, like primary and secondary education, tertiary education is free of charge in sri lanka. further, students may be entitled to a monthly scholarship and hostel facilities during their stay at university. in addition to these full-time internal degrees offered by universities and institutions, there are study programmes referred to as external degrees. on most occasions, the individuals participating in these are part-time students and take the courses off campus [external]. admission to these degree programmes may be based on the g.c.e. a/l results and/or a qualifying examination. b. the bit degree programme – widening participation in higher education the bachelor of information technology (bit) degree programme is offered by the university of colombo school of computing (ucsc) as an external degree that allows students with an interest in information technology [it] to study for a degree over a period of three years [1]. it has an annual enrolment of over 1,500 students who work at a distance with variable levels of support from local study centres. students have the option of continuing or leaving the programme with accreditation upon the completion of each year. if a student meets the minimum criteria in their examinations, he or she is awarded a diploma in it [dit] at the end of the first year, a higher diploma in it [hdit] at the end of the second year and, at the end of the final year, the degree of bachelor of it [bit]. fig. 1 illustrates the progression path of the bit degree programme. students attending the bit study programme have diverse education levels as well as diverse objectives. this includes students who are aiming for a degree soon after their a/l examination; those pursuing parallel degrees or seeking a second degree in different subject areas or related subject technology enhanced assessment for learning in a distance education it degree programme in sri lanka hakim a. usoof, brian hudson and gihan n. wikramanayake s 2 areas; others from the it industry or expecting to start a career in the it industry who may already possess a degree in a different subject area or a diploma in it or computer science; some seeking a degree after completing their ordinary level examination and those looking for a foundation in it leading only to a dit. syllabuses of courses of bit programme were published at http://www.bit.lk [home->information, select semester]. when a syllabus was prepared, it was moderated by another subject matter expert independently. the readability of the all content in the syllabus was verified by a senior scrutinizer until it is improved to the level acceptable for all parties. all these precautions were taken since learn centric syllabus in the curriculum is the first document to the student and it showed a guide map for self-learning. at the end of semester, the feedback from industry experts, facilitators/instructors and students were gathered to correct mistakes and improve the standard. the course code was changed to reflect these changes in the syllabus. during the curriculum development, the evaluation structure and guideline was prepared to safe guard the alignment of learning content and evaluation. section 5 describes the evaluation in detail. a special consideration was taken when these syllabus were prepared to deliver the course online. hence, they were independent from the face2face interaction. figure 3.1: bit curriculum structure and learning path 4 design of virtual campus the design and development of virtual campus is an incremental process and the introduction of learning management system to provide the interaction between teachers and students is the very first stage of virtual campus system. learning management system (lms) is a kind of information management system which provides various types of facilities to develop interaction according to different types of learning activities and learning resources. simply, lms is similar to “water tank” which is used to manage and distribute water for registered customers in a particular context. however, water tank without water couldn’t provide its service irrespective of facilities in the water tank. learning resources and learning activities are similar to water in a water tank. the quality and standard of the learning resources and activities must be addressed irrespective of lms. however, a good lms always add value to provide a good learning service to subscribed users of the system. hence, selecting a suitable learning management system (lms) is a critical decision in designing a virtual campus. a learning management system together with learning content/activities to provide the c58 fig. 1: bit curriculum structure and learning path [10] the number of students in the bit programme varies greatly from year to year, although student numbers progressing through the programme follow a similar pattern. in some years the withdrawal rate has been as high as 55% and the failure and withdrawal rates have been identified as major issues for the bit study programme. the limited assistance for learning from the ucsc and issues associated with the assessment system have been highlighted as problem areas during the evaluation of the bit degree programme. many international students have dropped out because there is not much support for learning and, unlike in sri lanka, there are no supporting teaching institutions in their countries. further, the price of textbooks outside south asia has created problems leading to international students leaving the programme. c. learning and teaching in the bit degree programme students in the bit degree programme use various study methods including attendance at private institutions that conduct classes for the bit study programme, self-study and group study. it is very common for students to adopt a combination of these methods but it is important to point out that some students only engage in self-study. in most cases students have access to the student manuals in the form of detailed texts about the subject matter which are created for each course in addition to reference books, the bit virtual learning environment [vle] with online learning content, assignments, practice tests, cds with tv programmes, videos along with notes and handouts given by the institutions they attend. the ucsc plays a very limited role in the actual learning process and acts more as an administrative body. the administration of students together with the preparation, conducting and marking of examinations and student accreditation are the ucsc’s main activities. the learning process is aided by the ucsc through the designation of a detailed syllabus with the learning outcomes, topics to be covered and specific references, as well as the development of student manuals and creation of assignments and practice tests and online learning content. the bit vle was added as a result of the ebit project funded by the swedish international development cooperation agency [sida] from 2005 to 2010 [24] with the aim of improving the support provided for students through technology enhanced learning. the vle consists of online learning content, documents, slideshows, videos and discussion forums. students do not have any direct contact with the academic staff who lead the courses. the lecturer in charge of a course has a distinct set of activities he or she must perform. this includes preparing the course syllabus, developing the student manual, moderating the content developed for vle, creating tv and video programmes, composing assignments and practice tests and creating and marking the final examination. in its latest attempt to aid the students’ learning process, the ucsc has allocated an e-facilitator who is a dedicated to each course in order to address student questions. the student to e-facilitator ratio is approximately 876. even though this ratio seem high it is important to note that the academic staff that lead the course supports the e-facilitators. furthermore, many students attend private institutions that conduct classes for the bit study programme and get assistance from the teachers at those institutions. e-facilitators respond to students’ queries both by email as well as on the online forum, hence they follow an asynchronous mode of assistance. their assistance is available for students throughout the semester. d. traditional approaches to assessment – issues and challenges at the end of each course, students sit an examination conducted by the ucsc at a designated examination centre. the final examination contributes 100% of the marks and determines a student’s final grade for a particular course. the examination centres are located in a few major cities in sri lanka as well as some other countries that have a substantial number of students. the examination consists of 40–60 3 multiple choice questions [mcqs]. these questions tend to test factual knowledge rather than higher order skills and were also found to encourage students to engage in rote learning. an associated issue to emerge related to guessing on the part of students which raised concerns about the validity of the mcq assessment system used in the bit study programme. at the outset of the programme, the bit examinations discouraged guessing by penalising wrong answers and carrying forward its effects, though this approach was changed in the subsequent revised syllabus in order to increase students’ confidence levels. a further issue to arise relates to the students’ problems with language as the bit study programme is conducted in english, which is a second language for most students. this also leads to some students finding it difficult to understand the questions, thus raising issues of effectiveness in terms of the assessment of learning and the issue of equity in terms of ensuring equal opportunities for all students. against this background, and as a result of ongoing evaluation processes from the outset of the programme, there has been a continuous process of quality improvement in place. the major aims of this process have been to improve progression and retention rates, to respond to assessmentrelated issues and challenges, and to respond to associated language issues. these aims have been addressed through developments which have utilised the affordances of technology enhanced learning, including enhanced opportunities for online discussion, dialogue and communication. the remaining sections of this paper firstly consider related work concerning the development on online learning communities, the promotion of collaborative learning and development of assessment for learning approaches. the third section outlines the methodology adopted which includes an overview of the developmental context, an outline of the approach taken to this research, including research questions and research methods. findings from the research study are presented in the following section on results which arise from an analysis of students’ enrolment, retention and progression rates in relation to three developmental phases of the programme. the outcomes of a survey of students’ perspectives on their experience of assessment practices within this developmental context are also reported. the final discussion section identifies successes, areas for further improvement and directions for future research and development in relation to technology enhanced assessment for learning. ii. related work a. learning communities and collaborative learning an inherent issue in the bit degree programme is the lack of a teacher. this creates an environment in which the student is isolated unless he or she attends an institute that supports the bit syllabus. one response to this scenario has been to create the conditions for an online learning community in which students collaborate and take responsibility for their own learning and for supporting that of their peers. such a learning community is defined by mioduser, nachmias et al. [17] and mioduser and nachmias [16] as a novel educational system based on a combination of three components: a virtual community [social dimension], hosted by an appropriate virtual environment [technological dimension], and embodying advanced pedagogical ideas [educational dimension]. based on related work, ludwig-hardman and dunlap [15] define a learning community as a group of people connected via technology-mediated communication who actively engage with one another in collaborative learnercentred activities to intentionally foster the creation of knowledge, while sharing a number of values and practices, including diversity, mutual appropriation and progressive discourse. what is clear from both definitions is that three elements are vital; a technological element, a social element and an educational element. if just one of these elements is missing, a learning community is unlikely to develop. as lowell and persichitte [14] state, “simply requiring learner interaction in asynchronous environments does not promote a sense of community”. the three factors mentioned by hiltz [11] can be clearly related to the above elements and can also be identified in fig. 2 which depicts a conditional matrix proposed by brown [4] in which the inner circles define greater levels of engagement in class and in dialogue, and feelings of belonging to a community. although the ebit programme development was designed to support online learning communities and collaborative learning through the use of forums on the bit vle and the elgg social learning environment, students were found to not be using them extensively. for such a learning community to be successful the challenge is seen to be the fig. 2: conditional matrix for community 4 creation of a “critical mass” of participants in order to sustain and replenish the community [6] [18] [13]. b. assessment for learning the problems encountered with the use of mcqs are reflected within the relevant literature through the work of other researchers in the field of teaching and learning in higher education such as scouller and prosser [20], scouller [21], gipps [9] and paxton [18]. whilst innovative mcq development involving such techniques as “confidence measurement” [8] and “computer adaptive testing” [5] are regarded as having potential, such developments do not fully eliminate guessing and or fully address the argument that mcqs feed the answers rather than having the student construct them. a response to supporting the development of a learning community is seen to be by assigning a grade in the form of an extrinsic rewards scheme [23]. however, brindley et al. [3] argue for a wider approach and in particular for the development of instructional strategies which accomplish the same goal more effectively and with added benefits for the learner. such a strategy has been developed within the ebit programme by adopting an approach based on the idea of assessment for learning [2] which means that the first priority of assessment is to serve the purpose of promoting students’ learning. using such a strategy has been seen as one way of combining the approaches proposed by swan et al. [23] and brindley et al. [3] to encourage participation. the ebit phase has involved the development of support in terms of assessment for learning by providing practice tests and a number of optional assignments. however, the assignments are mandatory if the students want to obtain the dit or the hdit at the end of the first year and second year, respectively. as these tasks were initially designed to be undertaken as individual activities they did not support the building of a learning community. an approach to improve this situation has involved the use of a forum on the bit vle and the elgg social learning environment. coupled with assessment through the award of a grade component, this has been intended to help motivate students to participate in the community and to help create the critical mass needed to sustain it. for those students who are more intrinsically motivated, the learning opportunities within the community are seen to encourage their continued participation. c. the bit e-learning environment the free and open sources software platform moodle is used as the e-learning environment of the bit. in order to meet the requirements of the bit degree programme the elearning centre of the ucsc has developed a customised version of moddle. due to a majority of the students registering the bit degree programme lacking the skills and profesiency in ict, four courses related to ict literacy and proficiency were developed for the 1 st semester of year 1. a further four courses related to basic knowledge and skill required for learning subject matter in year 2 and year 3, were developed for the 2 nd semester of year 1 [10]. hence, the learning activities of year 1 courses were created with the intent of engaging the learner with use of interactive multimedia objects. these courses were also supported with online forums for collaboration and interaction, formative assessment tasks through practice quizzes and assignments and the support of learning facilitators. year 2 and 3 had no learning activities developed with interactive multimedia, but was supported through similar online forums, formative assessment tasks and learning facilitators. iii. methodology a. the developmental context – improving quality through technology enhanced learning since its outset in 2000 the bit degree programme has undergone a number of major revisions, which can be classified in three major phases. the first phase (2000–2003) can be described as the “pre-lms phase” during which the learning support available to students was mainly based on a static website [www.bit.lk] to provide information, detailed syllabi with recommended text books, model papers and model answers, a number of powerpoint slides, a few lessons through video, and private institutional support. the second phase (2003–2006) involving the introduction of a learning management system (lms) can be described as the “lms phase” during which time more support for learning was offered to students in the form of a more dynamic web environment, detailed syllabi with learning objectives, recommended text books and reference page numbers, practice quizzes and assignments and the introduction of a collaborative learning model. this phase had the drawback of the lack of a constructive alignment between syllabus, learning resources and assessment. also during this phase, the initiative to promote collaborative learning was not successful in the way that had been envisaged. a consequence of this was the creation of a negative impression that the bit lacked studentlearning support. further, there was no real connection between processes of continuous assessment and the end-ofsemester examinations. the third phase or “ebit phase” (2007-11) reflects the impact of the sida funded ebit project [24] which involved enhanced e-learning content, quizzes and activities for year 1 and activities and quizzes for year 2. a radical change occurred in this phase with the development of much more support for students’ learning through rigorous adherence to a constructive alignment between the syllabus, learning content and assessment. the implementation of rich learning content and learning activities aimed at improving students’ learning was a vast improvement over the previous phase. another improvement was the introduction of formative assessment and the creation of a clear relationship between formative and summative assessment in the ebit phase. this phase also involved the use of forum and chat facilities within the bit virtual learning environment [vle] to enhance student interaction and collaborative learning. it also saw the introduction of “ucsc tv” for streaming related video 5 content [29]. bit students previously had very few opportunities to have their issues and questions addressed and the introduction of an e-facilitator for each course has helped address this issue in the ebit phase. a further development during this phase was the introduction of the use of social media as a result of the impact arising from participation in a european commission funded e-learning project [7]. this involved the creation of a social network web environment “the ebit community” using the elgg open source social networking engine as a further attempt to support students’ collaborative learning and interaction. b. approach to the research the research associated with this development has adopted a case study approach which, following stake [22], is seen as the study of the particularity and complexity of a single case, coming to understand its activity within important circumstances. further, it can be seen as a case of a design for technology enhanced learning that focuses on teachingstudying-learning processes and the design of teaching situations, pedagogical activities and learning environments with a particular emphasis on constructively aligned pedagogical activities that support assessment for learning [12]. it is also part of a wider study that has been reported in usoof et al. [28] and usoof [27]. c. research questions the two key research questions addressed by this study are: 1. how has the introduction of ict to and revisions of the bit degree programme affected the progression and accreditation achievements of the students? 2. what is the impact of current assessment practices on the learning process, student learning outcomes and students’ attitudes? d. research methods in responding to these questions, two research methods have been adopted. firstly, an analysis has been conducted in relation to the three phases which have been compared in terms of students’ enrolment, retention and progression rates. secondly, a survey, and associated analysis, was conducted in order to illuminate students’ perspectives on their experience of assessment practices within this developmental context. iv. results a. data analysis of enrolment, retention and progression rates enrolment refers to the number of registered students for a particular academic year of the bit degree programme. this can be looked at in two different ways. first, the number of new students who joined the bit degree programme and, second, the number of students remaining in the entire bit degree programme. the average first-year enrolment for the three phases is 3,630 for the pre-lms, 1,678 for the lms and 1,772 for the ebit. the drop in enrolment can mainly be attributed to the fact that, during the inception, most students thought they could “just do a degree” and that a conception arose that the bit was difficult to follow and lacked student support. the average enrolment levels for the three years [including repeat candidates] during the three phases were 5,335 for the pre-lms, 4,225 for the lms and 4,379 for the ebit. introduction of the ebit phase in the 2007/2008 academic year, the bit degree programme saw 781 students progressing to year 2 for the first time and 284 students progressing to year 3 for the first time, with some of them having very few repeat papers. it is important to mention a cross-batch effect on the figures for repeat candidates; for example, unsuccessful firstyear students of the lms phase of the 2005/2006 academic year would be in the ebit phase as repeat first-year students in the 2006/2007 academic year. progression can be considered in terms of how many students continue to progress through the degree programme by achieving at least the minimum results required to progress to the next year of study [e.g. 1.5 gpa under ebit phase]. it may also be considered as an indicator of students passing their examinations. a c a d e m ic y e a r 1 st t im e s u c c e ss t o p ro c e e d f ro m y e a r 1 t o y e a r 2 [ % ] w it h d ra w a ls f ro m n e x t y e a r 1 [ % ] o b ta in e d c it /d it [ % ] 1 st t im e s u c c e ss t o p ro c e e d f ro m y e a r 2 t o y e a r 3 [ % ] w it h d ra w a ls f ro m n e x t y e a r 2 [ % ] o b ta in e d a c it /h d it [% ] 2000/2001 12 46 4 n/a n/a n/a 2001/2002 9 54 6 38 12 14 2002/2003 12 50 10 53 21 20 2003/2004 22 54 13 63 29 32 2004/2005 21 50 13 58 26 30 2005/2006 21 46 14 50 23 32 2006/2007 51 37 16 57 24 31 2007/2008 45 35 17 52 23 36 2008/2009 44 36 22 47 26 38 2009/2010 33 43 21 47 27 37 2010/2011 32 * 28 56 * 37 table 1. student progression indicators and achieving accreditation 6 year 1 to year 2 the vle of the bit degree programme provides firstyear students with student manuals for each course, e-learning content, practice quizzes, assignments, learning activities and a cd with video programmes. the first-year students have a great deal of learning support and can also call upon the assistance of the e-facilitator assigned to each course. in addition to the learning support, changes were also made to the evaluation criteria in the ebit phase. a fourfold improvement in the progression of first-year students to the second year can be seen when comparing the pre-lms phase to the ebit phase and a twofold improvement when comparing the lms phase to the ebit phase of the bit degree programme. the average first-time successes were 11% for the pre-lms phase, 21% for the lms phase and 41% for the ebit phase. year 2 to year 3 unlike for the first-year students, the second-year students do not receive as much support from the vle. they are provided with quizzes and learning activities. in comparing the second-year figures, it is important to note that repeat students benefited from the fact that the math ii and computer networks papers were made optional. there is no considerable difference in the first-time success rates between the three phases of the bit beyond 2001/2002. even though there has been an increase in the number of students achieving the hdit certification, it is not visible as a percentage since there has been a rise in the number of students following the second year. the average first-time successes were 51% for the pre-lms phase, 55% for the lms phase and 50% for the ebit phase. it is clear from the above figures that there has been a great improvement in the first year of the bit degree programme in student progression and retention, but there is no change in the figures for the second year beyond 2001/2002. the considerable support arising from the use of technology enhanced learning provided for students in the first year does appear to have had a positive impact. it seems clear that the extremely high withdrawal rate, which was a key issue of the bit, has been addressed to a certain extent by the advent of technology enhanced learning in the first year. further, the numbers of new students enrolling in the bit degree programme has stabilised at around 1,500-2000 each year. b. students’ perspectives on assessment practices the survey designed to illuminate students’ perspectives on their experience of assessment practices within this developmental context was conducted during 2008–2009 as an online questionnaire which was posted on the lms of the bit degree programme. the questionnaire targeted first-year students of the bit degree taking the computer systems i course. the total number of respondents was 45 while 121 students viewed the questionnaire, thus making a 37% response ratio. 73% of the respondents were male, which closely reflects the overall gender balance of students in the bit programme. the majority of students who completed the questionnaire were taking the bit degree programme in order to pursue a career in the it industry or gain knowledge in ict. further, 76% were doing the study programme part-time and approximately 50% of the candidates were employed. while most of the students were taking the degree after their advance level examination, about 10% were pursuing a parallel degree or had completed a bachelor’s degree. 64% of the students were either taking or hoping to take an additional study programme. c. survey findings with respect to the type of examination and grading methods, the students most preferred computer-based continuous online assignments from home with a value of 1.9, followed by paper and pencil tests with a value of 2.1 on a 5point likert scale, with 1 being the most preferred. the least preferred were printed reports/dissertations and digital media reports with values of 2.4 on a 5-point likert scale, with 1 being the least preferred. 67% indicated end-of-course examinations coupled with continuous assignments as their most preferred grading method, while 22% preferred continuous assignments during the course to determine the final grade. all of the respondents thought that audio and video in addition to text and graphics would help them express their knowledge better, while 82% preferred a change in the way they are assessed in bit, 29% preferred a practical test the most, 24% preferred web portfolios/online examinations and 16% preferred open book/take home tests. as the preferred mode of expression, 64% indicated written and typed text, 62% audio-video and 67% graphics and images. on the issue of the language of study, 86% had a “c” grade for o/l english at least and 67% had a “c” grade for a/l english at least. in addition, 51% were hoping to take an additional english course whilst 60% would rather use a different language to english as their preferred language of expression. in relation to time pressure during examinations, 80% thought that this adversely affected the quality of their answers, while 38% felt nervous and anxious during examinations. further, 82% believed they perform better in real life than in examinations. yet, ironically, 53% thought that examinations were fair in assessing their actual ability and knowledge. concerning preferences for assessment types, the most preferred were mcqs, open book/take home tests and practical tests. the least preferred were reports/dissertations and essay-type papers. 7 as for study methods of the bit programme, 82% engaged in self-study, 13% in face-to-face group study, 20% in online group study and 58% attended group classes. in relation to technology use, 93% used computers often and 96% used the internet often for their education. with regard to studying, the main goal of 18% of the students was simply to pass the examinations and 13% believed that passing the examinations was the most important aspect. with regard to study methods, 58% of the students also stated that the one they use for multiple choice question examinations is different to when they study for non-mcq examinations. overall, 93% of the students had experienced mcq examinations, but only 13% had experienced reports/dissertations, 29% viva/interviews and 47% web portfolios/online examinations. d. summary of the findings in summary, the majority of the students prefer continuous assessment with an end-of-semester examination, they prefer the use of multimedia for examinations, they have an acceptable command of english but prefer their own mother tongue and they lack confidence about the assessment system and methods. in addition, standardised examinations affect students in a negative way. the majority of students engage in self-study and about 42% have very little interaction with other students or teachers. almost all students were used to using computers and the internet for their education. finally, they have varying experiences with different types of assessment. v. discussion the analysis of the progression and retention rates for the bit degree programme across the different phases has revealed a clear improvement in the student withdrawal rates and first-time passing of examinations. this pattern can only be attributed to the careful design considerations of the ebit phase aimed at improving support for student learning. despite the clear success in the design for learning, the attempts to create a learning community and a collaborative learning environment were not as successful. the main reason for this seems to be the lack of activity modelling, encouragement of community building and creation of collaborative learning. the analysis of the survey finding show that students prefer not to deviate from assessment methods which are familiar, but however understand the importance of formative assessment. the survey also reveals that many students target their approaches to learning based on the method of assessment with a primary aim of passing examination than focusing on the learning aspects. in addition, the students have identified that the use of ict and different media allows them to express their skills and abilities better. furthermore, the survey reveals that students recognise the importance of collaboration for learning. as a result of this study, a further project was established with a group of volunteer first-year students taking the computer systems i course. this has involved the use of community building and collaborative learning activities to encourage the creation of a learning community along with the use of peer assessment as the building block for learning within the community. the outcomes of this study are the subject of a further paper and the findings will be used to design future learning communities and assessment activities aimed at promoting learning and improving the bit degree programme. vi. acknowledgements the research reported on in this paper has been conducted with support from the swedish program for ict in developing regions [spider] during the period 2007–2011. references [1] bit (2011). bachelor of information technology website (2011). available at: http://www.bit.lk last accessed 20 dec 2011. [2] black, p., harrison, c., lee, c., marshall, b. and wiliam, d. (2003). assessment for learning: putting it into practice. open university press. [3] brindley, j., blaschke, l. m. and walti, c. (2009). creating effective collaborative learning groups in an online environment. international review of research in open and distance learning. 10(3). [4] brown, r.e. (2001). the process of community-building in distance learning classes. journal of asynchronous learning networks, 5(2): 18-35. [5] conole, g. and warburton, b. (2005). a review of computer-assisted assessment. association for learning technology journal, 13(1): 17. [6] cunha, m., raposo, a. b., and fuks, h. (2008). educational technology for collaborative virtual environments. proceedings of the 12th international conference on computer supported cooperative work in design. beijing : ieee, 2: 716-720. [7] e-jump 2.0 (2009). implementing e-learning 2.0 in everyday learning processes in higher and vocational education [ejump 2.0], available at: http://portaal.e-uni.ee/ejump last accessed 20 dec 2011. [8] farrell, g. and leung, y.k. (2004). innovative online assessment using confidence measurement. education and information technologies, 9(1): 5. [9] gipps, c. (1994). beyond testing: towards a theory of educational assessment. pp.20. falmer press, london. [10] hewagamage, k. p. and wikramanayake, g. n. (2011). “vidupiyasa” ucsc virtual campus: an innovative approach to produce ict professionals through online education for the national development. 4th international conference of education, research and innovation, madrid, spain. [11] hiltz, s. r. (1998). collaborative learning in asynchronous learning networks: building learning communities. available at: http://eric.ed.gov/ericdocs/data/ericdocs2sql/content_storage_01/0000 019b/80/17/5a/cc.pdf last accessed 8 mar 2009. [12] hudson, b. (2011). didactical design for technology enhanced learning. in beyond fragmentation: didactics, learning and teaching in europe (ed. b. hudson and m. meyer) pp.223-238. verlag barbara budrich, opladen and farmington hills. [13] , m. and koper, r. (2007). facilitating community building in learning networks through peer tutoring in ad hoc transient communities. international journal of web based communities, 3(2): 198-205. [14] lowell, n.o. and persichitte, k.a. (2000). a virtual ropes course: creating online community. available at: http://www.sloanc.org/publications/magazine/v4n1/lowell.asp last accessed 13 aug 2009. [15] ludwig-hardman, s. and dunlap, j.c. (2003). learner support services for online students: scaffolding for success. international review of research in open and distance learning, 4(1): 197. http://www.bit.lk/ http://portaal.e-uni.ee/ejump http://eric.ed.gov/ericdocs/data/ericdocs2sql/content_storage_01/0000019b/80/17/5a/cc.pdf http://eric.ed.gov/ericdocs/data/ericdocs2sql/content_storage_01/0000019b/80/17/5a/cc.pdf http://www.sloan-c.org/publications/magazine/v4n1/lowell.asp http://www.sloan-c.org/publications/magazine/v4n1/lowell.asp 8 [16] mioduser, d. and nachmias, r. (2002). www in education: an overview. in handbook on information technologies for education & training (ed. h. adelsberger, b. collis, and m. pawlowsky) pp.23-43. springer, berlin/heidelberg/new york. [17] mioduser, d., nachmias, r., oren, a. and lahav, o. (2000). websupported emergent-collaboration in higher education courses. journal of educational technology & society, 35(3): 94. [18] paxton, m. (2000). a linguistic perspective on multiple choice questioning. assessment and evaluation in higher education, 25(2): 109. [19] preece, j., nonneke, b., and andrews, d. (2004). the top five reasons for lurking: improving community experiences for everyone. computers in human behavior, 20: 201-223. [20] scouller, k. and prosser, m. (1994). students' experiences in studying for multiple choice question examinations. studies in higher education, 19(3): 267-279. [21] scouller, k. (1998). the influence of assessment method on students' learning approaches: multiple choice question examination versus assignment essay. higher education, 35(4): 453-472. [22] stake, r. (1995). the art of case study research. sage publications, thousand oaks, ca. [23] swan, k., shen, j., and hiltz, s.r. (2006). assessment and collaboration in online learning. journal of asynchronous learning networks, 10(1): 45-62. [24] ucsc (2011). sida e-learning project 2005-2010. available at: http://ucsc.lk/node/320 last accessed 20 dec 2011. [25] ugc (2011a). educational indicators 1980-2009. available at: http://www.ugc.ac.lk/en/statistics/educational-indicators.html last accessed 20 dec 2011. [26] ugc (2011b). sri lanka university statistics 2010. pp.63-79. management information systems division, ugc, colombo. [27] usoof, h. (2011). use of online forums for authentic assessment in distance education and the students' perspective, paper presented at 17 th annual sloan-c international conference on online learning 2011. orlando, fl. [28] usoof, h., hudson, b. & lindgren, e. (in press). plagiarism: catalysts and not so simple solutions. in summary of cases on professional distance education degree programs and practices: successes, challenges and issues (ed. k. sullivan, p. czigler & j. sullivan hellgren). igi global, hershey pa, usa. [29] wikramanayake, g. n., hewagamage, k. p., gamage, g. i. and weerasinghe, a. r. (2007). asia ebit@ucsc: implementing the paradigm shift from teaching to learning through e-learning framework, paper presented at computer society of sri lanka 25th national information technology conference, colombo, sri lanka. http://ucsc.lk/node/320 http://www.ugc.ac.lk/en/statistics/educational-indicators.html international journal on advances in ict for emerging regions 2021 14 (3): international journal on advances in ict for emerging regions july 2021 estimating the effects of text genre, image resolution and algorithmic complexity needed for sinhala optical character recognition isuri anuradha#1, chamila liyanage2, ruvan weerasinghe3 abstract— while optical character recognition for latin based scripts have seen near human quality performance, the accuracy for the rounded scripts of south asia still lags behind. work on sinhala ocr has mainly reported on performance on constrained classes of font faces and so been inconclusive. this paper provides a comprehensive series of experiments using conventional machine learning as well as deep learning on texts and font faces of diverse types and in diverse resolutions, in order to present a realistic estimation of the complexity of recognizing the rounded script of sinhala. while texts of both old and contemporary books can be recognized with over 87% accuracy, those in old newspapers are much harder to recognize owing to poor print quality and resolution. keywords— sinhala ocr, optical character recognition, tesseract, deep learning. i. introduction optical character recognition (ocr) technology is designed to recognize printed texts into machine operable text. ocr is a collection of multiple steps such as scanning, preprocessing, segmentation, feature extraction, classification, recognition and post-processing. in recent literature, many ocr systems have been developed for recognizing latin characters [1]. with the advancement of natural language processing during the past few years, researchers have integrated machine learning/deep learning techniques for analysing the textual representations on digital documents. template matching, neural network (nn) and recurrent neural network (rnn) are popular and widely used algorithms for character recognition. these technologies are better when applied for the other character sets, since large volumes of data are available in print media for many languages. the proposed sinhala ocr is discussed in this paper with special focus on the text genre, image resolution and algorithmic complexities needed for training an ocr system for the sinhala character set. as the state of the art ocr technology, currently tesseract is used in the training of ocr systems for many character sets. further, tesseract has moved from machine learning to deep learning with lstm architecture and provides relatively better recognition competence [2]. however, algorithmic complexity is not enough for training an ocr model, as text genre and image quality affect training a more accurate ocr model. since large volumes of available data are in print media and they have been printed before the computer era, the documents have been printed using different techniques such as offset printing and screen printing. therefore, common type-faces used in the history of printing should also be trained to train the model to get such text recognizable. further, types and sizes of the fonts and size of the training text is also significant. in this paper we discuss the ocr system developed for sinhala by estimating the effects of text genre, image resolution and algorithmic complexity. the rest of this paper is structured as follows: section ii gives a brief overview of the related work in this area. section iii discusses some properties and characteristics of the sinhala script as it is significant to review the complexities with regard to the particular script. algorithmic complicacy adopted to ocr is discussed in section iv. further, section v gives the motivation and rationale for the experimental set and systematic description on training data, word lists, and training regime adopted to develop the sinhala ocr. section vi presents experimental results on the ocr methods, and we also give an analysis of their performance comparison. finally, the paper is concluded with a discussion of future works. ii. related works despite decades of research on the engineering aspects, the problem of sinhala character recognition remains as a challenging issue in the ocr field. when the past few years are considered, some studies have been conducted to identify widely used font types in sri lanka [3]. when considering ocr for the sinhala language, initially the k-nearest neighbour (knn) algorithm-based sinhala ocr was developed by the language technology research laboratory, university of colombo school of computing [3]. for the following study, commercially used font types have been employed by varying font sizes to obtain 94% of average accuracy. considering literature, neural network based sinhala ocr systems have been developed in recent years [4], [5], [6]. in 2013, the sri lanka institute of information technology conducted a research based on applying neural networks for sinhala optical character recognition [4]. in this study they have only focused on 36 characters in the alphabet. another sinhala ocr application integrating neural networks was developed by a local research group [5]. these studies mainly focused on the character level accuracies and not on word accuracies. correspondence: i. anuradha #1 (e-mail isa@ucsc.cmb.ac.lk) received: 29-12-2020 revised:31-05-2021 accepted: 11-06-2021 i. anuradha#1 c liyanage2, r. weerasinghe3, are form university of colombo school of computing. (isa@ucsc.cmb.ac.lk) this paper is an extended version of the paper “deep learning based sinhala optical character recognition (ocr)” presented at the icter conference (2020) doi: © 2021 international journal on advances in ict for emerging regions mailto:isa@ucsc.cmb.ac.lk ucsc typewritten text ucsc typewritten text http://doi.org/10.4038/icter.v14i3.7231 ucsc typewritten text ucsc typewritten text estimating the effects of text genre, image resolution and algorithmic complexity needed for sinhala optical character recognition 2 international journal on advances in ict for emerging regions july 2021 in addition, the software development unit of university of colombo school of computing has trained a sinhala ocr model using tesseract 3 [7]. this system shows relatively good results only for the high-resolution images. also, language technology and research laboratory at university of colombo is experimenting on the integration process of machine learning concepts to sinhala ocr applications [8]. further, manisha et al. [9] has also tried to combine the tesseract ocr engine with the sinhala characters and mentions 97% of accuracy. however, the performance has not been well documented. it’s well-known that indic languages have many complexities and variations of characters which makes ocr systems hard to develop. but in the past few years, multiple studies have been conducted integrating tesseract ocr engine for character recognition using different low resource languages such as tamil [10], hindi [11], bengali [12] and urdu [13]. iii. sinhala script the sinhala script is an abugida or alphasyllabary script in which consonant-vowel sequences are written as units and thereby it is called a segmental writing system. the script has evolved from the brahmi script. the letters in sinhala are circular-shaped and are written from left to right [14]. the sinhala script is used primarily to write the sinhala language, which is one of the official languages of sri lanka spoken by about 16 million people in the country. in addition, it is also used in sri lanka for writing pali, the canonical language of theravada buddhism, and sometimes sanskrit, the old indoaryan language [15]. there are 20 vowels and 41 consonants in the sinhala script. since sinhala is a segmental writing system, vowels take two representations as independent vowels: occur in the initial position of a word (infrequently occur in the middle of a word: e.g. නුවරඑළිය, ජාඇල) and dependent vowels also known as vowel modifiers: occur after a consonant. figure 1 and 2 illustrate the vowels with their modifiers and consonants in sinhala script respectively. vowels and modifiers included for the training data ආ ා උ ා ඒ ේා ඇ ා ඌ ා ඓ ෛා ඈ ා ඍ ා ඔ ේා ඉ ා ඎ ා ඕ ේා ඊ ා එ ේා ඖ ේා ා ා vowels and modifiers not included for the training data ඏ ා ඐ ා fig. 1 vowel characters and modifiers in sinhala script from among the vowel modifiers in figure 1, ං (anusvara) and ං (visarga) are two specific modifiers. they occur not only with consonants but also with vowels. e.g. අ , ඉ , උ , අ , ඕ . consonants included for the training data ක ඛ ග ඝ ඟ ළ හ ච ඡ ජ ඣ ඤ ඥ ය ශ ට ඨ ඩ ඪ ණ ඬ ර ෂ ත ථ ද ධ න ඳ ල ස ප ඵ බ භ ම ඹ ව ෆ consonants not included for the training data ඞ ඦ fig. 2 consonant characters in sinhala script two vowels: ඏ, ඐ and their corresponding vowel modifiers in figure 1 and ඦ in figure 2 were not included for the training data as they do not occur in old or contemporary sinhala books. however, ඞ in figure 2 occurs in a limited number of words in old sinhala books. it was not included because the shape of the particular character would cause misrecognition with similar characters in sinhala script. sinhala consonants imply the inherent vowel /a/ (අ) when they are occur with no modifiers. absence of the inherent vowel is marked by adding a symbol called hal lakuna or halkirima to the top of the particular consonant. e.g. ක්, ව්. further, hal lakuna also occurs with two vowels and their modifiers. it has two shapes as illustrated in figure 3. fig. 3 two different shapes of hal lakuna with vowels and consonants as a segmental writing system, vowel modifiers appear above, below or to the right or left of the basic consonant. from all the consonant-vowel sequences in sinhala script, ළු is a special character as it appears as a separate symbol to represent ළ+උ sequence. as an example, following figure 4 illustrates all the consonant-vowel sequences for consonant ‘ක’. ක ක ක කි කී කු කූ ක ක ේක ේේ ෛක ේක ේක ේක ක ක fig. 4 consonant ‘ක’ with all the vowel modifiers there are three consonant modifiers which occur in the sinhala script, known as rakaranshaya ( % ), yanshaya ( h ) and rephaya ( _). among them rakaranshaya represents ‘ර’ (ra) and yanshaya represents ‘ය’ (ya) when they appear after a consonant (from which the inherent vowel has been removed). however, as symbols, rakaranshaya appears below (e.g. ක්‍රම, ආශ්‍රය, වක්‍ර) and yanshaya to the right (වයසන, සත්‍ය, 3 isuri anuradha#1, chamila liyanage2, ruvan weerasinghe3 international journal on advances in ict for emerging regions july 2021 සංඛ්‍යාව) of the basic consonant. further, rephaya is also used to denote 'ර්' when it occurs before a consonant and the symbol appears on top of the basic consonant (e.g. ධර්‍ම, සර්‍ව, ත්‍ර්‍ක). using rephaya is an alternative rule in the sinhala writing system while rakarakshaya and yanshaya are essential. all the vowel modifiers surround the consonantrakaranshaya (e.g. ක්‍ක්‍රෝ ), consonant-yanshaya (e.g. ක්‍කයෝ ) or rephaya-consonant (e.g. ක්‍ර්‍කෝ ) units. figure 5 illustrates how vowel modifiers occur with rakaranshaya. ක්‍ර ක්‍ර ක්‍ර ක්‍ර ක්‍රි ක්‍රී ක්‍ර ක්‍ර ේක්‍ර ේේ ෛක්‍ර ේක්‍ර ේක්‍ර ේක්‍ර ක්‍ර ක්‍ර fig. 5 consonant ‘ක’ with rakaranshaya and the vowel modifiers one other significant characteristic in sinhala writing system is using compound consonants. this frequently occurred in old sinhala books. however, in contemporary sinhala this writing system is infrequent and therefore only the first set of compound consonants in figure 6 (which are rarely occurred in contemporary sinhala books) have been concerned for the training data in this research. compound consonants rarely occurred in contemporary sinhala books ක්‍ව ක + zwj + ව ක්‍ෂ ක + zwj + ෂ ග්‍ධ ග + zwj + ධ ත්‍ථ ත + zwj + ථ ත්‍ව ත + zwj + ව න්‍ථ න + zwj + ථ න්‍ද න + zwj + ද න්‍ධ න + zwj + ධ න්‍ව න + zwj + ව ද්‍ය ද + zwj + ය compound consonants occurred in old books ඤ + zwj + ච ට්‍ඨ ට + zwj + ඨ ද්‍ධ ද + zwj + ධ ද්‍ව ද + zwj + ව fig. 6 compound consonants in sinhala script iv. system overview in our study, different sinhala text genres were given different accuracy results. from a variety of genres, explanation and descriptive writings, narrative writings, and news reportage were selected for our purpose. when selecting documents, we considered a variety of documents and unicode font types from different printing eras. when image resolutions were considered, low image resolutions may affect not only quality but also speed degradation of overall ocr performance, since uncertainty in character pictures produce more recognition variants. in the tesseract engine also, high resolution images were able to give high accuracy by identifying all the punctuations, modifiers and complex letters. in the tesseract engine, image processing is a combination of several steps such as rescaling, binarization, dilation / erosion, and etc. for the training process, we adapted and experimented on both tesseract 3.0 (legacy version) and tesseract 4.0 (deep learning) ocr engines as a tool. tesseract has a standard level of accuracy in its engine. it’s necessary to have a library file in the ocr engine called ‘traineddata’ which works on sinhala inputs. this file is a concatenation of multiple files. according to the accuracy and richness in the library file, the ocr engine can work to its full potential. sinhala language is complicated and has various types of letters including vowels, consonants, compound characters and other special types. therefore, for tesseract 3.0, we developed a large character set for sinhala. it is important to mention that, for tesseract 3.0 we need to uniquely identify each and every character. sometimes due to the complexity of the character set, the ocr may not always detect a character correctly even if the character is included in the training files. the preparation of data and the training process adopted for developing the sinhala ocr model for both tesseract 3.0 and 4.0 versions are described in the following subsections. v. training process the preparation of data and the training process adopted for developing the sinhala ocr model for both tesseract 3.0 and 4.0 versions are described in the following subsections. ● preparation tesseract version 3.0 a. setting up the ocr engine we installed the tesseract version 3.0 in the windows operating system. since there is no user interface of tesseract 3.0, we used several commands in the command line to launch the application. b. preparing training data the process followed by preparing training data is described below. 1) preparing ocr alphabet for sinhala: initially the ocr alphabet for sinhala was defined to collect text data. the ocr alphabet is distinct from the character alphabet which includes the basic units of training data as follows. estimating the effects of text genre, image resolution and algorithmic complexity needed for sinhala optical character recognition 4 international journal on advances in ict for emerging regions july 2021 o all vowels: e.g. අ ඉ උ එ ඒ ඔ ඕ o all consonants e.g. ක ව ඟ ඳ ඣ o consonants with touching modifiers: e.g. කි කු වි o consonants with hal lakuna: e.g. ක් ව් o compound consonants: e.g. ක්‍ර ව්‍ර ක්‍ෂ ත්‍ථ o compound consonants with touching modifiers: e.g. ක්‍රි ව්‍රි ක්‍ෂි ක්‍ෂු ත්‍ථි ත්‍ථු o non-touching modifiers: e.g. ෙං ංා ං ං o punctuation marks: e.g. ! ? ( ) / 2) text data collection: text data for preparing training images were collected from the ucsc 10m sinhala corpus and these were extracted as per the ocr alphabet that we defined. we selected 1050 words from extracted data for the preparation of training images. figure 7 shows a sample of text data. fig. 7 sample of text data 3) preparing training image: training images were prepared for fonts widely used in sinhala typing. the criteria followed to create text images are as follows. colour: grayscale font size of the text: 12, 14, 16 dpi: 300 fonts used: iskoola pota, fmabhaya, malithi web, bhashithascreen, dinaminauniweb based on the above criteria we prepared two sets of training images. the first set consisted of computergenerated images (screenshots). as an iterative process of training, the second set of training images were prepared with scanned images for the same text data. figure 8 shows a sample of such training images. fig. 8 sample of training images 4) character segmentation: in tesseract 3.0, character segmentation is performed using the process of creating box files. a ‘box file’ is a text file, which contains the necessary information of the training images. the coordinate values of the characters in the training images along with corresponding unicode characters are stored in these box files. the segmentation of the characters in the training images was done as per the ocr alphabet we defined. each training image should have a box file in which the number of boxes must tally with the number of training character segments in the image. figures 9 and 10 show an image with segmented characters and a sample of box information. fig. 9 character segmentation fig. 10 sample of box information 5 isuri anuradha#1, chamila liyanage2, ruvan weerasinghe3 international journal on advances in ict for emerging regions july 2021 ෙේ ´කලියුගය´ වූ කලි ´ගේ ෙෙරළිය´ හා ´යුගාන්තය´ එකට ඈඳීෙෙන් ඒ තුන, තුන් ව ද රුේ එකෙ කතාන්තරයක් කරන්නකි. බටහිර සභ්‍යත්වය ර ගීෙ නිසා ෙඳින් ෙඳ ෙවනස ්වන්ට වූ දකුණු ෙළාෙත් ග මි ෙවුලක ජීවිතයත් ග මි සොජයත් ගේ ෙෙරළිෙයහි කථාවට වස්තු වී ය. එහි ෙසෙළාස ්වන ෙරච්ෙේදෙයහි ෙේ කියුෙ දක්නා ල ෙේ. ´´ෙ ගෙලාත්සවෙයන් ෙසු ෙකාළඹ ගිය පියල් ඇත ේ විට ගෙට එන්ෙන් සතියකට වරකි; ඇත ේ විට ෙදසතියකට වරකි. ක්‍රෙෙයන් දියුණු වන ෙවළඳාෙ උෙදසා තෙ කාලය වඩාත් ෙයදිය යුතු වූෙයන් පියල් ෙකාළඹ ෙදි චියට යාෙට සිතුෙව් ය. කලකට ෙෙර ඔහු ෙෙෝදර තනවන්ට ආරේභ්‍ කළ ෙගයි ව ඩ නිෙ ෙකෙළ් විවාහයට ෙදෙසකට ෙෙර ය. ඔහු ඒ ෙගය තනවන්ට ෙටන් ගත්ෙත් කුලියට ෙදනු පිණිස මිස තොෙේ ෙදි චිය පිණිස ෙනාෙව්. තො ස ද වූ ඒ ෙගයින් ගත යුතු ප්‍රෙයෝජනය පියල්ට ද න් ව ටෙහයි.´´ පියල් හා නන්දා, අනුලා හා තිස්ස ද සෙඟ ෙකාළඹට ස ක්‍රෙණය වී ෙෙෝදර අර ෙගයි ෙදි චි ෙවති. ෙේ ´කලියුගය´ ඔවුන්ෙේත් ඔවුන්ෙේ දරුවන්ෙේත් ජීවිතය වණණනා කරනු පිණිස ෙබ ඳුණකි. c. training the model the training was performed as an iterative process until better results were obtained. firstly, the training models were done for individual data sets of computer-generated images for given font types and sizes. secondly, we combined the training data sets for multiple fonts and multiple sizes and trained the models. thirdly, the training was performed using the scanned images and trained multiple models for the given font types and sizes. finally, all the data sets of computer-generated images and scanned images were combined in several ways and trained multiple models. ● preparation tesseract 4.0 version a. setting up the ocr engine for setting up the tesseract 4.0 version we selected ubuntu environment. since tesseract 4.0 deals with deep learning techniques such as long short-term memory (lstm), the ubuntu operating system provides full compatibility for ocr engines. and all the tasks were carried out in the terminal and instructions were given as commands. b. preparation of training data sets training data plays an important role in tesseract version 4.0. with the integration of deep learning techniques, more training data will result in good outcomes. for our experiment, we have employed 3 datasets which are available for the sinhala language. further details of the 1) ucsc 10 million sinhala dataset, 2) common crawler sinhala dataset and 3) google dataset will be discussed in the next few lines. 1) ucsc 10 million word sinhala corpus: ucsc 10m word sinhala corpus has been compiled by the language technology research laboratory university of colombo school of computing (ucsc) in sri lanka. this text corpus contains a huge variety of sinhala books including novels, short stories, translations, critiques written by renowned sinhala writers, and sinhala newspapers: silumina, dinamina, lankadeepa and lakbima. the ucsc 10 million dataset includes texts which belong to different eras in sri lanka. it also contains texts from various sources; the text is rich with different writings. noise data and other textual data with different languages have been removed from this dataset in order to minimize the errors. 2) 5million+ sentences in sinhala common crawler: in 2019, guzman [16] presented two monolingual corpora for sinhala. those were a combination of 155k+ sentences of filtered sinhala wikipedia and 5178k+ sentences of sinhala common crawl. since this study considered only textual data available online, the diversity of textual representation is considerably low. furthermore, a high noise rate exists in this dataset with other common issues like the zero width joiner problem and the combination of multiple language textual data with sinhala textual data. and these affect the overall accuracy of the system. 3) google dataset for sinhala is especially built with the tesseract. this dataset includes variety of textual representations gathered in recent years. 4) ucsc 400k distinct wordlist: this list of monolingual vocabulary was developed from the ucsc 10 million words sinhala corpus by the language technology research lab of ucsc. the list includes 440,021 distinct entries and is available on the web. after comparing these 3 datasets, the ucsc 10 million sinhala dataset [17] was selected by the authors due to the enrichment of textual combinations in different eras and less noise data. ucsc 400k distinct word list [18] was also combined with the existing tesseract word list. as a special feature, the tesseract version 4.0 generates the tiff file and box file automatically. additionally, image and corresponding utf-8 text transcription are generated on lstmf file at the process of font training. also in tesseract 4.0 the clustering steps (mftraining, cntraining, shape clustering) are replaced with a single slow lstm training step. fig. 11 sample of training data for tesseract 4.0 c. selection of font types and sizes since typefaces are significant in training an ocr system, we investigated the commonly used sinhala fonts to train the ocr model in tesseract 4.0. though there are hundreds of non-unicode fonts available for the sinhala script, they have no unique character code point for identification. owing to its 16-bit encoding, unicode is theoretically able to support over 65,000 unique character code points [19] and we selected 9 unicode fonts from the limited number of sinhala unicode fonts available. they include unicode fonts which are most commonly used in printed and digital media [20]. the font types involved with the research is given below. ○ noto-sans font ○ lklug font ○ malithi font ○ dinamina font ○ iskolapotha font ○ bhashitacomplex font estimating the effects of text genre, image resolution and algorithmic complexity needed for sinhala optical character recognition 6 international journal on advances in ict for emerging regions july 2021 d. training the model as pre-processing steps for noise removal, adaptive thresholding, page layout analysis and connected component analysis were performed by the tesseract ocr engine. the following steps were followed to train the model. initially generated training data is provided as the input to the engine and extract the generated model. then the model was fine tuned to decrease the error rate and finally the fine-tuned model was combined with the initial trained model. we combine multiple fonts for model creation. single font models, double font models and triple font models were used for analysis. vi. evaluation and results for the evaluation process, we considered both tesseract 3.0 and tesseract 4.0. as the first phase, the tesseract version 3.0 was evaluated by character level. meanwhile, the tesseract 4.0 was evaluated at both character and word level. the developed ocr models have been tested with 30 images selected for three different categories (10 for each category). when selecting images for testing we chose non identical images with different typefaces and different image qualities. 1) old sinhala newspapers: the testing data for this category was selected from archived images of sinhala newspapers published in 1870 – 1890. the newspapers include: සෙතයෝදය (sathyodaya), සතයාල කාරය (sathyalankaraya) and දිනෙතා ප්‍රවෘත්ති (dinapatha prawruththi). all the images are in 200 dpi. 2) old sinhala books: testing images for this category were selected from old sinhala books which are printed on letterpress printing. the old books selected include: බුත්සරණ (buthsarana), පූජාවලිය (pujawaliya) and සද්ධෙණරත්නාවලිය (saddharmarathnawaliya). the images in this category are in 72 dpi. 3) contemporary sinhala books: the books printed with computerized fonts were selected for this category. 10 images of randomly selected pages from 10 books were taken and they were scanned for 300 dpi. to calculate the accuracy of the systems we compare the common and different characters between original and ocr document. a. evaluation of the models from tesseract 3.0 the evaluation of tesseract version 3.0 was conducted only for the third category of testing images for two reasons. firstly, the results for the other two categories were not at satisfying level and secondly, we gave our main priority for the evaluation of tesseract version 4.0 therefore, we selected the most accurate model (scanned-iskolapotha model) out of 18 multiple models created by varying different font types and sizes. original data of the testing samples consist of 2592 words and 16380 characters. testing results are illustrated in table i. table i tesseract 3.0 results of ocr documents font type recognized character count recognized word count accuracy of the system iskolapotha 16962 2507 36.89% b. evaluation of the models from tesseract 4.0 the models generated from tesseract 4.0 ocr engine were evaluated for the three categories of testing samples explained above. from the generated models, all the models of individual fonts and three selected combined models (cm) were evaluated. the same set of testing images were used in the evaluation process. for the first category of evaluation, we selected 10 images from old newspapers and they consist of 1557 words and 9821 characters. some of the texts in these images are even hard to read by a human. the results for the first category of images are shown in table ii. table ii category 01 results of ocr documents font type recognized character count recognized word count accuracy of the system noto-sans 10142 1462 61.43% lk-lug 10031 1441 61.66% malithi 10094 1516 65.51% iskolapotha 10067 1458 67.02% dinamina 9897 1426 59.83% bashitha 10056 1451 61.96% noto-lklug (cm) 10071 1458 61.51% malithi-lug (cm) 10003 1449 63.30% noto-lugmalithi (cm) 10035 1445 64.40% there were 3032 words and 18074 of characters in the images of category 2, the old books printed in letterpress printing era. the results obtained are illustrated in table iii. accuracy (%) = count of common characters x 100% count of (common characters + different characters) 7 isuri anuradha#1, chamila liyanage2, ruvan weerasinghe3 international journal on advances in ict for emerging regions july 2021 table iii category 02 results of ocr documents font type recognized character count recognized word count accuracy of the system noto-sans 18623 3019 85.13% lk-lug 18584 3012 87.15% malithi 18774 3039 87.07% iskolapotha 18688 3022 85.28% dinamina 18428 2807 84.97% bashita 18387 2983 87.53% noto-lklug (cm) 18627 3023 86.06% malithi-lug (cm) 18704 3017 85.69% noto-lugmalithi (cm) 18461 3010 87.52% third category of 10 images captured from contemporary sinhala books and they consist 2592 words and 16151 characters. the results are denoted in table iv. table iv category 03 results of ocr documents font type recognized character count recognized word count accuracy of the system noto-sans 16476 2620 85.91% lk-lug 16306 2615 84.91% malithi 16530 2626 86.14% iskolapotha 16391 2635 87.63% dinamina 16104 2459 85.49% bashita 16178 2591 87.49% noto-lklug (cm) 16335 2627 84.83% malithi-lug (cm) 16338 2796 85.49% noto-lugmalithi(cm) 16259 2613 86.59% when three categories of input image sets were considered, old newspapers have a low accuracy rate in character recognition due to the high existence of noises and low quality of images. in the old sinhala book category malithi, lklug and the combined model of noto sans, lklug and malithi were given best accuracy when recognizing characters. the model trained with the font iskolapotha obtained the highest accuracy rate in 3rd category of contemporary sinhala books. when analysing ocr outputs, some identifiable improvement can be done in the recognition process of the system. moreover, as another part of this research we randomly selected 5 images from the contemporary sinhala books and converted for low dpi count (from 300px to 96px). thereafter, we chose our best 3 models in the contemporary sinhala books category and evaluated the performance. a total count of 1482 words and 8735 characters were there in the selected 5 image data. table v comparison with the low dpi level font type recognized character count recognized word count accuracy of the system iskolapotha 8983 1481 87.88% malithi 9067 1479 86.09% noto-lugmalithi (cm) 8895 1473 87.40% as a special note, after reducing dpi count, some character combinations which were not recognized correctly but samples with 300 dpi were able to recognized. for instance yanshaya was not recognized well in previous efforts but with this modification yanshaya was recognized well. some clearly identifiable errors in the recognition process of our ocr system has been briefly noted below. o confusing with similar shaped individual characters. (e.g. හ භ්‍ ග ඟ ශ, ව ච ට, ඔ ඹ ෙ, එ ඵ ථ, ඩ ඬ, ද ඳ, ය ස ඝ, ජ ඡ, බ ඛ, ඨ ඪ, ත න) errors of this type frequently occur in 1st and 2nd categories of testing images. when the images are not clear, the tiny variations of the characters are difficult to be captured. o inability to recognize hal lakuna. (e.g. ක් ක, ෙක ෙක්, ව ව්, ෙව ෙව්) o confusing with vowel modifiers and hal lakuna in the same character or in different characters. (e.g. වි වී ව් චි චී ච්, හි හී භි භී, මි මී ේ, මු මූ, සි සී) o misidentification of rakaranshaya with papilla, the vowel modifier for 'u'. (e.g. ප්‍ර පු, ට්‍ර ටු, ම්‍ර මු) o inability to recognize touching letters. (e.g. ස්ස, ද්ව, ට්ඨ, ණ්ණ, ේඛ) using touching letters was a writing style in old sinhala. and in pali it is a rule, as pali does not have a sign like hal lakuna to show the absence of the inherent vowel. this writing style can occur in some testing images of category 1 and 2. since estimating the effects of text genre, image resolution and algorithmic complexity needed for sinhala optical character recognition 8 international journal on advances in ict for emerging regions july 2021 contemporary writing does not follow this style, the training data is not rich with these sequences. this has resulted in not recognizing touching letters. o inability to recognize compound consonants. the compound consonants given in figure 6 hardly occur in contemporary sinhala and therefore they are not well recognized. some english characters were also in the testing images of all three categories. as we focused on developing a better recognition model for sinhala characters, we did not include enough english text data in the training process, this resulted in some errors in recognition and affected the overall accuracy of the system. however, the above limitations will be considered in the next stage as a future enhancement. vii. conclusion and future works in this paper we presented a process of developing an optical character recognition system for sinhala. in this research we identified the characteristics of sinhala script along with properties of writing style in sinhala scripts. the training process of the ocr model was initiated with tesseract 3.0 and later moved to tesseract 4.0, as it was the state of the art of deep learning. the evaluation was carried out by comparing the results with the different types of sinhala fonts and adapting systems to recognize varieties of test data gathered from different sources. although we tested some samples with the model built from a sinhala common crawl dataset, overall accuracy is less compared to others and unable to identify characters. according to the results, our system model trained with font iskolapotha gave accuracy of 87.63% in contemporary sinhala books. in the sinhala old book category, models developed using fonts malithi, lklug and combined font models using noto sans, lklug and malithi gave accuracies of 87.07%, 87.15% and 87.52% respectively. meanwhile in the old sinhala newspaper category 67.02% of accuracy was obtained from the model developed with font iskolapotha. developing ocr systems for low resource languages needed a considerable amount of effort from both linguistics and computer science domain areas. analysing linguistics rules and mapping them with computer science is quite challenging for low resources languages like sinhala and tamil. in this stage of the research, we focused highly, only on the recognition of the sinhala script. as mentioned in the above sections, the sinhala script is also used to write “pali” and “sanskrit” languages in sri lanka. as a future enhancement we will work on identifying touching and conjoining letters which occur frequently in pali. we also plan to apply some n-gram or word embedding based postprocessing techniques to enhance the accuracy. also in real world ocr can be categorized as one of sequence learning tasks. therefore, it is necessary to predict the sequence of labels from noisy, unsegmented input data. as future work, we plan to combine connectionist temporal classification (ctc) with deep learning algorithms to train the recurrent neural network (rnn) to label unsegmented sequences directly. moreover, neural net compressions and conventional neural machine translations for sinhala ocr will be studied in the future. acknowledgment this work was carried out as a part of a project funded by theekshana research and development company. we acknowledge mrs. dinuji godigamuwa for assisting the work and thank all the members of language technology research laboratory of the university of colombo school of computing, sri lanka, who helped in various ways to make this work bear fruit. references [1] r. weerasinghe, a. wasala, d. herath, and v. welgama, “nlp applications of sinhala: tts & ocr,” in proceedings of the third international joint conference on natural language processing: volume-ii, 2008. [2] smith, r. (2007) ‘an overview of the tesseract ocr engine’, in ninth international conference on document analysis and recognition (icdar 2007), pp. 629–633. doi: 10.1109/icdar.2007.4376991. [3] a. r. weerasinghe, d. l. herath, and n. p. k. medagoda, “a nearestneighbor based algorithm for printed sinhala character recognition,” innov. a knowl. econ., p. 11, 2006. [4] m. rimas, r. p. thilakumara, and p. koswatta, “optical character recognition for sinhala language,” in 2013 ieee global humanitarian technology conference: south asia satellite (ghtc-sas), 2013, pp. 149–153. [5] s. ajward, n. jayasundara, s. madushika, and r. ragel, “converting printed sinhala documents to formatted editable text,” in 2010 fifth international conference on information and automation for sustainability, 2010, pp. 138–143. [6] h. w. h. premachandra, c. premachandra, t. kimura, and h. kawanaka, “artificial neural network based sinhala character recognition,” in international conference on computer vision and graphics, 2016, pp. 594–603. [7] [online]. available: http://192.248.22.122/ocrsinhala/ [8] i. anuradha, c. liyanage, h. wijayawardhana, and r. weerasinghe, “deep learning based sinhala optical character recognition (ocr),” in 2020 20th international conference on advances in ict for emerging regions (icter), 2020, pp. 298–299. [9] u. manisha and s. r. liyanage, “sinhala character recognition using tesseract ocr,” 2018. [10] c. liyanage, t. nadungodage, and r. weerasinghe, “developing a commercial grade tamil ocr for recognizing font and size independent text,” in 2015 fifteenth international conference on advances in ict for emerging regions (icter), 2015, pp. 130–134. [11] n. mishra, c. patvardhan, c. v. lakshmi, and s. singh, “shirorekha chopping integrated tesseract ocr engine for enhanced hindi language recognition,” int. j. comput. appl., vol. 39, no. 6, pp. 19–23, 2012. [12] m. t. chowdhury, m. s. islam, b. h. bipul, and m. k. rhaman, “implementation of an optical character reader (ocr) for bengali language,” in 2015 international conference on data and software engineering (icodse), 2015, pp. 126–131. [13] s. hussain, a. niazi, u. anjum, f. irfan, and others, “adapting tesseract for complex scripts: an example for urdu nastalique,” in 2014 11th iapr international workshop on document analysis systems, 2014, pp. 191–195. [14] r. m. joshi and c. mcbride, handbook of literacy in akshara orthography, vol. 17. springer, 2019. [15] j. w. gair and w. s. karunatilaka, “literary sinhala inflected forms: a synopsis with a transliteration guide to sinhala script.,” 1976. [16] guzmán, f., chen, p. j., ott, m., pino, j., lample, g., koehn, p., ... & ranzato, m. (2019). two new evaluation datasets for low-resource machine translation: nepali-english and sinhala-english. arxiv 2019. arxiv preprint arxiv:1902.01382. [17] [online]. available: http://ltrl.ucsc.lk/tools-and-resourses/ [18] [online]. available: http://ltrl.ucsc.lk/tools-and-resourses/ [19] v. k. samaranayake, s. t. nandasara, j. b. disanayaka, a. r. weerasinghe, and h. wijayawardhana, “an introduction to unicode for sinhala characters,” univ. colombo sch. comput., 2003. [20] r. subasinghe, s. eramudugolla, s. samarawickrama and g. dias, "atomic vs anatomic features of sinhala fonts", 10th typography meeting, 2019. http://192.248.22.122/ocrsinhala/ http://ltrl.ucsc.lk/tools-and-resourses/ http://ltrl.ucsc.lk/tools-and-resourses/ 9 isuri anuradha#1, chamila liyanage2, ruvan weerasinghe3 international journal on advances in ict for emerging regions july 2021 appendix a following include the images of three categories used for testing each ocr model. appendix b interface of developed ocr system is shown in figure b. fig a.1. a sample image of category 1 (old sinhala newspapers) fig a.2. a sample image of category 2 (old sinhala books) fig a.3. a sample image of category 1 (contemporary sinhala books) fig b. online application developed for the sinhala ocr automatic syllable segmentation of myanmar texts using finite state transducer automatic syllable segmentation of myanmar texts using finite state transducer 3/4/2013 icter submission tin htay hlaing and yoshiki mikami, nagaoka university of technology, japan. 2  abstract — automatic syllabification lies at the heart of script processing especially for the south east asian scripts like myanmar. myanmar syllabification algorithms implemented so far are either rule-based or dictionary-based approach. this paper proposes a new method for myanmar syllabification which deploys formal grammar and un-weighted finite state transducers (fst). our proposed method focuses on orthographic way of syllabification for the input texts encoded in unicode. we tackle syllabification of myanmar words with standard syllable structure as well as words with irregular structures such as kinzi, consonant stacking which have not been resolved by previous methods. our fst based syllabifier was tested on 11,732 distinct words contained in myanmar orthography corpus. these words yielded 32,238 syllables and are compared with correctly hand syllabified words. our fst based syllabification method performs with 99.93% accuracy on stuttgart fst (sfst) tools. index terms— automatic syllabification, finite state transducer, myanmar syllabification, formal description of syllable structure i. introduction syllabication is the task of breaking words into syllables. this is called automatic syllabication when performed using computer algorithms instead of linguists. knowledge of the syllable boundaries in words is very useful in a number of areas because it is impossible to store syllable information for all words in a language (new vocabulary are continually being added), and thus automatic syllabication algorithms are necessary [3]. languages differ considerably in the syllable structures that they permit. for most languages, syllabification can be achieved either by writing a set of rules which explain the location of syllable boundaries of words step-by-step or using annotated corpus. syllabification algorithms have been proposed for different languages by using different approaches. up to our knowledge, rule-based approaches have been used for asian scripts, for example, lao [16] and sinhala [2]. and also, corpus-based syllabification approach is done for uyghur [12] and urdu [1]. moreover, many language-specific syllabification methods have been modeled by using finite state machines or neural networks and finite state transducers for multilingula syllabifiation [9]. these algorithms are mainly used in text-to-speech (tts) systems in producing natural sounding speech, and in speech recognizers for dealing with out-of-vocabulary words. also for myanmar script, syllabification algorithm for myanmar text-to-speech (tts) system has been developed [20]. however, such phonetic syllabification cannot be used for some major myanmar language processing tasks such as lexicographic sorting, word breaking, line breaking and spelling checking. in other words, it is necessary to use orthographic syllabification for these tasks. although a few researchers have documented attempts at syllabifying myanmar words orthographically, this is the first known documented method for myanmar orthographic syllabification using finite state transducers (fst). the objectives of this study are to represent myanmar syllable structure in formal description (e.g, regular grammar) by taking advantage of its structuredness and unambiguity. another objective of our study is to achieve correct syllabification without applying step-by-step rules and the need of corpus. in other words, our proposed method is neither heuristic approach nor annotated corpus-based approach. thus, in this research, we explain myanmar syllabification in both orthographic and phonetic views. un-weighted finite state transducer to divide myanmar words into syllables is proposed. syllable struture model is represented in chomsky`s regular grammar and deploy finite state transducers for automatic syllabification of myanmar unicode texts. our transducer accepts input string (also known as surface string) in unicode encoding and in generation mode, the transducer produces syllabified texts with boundary marker notation. the method was tested by using a text corpus containing 11,732 distinct words yielding 32,238 syllables on stuttgart finite state transducer tool (sfst). and its performance was then meaured in terms of the percentage of correctly syllabified words. having limited resources for myanmar language processing, the syllabified results are checked manually. our automatic syllable segmentation of myanmar texts using finite state transducer tin htay hlaing yoshiki mikami tinhtayhlaing@gmail.com mikami@kjs.nagaokaut.ac.jp nagaoka university of technology, japan mailto:tinhtayhlaing@gmail.com mailto:mikami@kjs.nagaokaut.ac.jp 3 result reports 99.93% of accuracy for the test set of 32,238 syllables. the rest of this paper is organized as follows: section 2 introduces syllable segmenation of myanmar script by highlighting its challenges. we cover related works in section 3 and then discuss overview of myanmar syllable strucutre in both phonetic and orthographic ways and unicode cannonical order are in section 4. section 5 describes our proposed finte state transducer approach, and our experiments and results are in section 6. ii. challenges in myanmar syllabification myanmar language, formerly known as burmese, is an official language of myanmar and a member of burmese-lolo group of the sino-tibetan language spoken by about 32 million as a first language and as a second language by 10 million. the burmese script, attested in stone inscriptions at least as far back as the early twelfth century c.e., is a phonologically based script, adapted from mon and ultimately based on an indian (brahmi) prototype [ 15]. in myanmar language, syllable is the smallest linguistic unit and one word consists of one or more syllables. generally, myanmar words can be classified into (1) standard words (i.e, words with standard syllable structure) and (2) irregular words (i.e, words with abbreviated characters or words written in special traditional writing forms which is discussed in detailed in section 4.3). and each word type needs different orthographic ways of syllabifications as follows. word type example word meaning syllabified output standard word daughter # irregular word university # # in this example, the word “daughter, ” has two syllables, and where the former syllable has only one sub-syllabic element, consonant but the latter has three sub-syllabic elements of consonant , vowel and diacritic . for instance, syllables in alphabetic script are composed of only two alphabets, consonants and vowels whereas in myanmar script, each syllable is composed of at most five sub-syllabic groups and each group has its own members. we will address about this in later section. for the word “university, ” has three syllables and it is written in one of the special traditional writing formats known as consonant stacking. it is syllabified by extra insertion of myanmar sign asat, u+103a between two stacked consonants to get the syllable boundary. for instance, the first character is combined with the upper character from the stack and form one syllable and the lower character in the stack itself becomes a syllable. thus such kind of language-specific features make myanmar syllable segmentation task complicated. myanmar language has five different forms of irregular words which are explained in section 4.3. besides, many pali words and english loan words that refer to people, places, abbreviation of foreign words, currency units etc., can be found in myanmar texts and we need to tackle segmentation of such words. in handling such irregular words under-resourced language, myanmar, rule-based approach and corpus-based approach have shown some failures. nevertheless, in our fst-based approach, both standard and irregular words are correctly syllabified. iii. related work scripts can be classified into five categories namely: alphabet, abjad, abugida, syllabary, and logosyllabary. there is no need to explain the first category, alphabet. the second and third categories do not sound familiar to us, but abjad corresponds to semitic scripts, such as arabic script and hebraic script, while abugida corresponds to ethiopian script and indian-based script. the third category, abugida, “consists of consonant letters with specific vowels attached to them, and expresses other vowels (or a syllable with vowels) by modifying consonant letters in a consistent manner. the fourth category, syllabary, can be expressed as “letters with no graphical relationship or rules between letters with similar sounds.” a typical example is kana in japanese. the fifth category is logosyllabary. of course, a typical example is chinese characters. chinese characters are so classified because they are both logography and syllabary at the same time [22]. syllable segmentation of most alphabetic and arabic scripts is done phonetically, i.e, the input surface string is first converted into phonetic symbols and then syllabify by using suitable approach such as rule-based or statistical approach. for the indian-based script abugida, syllabification can be done either phonetically or orthographically. there are many approaches tackling the syllable segmentation task. generally, these can be divided into two broad categories namely rule-based and data-driven approaches. the rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm infers “new” syllabifications from examples assumed to be correctly-syllabified already. however, it is difficult to determine correct syllabification in all cases and so to establish the quality of the “gold-standard” corpus used either to quantitatively evaluated the output of an algorithm or as the example-set on which data-driven methods crucially depend [11]. besides these two categories, statistical methods and finite state methods are also applied for automatic syllabification and we will introduce previous approaches briefly. in syllabification of written uyghur [12], a rule-based approach that uses the principle of maximum onset is applied. experiment on a random sample shows that the syllabification algorithm achieves 98.7 percent word accuracy on word tokens, 99.2 percent on word types, and 99.1 percent syllable accuracy. in [17], the authors described rule based syllabification algorithm for sinhala after analyzing the syllable structure and linguistic rules for syllabification of sinhala words. rule-based syllabification algorithm for malay is proposed based on maximum onset principle for text to speech system in [21]. 4 in [11], one rule-based approach, and three data-driven approaches are evaluated (a look-up procedure, an exemplar-based generalization technique and the syllabification by analogy (sba)). the results on the three databases show consistent and robust patterns: the data-driven techniques outperform the rule-based system in word and juncture accuracies by a very significant margin and best results are obtained with sba. in [9], a weighted finite-state-based approach to syllabification is presented. their language-independent method builds an automaton for each of onsets, nuclei, and codas, by counting occurrences in training data. these automatons are then composed into a transducer accepting sequences of one or more syllables. they do not report quantitative results for their method. syllabification of middle dutch texts is done by the method which combines a rule-based finite-state component and data-driven error-correction rules. the authors adapt an existing method for hyphenating (modern) dutch words by modifying the definition of nucleus and onset, and by adding a number of rules for dealing with spelling variation [4]. regarding myanmar syllabification, two of the above mentioned approaches have already been done. in the corpus-based longest matching approach [6], the authors collected 4,550 syllables from different resources. the input texts are syllabified by using longest matching algorithm over their syllable list. they observed that only 0.04% of the actual syllables were not detected and described their failures as three facts: • differing combinations of writing sequences • loan words borrowed from foreign languages • rarely used syllables not listed in their syllable list rule-based myanmar syllable segmentation is done by [23] in which input text strings are converted into equivalent sequence of category form (e.g. cmcacv for the word “myanmar”) and compares the converted character sequence with the syllable rule table to determine syllable boundaries. the authors tested 32,238 syllables in the myanmar orthography [14] and the experimental results show an accuracy rate of 99.96% for segmentation. however, their approach cannot solve for the segmentation of irregular words with traditional writing forms namely kinzi, consonant stacking, great sa and english loan words with irregular forms as shown in table 1. however, in our approach, these kinds of failures in [6] and [23] are addressed the syllabification of irregular words and managed correctly. table 1. syllable segmentation examples and results 1 1 this table is directly taken from the original manuscript of [23]. in [13] unicode technical note, diacritic storage order of myanmar characters in unicode (which we refer as sub-syllabic components) is explained in detail and it is highlighted that diacritic storage order does not define a phonetic syllable. further, automated syllable breaking approach for myanmar script is mentioned. it is said that the syllable break may occur before any character cluster so long as the kinzi, asat and stacked slots remain empty in the cluster following the possible break point. it is mentioned that their algorithm does not require dictionary but still needs more refinement, for example, sequence of digits should be kept together and visible virama needs more complex analysis. the author also stated that the result of syllable breaking can be applied for line breaking and the same syllable breaking rules can be applied for lexicographic sorting. finite state methods have been applied in syllabification of alphabetic scripts but it has not yet been done in myanmar script. therefore, in our study, we describe syllable structure model in regular grammar and write regular expressions which are used as input to finite state transducer. in our experiment, we use 32,238 syllables covering all possible syllable structure in standard words and irregular words from myanmar orthography published by myanmar language authority [14] and achieved the accuracy of 99.93%. we implemented the syllabification system by using the programming language for finite state transducer [18]. iv. myanmar syllable structure according to the unicode standard version 6.2, myanmar characters range from u+1000 to u+109f. basically, myanmar script, formerly known as burmese, has 33 consonants, 8 vowels (free standing and attached), 2 tone marks, 11 medial consonants, a vowel killer or asat, 10 digits, 5 abbreviated syllables and 2 punctuation marks. among them, consonants, medial consonants and attached vowels, a vowel killer or asat and tone marks can be combined in different ways to form a syllable and thus we refer these groups as sub-syllabic elements. others, free standing vowels, digits and abbreviated syllables can be syllables by themselves. further, myanmar syllable structure can be represented in two ways namely phonetic and orthographic representations which are briefly explained in following section. 4.1 myanmar syllable structure in phonetic representation myanmar script is known as diacritically modified consonant syllabic scripts or alphasyllabaries as it is derived from brahmi and it inherited systemic features of brahmi. the brahmi system based on the unit of the graphic “syllable” or aksara, which by definition always ends with a vowel (type v, cv, cvv, etc). further, the basic consonantal character without any diacritic modification is understood to automatically denote the consonant with the inherent vowel [15]. 5 a myanmar sentence is composed of words, and words written in myanmar script are made of a series of distinct single syllables. in phonological representation, each syllable is made up of two parts: (a) a consonant or sometimes two consonant together and (b) a vowel or a vowel and a final consonant together. the first part of the syllable is known as head and the second part the rhyme. the rhyme may also contain tone. as an example, here is the same principle applied to some english words: (a) head + (b) rhyme = syllable [consonants or two consonants] [vowel or vowel and consonant] t + ee = tee t + ick = tick tr + ick = trick tr + ee = tree in burmese script the head of a syllable may be either (1) an “initial consonant”; for example, the consonants written: ပ လ န pronounced: p l n thor or (2) an initial consonant combined with a second consonant, referred to below as a “medial consonant”; e.g. written: ပ လ pronounced: py ly hn thw there are only four medial consonants in burmese script. the rhyme of a syllable may be written with either (1) an attached vowel symbol; e.g. written: ပ လ န pronounced: pi lu na tho or (2) a consonant marked as a final consonant by carrying the “killer” symbol ; e.g. written: ပန လန န pronounced: pan lan naq theq or (3) a combination of an attached vowel symbol and a final consonant;e.g. written: ပ န လ န န pronounced: poun lein naun thaiq in addition, tones are part of the rhyme and are mostly represented by the two tone marks and ; e.g. written: ပ န လ န န pronounced: pou´n leı´n na´ tho´ other ways of representing tone are used for certain rhymes. it is also good to note that vowels in the myanmar script are always attached to their heads (consonants) they are not letters in their own right, like the a, e, i, o, u of the roman alphabet, and they are not normally written independently. however, a myanmar letter a “အ” (u+1021) is used to write syllables that have no initial consonant, such as “i” written အ , “an” written အန , “oun” written အ န the myanmar letter a “အ” occupies the position of the initial consonant in the written syllable, but is read aloud as “no initial consonant” [8]. from the view of phonology, myanmar syllable has the phonemic shape of c(m)v(ck)d | v(ck)d where c refers an initial consonant, m for medial consonant, v for attached vowels, ck stands for a final consonant ( a combination of consonant c and vowel killer k), and d for the tones respectively. the symbol ( ) means optional. therefore, minimal syllable in phonetic view is cvd or vd. 4.2 myanmar syllable structure in orthographic representation as described in previous section, myanmar script is derived from brahmi script of ancient india and there are other indian-based scripts such as sinhala, bengali, dzongka, thai, khmer and so on. myanmar script is based on “a-vowel accompanying consonant syllabics”, i.e, this syllabary consists of consonant letters accompanied by an inherent vowel. since, in this syllabary, a consonant associated with its inherent vowel can indicate a standalone syllable, it would be appropriate to call it a consonant syllable, but we simply call it a consonant letter for simplicity [15]. thus, only a consonant can be syllable breaking point in orthographic syllabification which means minimal syllable in orthographic view is c which stands for consonant. basically, a myanmar syllable can be described as s = i | n | p | x where s = syllable | = logical operator or i = free-standing vowel syllables n = digits p = abbreviated syllables x = a syllable formed by the combination of up to 5 sub-syllabic groups an additional complexity to myanmar syllable structure is that there are syllables (x) containing up to 5 sub-syllabic groups namely consonant (c), medial consonant (m), dependent vowel (v), asat or a vowel killer (k) and tones (d) and these groups can appear in a syllable as one of the following combinations. table 2. possible combinations within a syllable (x) consonant only consonant followed by vowel consonant followed by consonant consonant followed by medial c cv cck cm cvck cckd cmv cvd cmvd cvckd cmvck cmvckd cmck cmckd these combinations can be described as a regular expression as 6 where the symbol “ ? “ stands for 0 or 1 occurrence of the character. in this expression, the combination of consonant and asat or vowel killer (ck) is called final consonant and the syllables end with this combination are known as closed syllables. further, as with the multiple-component vowels, the user reads the entire syllable as an entity. in myanmar script, 3 vowels and 7 medials which are formed by the combination of other vowels or medials defined in the unicode character chart. and, if we use this characteristics of vowels and medials in writing regular expression, the expression for syllable structure becomes like this x= c m* v* (ck)? d? where the notation ? means 0 or 1 occurrence and * means 0 or more occurrence[19]. 4.3 myanmar words in irregular forms there are also traditional as well as particular writing forms for some myanmar words which are commonly used in the literature. the burmese script is adapted from mon, and ultimately based on indian (brahmi) prototype. burmese scribes have the convention of preserving the original spelling of (mostly) indian loan words and also follow the indian practice of stacking geminate and homorganic consonants [15]. therefore, we generally named the group of words with the above mentioned particular forms and english loan words as irregular throughout the paper for readablilty and simplicity. the details of each irregular form and the correct syllabification for these forms are discussed as follows. 1) kinzi. as mentioned in the previous section, final consonant (ck) can follow the main consonant and the resulting syllable is called closed syllable (cck). in a few words (many of them loanwords), the combination of consonant nga “ ” and vowel killer “ ” is “ ” which is not written on the line as the usual way, but is placed above the first consonant of the next syllable. the name of the reduced form is called kinzi and meaning “forehead rider“ for obvious reasons [7]. some examples are shown in the table below. table 3. syllabification of kinzi words with reduced form of “ “ correct orthographic syllabification meaning အ လန အ # #လန england # ship အ န အ # # န tuesday (note: the symbol # is used to show the syllable boundary throughout this paper) from the phonetic view, sometimes, the kinzi is pronounced with high tone, for example, the word (ship) is pronounced as # by adding diacritic mark “ ” but in sorting order, it is sorted as without diacritic mark. the spelling of the words gives no indication that they are pronounced with a high tone on the kinzi. 2) consonant stacking. there are some final consonants which are not written in the reduced form like kinzi. with all final consonants other than kinzi ( ), instead of the final consonant being forced up and over the next consonant, it is the next consonant that is forced down and under the final consonant. so, if we have a pair of syllables like န and and they appear in a word that requires them to be compressed, then force the main consonant of second syllable under the န , and write as . the same consonants can be stacked and it is known as consonant repetition. 3) loan words. loan words mean the words which adopt the english pronunciation directly and sometimes adding myanmar pronunciation together with english pronunciation. for example, english word “car” is written as “ ” where the first syllable “ “ is english pronunciation and the second syllable “ ” is myanmar pronunciation. 4) great sa. myanmar consonant great sa “ “ is commonly used and the words with great sa is syllabified as + + . for example, the word “problem” in myanmar language is ပ န which is syllabified as three syllables ပ # # ⁠န . 5) contraction. there are usages of double-acting consonants in myanmar and as the name give which acts both the final consonant of one syllable and the initial consonant of the following syllable. for example, the word which means “man” is syllabified as # . there are also some words which can be written in both in standard syllable structure and contracted form, for example, the word “daughter” is written as and . 4.4 myanmar canonical ordering it is possible for a myanmar syllable to have a number of sub-syllabic elements (also known as diacritics in [13]) surrounding a base consonant, independent vowel or digit. since all these diacritics are not spacing, it is important that there is consistent way of storing strings so that applications can work consistently. further, our proposed method is for orthographic syllabification and thus canonical order of the script should be taken into account. myanmar canonical order in unicode encoding is described in [13]. there is a general order of: initial consonant, medial consonant, vowel, final and tone. the following is the example of how myanmar strings are stored according to unicode canonical order and their syllable boundaries. before syllabification : input string stored order sequence အ ပ န (to the bed room door) 1021 102d 1015 103a 1001 1014 103a 1038 1010 1036 1001 102b 1038 1000 102d 102f after syllabification : output string stored order sequence with logical syllable break points အ ပ # န # # # 1021 102d 1015 103a # 1001 1014 103a 1038 # 1010 1036 # 1001 102b 1038 # 1000 102d 102f x = c m? v? (c k)? d? 7 v. our approach automatic syllabification of word is challenging, not least because the syllable is not easy to define precisely. consequently, no accepted standard algorithm for automatic syllabification exists [10]. however, syllabification can be achieved by writing a declarative grammar of possible locations of syllable boundaries in polysyllabic words [8]. on the other hand, finite state machines are widely used in the field of natural language processing. finite state transducers (fsts) have been used ubiquitously in the domain of phonology as well as in morphology for all sorts of string mapping between string descriptions. again, finite state automata and transducers have been used in natural language processing of asian languages for example, morphological analysis of urdu [5] and likewise, formal grammar is used to express sinhala computational grammar [2]. in our approach, first we write syllable structure in regular grammar and then converted it into regular expression as fst program is a set of regular expression. we adopt two-level morphology approach in which the surface string, for example, “boys” is analyzed and produced together with lexical information as “boy”. in our case, our syllabification transducer accepts input string/surface string and outputs the string together with syllable boundary information. grammar rules for independent vowels, digits and abbreviated syllable can be described as s  ဣ | ဤ | ဥ | ဦ | ဧ | ဩ | ဪ s  ၀ | ၁ | ၂ | ၃ | ၄ | ၅ | ၆ | ၇ | ၈ | ၉ s  ၌| ၍ | ၎| ၏ and the syllable with five sub-syllabic elements can be written as s x |…..|အ x # 33 rules for all consonants x a |…… | a # 11 rules for all medials x b |…… | b # 12 rules for all vowels x t |….. |အ t # 33 rules for ending consonants (consonant + vowel killer) x  a  b a t |….. |အ t # 33 rules for all consonants a b  t |….. |အ t # 33 rules for all consonants b  | # 2 ruels for tones b t d d  | # 2 ruels for tones d thus, we developed the transducer which accepts input unicode strings and then output the strings with correct syllable boundaries. firstly, based on the regular grammar as mentioned above, we write the regular expression to recognize myanmar syllables and then construct the orthographic automata for a myanmar syllable amm as amm = c opt(m) opt(v) opt(ck) opt(d) | p | i | n where opt is the just acronym for “optional”. the above automaton accepts one syllable at a time and it can check the combinations of sub-syllabic elements orthographically. then, we construct the syllabification automaton, denoted by asyl , which accepts a sequence of syllables and finds the syllable boundaries correctly. this is achieved by the expression asyl = amm ( # amm)* in this expression, syllable structure represented by amm is followed by zero or more occurrence of the boundary marker (#) and a syllable form amm. the automaton amm accepts the sequence of syllables but we need to transform this into a transducer which inserts a boundary marker `#` after each syllable but not after the last syllable. this is simply achieved by computing the identity transducer for amm and replacing `#` with a mapping ` : # ` in amm. now, the syllabification transducer becomes the syllabification transducer for words with standard syllable structure is shown in figure 1. for irregular words, the stored character sequence is special. it usually uses invisible myanmar sign virama (u+1039) in the input sequence encoding but it is required to output different character or characters according to the types of irregular words in section 4.3. though irregular words are written in tradition forms and complicated, the result of their syllabification turned into standard word syllable structure. we construct the finite state transducers (fst) for each type of syllabification of irregular words respectively and we also show finite state transducer for standard syllable structure in figure 1. fig. 1. finite state transducer for myanmar syllabification tmmsyl = id(amm ) ( : # ) id(amm) 8 vi. experimental results and conclusion 6.1 experiments and results we use stuttgart finite state transducer (sfst) tools for our syllabification transducers although it is primarily concerned with morphology, for example, smor (a large german morphological analyzer). the specification of our syllabification transducer is written in sfst-pl, the programming language of the sfst tools which is a set of regular expressions. as a technical element, it is noticed that the default myanmar keyboard layout in ubuntu is in unicode 4.1 based “myanmar 1” font and the code point values of some characters are different from the unicode 6.1 myanmar character code table. thus it is required to customize the keyboard layout file namely /usr/share/m17n/my-kdb.mim so as to get correct syllabification result. for the test data set in our experiment, we use myanmar orthography published by myanmar language commission which is a standardized system to write myanmar words including rules of spelling [14]. based on the regular grammar for syllable structure of myanmar, we can identify correct syllable boundaries in the given texts. we tested all 11,732 distinct words contained in myanmar orthography corpus yielding 32,283 syllables covering standard and irregular words. details of the results based on the types of word can be found as follows. table 4. details of experimental results no. type of words no. of words correctly syllabified words % of correct syllabificati on for each word type 1. standard words 11,092 11,092 100% 2. irregular words 640 2.1 consonant stacking 2.2 consonant repetition 2.3 kinzi 2.4 great sa 2.5 contraction 2.6 loan words total 253 266 71 26 3 21 11,732 245 266 71 26 3 21 96.83% 100% 100% 100% 100% 100% by checking manually, we found that only syllabifcation of 9 stacked words out of 11,732 words are erroneous. therefore, we received 99.93% of overall accuracy covering both types of word, standard and irregular words. by doing error analysis, the errors are caused by those words in which free standing vowels ဣ (u+1023), consonant stacking and other sub-syllabic elements are mixed and our fst cannot find correct syllable boundaries for these words. the free standing vowels ဣ (u+1023) which is the combination of consonant letter အ (u+1021) and the vowel (u+102d). however, this kind of error can be improved in our transducer and we will solve this issue in our future experiment. we also analyze the syllabification results of our approach and other existing approaches on irregular words. table 5. comparison of syllabification results on irregular words irregular words corpus-based method [6] rule-based method [23] finite state transducer method 1. consonant stacking no no yes 2. consonant repetition no no yes 3. kinzi no no yes 4. great sa no no yes 5. contraction no no yes 6. loan words yes but with some failures yes for regular words yes further, we summarize and compare the accuracy of the developed methods on myanmar syllabification as follows. table 6. summary of myanmar syllabification methods according to the above table, our approach promises the accuracy with 99.93% covering both standard and irregular words. 6.2 conclusion syllabification is an important component of many speech and language processing systems, and this fst-based approach is expected to be a significant contribution to the field, and especially to researchers working on various aspects of myanmar language and other asian scripts. for automatic syllabification of alphabetic languages, spelling does not have well-defined structure. thus, the input texts are transliterated into phonetic symbols and syllabification is done on these transliterated texts, not on the input texts directly. moreover, it is necessary to apply method source data total no. of syllables syllabfi cation of standard words syllabifi cation of irregular words accura cy (%) corpus based longest matching method [6] 11 short novels 70,384 yes pleases refer to table 5. 99.96% rule based method [23] myanmar orthography 32,238 yes please refer to table 5. 99.96% finite state transducer method myanmar orthography 32,238 yes yes 99.93% 9 additional information using dictionary or annotated corpus and even fsa-based approach could be applied in statistical way. myanmar script is indic script in origin like lao and thai. myanmar syllable structure is well-defined and unambiguous. and thus, it could be represented in finite state model. in general, finite state language processing becomes popular because of their simple, elegant and efficient computational power. from computational point of view, finite state based methods achieve a good performance and they are suitable for application more interested in speed and memory footprint. many works have been done for myanmar syllabification but fsa approach has not yet been applied to myanmar. this paper proved that fsa approach gives significant solution for syllabification of myanmar language as we achieve correct syllabification without applying step-by-step rules and the need of corpus. in other words, our proposed method is neither heuristic approach nor annotated corpus-based approach. further, it could handle both regular and irregular syllable structures of myanmar with acceptable performance. and we hope that it could be applicable to automatic syllabification of other syllabic writing systems. in our further study, we will test our syllable segmentation fst on real online texts to evaluate the accuracy of the proposed approach and the evaluation process will be automated as a future improvement. references 1. bilal arram. analysis of urdu syllabification using maximum onset principle. http://crulp.org/publication/crulp_report/cr02_18e.pdf 2. chamila, l., randil, p., dulip, l. herath and ruvan w. (2012): a computational grammar of sinhala. in the proceeding of 13 th international conference on computational linguistic and intelligent text processing (cicling), mumbai, india. 3. connie r. adsett. (2008) : automatic syllabification in european languages: a comparison of data-driven methods, master thesis dissertation, dalhousie university, halifax, nova scotia. 4. gosse bouma, ben hermans. syllabification of middle dutch. http://alfclul.clul.ul.pt/crpc/acrh2/acrh-2_papers/bouma-herm ans.pdf 5. hassain, s. (2004): finite state morphological analyzer for urdu. master thesis, national university of computer and applied science, lahore, pakistan. 6. hla hla htay, murphy kavi narayana. (2008). myanmar word segmentation using syllable level longest matching. in proceedings of the 6 th workshop on asian language resources, january 11-12, hyderabad, india. 7. john okell. (1994): burmese, an introduction to the script. northern illinois university press. 8. john okell. (2002): burmese by ear, audio-forum, sussex publications limited, microworld house, 4 foscote mews, london w9 2hh. downloaded at http://www.soas.ac.uk/bbe/ (2013, february 23) 9. kiraz, g.a., m¨obius, b. (1998): multilingual syllabification using weighted finite-state transducers. proceedings of the third international workshop on speech synthesis. jenolan caves, australia, pp. 71–76. 10. le hong p., nguyen thi m.h., azim r., ho tuong v. (2008): a hybrid approach to word segmentation of vietnamese texts. in proceeding of the 2 nd international conference on language, automata theory and application, lata 2008, 240-249. 11. marchand, y., connie a., damper r. (2007) : evaluation of automatic syllabication algorithms for english. in proceedings of the 6th international speech communication association (isca) workshop on speech synthesis. 12. maimaitimin saimaiti , zhiwei feng, a syllabification algorithm and syllablestatistics of written uyghur, ucrel.lancs.ac.uk/publications/cl2007/paper/153_paper.pdf 13. martin hosken. (2012): representing myanmar in unicode, details and example. available at http://www.unicode.org/notes/tn11/ (2013, february 23) 14. myanmar language commission. (2006).: myanmar orthography. third edition, university press, yangon, myanmar. 15. peter t. daniels, william b. (1996): the world writing systems, oxford university press. 16. phonpasit and et.al. syllabification of lao script for line breaking. http://www.panl10n.net/english/outputs/working%20papers/lao s/microsoft%20word%20-%206_e_n_296.pdf 17. ruvan w., asanka w., and kumudu g. (2005): a rule based syllabification algorithm for sinhala, proceedings of 2nd international joint conference on natural language processing (ijcnlp-05), p. 438-449, jeju island, korea. 18. schmid, h. (2005): a programming language for finite-state transducers. in yli-jyrä, a., karttunen, l., and karhumäki, j., editors, finite-state methods and natural language processing fsmnlp 2005. 19. tin htay hlaing. (2012): manually constructed context-free grammar for myanmar syllable structure. in the proceeding of the european chapter of the association of the computational linguistics(eacl), student research workshop. 20. win, kyawt yin. (2011): myanmar text-to-speech system with rule-based tone analysis. phd dissertation, university of ryukyus, okinawa, japan. 21. y.a. el-imam and z.m. don. (2000) text-to-speech conversion of standard malay.international journal of speech technology 3, kluwer academic publishers, pp. 129-146. 22. yoshiki mikami.: world of scripts in asia available at http://gii2.nagaokaut.ac.jp/ws/indic.html (2013, january 23) 23. zin maung maung, mikami yoshiki. (2008): rule-based syllable segmentation of myanmar texts. in proceedings of the 6 th workshop on asian language resources, january 11-12, hyderabad, india. http://crulp.org/publication/crulp_report/cr02_18e.pdf http://www.easychair.org/utils/wild.cgi?url=http://www.ucsc.lk http://www.easychair.org/utils/wild.cgi?url=http://www.ucsc.lk http://www.easychair.org/utils/wild.cgi?url=http://www.ucsc.lk http://www.easychair.org/utils/wild.cgi?url=http://www.ucsc.lk http://alfclul.clul.ul.pt/crpc/acrh2/acrh-2_papers/bouma-hermans.pdf http://alfclul.clul.ul.pt/crpc/acrh2/acrh-2_papers/bouma-hermans.pdf http://www.soas.ac.uk/bbe/ http://www.unicode.org/notes/tn11/ http://archive.org/search.php?query=creator%3a%22daniels%2c+peter+t.%3b+bright%2c+william%22 http://www.panl10n.net/english/outputs/working%20papers/laos/microsoft%20word%20-%206_e_n_296.pdf http://www.panl10n.net/english/outputs/working%20papers/laos/microsoft%20word%20-%206_e_n_296.pdf http://gii2.nagaokaut.ac.jp/ws/indic.html%20%20(2013  1 henrik hansson, jan moberg, ranil peiris department of computer and systems sciences, stockholm university, sweden abstractthis article focuses on how to empower the initiation stage in the thesis supervision process with information and communication technology. starting a large number of theses with available resources based on creative ideas is a challenging task. another challenge is to connect student’s theses with industry's interests. the scipro ict support system for thesis supervision was developed over a five-year period. the most important task in the thesis initiation stage is matching students, supervisors, and ideas. business partners and administrative staff are other stakeholders who should interact in this process. although the choice of an idea is the responsibility of students, and seems simple, it in fact requires a series of academic and administrative support processes. there is a lack of it systems specifically address thesis supervision, especially the initiation stage. the scipro system was constructed to bridge this gap. the initiation stage is the foundation of a quality thesis, and it highly affects the quality of a thesis. the research approach was based on a design science method. prototyping, testing, demonstrations, and user evaluations were conducted throughout. data collection methods for user evaluations included interviews, observations, focus-group discussions, and log data. this paper presents the scipro it system, which was developed to support the start of quality theses. this system adds value, saves time and increases the quality of the thesis initiation stage. the process implemented rewards supervisors and students by providing a high degree of freedom, control, and selection of relevant topics. it enables both automatic processes for previously timeconsuming work and qualified manual operations which can be controlled by administrators according to their needs. scipro can also be used to improve industry-university collaboration on thesis production. index terms — thesis initiation, supervision, matching, business, university, ideas, innovative, management, it i. introduction and aim rom a small seed, a mighty trunk may grow (aeschylus, 526–456 bc). in research, ideas are seeds that can produce fruitful trees, and without good ideas all scientific work is fruitless. students starting to write their theses (bachelor’s, master’s, or phd) face the same challenge as other researchers: how can i find a good topic? where shall i start? the idea needs to be relevant, interesting, and manageable within the available time frame. the initial part of a research project, creative idea generation, is not subject to any methodological procedure. ideas are created in many diverse ways. personality, experience, an open mindset, and curiosity play a role. structure and stimuli also help the process. there needs to be a balance between freedom and structure in order to facilitate good ideas. a student’s thesis is a sustainable way of promoting industry-university collaboration through real world projects. it also offers a possibility of selecting realworld projects and real project ideas from industry, which is beneficial for both university and industrial stakeholders. a successful thesis project contains three steps: (1) project initiation, (2) supervision, and (3) utilization. in the initiation step, one or more students should select a research problem and then one or more supervisors should be assigned to the project. then the supervision process starts. supervisors, students, and other interested parties (reviewers and peers) communicate with each other until the completion of the project. finally, the thesis findings should be published for the benefit of society (utilization). there are a number of issues and opportunities in the project supervision process that should be addressed to enhance the quality of this process. this article focuses on the first step, project initiation. if the initiation is not carefully elaborated, it will adversely affect project completion. there are several problems and opportunities, which are usually not considered in the project initiation phase. for a successful thesis, three components should be properly matched: the project idea, one or more students, and one or more supervisors. there is a risk otherwise that projects will not really benefit society. students may select any kind of research project that satisfies the basic course requirements. supervisors tend to accept students’ projects if they fit into their own knowledge area, and they do not have enough time to guide them in choosing a useful project. additionally, students and supervisors lack the resources and time to develop a dynamic information link to identify current industry issues. the duty of university administration is limited to registration and course administration. although industry and society can provide research questions constituting useful points of departure for thesis projects, there is no proper link between industry and its university partners. hence, the majority of project ideas will stay on desks as memos without ever reaching the appropriate community. similarly, valuable research findings remain in academic reports because of the lack of proper links between industry and university. aim: to describe and analyse the information and communication system support for the thesis initiation stage. scipro matching: ict support to start a quality thesis f henrik hansson (associate professor) is with the department of computer and systems sciences, stockholm university – sweden and coordinator of ict4d research and senior researcher in technology enhanced learning (tel). (e-mail: henrik.hansson@dsv.su.se) jan moberg (it manager) is with the department of computer and systems sciences stockholm university – sweden and it director. (email: jan.moberg@dsv.su.se) colombage ranil peiris (phd student) is with the department of computer and systems sciences, colombo sri lanka and stockholm university, sweden, and senior lecturer in university of sri jayewardenepura, sri lanka (e-mail: ranil@dsv.su.se) mailto:ranil@dsv.su.se 2 ii. related work initiating a thesis is a time-consuming and complex process. students have to access many types of information sources and consider a number of influencing factors [1] such as the tasks handled by an administrator. according to isaak and hubert [2], ‘good’ thesis topic selection is a critical thinking and filtering process that should be done by the student with advice from a supervisor. the chosen topic will affect the success of the project, so its selection is very important [3][4] and should be managed in the same way as other steps in the process. almost all theses management information systems ignore this aspect or fail to provide enough support for it. there are only a few research studies of online thesis supervision systems, and a brief overview is presented in the next section. richard [5] described a research supervision system project implemented at makerere university in uganda. although the system diagram showed that supervisors' specializations were listed and students could submit concept papers, there was no proper matching system or ‘idea bank’ concept (see section iv) to support the project initiation step. according to the author, the system was an intranet system and was unable to satisfy essential requirements. colbran [6] implemented a supervision support system for phd supervision using collaborative supervision of doctoral theses. a supervision cell (website for supervision support) was implemented by means of an action research approach. the website had seven main elements; (1) project management, (2) reflective journal, (3) exercises, (4) discussion forum, (5) private correspondence files, (6) resource websites, and (7) course material database. in this project, ict was not used for the idea-matching process. the department of management at the durban university of technology (dut), south africa [7] implemented a web-based (webct) system for postgraduate research management as a blended approach in 2005 and 2006. the findings showed that it improved the supervision process, reduced the administrative workload of the supervisor, and created a dynamic record of the supervision process. the results to date imply that traditional supervision practice needs to be revisited and modified to include digital procedures. the research study was, however, focused mainly on communication and data recorded with ict and ignored the initial stage of the thesis process. another research study by mackeogh [8] conducted at dublin city university found that it was possible to use learning methodologies to provide a supportive environment for students embarking on undergraduate research. moodle was used as the technical system and a conference module for communication between students and supervisors. the paper outlined the approach to research supervision adopted in a distance education psychology module, which combined online supervision, face-to-face meetings, and peer supervision. this research did not consider the matching function and its complexity. additionally, learning management systems (lms) do not specifically support the thesis supervision process, especially the initial matching part of the process. although there are some well-developed lms in the e-learning industry, they do not manage thesis supervision as a special module, although there are a few tools that can be used with limited functionality [8]. the support offered by standard lms for the thesis supervision process is not adequately developed or specific enough. foster and gibbons [3] highlighted the importance of selecting a good title as a way to increase research interest. they noted that poor choice of a topic and problems with developing a topic were obstacles to production of a good research paper. lei argued that the selection of a thesis topic was a time-consuming and complex process, and stated that ‘students have to access many types of information sources and have to consider a number of influencing factors’ [1]. hansson and colleagues [9] originated discussion on the use of ict for thesis supervision. the important point of this study was that they identified four stakeholders and highlighted the importance of collaboration among the stakeholders for a quality thesis. aghaee and others [10] studied the issues that emerged during the thesis process and noted that one of the main issues was project initiation. with regard to the importance of topic selection, harrison and whalley [4] stated the following in the light of their survey results on undergraduate research: ‘from both the staff and student perspective, deciding on the right topic of study is fundamental. students recognize that the topic should be something that really interests them and will motivate them for sustained study. students valued the freedom to choose their topic of study yet also identified that a failure to get it right threatened further study’. the council of graduate schools (1990), cited in donald and colleagues [11, p. 74], suggested that there were two major factors in the supervision of graduate research students. the first and more important had to do with creativity and involved the ability to select problems, to stimulate and enthuse students, and to provide a steady stream of ideas. a. what is supervision? supervision is a subtype of pedagogy, and it basically focuses on one or a few students per supervisor. supervision refers here to processes that academics use to support students’ learning as defined by maxwell and smyth [12]. generally, ‘advisor’ is the term used in north america and ‘supervisor’ in those countries with a british higher education tradition. henceforth, the terms supervisor/supervising are used. connell [13] suggested that supervision was one of the most complex and problematic pedagogical methods and led to a high dropout rate [14]. she observed that both students and supervisors failed to identify supervision as a method of teaching. she argued that it was genuinely a complex teaching task and, like other forms, raised questions about curriculum, method, teacher/student interaction, and educational environment. it also required a substantial commitment of time and energy. 3 iii. methodology sherman and webb [14] observed that three concepts were at the core of qualitative methods: holism, context, and validity. in fact, the need to achieve and increase validity was the reason for selecting a qualitative method when researching certain topics where it was not possible to gain valid results by using a quantitative approach. figure 1 below outlines the differences between types of research questions and the connection to research approaches. if the issue is how people ‘think’, observations of behaviour are not enough. one needs to talk (interview, discuss) in order to understand attitudes. what people say they want is not always what they actually do, however, so actual observation of behaviour is also needed. fig. 1 overview of research approaches and connection to types of research questions (adapted from rohrer, 2008) [16]. the scipro system, which was developed at stockholm university, was selected as a case study. mccaslin and scott defined case study research as ‘an in-depth study of a bounded system with the focus being either the case or an issue illustrated by the case(s)’ [15]. creswell and colleagues [16] defined it as ‘a qualitative approach in which the investigator explores a bounded system (a case) or multiple bounded systems (cases) over time, through detailed, in-depth data collection involving multiple sources of information (e.g., observations, interviews, audiovisual material, and documents and reports), and reports a case description and case-based themes’. furthermore, they discussed three variations in terms of intent: the single instrumental case study, the collective or multiple case studies, and the intrinsic case study. a single instrumental case study was selected. in such a case study [17], the researcher focuses on an issue or concern, and then selects one bounded case to illustrate the issue. research can be very generally defined as an activity that contributes to the understanding of a phenomenon [18]. niiniluoto emphasized that sciences that explain and interpret the world were largely studied by philosophers [19]. he suggested paying attention to sciences which change the world and that design science was a methodology that was used to understand complex phenomena with the aid of an artefact. hevner and chatterjee [20, p. 5]] defined design science research (dsr) as follows: ‘design science research is a research paradigm in which a designer answers questions relevant to human problems via the creation of innovative artifacts, thereby contributing new knowledge to the body of scientific evidence. the designed artifacts are both useful and fundamental in understanding that problem’. this paper will answer questions related to how the thesis process can be improved. the theme of this research is how ict can be used for supporting the initiation stage. the scipro case was combined with design science to address the research question. peffers and colleagues [21] highlighted six steps in design science research methodology with reference to seven papers published in the field. this paper follows these steps with the case study approach to discuss the importance of an ict support system for the thesis initiation stage. • activity 1: problem identification and motivation • activity 2: defining the objectives for a solution • activity 3: design and development • activity 4: demonstration • activity 5: evaluation • activity 6. communication during a five-year period, the authors had many interactions with staff and students both informally (daily conversations, drop-in, problem-solving, etc.) and formally (development meetings, trials, workshops, evaluations, specific interviews, questionnaires, interaction with students as supervisors, etc.). this interaction accumulated information about stakeholder perceptions, needs, attitudes, and problems. based on these interactions, the system was developed and redesigned several times. there was not always consensus between students, supervisors, and administrators about how to do it. furthermore, among such stakeholders there were several opposing views and interests. the authors needed to accommodate expressed needs so that most people were satisfied, and selected the procedures believed to provide the best quality and efficiency. regarding context, the system was designed specifically for the department of computer and systems sciences at stockholm university, sweden. this meant that it was adapted to the swedish model of higher education (requirements according to the higher education act and other laws and regulations; see swedish national agency for higher education) [22]. the authors adapted the system to: 4 • the student profile at the department • the staff and organization at the department • the it infrastructure already available at the department and it systems provided by stockholm university at the central level, such as digital access to library services, etc. in fact, the matching system was accessed through single sign-on and integrated with more than 30 different it systems. multiple data collection methods were used: interviews, observations, focus group discussions, workshops (with demonstrations, prototypes, design mock-ups, etc.) and log data. additionally, emails and drop-in, and face-to-face support sessions generated critical issues and identified user experiences for consideration. iv. results and discussion as a result of continuous development, scipro version 3 provides wide-ranging facilities for the project initiation stage. hansson and colleagues [23] discussed matchmaking facility evolution in scipro (versions 1 and 2). as illustrated in figure 2, this stage consisted of four interrelated systems; (1) register (daisy), (2) idea bank, (3) match, and (4) the supervision process. in version 1, the register system and supervision process system were integrated and the others were separate systems. in version 2 all the systems were integrated except the idea bank. the preceding supervision support system was developed in parallel with another system with resources containing information, instructions, learning material, templates, grading criteria, and faqs for both thesis writing students and supervisors. a unification process and considerable system maturity took place during a five-year period that significantly increased the efficiency, simplicity, transparency, and quality of the initial phase of the thesis work. a. registry it was very important to match the new system with the existing information system for several reasons. technically it was easy to implement and reuse data and resources. from a user’s perspective, it was essential to get his/her support and to reduce resistance to the new system. scipro was designed to use the general information system for student and supervisor registration. also, this integration was essential to reduce the workload of the thesis administration staff. figure 2 shows the registration system within scipro. registered students in daisy could access the system for thesis supervision. the process started when an administrator activated an application period in the system. see figure 2: number 1. b. idea bank the idea bank subsystem facilitated the storage of ideas from potential idea sources. figure 3 illustrates the concept with several idea sources. the institution could decide which sources were going to link with the idea bank as a source: see figure 2: number 2 where students uploaded their thesis project ideas. in the current practice, idea creation was limited to specific admission times in the academic calendar. a flexible thesis start could be introduced, opening admissions for the whole year. this eliminated the waiting time to register a new thesis and helped to get more industry projects into the system. the management of staff time needed a new business model to incorporate such flexibility, however. table 1, column 3 shows the registered ideas for the last semesters supervisors administrators students idea bank reviewers register match supervision system 3 7 10 2 4 6 9 5 11 10 1 fig. 2 matching procedures: functions 1-12 outlined above are explained in the article text. 8 12 5 students sent their ideas to the matching system in a short overview format. the template suggested by watson was used, [24] referred to as a ‘watson´s box’ (see table 2). this format provided a holistic picture of the general idea, methods, and practical aspects of a thesis project. students also labelled their project idea with keywords, a research area, the languages they could receive supervision in, and a preliminary title. students could choose to label their project idea with a confirmed supervisor (a supervisor who had agreed to supervise the project), a preferred supervisor (when a student wanted a certain supervisor, but had not yet agreed it with the supervisor), and an external supervisor (from an external organization). these steps provided enough information at this stage and were used for the allocation of a project idea to a suitable supervisor who is active within a particular research area. students’ project ideas were matched in the system with available supervisors, who had indicated their research activities with keywords in the match system (see figure 2: number 3 for supervisors' ideas for thesis topics). supervisors were of course more interested in supervising students within their research field than otherwise. students, however, did not know what the supervisors were researching or their particular foci and projects. from a quality perspective, it was also very important to connect research with education, and the thesis work was very suitable for this purpose. the student received a meaningful context for the thesis work and the supervisor received an additional collaborator for current research activities. to facilitate this effect, the authors modified the system to accommodate supervisors’ ideas for theses. initially, as seen in table 1 above, the ideas from supervisors were few but as soon as the users realized the potential and followed a new policy that stated ‘each supervisor needs to create at least three supervisor ideas (thesis topics) for the idea bank’, the idea bank began to be beneficial. since not all supervisors were on duty every semester and they had different targets (the number of theses a supervisor could supervise in an application period), the policy was changed to ‘at least the same amount of supervisor ideas as the supervisor has targets in the current application period’ (see figure 4). two hundred twenty seven supervisor ideas were available for students, and a total of 457 supervisor ideas were created between september 2012 and december 2013). two hundred and fifty of the ideas were created for bachelor’s and 207 for master’s degrees. the aim was to create a pool of ideas not only from supervisors but also from other sources (see figure 3). table 1. number of supervisors, students and project ideas matched regarding thesis work 2010-14 period number of supervisors number of students number of ideas in idea bank matched thesis project ideas* 2010 -11 81 311 (master) 390 (bachelor) autumn 2010: 0 spring 2011: 5 autumn 2011: 200 300 (master) 220 (bachelor) 2011 -12 46 175 (bachelor) spring 2012: 224 95 (bachelor) 2012 -13 77 113 (master) 405 (bachelor) spring 2013: 221 113 (master) 217 (bachelor) 201314 94 90 (master) 406(bachelor) 290 90 (master) 215 (bachelor) fig. 3 idea bank 6 fig. 4 supervisor must prepare ideas an important new aspect was that students were informed one semester before the thesis course started that they needed to come up with a project idea in the form of a ‘watson´s box’ before the deadline in order to find a supervisor (see figure 5). since time was needed to generate good ideas, this ‘thinking time’ significantly improved the suggested thesis topics. it also saved time when the thesis course actually started because the students already had an idea and a prototype plan as well as an informed and prepared supervisor. also, students might have wanted to investigate opportunities to connect their thesis work with a business need, and this connection needed to be developed in advance. the actual project plan was developed in detail together with the supervisor when the course started. in some cases, the specific project plan was an elaboration of the project idea, and in other cases it was based on the supervisors’ advice and knowledge, which might be a radically different plan. c. the matching system after collecting ideas, the next step was matching supervisors, ideas, and students to form projects. the idea bank consisted of ideas, students, and supervisors, and these elements had to be matched to start a good thesis. at the bachelor’s level an additional step was required for pairing students for a group thesis. at the bachelor’s level students wrote together in pairs but some students did not know other students willing to co-work on a thesis with a topic in which they were jointly interested. a matchmaking forum was needed, and it needed to be as automatized as possible because supervisors or administrators did not have the time or knowledge to match students with similar interests. see figure 2: number 4, where students used the project partner tool to find a partner for projects. the authors managed to find a quick and dirty solution by reusing a student forum built for social networking. it had a different layout but did the work. see figure 6 there were 112 messages in the project partner portal from september 2011 to january 2014. fig. 5 screen shot (‘watson´s box’): structure for project idea fig. 6 screen shot: ‘find partner’ exemptions were needed for other reasons than not finding a partner with whom to write a joint thesis. for instance, circumstances might dictate writing alone, or a student might need exemptions because of different educational background see figure 2: number 5. exemptions. in version 3, the matching between students and supervisors was more direct and used several methods. in the first and second methods, students and supervisors selected ideas by themselves and hence they could be considered as direct selection methods. two other methods are indirect methods. all methods are explained in the table 3 and immediately below. table 2 structure for a student´s project idea (watson´s box). what? why? what puzzles and intrigues me? what do i want to know more about or understand better? what are my key research questions? why is this of enough interest for the library shelves or my organization? is it a guide to practitioners or policymakers? is it a contribution to knowledge? how conceptually? how practically? what models, concepts, and theories can i draw upon? how can i develop my own research, questions and create a conceptual framework to guide my investigation? what research methods and techniques shall i use to apply my conceptual framework (to both gather and analyse evidence)? how do i gain and maintain access to information sources? 7 1) student selected students could be matched directly with a supervisor if they selected a supervisor’s idea in the idea bank. see figure 2: number 6. students could select ideas from a list uploaded by supervisors. 2) supervisors selected supervisors could select students directly based on their project ideas or by prior agreement (see figure 2: number 7). supervisors were able to select ideas from a list uploaded by students. 3) auto match the system matched students and supervisors automatically. an automatic match considered more rules than the ones visible to administrators and could only be changed by a system developer. for example, master’s theses were matched before bachelor’s, and supervisors who had more available supervisions were matched first if the points were equal. auto match was triggered by administrators (figure 2: number 8) and generated a match that could be investigated manually before saving. the algorithm could be changed when necessary for a more appropriate matching result. 4) administrators selected unit administrators allocated thesis projects to supervisors manually by allocating numbers and persons in the system, figure 2 number 9 illustrates this method, and only if direct contact between students and supervisors is insufficient. administrator matching in the system could be done fast and easily when the administrator had good knowledge about the supervisors. direct matching further facilitated the introduction of an instant notification system. notifications told supervisors when students added ideas that matched a supervisor’s area of interest. also, when supervisors added ideas, the system generated messages to students who had similar research interests. this was an additional option for making ideas noticeable quickly and increasing the number of direct matches in the system. all four methods had inherent pros and cons, as summarized in table 3. it was assumed that the first and second methods contributed more than the other two methods from a motivational perspective. arguably, a supervisor or student selected an idea because of his or her motivation. the publisher of that idea was already motivated, and when he or she was matched with a motivated partner, it constituted the highest level of motivation that could be expected from the perspective of students and supervisors. there was no external force or intervention for selecting and matching ideas, and scipro provided the necessary infrastructure for direct selection. figure 7 illustrate this selection. although auto matching was fast and impersonal, the main problem was inappropriate keywords for ideas and research areas. students could have selected the wrong research area and/or keyword and found it hard to locate good matching criteria that reflected their research interest. the administrator assign method was the last solution when other methods were not applicable. figure 8 depicts manual-matching method . the issues with this method were middleman involvement, high workload, and also the need for an administrator with good knowledge of all supervisors and keywords. originally the supervisors had to accept or reject the suggested ideas. since many supervisors were slow to accept the project ideas, the settings were changed so that the supervisor received the project ideas directly without the chance to accept or reject. fig. 7 screen shot: ‘select idea’ additionally, in version 3, the idea bank was more integrated so that matching occurred immediately when students selected a supervisor´s idea. at the same time, the supervisor spent less time on supervision and this was automatically registered in the system. table 3. student-supervisor matching procedure in scipro and its pros and cons method pros cons 1) student selects supervisor motivated (their idea). no middle man. not enough supervisor ideas for all students. sometimes students cannot find an interesting idea. 2) supervis or selects supervisors motivated (they choose the idea). no middle man. supervisors do not think they have time to choose student ideas 3) auto match fast, impersonal, can be done by administrative staff with no special skills hard to find good matching criteria, students often choose wrong research area and/or keyword 4) adminis trator assigns handles all kinds of problems like too few supervisors in a research area and special prerequisites. middle man involvement, high workload. need an administrator with a good knowledge of all supervisors. 8 fig. 8 screen shot: ‘manual match’ figure 9 sources hold and as shown in table 1, the system matched a considerable number of students, project ideas, and supervisors. with a manual mode of operations it would take several months to administer the documents and communicate with all actors. furthermore, the system was up to date in real time and transparent regarding who was allocated to supervise whom and what topics were suggested. fig. 9 screen shot: ‘match status’ the number of ideas matched in the system, moving from a student ‘door knocking’ mode (asking supervisors if they could supervise), which was unstructured, time-consuming, and also frustrated both students and supervisors, to a more flexible and informative it-support system, needed further attention. a number of activities, processes, and functions were identified based on needs expressed by students, supervisors, and administrators. the needs were prioritized in the following order; students’ needs first, supervisors needs second, and administrators’ needs third. the department benefited from this system since was easily accessible and maintainable. fig. 10 screen shot: ‘matched project details’ when the supervisor agreed to supervise a student, he/she could disclose his/her identity to the student before the course started. otherwise, it would be released in the system automatically and be visible to the student when the course started. it was a problem in version 1 and 2 that a few supervisors did not contact their students in time and the students did not know who their supervisors were. this caused a lot of student frustration and some internal staff conflicts; how long should a supervisor wait to respond? who was responsible for correcting it? a re-allocation of supervisors was not possible because of a resource shortage regarding available supervisors and management time. also, a new feature in version 3 introduced a built-in template for a ‘first meeting’ at course start, when the supervisor filled in the date and location of the meeting, since in some cases, the first meeting had been significantly delayed (figure 2: number 10). now non-implemented first meetings were visible in the system and actions could be taken in time to prevent delays (see figure 10). in version 3, the integration was more complete, including automatic registration of new projects in the administrative system (daisy), which in earlier versions had to be done separately by supervisors. this registration was necessary because otherwise: (1) the project would not be officially started, (b) grading could not take place, and (3) students could not get access to the subsequent support functions in scipro. the actual project plan was written with the help and advice of the allocated supervisor. students wrote a research plan between one and a half and three pages long (figure 2: number 11) with the following structure: (1) preliminary title, (2) background, (3) aim and research questions/problem statement, (4) methods and material, (5) expected results, (6) significance, (7) time and activity plan, and (8) references. once the supervisor approved this plan (figure 2: number 12) the student proceeded to the next step (description is not within the scope of this article). d. the supervision support system once a new project was registered both supervisors and students could start communicating in the thesis support system. the it support developed to facilitate the supervision process is not within the scope of this article but see hansson [25], hansson and moberg [26], hallberg, and colleagues [27], larsson and hansson [28], and hansson and colleagues [9]. 9 evaluations revealed that better communication between supervisors, reviewers, and students was needed. the matching of reviewers (senior academics acting as mentors and evaluators) has recently been implemented, but only manual matching by an administrator so far. in addition to the features discussed above, scipro provides a unique space for organizing useful resources for starting a quality thesis. the information and resources section has a collection of information and tools that can be used by students to meet their requirements. figure 11 shows a summary of resources available in the current system. v. conclusions in order to produce a good thesis, a good start is very important. student-idea-supervisor matching is the essence of starting a good thesis and the administration of this process is very complex. although information technology is widely used in higher education, there is a lack of it support in thesis supervision. scipro is suggested as a model for discussion, and future developers should be able to enhance the features elaborated in this discussion. an internet-based idea bank as a repository for ideas and as a management system to facilitate matchmaking between students and supervisors is an important initial part in the process of creating quality theses. the idea bank can be linked with external sources to enhance the richness of ideas. matching is a complex process, and information technology can be used to manage this process and reduce the burden of administration. scipro is a model work for matchmaking and can be developed to meet future needs. in addition to the technical aspects, human behaviour is very important for the smooth functioning of the system. hence it is essential to consider the stakeholders’ requirements and fine-tune the system to match requirements as much as possible. students and supervisors who can select each other based on mutual interests (ideas) directly (without a middle man/administrator) is the best procedure from a motivational perspective. auto match is fast but could be problematic because of inappropriate keyword selection by the students. although administrative selection is comparatively problematic, it is the best solution when other methods are not applicable. policy on procedures is also an important factor; it includes deciding on rules and regulations, roles and responsibilities as well as incentives and the consequences of not complying. scipro creates an ictenabled supporting environment for four different matching methods between students and supervisors, depending on policy. the analysis shows that scipro is an ict-enabled, flexible structure which supports the starting stage of many theses with quality and efficiency. vi. acknowledgements the authors would like to thank khalid bencherifa for the artistic design of the figures ‘idea bank’ and ‘research approaches’ and william jobe for language corrections. they also thank staff and students at the department of computer and systems sciences for their valuable feedback and participation. fig. 11 screen shot: ‘resources’ 10 vii. references [1] s. a. lei, “strategies for finding and selecting an ideal thesis or dissertation topic: a review of literature,” coll. stud. j., vol. 43, pp. 1324–1332, 2009. [2] d. j. isaak and w. a. hubert, “catalyzing the transition from student to scientist: a model for graduate research training,” bioscience, vol. 49, no. 4, p. 321, apr. 1999. [3] n. f. foster and s. l. gibbons, studying students: the undergraduate research project at the university of rochester. assoc of cllge & rsrch libr, 2007, p. 99. [4] m. e. harrison and w. b. whalley, “undertaking a dissertation from start to finish: the process and product,” j. geogr. high. educ., vol. 32, no. 3, pp. 401–418, sep. 2008. [5] m. richard, “a research project supervision system. makerere university, uganda.,” 2008. [online]. available: http://dspace.mak.ac.ug/handle/123456789/627 . [accessed: 18-jul-2012]. [6] s. colbran, “collaborative supervision of legal doctoral theses through e-learning,” univ. new engl. law j., vol. 1, no. 1, 2004. [7] m. de beer and r. b. mason, “using a blended approach to facilitate postgraduate supervision,” innovations in education and teaching international, vol. 46. pp. 213–226, 2009. [8] k. mackeogh, “supervising undergraduate research using online and peer supervision,” 2006. [online]. available: http://doras.dcu.ie/82/. [accessed: 11-jul2012]. [9] h. hansson, j. collin, k. larsson, and g. wettergren, “sci-pro improving universities core activity with ict supporting the scientific thesis writing process,” in sixth eden research workshop budapest, 2010, 2010. [10] n. aghaee, u. larsson, and h. hansson, “improving the thesis process,” su.diva-portal.org, 2008. [online]. available: http://iris.im.uu.se/wpuploads/2012/08/iris2012_submission_66.pdf. [accessed: 30-jan-2014]. [11] j. g. donald, a. saroyan, and d. b. denison, “graduate student supervision policies and procedures: a case study of issues and factors affecting graduate study,” can. j. high. educ., vol. 25, no. 3, pp. 71–92, 1995. [12] t. w. maxwell and r. smyth, “research supervision: the research management matrix,” high. educ., vol. 59, no. 4, pp. 407–422, sep. 2009. [13] r. w. connell, “how to supervise a phd,” vol. 28, no. 2, pp. 38–42, 1985. [14] r. sherman and r. webb, qualitative research in education focus and methods. london ; new york: falmer press, 1988, p. 217. [15] m. mccaslin and k. scott, “the five-question method for framing a qualitative research study,” qual. rep., vol. 8, no. 3, pp. 447–461, 2003. [16] j. w. creswell, w. e. hanson, v. l. clark plano, and a. morales, “qualitative research designs: selection and implementation,” couns. psychol., vol. 35, no. 2, pp. 236–264, mar. 2007. [17] r. stake, “the art of case study research,” 1995. [18] t. kuhn, the structure of scientific revolutions (3rd ed.). chicago, il, us: university of chicago press, 1996, p. 212. [19] i. niiniluoto, “the aim and structure of applied research,” erkenntnis, vol. 38, no. 1, pp. 1–21, 1993. [20] alan hevner and s. chatterjee, design research in information systems: theory and practice. springer, 2010. [21] k. peffers, t. tuunanen, m. a. rothenberger, and s. chatterjee, “a design science research methodology for information systems research,” j. manag. inf. syst., vol. 24, no. 3, pp. 45–77, dec. 2007. [22] “laws and regulations swedish national agency for higher education.” . [23] h. hansson, j. moberg, and r. peiris, “how to use scipro. an it-support system for scientific process: management of ideas to finished theses,” in international conference on advances in ict for emerging regions, 2012, pp. 111–121. [24] t. j. watson, “managing, crafting and researching: words, skill and imagination in shaping management research,” br. j. manag., vol. 5, no. s1, pp. s77–s87, dec. 1994. [25] h. hansson, “4-excellence: it system for theses,” in going global internationalishing higher education, 2012, pp. 13–15. [26] h. hansson and j. moberg, “quality processes in technology enhanced thesis work,” in 24th icde world conference on open and distance learning, 2011, pp. 2–5. [27] d. hallberg, h. hansson, j. moberg, and k. p. hewagamage, “scipro from a mobile perspective: technology enhanced supervision of thesis work in emerging regions,” in aitec east africa ict summit, 2011, pp. 2–3. [28] k. larsson and h. hansson, “the challenge for supervision: mass individualisation of the thesis writing process with less recourses,” in online educa berlin 2011 17th international conference on technology supported learning & training, 2011. ieee paper template in a4 (v1) developing a community-based knowledge system: a case study using sri lankan agriculture anusha indika walisadeera*#1, athula ginige$2, gihan nilendra wikramanayake#3 *university of ruhuna, matara, sri lanka 1waindika@cc.ruh.ac.lk $school of computing, engineering & mathematics university of western sydney, parramatta campus, nsw, australia 2a.ginige@uws.edu.au #university of colombo school of computing, colombo 07, sri lanka 3gnw@ucsc.cmb.ac.lk abstract— the agriculture sector plays a vital role in sri lanka’s economy. not having an agricultural knowledge repository that can be easily accessed by people in agriculture community in sri lanka within their own context, is a major problem. as a solution, a large user centred ontology for sri lankan farmers was developed to provide required information/knowledge not only in a structured and complete way, but also in a context-specific manner. since this problem is not only limited to farmers, we extend this for every one working in the agriculture domain. we validate the ontology in terms of accuracy and quality. the online knowledge base based on the ontology with a sparql endpoint was created to share and reuse the domain knowledge that can be queried based on user context. a mobile based application and a web based application were developed to provide information/knowledge by using this ontology. these applications are also used to evaluate the ontology by getting the feedbacks from users to the knowledge in the ontology. it is very difficult to maintain a large complex ontology. to maintain our ontology, we identified various processes that are required to develop and maintain ontology as a collaborative process. a semi-automatic end-toend ontology management system was developed to manage the developed ontology and the knowledge base. it provides the facilities to reuse, share, modify, extendand prune the ontology components as required. the facilities to capture users’ information needs and search domain information in user context are also included. in this paper, we present a summary of the overall development process of the ontology including the end-to-end ontology management system. keywords— agricultural information/knowledge, contextual information, knowledge modeling, ontology, ontology management systems. i. introduction agriculture is an important sector in the sri lankan economy. 31.8% out of the total population in sri lanka engages in agricultural activities [1]. people in agriculture domain, need agricultural information and relevant knowledge to make informed decisions and satisfy their information needs. for example, farmers need information on pest and diseases, control methods, seasonal weather, best varieties or cultivars, seeds, fertilizers and pesticides, etc. to manage their farming activities [2], [3]. other stakeholders of the domain such as agricultural instructors, researchers, information specialist, policy makers, etc. need agricultural information to fulfill their information needs. for example, researchers are interested to know the information about how to solve the problems of pest, symptoms of crop diseases, and usage of fertilizer and pesticides for research purposes. agricultural instructors also need domain-specific information to help farmers in their region. thus, all the stakeholders in the agriculture community need agricultural information relevant to them to make better decisions, do further research, or analyze the information for future needs and predictions. they can get some of this information from multiple sources such as agricultural websites, agriculture department leaflets and mass media, etc. however the information in the above sources is general, incomplete, heterogeneous, and not structured to meet their needs. they require information within the context of their specific needs in a structured and complete manner. such information could make a greater impact on their decision-making process [4]. not having an agricultural knowledge repository that is consistent, well-defined, and provide a representation of the agricultural information and knowledge needed by the farmers within their own context, is a major problem. moreover, this problem is not only limited to the farmers, it effects every one working in the agriculture domain. social life networks for the middle of the pyramid (www.sln4mop.org) is an international collaborative research project aiming to develop a mobile based information system to support livelihood activities of people in developing countries [5]. the research work presented in this paper is part of the social life network project, aiming to provide agricultural information and knowledge to farmers based on their own context in sri lanka using a mobile based information system. this system has now been expanded to include everyone working in the agriculture domain in sri lanka through a development of an end-to-end ontology management system via web based interface. to represent the information in context-specific manner, firstly, we need to identify the users’ context (i.e. users’ context model). since the farmers are the main stakeholders in the agriculture community and other stakeholders are willing to help farmers in various manners, we have identified the users’ context specific to the farmers in sri lanka such as farm environment, types of farmers, farmers’ preferences, and farming stages [6]. the farming stages that we have identified as relating to our application are crop 2 selection, pre-sowing, growing, harvesting, postharvesting, and selling [6]. next we have identified an optimum way to organize the information and knowledge in user context using ontologies. an ontology provides a structured view of domain knowledge and act as a repository of concepts in the domain [7]. the most quoted definition of ontology was proposed by thomas gruber as “an ontology is an explicit specification of a conceptualization” [8]. mainly due to the complex nature of the relationships among various concepts, attenuate the incompleteness of the data, and also add semantics and background knowledge about the domain we have selected a logic based ontological approach to create our knowledge repository. we first developed an ontological approach to represent the necessary agricultural information and relevant knowledge within the user context [6]. using this approach,we designed the ontology to include information needs identified for the first stage of farming life cycle [9]. next we extended the ontology to include events associated with the farming life cycle such as fertilizers, growing problems, and their control methods [10]. a revised and enhanced version of the work including the creation of an online knowledge base and an information retrieval interface has been published in [11]. in this paper we have presented a summary of the overall development process of the user centered ontology and the end-to-end ontology management system with respect to the domain of agriculture in sri lanka. the user centered ontology was implemented using protégé editor (based on owl 2-dl). a web-based ontology management system was developed based on the framework explained in [12]. the remainder of the paper is organized as follows. section 2 summarizes the development process of the ontology. a summary of end-to-end ontology management system is explained in section 3. finally, section 4 concludes the paper and describes the future directions. ii. ontology development process to clearly identify the process of the ontology development to represent the information in user context, this section (section ii) is mainly organized in six (6) categories such as users’ information needs, users’ information needs in context, representation of contextualized information, generalizing design approach, and validation and evaluation process. the framework we identified to maintain the ontology for our application is described separately in section iii. a. users’ information needs first we have extracted domain specific knowledge using the reliable knowledge sources [2], [3], [13]-[17], by interviewing the farmers as well as other stakeholders in the agriculture community. by analyzing the information gathered from various sources, we have identified what information is required by the users in agriculture domain at various stages to support better decisions, problem solving, and other information needs. as a result of this analysis, information important to users was identified in the form of questions. some examples are given in table i. table i users’ information needs users’ information needs what are the suitable crops to grow? what are the best varieties (or cultivars)? what are the best fertilizers for selected crops and in what quantities? when is the appropriate time to apply fertilizer? what are the types of pests or crop diseases? how to solve the problems of pests? what are the symptoms of crop diseases? how to solve crop diseases? which are the most suitable control methods to a particular disease? what are the problems of pesticides? what are the reasons for reduction of yield and/or quality of the specified crop? how to control diseases in an environmentally safe way? what are the best techniques for harvesting? what are the crops cultivated by other farmers and in what quantities? in this study we identified that, farm environment, types of farmers, farmers’ preferences, and farming stages (considered as the user context model) are the important factors that need to be considered when delivering agricultural information and knowledge to farmers [6]. b. users’ information needs in context we identified areas of generic crop knowledge required to answer the users’ information needs (see table i). we have called these broad areas of knowledge as “knowledge modules”. the generic crop knowledge consists of modules such as nursery management, harvesting, post-harvesting, growing problems, control methods, fertilizer, environmental factors, crops and basic characteristics of crops, variety, etc. for example, crop module has information about crops and fertilizer module has fertilizer information and knowledge to handle the fertilizer knowledge needed by domain users. next we identified the relationships among them. the fig. 1 shows the generic crop knowledge module. this modularization also helps us to reduce the complexity of real-world scenario in the application domain. it is very hard to maintain a large ontology. furthermore, this modularization assists us to maintain a large ontology by maintaining small blocks in the knowledge module. fig. 1 generic crop knowledge module we organized the users’ list of information requirements according to the farming life cycle stages (6 stages crop selection, pre-sowing, growing, harvesting, post variety nursery management environmental factors harvesting growing problems growing practices control methods post harvesting symptoms basic characteristics fertilizer crop 3 harvesting, and selling). we begin our detail design process with the first question in the list; “what are the suitable crops to grow?” choosing the best crop for individual situations is difficult since one has to consider many factors such as environmental conditions which can vary based on region and time period, preferences of user, and resources available for them for cultivation. we therefore have reviewed existing literature on crop selection to identify a suitable criterion which can be used to make better decisions. then we summarized the existing criteria and identified a suitable crop selection criterion for our application based on the requirements of agriculture community in sri lanka [11]. it includes the environmental conditions, the special characteristics of a crop, user preferences, about what other farmers grow in different regions and its quantities, and the market information. in a similar way, we identified the criteria for each item in the list of user information requirements. for example, we defined the criteria for applying fertilizers to deliver fertilizer knowledge and for the growing problems and their control methods related to second stage and third stage of the farming life cycle respectively. when applying a fertilizer for a specific crop user needs to know fertilizer quantity and its unit. a fertilizer quantity depends on many factors; especially it depends on the location, water source, soil ph range, time of application, application method, and fertilizer type. in addition to this information; the cost, the land sized required for particular fertilizer, and other special information need to be considered. thus fertilizer quantity needs to be specified in relation to all these information. to do that, we introduced a new information module; fertilizer event to represent this additional information and new relationships to describe this event. more details about the criteria for applying fertilizers and selecting control methods are explained in [10]. a summary of these criterion factors is shown in table ii. table ii summary of the criterion factors for crop selection, fertilizer application and control methods crop selection fertilizer application control method selection environment  soil  location  water supply  season crop characteristics  hardiness, value added products, etc.  length, weight, color, shape, quality, size of the variety  etc. user preferences  high yielding varieties  maturity time and disease resistance  other preferences labor requirement market information other farmers’ information environment  soil  location  water supply time of application  pre-sowing stage  growing stage, etc. application methods  basal dressing  top dressing 1  top dressing 2, etc. user preferences  fertilizer types such as chemical, organic, or biological and its specific sources  farm land size  budget environment  soil  location  water supply farming stage  application stage o before infestation (avoid and prevention) o after infestation (control) user preferences  control method types such as chemical, cultural and biological control methods the next step is formulation of a set of contextualized or personalized information based on the users’ information needs. for this, we had to develop our own approach to formulate the contextualized information. with the help of the domain experts, we first identified the breadth of information required by users. next based on earlier identified user context we identified the conditions we can use to obtain a subset of information that can satisfy a specific information need of users. based on this, we expanded the questions in the user information need list to include the user context. the fig. 2 shows our basis for formulating contextualized information. the formulation of contextualized information for crop selection depends on multiple criteria such as the users’ context, general crop knowledge, crop selection criteria (select a suitable task modeling criterion specific to the question; for example crop selection criterion, fertilizer application criterion, control method selection criterion, and so on) and the users’ constraints (conditions). this serves as a basis for formulating information in a user context for our application. fig. 2 basis for modeling contextualized information some examples of contextualized information related to each category of crop selection, fertilizer applying, and control method selection are given in table iii. we have generic crop knowledge module additional knowledge modules task modelling criteria user context model contextualized or personalized information 4 identified the user constrains based on the each criterion factor. we therefore need to select suitable information based on the different locations, different seasons, different soil factors, different types of control methods, etc. or combination of these constraints that help to make better decisions. we have identified these different constraints related to this application. for example, we identified the location as a zone, agro zone, elevation based location, province, district, and regional area (see fig. 3 (a)). the relationships among these are also complex based on the meaning of these terms. for example, agro zone is a zone, zone is a location, variety is a crop, and the representation of the environmental factor (see fig. 3 (a)). the definitions of the terms also need to be considered to attenuate the incompleteness of the data (see fig. 3 (b)). furthermore, we need to represent semantic meaning of the terms, for example, if magalle (location) belongs to galle (location) and galle belongs to wetzone (location) then magalle belongs to wetzone (see fig. 3 (c)). through this process, we have formulated the contextualized questions covering all constrains relevant to each criteria. we also generalized these questions (see table iii). table iii users’ information needs in context users’ information needs users’ information needs in context generalizing contextualized information stage 1: what are the suitable crops to grow? suitable crops based on the environment: what are the suitable vegetable crops for ‘upcountry’, applicable to the ‘well-drained loamy’ soil, and average rainfall > 2000 mm? suitable crops based on preferences of users: what brinjal’s varieties are good for the ‘bacterial wilt’ disease? suitable crops based on environment, preferences and other information: what is the best brinjal’s variety which is suitable for ‘dryzone’ and high-resistance to the ‘bacterial wilt’ disease? what are the suitable types of crops for specified location (elevation), applicable to the specified soil types/characteristics, and conditions (rainfall or temperature)? what crop’s varieties are good for the specified disease? what is the best crop’s variety which is suitable for specified location (climatic zone) and resistance conditions to the specified disease? stage 2: what are the suitable fertilizers for selected crops and in what quantities? suitable fertilizers based on the environment: what are the suitable fertilizers and in what quantities for farmers in badulla district who cultivate tomatoes? suitable fertilizers based on preferences of users: what are the suitable organic fertilizers which are used to basal dressing for tomato? what are the suitable fertilizers and in what quantities for farmers in specified location (districts) who cultivate specified crops? what are the suitable types of fertilizers which are based on method of application for specified crops? stage 3: which are the most suitable control methods to a particular suitable control methods based on the environment: what are the suitable control methods to what are the suitable control methods for different types disease? control weed for radish which is grown in up country? suitable control methods based on preferences of users: what are the suitable chemical control methods and in what quantities to control damping-off for tomato? suitable control methods based on the farming stages: what are the suitable control methods to control bacterial wilt for brinjal before infestation of the disease? of growing problems to specified crop which are grown in specified location? what are the different types of control methods to specified growing problem of a crop? what is the suitable control method based on the specified farming stages to specified growing problem of a crop? these are the range of questions that we want to obtain answers by organizing agricultural information and knowledge to query in context using ontology. c. representation of contextualized information an ontology provides a structured view of the domain knowledge and act as a repository of concepts in the domain. this structured view is essential to facilitate knowledge sharing, knowledge aggregation, information retrieval, and question answering [7]. mainly due to the complex nature of the relationships among various concepts, attenuate the incompleteness of the data, and also add semantics and background knowledge about the domain (see fig. 3) we have selected a logic based ontological approach to represent the contextualized information/knowledge (in table iii) that can be used to find a response to queries within a specified context in agriculture domain. we reviewed ontology development methodologies and techniques to identify a suitable ontology development approach. grüninger and fox [19] have published a formal approach to design ontology while providing a framework for evaluating the adequacy of the developed ontology. we therefore selected grüninger and fox’s methodology, a logic based approach to develop a user centric ontology for agriculture community. our ontology creation begins with the definition of a set of users’ information needs identified in table i. we take these information needs as the main motivation scenario of our application to provide information in context. competency questions (cqs) determine the scope of the ontology and use to identify the contents of the ontology. the ontology should be able to represent the cqs using its terminologies, axioms and definitions. then, a knowledge base based on the ontology can provide answers to these questions [19]. therefore, formulation of the cqs is a very important step because these questions guide the development of the ontology. in our application, the contextualized information (see table iii) has been used as the cqs to develop the ontology because it satisfies the expressiveness and reasoning requirements of the ontology (see fig. 3). the different constraints in the domain are represented using owl-2 dl (see fig. 3). fig. 3 (a) represents the semantic meaning of the concepts using the class hierarchies. 5 the sub concepts inherits the properties of the parent concepts and then instances of the sub concept act as the instances of the super concept, because of the taxonomic hierarchy (is-a relationship). the definition of the concept, for example dryzone is represented in fig. 3 (b). the instances need to be classified based on these definitions. the reasoner attached to the protégé tool can be used for this classification. by using the transitive property, the relation belongsto with respect to the instances of the location concept is defined and shown in fig. 3 (c). based on the existing information, the additional knowledge can be inferred using the composition of relations (e.g. the relation grandfatherof is composed by the relations fatherof and parentof). we used this property to infer the additional knowledge (see fig. 3 (d)). the object property chain in protégé tool is used for this representation. (a) class hierarchies the concept definition of the dryzone: x (zone(x) (y, z  nonnegativeinteger  hasmaxmumrainfall (x, y) (y <= 1750) hasminimumrainfall (x, z) (z >= 0) ) ↔ dryzone(x)) this definition is represented in protégé implementation (see below): (b) class definition the instances of the location concept are related as follows:  x,y,z  location: ( x belongsto y and y belongsto z)  x belongsto z for example, if galle belongs to wet zone and wet zone belongs to low country then the galle belongs to low country. this semantic can be represented in owl: (c) transitive property fig. 3 representation of different constraints representation of enviornmentalfactor: union of a set of mutually-disjoint classes (exhaustive partition) variety is a crop semantic representation of user location based on the existing information the additional knowledge can be inferred. for example; if crop has growingproblemevent and growingproblemevent has growingproblem then can infer the crop is affected by this growing problems (object property chain in protégé was used to represent this): hasgrowingproblemevent o hasgrowingproblem  isaffectedby if growingproblem is growingproblem of growingproblemevent and growingproblemevent has related controlmethod then can infer the growingproblem is controlled by this controlmethod: isgrowingproblemof o hasrelatedcontrolmethod  iscontrolledby (d) composition of relations fig. 4 shows the fertilizer event represented using cmap tool. the cmap (concept map) tool is used to view the graphical representation of the ontology for better user understanding [18]. the details of modeling the events associated with second and third stages of the farming life cycle and the associated challenges are explained in [10]. the implemented ontology using protégé is available at http://www.sln4mop.org/ontologies/2014/sln_ontology. it consists of 90 concepts, 205 object properties, and 45 data properties. currently it has 23 vegetable crops, 10 fertilizers, 19 growing problems, and 30 control methods. the more details of the ontology development are explained in [11]. fig. 4 fertilizerevent concept d. generalizing approach we have generalized the specific approach that was developed to create the user centered ontology for social life networks. the fig. 5 shows this generalized approach. according to this approach, we first identify a set of questions (users’ information needs) that reflect various motivation scenarios. next we create a model to represent information in user context. then we derive the contextualized information incorporating user context and task modeling with generic knowledge module. we refer to this contextualized information (refer table iii.) as the informal cqs. these cqs are used to identify the ontology components according to the grüninger and fox’s methodology to develop the ontology. using this framework, we can extend the ontology for different scenario problems. for example, when answering scenario question like “how to control the growing problems such as diseases, weeds, or pests in environmentally safe manner?” we need to take into account suitable criteria for selecting control methods and the users’ context with respect to each criterion factor. we can then formulate the contextualized information based on this systematic approach. these questions drive the development of the ontology. by doing so the contextual information/knowledge can be represented by satisfying the user needs. fig. 5 ontology design framework domain knowledge from reliable knowledge sources and outcomes of interviews with domain users and domain experts users’ information needs (formally referred to as motivation scenarios) generic crop knowledge (in agriculture domain) user context model task criteria modeling contextualized (or personalized) information (formally referred to as competency questions) ontolog y classification axioms main ontology components specialization and generalization 7 e. validation and evaluation process it is very important to check the validity of the ontology. in this study, the correctness of the contents and correctness of the construction of the ontology have been validated. the content correctness depends on definitions of concepts, relationships between concepts, hierarchical structures, concept properties, and information constraints of the ontology. the delphi method is a research technique that is used to obtain the responses to a problem from a group of domain experts [20]. we selected the delphi method to obtain expert advice and responses to check the definitions of concepts, relationships, and data properties; and hierarchical structures. the validation process is mainly done by agricultural experts from different agricultural institute using questionnaires base on the delphi method. they verify the correctness, relevancy, and consistency of the ontology components and a set of predefined criteria. the modified delphi method can be adapted to use in face-to-face group meetings, allowing group discussions [21]. since we need to make more dialogues and collaboration among the participants in the delphi group we arranged a discussion based on the modified delphi method. for this discussion eleven (11) agricultural instructors (ais) gathered at lunama govi jana seva center, ambalanthota. the main aim of the discussion was to check the criteria relevant to the fertilizer application, growing problems and control methods, etc. the delphi investigator (one of the authors of this paper) explained the problems in details to get experts’ knowledge. investigator also allowed them to discuss the problems and possible solutions. based on their responses, comments, and suggestions we make judgments for the design criteria and assumptions we made during the design process. the contents of the ontology have been refined based on domain experts’ feedbacks and comments. one approach for checking the correctness of the construction is to analyze whether the ontology contain anomalies or pitfalls [22]. we first identified the common pitfalls before the implementation. next we identified the types of ontology design patterns (odps) that helps to avoid the pitfalls by means of adapting or combining existing odps [22]. design patterns are shared guidelines that help to solve design problems, for example semantic web best practices and development under w3c [23]. we also used the webbased tool called oops! [22] to detect potential pitfalls in the ontology. using above methods we validated the ontology in terms of accuracy and quality. the implemented ontology using protégé is used to evaluate the ontological commitments internally and also used to test the consistency and inferences using reasoners. we used the cqs to evaluate the ontological commitments to see whether the ontology meets the users’ requirements using description logic (dl) queries and sparql queries [11]. next we checked the user satisfaction to the information/knowledge in the ontology. we used a mobile based application for this evaluation. a mobile based application was developed to provide information by using this ontology [24]. the first evaluation was done only for crop selection with a group of 32 farmers in sri lanka [24]. we have gathered suggestions from farmers and other stakeholders of the domain for our future designs. the knowledge base based on the ontology was created by populating the ontology with instances to share and reuse the agricultural information via the web [11]. the online knowledge base can also be used for evaluation process. we can query the contextualized information on the web via this application (sparql endpoint) using sparql queries (refer http://webe2.scem.uws.edu.au/arc2/select.php). this application specially is useful for agricultural instructors, researchers, and people at the department of agriculture to find information based on their needs. for example, the following sparql query lists the suitable environmentally safe control methods to control bacterial wilt disease for brinjal crop? we evaluated the knowledge represented in the ontology by evaluating outputs of the queries. the output of the following query is shown in fig. 6. prefix sln: select distinct ?controlmethods where { {?p sln:iscontrolmethodeventofcrop sln:brinjal. } { ?p sln:hasrelatedgrowingproblem sln:bacterial_wilt .} { ?controlmethods sln:iscontrolmethodof ?p } { ?controlmethods sln:hascontrolmethodtype "cultural"^^xsd:string } } limit 250 controlmethods use_resistance_varieties_for_bacterial_wilt deep_drain_to_facilitate_drainage crop_rotation_with_non_solanaceos_crops fig. 6 the output of the above query iii. ontology management system (oms) if a developed ontology is not up-to-date or the annotation of knowledge resources is inconsistent, redundant or incomplete, then the reliability, accuracy, and effectiveness of the ontology based systems decrease significantly [25]. ontology building is a significant challenge for a number of reasons, for example it takes a considerable amount of time and effort to construct an ontology, it requires a sophisticated understanding of the subject domain, and also it is even greater challenge if the ontology developer or engineer is not familiar with the domain of interest. due to the increase in volume of information, capturing the information, maintaining it and making it usable is a challenge. therefore it is very important to be able to practically maintain a developed ontology by updating the content of the ontology in a timely manner, for example, extending the ontological structure by improving coverage and modifying the instances (individuals) in the knowledge base. after developing the ontology we had to devise a method to maintain it. a community based facility to manage the structure of the developed ontology in the long term as well as further populate the knowledge base is very useful. for this we have developed an end-to-end semi-automatic collaborative ontology management system for large-scale development and maintenance purposes by giving facilities to 8 reuse, modify, extend, and prune the ontology components as required. it also has facilities to capture users’ information needs in their context, as well as search domain information in user context. we use a web based application to deploy the proposed framework. with the help of this web based ontology management system, the people with little knowledge about the ontology can help to modify the ontology, and use the ontological information and knowledge for their needs. fig. 7 a framework for end-to-end ontology management system the fig. 7 shows the proposed framework for an end-toend ontology management system. the full details of the design of the framework and development of the end-to-end ontology management system based on the framework is explained in [12]. in this paper we briefly present the processes belonging to this framework. this framework mainly has four processes such as populate the ontology, modify the ontology, search domain information in context, and capture users’ information needs and related users’ context for community based ontology development and maintenance. this framework provides the essential facilities to manage the ontology life cycle by supporting the identified processes. each process is briefly mentioned below. a. populate the ontology using this process we can get the support from the agriculture community to fully populate the knowledge base in the long term. to populate, we specially get the involvement of the people in the domain, for example, domain experts such as agricultural instructors, information specialist and researchers in agriculture community. to fully populate the ontology with the real data, we develop a semiautomated system to capture this information using web based application. for that we have used a framework called “cbeads”: component based ebusiness application development and deployment shell [26] as a data capturing application. this framework which is created using php and mysql has the potential to evolve with changing requirements. more details related to each process can be found in [12]. the form as shown in fig. 8 is used to gather required data using the cbead application. fig. 8 data gathering interface for crop variety 9 b. modify the ontology this process helps us to extend and prune the ontology based on the changing and/or expanding user requirements and related user contexts. this process can be performed by agriculture domain experts and ontology developers. since the process to modify the structure of the ontology is complex we need to mange this process carefully. this has three processes; insertion, deletion, and updating (change). each process has three main activities. for example in insertion process it needs to consider inserting concepts, inserting data properties and inserting object properties. in the same manner these activities can be seen in deletion and updating processes. in this model, seven steps have been proposed to modify the ontology such as view the ontology structure (initial structure) represented in cmap tool; extract domain terms, concepts, and basic hierarchies using text-toonto tool; view ontology design framework used to represent the information in user context; based on the design framework modify the structure using metadata (metadata provides the information to users how to modify the structure of the ontology, for example, how to insert the concepts, how to delete the data properties or object properties, etc.); validate the modified content using web forms; convert modified content into rdf or owl format; and finally import modified content into initial ontology for information integration. the way of modifying the ontology related to this application is outside the scope of this paper and it is explain in [12]. c. search domain information in context to get the benefits from the knowledge base for all the stakeholders in the community by finding the right information based on their context we have included the process “search domain information in context”. through this system we provide two facilities. especially normal users such as farmers can view the domain information in their context and other stakeholders in the domain especially agricultural instructors and researchers can retrieve domain information and knowledge based on their interest. for the farmers, we have provided specific answers to their questions in their context using a natural language (in english, sinhala, and tamil). fig. 9 shows a user friendly interface for searching information. fig. 9 interface for searching information in context d. capture user information needs in context the process “capture the user information needs and related user context” collects the information required to extend the ontology further. since to get the benefits to a broad audience is even more challenging task, this collaborative end-to-end ontology management system via web based interface has now been expanded to include their requirements in context. then we can extend our ontology with the different motivation scenarios that provide even richer knowledge environment to support the agriculture community. since this is a collaborative approach, the system mostly relies on the users of the domain, their participation to the system, and developers’ and administrators’ skills in overseeing the collaborative processes. in our system (refer http://webe2.scem.uws.edu.au/oms/index.php), there are three main user categories (e.g. domain experts, normal users and ontology developers) with different access rights. fig. 10 shows the home page of the oms (in english language). fig. 10 ontology management system (oms) the domain experts and ontology developers need to be logged-in to the system for populating and modifying the ontology. domain experts and ontology developers can change or extend the ontology by getting the requirements and user constraints from the system. there are processes to capture user information needs and related user context from the users to represent domain information in context. domain experts also involve populating the ontology by capturing instance values through the forms. through this system all the stakeholders of the community can search information by viewing user friendly interfaces (for the normal users such as farmers) and/or querying the sparql endpoint in context (for the advanced users). we have developed this web based application in english but the english is not the official language of sri lanka. sri lankan people mainly use their native languages such as sinhala and tamil. we therefore give the facility to use this application in their native languages. fig. 11 shows the overall development process of a community based ontology by summarizing above two sections (ii and iii). this is an iterative process. based on the results and feedbacks of the validation and evaluation processes the design of the ontology is refined using the design framework shown in fig. 5. then the ontology can be maintained using the web based ontology management system based on the framework represented in fig. 7. 10 fig. 11 overall development process of the ontology iv. conclusions agriculture is the most important sector in sri lankan economy. the people in agriculture domain in sri lanka need agricultural information and relevant knowledge to make optimal decisions for successful farming and/or do research for development of the agriculture sector and enhancement of the farming industry. since not having agricultural knowledge repositories that can be easily accessed by people in agriculture community within their own context is a major problem, a user centric knowledge environment has been developed as a solution. through this study, we first identified the user context model related to the farmers in sri lanka. next we developed a logic based ontological approach to meet the information needs to suite the identified context. we have achieved this by modifying how contextualized information is formulated in a well-established methodology. this article presents a summary of the overall ontology development process to organize domain knowledge by meeting particular access requirements effectively using the guidelines shown in fig. 11. we validated the ontology in terms of accuracy and quality by using delphi and modified delphi methods; a web-based tool; and odps. we evaluated the ontology against the user requirements by using mobile based and web based applications. the online knowledge base with a sparql end-point to share and reuse the domain knowledge was created. to fully populate the knowledge base as well as modify the ontology by extending coverage of the domain we developed a semi-automatic endto-end ontology management system that help us to develop and manage complex real-world application based ontologies in the long term as a collaborative process. therefore this oms is a community activity. we received very valuable feedbacks from the domain experts during the group discussions in the modified delphi method as well as from and the field trials. based on these feedbacks we are now refining our application. acknowledgment we acknowledge the financial assistance provided to carry out this research work by the hrd program of the hetc project of the ministry of higher education, sri lanka (ruh/o-sci/n2) and the valuable assistance from other researchers working on the social life network project. assistance from the national science foundation (ntrp/2012/fs/pg-01/p-02) to carry out the field visits is also acknowledged. we would also like to convey our gratitude for farmers who took their valuable time to share their ideas to clearly identify the user information needs, and also the agricultural experts from different institutes who gave us valuable suggestions/feedbacks by taking part of the delphi method and modified delphi method; and also other activities in the ontology management system. references [1] field listing :: labor force by occupation, available: https://www.cia.gov/library/publications/the-worldfactbook/fields/2048.html. accessed 2015. [2] l. n. c. de silva, j. s. goonetillake, g. n. wikramanayake, and a. ginige, “towards using ict to enhance flow of information to aid farmer sustainability in sri lanka,” in proc. 23rd australasian conference on information systems (acis), 2012, p. 1-10. [3] s. lokanathan, and n. kapugama, “smallholders and microenterprises in agriculture: information needs and communication patterns,” in proc. lirneasia, 2012. [4] c. j. glendenning, s. babu, and k. asenso-okyere, “review of agriculture extension in india are farmers' information needs being met?,” international food policy research institute, 2010. [5] a. ginige. social life networks for the middle of the pyramid. available: http://www.sln4mop.org//index.php/sln/articles/index/1/3. accessed 2013. [6] a. i. walisadeera, g. n. wikramanayake, and a. ginige, ed., an ontological approach to meet information needs of farmers in sri lanka, ser. lecture notes in computer science. berlin heidelberg: springer-verlag, 2013, vol. 7971. [7] t. r. gruber, “toward principles for the design of ontologies used for knowledge sharing,” international journal of human-computer studies, vol. 43, pp. 907-928, 1995. [8] t. r. gruber, “a translation approach to portable ontology specifications,” knowledge system laboratory, stanford university, california, ksl tech. rep. 92-71, 1993. [9] a. i. walisadeera, g. n. wikramanayake, and a. ginige, “designing a farmer centred ontology for social life network,” in proc. 2nd international conference on data management technologies and applications (data 2013), 2013. [10] a. i. walisadeera, a. ginige, and g. n. wikramanayake, ed., conceptualizing crop life cycle events to create a user centered ontology for farmers, ser. lecture notes in computer science. switzerland: springer, 2014, vol.8583. [11] a. i. walisadeera, a. ginige, and g. n. wikramanayake, “user centered ontology for sri lankan farmers,” (in press, corrected proof), ecological informatics, elsevier b.v., netherlands, issn: 1574-9541, doi: http://dx.doi.org/10.1016/j.ecoinf.2014.07.008. [12] a. i. walisadeera, a. ginige, g. n. wikramanayake, a. l. p. madushanka, and a. a. s. udeshini, “a framework for end-to-end ontology management system,” (submitted), 2015. [13] department of agriculture, sri lanka. available: http://www.agridept.gov.lk. accessed 2013. [14] navagoviya, cic (private sector). available: http://www.navagoviya.org. accessed 2013. [15] s. a. narula, and n. nainwal, “ icts and agricultural supply chains -opportunities and strategies for successful implementation information technology in developing countries,” in: a newsletter of the international federation for information processing (ifip) ontology design: ontology design framework (fig. 5) implementation: language: web ontology language (owl) tool: protégé validation: correctness of content – delphi method and modified delphi method correctness of construction – odps and oops! tool evaluation: mobile based application web based application (using question answering in natural languages) online knowledge base (sparql endpoint) maintenance: web based ontology management system (end-to-end oms design framework in fig. 7) refine the ontology based on the feedbacks and comments from the validation and evaluation process https://www.cia.gov/library/publications/the-world-factbook/fields/2048.html https://www.cia.gov/library/publications/the-world-factbook/fields/2048.html http://www.sln4mop.org/index.php/sln/articles/index/1/3 http://www.agridept.gov.lk/ http://www.navagoviya.org/ 11 working group 9.4 centre for electronic governance, indian institute of management, ahmedabad, vol. 20, pp. 24–28, 2010. [16] a. kawtrakul, “ontology engineering and knowledge services for agriculture domain,” journal of integrative agriculture, vol. 11, pp. 741-751, 2012. [17] “fertilizer use by crop in india,” in. food and agricultural organization of united nations (fao), rome, 2005. [18] j. novak and a. cañas, “the theory underlying concept maps and how to construct them,” technical report ihmc cmaptools 200601, florida institute for human and machine cognition. http://cmap.ihmc.us/publications/researchpapers/theoryunderlyingc onceptmaps.pdf. 2006. [19] m. grüninger, and m. s. fox, “methodology for the design and evaluation of ontologies,” in proc. workshop on basic ontological issues in knowledge sharing (ijcai-95), 1995. [20] m. mattingley-scott. delphi method. available: http://www.12manage.com/methods_helmer_delphi_method.html. accessed 2014. [21] “nominal group technique1,” handout: the skilled group leader; available from: http://www2.ca.uky.edu/agpsd/nominal.pdf. accessed 2015. [22] m. poveda, m. c. suárez-figueroa, and a. gómez-pérez, ed., common pitfalls in ontology development, ser. lecture notes in artificial intelligence. berlin heidelberg: springer-verlag, 2010, vol. 5988. [23] semantic web best practices and development. w3c recommendation, available: http://www.w3.org/2001/sw/bestpractices/. accessed 2013. [24] l. n. c. de silva, j. s. goonetillake, g. n. wikramanayake, and a. ginige, ed., farmer response towards the initial agriculture information dissemination mobile prototype, ser. lecture notes in computer science. berlin heidelberg: springer-verlag, 2013, vol. 7971. [25] a. stojanovic, b. maedche, l. motik, and n. stojanovic, “user-driven ontology evolution management,” proc. 13th european conference on knowledge engineering and knowledge management ekaw’02, madrid, 2002. [26] a. ginige, and b. de silva, ed., a framework to support meta-design paradigm, lecture notes in computer science, part i. pp. 107–116. springer, heidelberg, 2007. http://www.12manage.com/methods_helmer_delphi_method.html http://www.w3.org/2001/sw/bestpractices/ ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2021 14(2): march 2021 international journal on advances in ict for emerging regions user-controlled subflow selection in mptcp: a case study kshithija liyanage1*, tharindu wijethilake#2*, kasun gunawardana3*, kenneth thilakarathne4*, primal wijesekera5ϯ, chamath keppitiyagama6* abstract— it is common to find multiple network interfaces connected to different internet service providers (isps) in devices such as smartphones. multipath tcp (mptcp) enables tcp connections to use all these network interfaces in a single tcp connection in an application transparent manner. mptcp schedules traffic of one tcp connection over subflows created over these network interfaces. it is evident that this requires some scheduling policy. there have been some attempts to allow applications to decide on the scheduling policy. however, this violates the application transparency of mptcp, and applications do not have all the information required to decide on such a policy. in addition, this allows the applications to monopolize the network connection thus posing a security threat as well. we argue that only the owner of the device (the user) has the right to make that policy decision and only the user can make an informed decision on the scheduling policy. for example, the user has the information on the monetary cost of the connections through different interfaces. in this paper we present a mechanism that allows the user to provide hints to the tcp scheduler to alter its scheduling policy. while this is not a mechanism to implement generic scheduling policies, it demonstrates how a user can guide the scheduling policies. as a proof of the concept, we demonstrate how mptcp scheduler can be influenced to select a less stable and lossy path over a stable path based on a user preference. keywords— multipath tcp, mptcp subflow, mptcp scheduler i. introduction onsider a smartphone with two network interfaces; cellular and wifi. this device is multihomed i.e it is connected to the internet through two different isps via the two network interfaces. it has the potential to reach a destination via the two isps utilizing the two network interfaces thus increasing the available bandwidth and the path redundancy. despite having multiple interfaces and network connections, communication of a particular application may be limited to a single network interface or connection due to the limitations imposed by underlying communication protocols and their implementations. for example, a tcp connection opened by an application can use only a single endpoint since a tcp connection is tied to a single ip address at one end. there is a number of techniques to overcome these limitations and reap benefits of having multiple network interfaces. for example, assume that a device with two network interfaces. in such a device, it is possible for an application to be aware of both the interfaces and open two tcp connections to a destination through those interfaces. the application itself can schedule traffic over these two connections according to a policy. however, such a solution is neither scalable nor portable. link aggregation or channel bonding can be used to increase the bandwidth of a particular communication [1]. channel bonding/aggregation techniques combine several network interfaces/connections in order to increase the bandwidth where the implementation may be through hardware, software or both. the traffic division of channel bonding will happen at the network layer or data link layer with respect to the osi model, depending on the implementation i.e. if it is an ethernet connection, the switch and the operating system of the host machine have to be configured to use channel bonding. however, in the above mentioned scenario channel bonding does not provide a solution. since the interfaces are connected to two different isps it is impossible to bond the two interfaces at network or data link layers. therefore, there have been a number of attempts in finding a solution at the transport layer and consequently mptcp has been introduced[4]. transport layer protocol (tcp) is confined to a single network interface in traditional tcp/ip implementation. however, mptcp supports establishing tcp connections over multiple network interfaces. it is an extension that has been proposed by ietf to enable the transport layer to utilize multiple network interfaces available in a device and improve the reliability of communication for its applications by enabling multiple redundant paths [4]. moreover, the proposed extension also maintains the abstraction provided by the transport layer to the upper layers/applications through a mechanism that aggregates multiple tcp connections. therefore, applications are not aware of multiple tcp connections. thus applications need no modifications to use mptcp. currently mptcp is available for linux operating systems which should be installed and configured deliberately. apple ios uses mptcp for their voice assistant ‘siri’[17]. apple ios is claimed to be the first mobile operating system which implements mptcp [18]. further, apple states that if an application needs to use its mptcp, then the application needs to be written in support of using mptcp [19] which implies that the particular application on apple mobile platform is aware of mptcp. i.e. transport layer protocols are made visible at the application layer to a certain extent to have a better control over mptcp in catering specific application requirements. c correspondence: t. wijethilak#2 (e-mail: tnb@ucsc.cmb.ac.lk) received: 14-01-2-2021 revised:16-03-2021 accepted: 17-03-2021 this paper is an extended version of the paper “priority based subflow selection in mptcp: a case study” presented at the icter 2020 conference. k. liyanage1*, t. wijethilake#2*, k. gunawardana3*, k.thilakarathne4*, and c. keppitiyagama6* are from university of colombo school of computing (ucsc). (kshithijal@gmail.com, tnb@ucsc.cmb.ac.lk, kgg@ucsc.cmb.ac.lk, kmt@ucsc.cmb.ac.lk, chamath@ucsc.cmb.ac.lk) p. wijesekera5ϯ is from university of california, berkeley & international computer science institute, usa. (primal@cs.berkeley.edu) doi: http://doi.org/10.4038/icter.v14i2.7225 © 2021 international journal on advances in ict for emerging regions user-controlled subflow selection in mptcp: a case study 2 international journal on advances in ict for emerging regions march 2021 the emergence of mptcp inclines the transport layer’s involvement in end-to-end routing decisions by allowing it to have one or more communication paths, subflows. however, mptcp has to make these decisions based on the information available for traditional tcp. this makes a proclivity for mptcp to select more stable communication paths, resulting from inheriting tcp design, despite having more viable alternative paths with a less opportunistic cost. we believe that enabling/assisting such decision-making ability at the transport layer would help mptcp attain its objectives. we envisioned the benefits of users being able to take part in decision making with respect to the mptcp path selection by providing their preference. we make a clear distinction between the user and applications. in this work user refers to the person who owns or controls the device and uses the applications. for example, in a circumstance where an ad-hoc network and a 4g mobile data connection are available for the user to communicate, the user would have chosen to connect through the ad-hoc network if the user is more concerned about the cost rather than the delay in communication. only the user, not the individual applications on the device, has the information to make such a decision. hence it is crucial to get user input in scheduling traffic over multiple network interfaces. however, current gnu/linux mptcp implementation does not have provisions for such information propagation from user to the mptcp scheduler in the kernel. wijethilake et al. [8] and neira-ayuso et al. [7] discuss techniques of passing information from user space to kernel space by modifying the socket implementation which requires modification to the existing applications. however, we believe that the user should be involved in the path selection decision where applications should not be affected by the changes made at the lower layers of the network stack. therefore, an alternative way of passing a user provided hint from the user space to the kernel space, bypassing the application, was studied by liyanage et al. [25]. in this study, we further explore that approach and, a novel mptcp scheduling algorithm that takes user hints to decide and prioritize subflows. ii. background mobile device usage has been continuously growing and, as of the year 2019, there were 4.68 billion [4] mobile users around the globe. nowadays, most mobile devices are equipped with multiple communication interfaces, such as wifi, cellular, and bluetooth. generally, wifi communication is cheaper than cellular communication. due to mobile devices' widespread usage, building an adhoc network and establishing a communication path is possible. using an ad-hoc network can make communication more cost-effective, evading expensive communication via isps. however, this introduces several challenges, such as prioritizing network packets' flow through the ad-hoc connection by using the mptcp and giving the authority to choose when to prioritize the ad-hoc connection to the user's mobile device. thus, this study aims to combine the infrastructure and ad-hoc network connections to prioritize the flow of network packets through the ad-hoc network and obtain the user's hint in deciding when to prioritize the ad-hoc connection. a. multipath tcp mptcp uses a tcp subflow per interface to establish an mptcp connection between two hosts. the individual mptcp subflows work as independent tcp connections by maintaining their own data structures to attain the reliability of tcp. mptcp maintains additional data synchronization mechanisms to integrate data pertaining to a particular mptcp connection that may be received out of order from multiple subflows [4]. in order to create a mptcp communication, both the hosts must be configured with mptcp. otherwise it will use the traditional tcp for the communication. when establishing mptcp connections over two hosts, it first initiates the connection via one of the available network interfaces. it is called as the first subflow in the context of mptcp terminology. if both the hosts are compatible with mptcp, the hosts can initiate another subflow via a second network interface available in the device. as an example, a mobile phone that has two network interfaces (i.e. wifi interface and a cellular interface) can initiate the first subflow using the wifi interface when it establishes mptcp connection with a server. this connection will examine whether both the devices are compatible with mptcp. if the mobile phone and the server both are compatible, it will initiate the second subflow via the cellular interface. a client application running on the mobile phone will assume a single network connection. but underneath it has two different tcp connections via two network interfaces which probably operate over two different isps as shown in figure 1. fig. 1 mobile phone connecting to a server using mptcp via two isps a vital attribute of mptcp is its backward compatibility. if one of the hosts of mptcp connection is incompatible with mptcp, it will downgrade the connection to traditional tcp. even though mptcp achieves this level of flexibility, it does not change the existing headers of traditional tcp. therefore, to communicate the control signals related to mptcp, it uses the ‘options’ field of the tcp header segment. one of the challenges faced when using mptcp in practice is that the mptcp traffic could get dropped in the middle, such as a firewall or intrusion prevention system, if those are not aware of mptcp. when initiating the connection between two hosts using the mptcp, it used the same three-way handshake used as traditional tcp. it will send syn, syn/ack and ack messages to and from the hosts and establish the connection. however, to negotiate the mptcp communication, it uses the set of mptcp options. the mptcp options will reside inside the “options” field of the tcp header sections and exchanged in the initial three-way handshake. in the first 3 k. liyanage1*, t. wijethilake#2*, k. gunawardana3*, k.thilakarathne4*, p. wijesekera5ϯ, c. keppitiyagama6* march 2021 international journal on advances in ict for emerging regions sub-flow, mptcp will send the mp_capable option within the syn message to the host which it intends to communicate with (figure 2). when this message is received at the recipient and if the recipient is configured with mptcp, it will reply to the message with syn/ack message including the mp_capable. if not it will reply to the syn without including the mp_capable option. if the reply contains the mp_capable option, the initiator will recognize that the recipient is also compatible with mptcp. when the reply contains the traditional tcp syn/ack without mp_capable option, it implies that the recipient is not configured with mptcp. with the mp_capable option, mptcp will negotiate a set of keys which will be used to authenticate the next subflows it will generate. after establishing the first subflow, mptcp will use other network interfaces to create the consecutive subflows. again the tcp three-way handshake will happen, but with the mp_join option included in the tcp “options” field. mp_join will use the keys negotiated in the first subflow using mp_capable to authenticate the newly created subflow with the destination host. if the host agrees to create the next subflow it will reply to the syn message with the mp_join options with the key materials to confirm the connection. this mechanism is used to join any number of subflows to the mptcp connection. fig. 2 mptcp three-way handshake which uses mp_capable option in the first subflow and mp_join option in the next subflow. mptcp uses two methods in scheduling packets at the initial stage of its connectivity 1. all the network packets are duplicated among the available subflows to improve redundancy. 2. segment and schedule the packets to the subflows in the round-robin method to improve overall performance and bandwidth [20]. however, further studies present more advanced schedulers and congestion controlling mechanisms for mptcp [21]. though the schedulers and congestion control mechanisms currently available in gnu/linux systems, mptcp schedules the packets to the subflows without any user or application intervention and is purely based on the information available through regular congestion control implementations at transport layer with respect to each subflow. on the other hand, frömmgen et al. and others, have shown the limitations of current schedulers of mptcp and they suggested having an application aware mptcp scheduler, which ultimately gives mptcp more benefits than the throughput optimization [16]. the study further mentions that the applications might have different preferences or requirements, not only the throughput. for such cases, they have introduced a programming model which can be used to create and deploy, application and preference aware schedulers for mptcp. however, in our study, we argued that, as the primary stakeholder of the system, the ability to make the decision about the path selection and scheduling has to be with the user of the particular device, which we consider as the user policy, rather than the application. most of the users who use devices are not technically capable of creating their own schedules or configuring the pluggable schedulers by using sysctl commands. therefore when it comes to the end user, it has to be more user friendly for the end users of the devices to select or configure their user policy. for example, an user interface with a toggle to priorities the subflow which is connected to a particular service provider that may provide higher bandwidth or less charges. therefore, we believe that it is prudent to base the decision on prioritizing or selecting a communication path via a particular interface on a user policy and the information that the transport layer can discover/learn. as mentioned earlier, the user policy may be a composition of different factors such as risk, opportunistic monetary cost, throughput, and some other affinity towards a particular path. further, these factors may be computed or taken directly as an input from the user because, in any information system, the user plays a vital role as a stakeholder. however, existing tcp/ip specification does not have provisions to capture such information or propagate such information to the transport layer. further, making such enhancement to the protocol without compromising existing systems and applications would be a challenging task. in one of the early works, wijesekera et al. suggested comonet as a user transparent way of switching a conventional call between a costly mobile provider or a free community-driven ad-hoc network [3]. the switching between connections occurs at the application layer. however, mptcp provides a less costly way to switch at the kernel level. use of heterogeneous paths (subflows) with mptcp made researchers explore finding the optimal path based on its network performance. thus, ample research has been conducted on finding the optimal path based on the path attributes [12][14]. further, a number of researches have been carried out to discover application/context aware path selection [11] [13]. despite these findings, we believe that the user’s preference is also a vital factor in deciding the path, as the user possesses additional knowledge such as cost-effectiveness, reliability, etc. which is unavailable and not considered for current mptcp path selection. user-controlled subflow selection in mptcp: a case study 4 international journal on advances in ict for emerging regions march 2021 b. mptcp scheduler selection of a subflow and segmentation/integration of data into/from multiple subflows is done by mptcp scheduler (figure 3). the availability of the subflows for selection is provided by the path management component of the mptcp. the mptcp scheduler considers a number of factors such as, the tcp subflows’ state (active or not), the congestion window of the subflow, the round trip time (rtt) estimation, etc. in making the subflow selection decision. a combination of these factors is considered as a subflow scheduling policy. currently, there are four such scheduler policies in gnu/linux mptcp implementation, namely; mptcp blest, mptcp round-robin, mptcp redundant, and default mptcp scheduler [4]. however, the scheduling policy in gnu/linux can easily be changed to cater different requirements [15]. fig. 3 mptcp scheduler according to the study of frömmgen et al., there may be different bandwidth requirements for the applications and some of the applications are very sensitive to the latency. in such cases the packets have to be scheduled based on the requirement of the application. therefore they have highlighted the need of intelligent schedulers to mptcp which can cater the need of these applications. with the implementation, they have provided a python library for the user space application which is going to cover all the complexities of network sockets and scheduling. therefore, the application developer can load their scheduling specification for the application by using the programming model provided. the implementation has several layers of functionalities which are used to optimize the compilation of the scheduler and interpretation. finally they have a runtime environment in the gnu/linux kernel to execute the application defined scheduler. the real argument to make at this point is, what is the most suitable position to make the subflow selection decision. is it at the mptcp level, application level or at the user level. in the original mptcp the subflow will be selected based on factors like tcp subflows’ state and the congestion window of the subflow as we mentioned earlier. mptcp has most of the information which is necessary to make the scheduling decision. in the proposed mechanism of frömmgen et al, the decision making authority for scheduling has been given to the application. the third option is the user level. in this case the user can consider external factors when making the subflow decision, such as the cost for each connection on subflows, privacy issues, and the requirement of the user. therefore when considering these three different levels, there are pros and cons to discuss. when making the subflow selection and scheduling in the mptcp level, the user and the application is totally blind about the path selection and the scheduling process. users and the application does not even have any clue whether it is using mptcp to communicate or whether it is using traditional tcp. all the hard work related to the scheduling is handled by the mptcp. this is an advantage for the user as well as for the application. such that the application nor the user does not need to worry about selecting the best path for scheduling the packets. but the problem is, does the user or application need to interfere with the scheduling process. in some cases, the user might have some requirements to allow or restrict sending packets via specific connections due to monetary cost or even privacy factors. in this study, we incorporate a new policy to get the user’s preference to assign different priorities for mptcp subflows. c. mptcp recovery from packet losses in tcp, there are three mechanisms available to recover from packet loss; retransmission timeouts (rto), fast recovery (fr), and tcp loss probe (tlp) [5]. mptcp also inherits these mechanisms. if the mptcp handles the recovery, the lost packets are retransmitted through an available subflow according to the scheduler's policy at that particular time. if the tcp handled the recovery, packets would be retransmitted through the same subflow as before. gnu/linux's implementation of mptcp uses heuristics to decide whether the retransmission is fr based or rto based. in fr, the segments use the same subflow as previous communication, wherein rto, the scheduler, re-evaluates the packet transmission and would use a different subflow for recovery [5]. an ad-hoc network is composed of nodes that are mobile and sparsely connected to create a communication network. therefore, ad-hoc networks are intrinsically dynamic in their routing. as a result, they have significant delays in packet transmission which could result in frequent retransmissions [2]. therefore, employing an ad-hoc network as a subflow of mptcp would make mptcp select the more stable alternative subflows such as the flow over a stable fixed link [4]. gnu/linux kernel supports both ad-hoc networking and mptcp. therefore, employing an ad-hoc network as an mptcp subflow on the gnu/linux environment has given us a viable experimentation set up to study how user preferences can be passed to the tcp layer to push mptcp to select paths that it would otherwise abandon. user preference is passed to the kernel as a soft directive a hint. this testbed, rather than a simulation, allows us to explore the issue in a realistic environment. 5 k. liyanage1*, t. wijethilake#2*, k. gunawardana3*, k.thilakarathne4*, p. wijesekera5ϯ, c. keppitiyagama6* march 2021 international journal on advances in ict for emerging regions to this end, we show the viability of passing user hints to the kernel to focus on a subflow over the other alternatives. we contribute the following, 1. we developed a method to influence the path selection and scheduling of mptcp with the preference/requirement of the end user. 2. we show that the proposed modifications are practical and with negligible overhead. 3. we show that the proposed work can pave the way to much interesting security research iii. design as mentioned before, the goal of this research is to prioritize the flow of network packets through the ad-hoc connection and get the user preference for that process. figure 4 illustrates a high level view of the experimental setup. one connection was created through isp whereas the other connection established through an ad-hoc network. application layer made unaware of the mptcp to make existing applications reusable despite the changes at lower layers of the network stack. user preference is set to ad-hoc network in order to prioritize traffic through ad-hoc network. a new scheduling mechanism is introduced to honor user preference in selecting mptcp subflows. in order to achieve these goals, we have identified three main tasks. they are, • passing user hints to the kernel. • prioritizing ad-hoc subflow. • implementing alternative loss recovery. fig. 4 design a. passing user hints to the kernel there are several mechanisms to pass user hints from user space to kernel space, such as using netlink sockets [7] and using extra fields like sin_zero in socket [8]. however, it is prudent to pass the hint from user to the kernel without modifying user applications. gnu/linux has an abstraction layer providing an interface to the kernel data structures via a pseudo file system called “\proc”. therefore, in order to convey the hint from user space to the kernel space without modifying an application, we use the proc file system [9]. we pass the internet protocol (ip) address of the subflow we prefer to prioritize using the proc filesystem, thereby the mptcp scheduler can act accordingly. since, we use an ad-hoc network as a possible path of communication, we keep the availability of an ad-hoc network in a new variable (is_adhoc_avail) which is stored in the mptcp control buffer (mptcp_cb). in mptcp architecture, mptcp control buffer is visible to all subflows. we use mptcp controller to set is_adhoc_avail variable upon detecting an ad-hoc network. the mptcp scheduler is modified to refer to the variable to check the availability of ad-hoc networks when selecting subflows (see fig. 5). fig. 5 modified mptcp scheduler b. prioritizing ad-hoc subflow the default scheduler policy is to use the subflow with minimum latency and lowest number of scheduled packets. therefore, given the nature of ad-hoc networks, there is very less probability of getting an ad-hoc network scheduled through the mptcp default scheduler. therefore, we propose a derived version of mptcp scheduler which we have illustrated in figure 3, henceforth referred to as the scheduler. the scheduler initially checks if there are any adhoc networks connected to one of the network interfaces by referring to the is_adhoc_avail variable set by mptcp controller. even though the ad-hoc network is available, it may be in an error state. a subflow can become unusable for various reasons such as higher rtt, higher error rate, or complete loss of the channel. such subflows are put into an error state using the adhoc_priority flag in the mptcp control buffer to make sure that the scheduler does not use the particular subflow. we then keep track of the erroneous adhoc network subflow using a socket flag corresponding to the particular subflow [6]. the erroneous subflows related to ad-hoc networks are separately probed at regular intervals user-controlled subflow selection in mptcp: a case study 6 international journal on advances in ict for emerging regions march 2021 using a retransmission packet and check if those can be reactivated. an acknowledgement received in response to a retransmission triggers reactivation reversing the value of adhoc_priority flag to make the subflow active. thus, the subflow needs to be checked for usability by referring to the value of adhoc_priority in the mptcp control buffer before scheduling. in this implementation, as it is shown in equation 1, we used the deactivation threshold, 𝛅𝛅, to deactivate the prioritized sub-flow. 𝛅𝛅 = 2* min(srtt) (1) srtt is the smoothed rtt and the min(srtt) is the minimum of the srtts across all the subflows. prioritized subflow is deactivated if its srtt is larger than the deactivation threshold. note that this threshold is an arbitrary value used for our experiments and it can be set by the user. in design, tcp retransmission intervals get incremented exponentially over time thus making a particular path reactivation time to increase exponentially. therefore, a connection which has regular interruptions, such as ad-hoc networks, could not be effectively used with existing tcp design. hence, we propose a redesigned tcp retransmission strategy incorporating user preference as a hint for mptcp to reactivate a subflow thus making a sub-flow attached to an ad-hoc network considered for scheduling. considering the deficient reliability of ad-hoc networks, tcp connection initiation is not scheduled through the adhoc network. therefore, subflow corresponding to the adhoc interface has only been scheduled after confirming that the initial communication has happened through one of the interfaces other than ad-hoc networks. a subflow complying the conditions stated gets scheduled by the scheduler with packets to be sent to an mptcp aware destination. c. implementing alternative loss recovery ad-hoc networks are volatile. therefore, the standard loss recovery mechanism used by mptcp has to be altered accordingly to the behaviour of ad-hoc networks. in the standard mptcp loss recovery, for each unsuccessful retransmission, the retransmission timeout doubles. but in our implementation, first it checks whether the retransmission is happening through an ad-hoc socket. if it fig. 6 data rates on both the interfaces of modified mptcp with same latency is an ad-hoc socket, then the retransmission time doubles and keeps it constant for a defined amount of retransmission. the number is taken as part of the hint set by the user reflecting a weightage to use ad-hoc network. for our evaluations, we have taken the maximum weightage. after that it will switch back into doubling retransmission timeout as in the standard mptcp. with this we are checking the availability or recovery of ad-hoc connections more frequently compared to standard mptcp. the main objective of this more persistent re-trying is to make sure that the priority is given to the ad-hoc network. we, however, believe that this will result in longer waiting but we hypothesize that there will be cases that longer waiting can be a worthy compromise over switching to unpreferable subflow. iv. evaluation as proposed in section iii, we modified the standard mptcp in order to pass user hints to the tcp scheduler. to evaluate the correctness and the effectiveness of the modified protocol, we carried out several experiments in a virtual environment. each host in the virtual environment was configured to have two wifi network interfaces, such that one interface was configured with infrastructure mode, and the other was configured with ad-hoc mode. further, the communication links were restricted to have a maximum bandwidth of 1mb/s by using vmware [10] to emulate physical networks’ behavior. the rest of this section is organized to present all the experiments. a. overhead of the new mptcp kernel stack the objective of our first experiment was to investigate whether the standard mptcp is unduly affected by the changes introduced into the mptcp kernel stack. thereby, the behavior of data flow was examined using the data rate as a metric, and the modified mptcp was compared against the standard mptcp. to eliminate the bias of latency in one interface over the other, both the interfaces were configured with the same latency. as we see in fig. 6 and fig. 7, both the standard and the modified mptcp have reached stable data rates up to 500 kb/s. achieving the same stable level of data rate shows that the presented modifications do not incur significant overheads. fig. 7 data rates on both the interfaces of standard mptcp with same latency 7 k. liyanage1*, t. wijethilake#2*, k. gunawardana3*, k.thilakarathne4*, p. wijesekera5ϯ, c. keppitiyagama6* march 2021 international journal on advances in ict for emerging regions to explore the rate of data flow when using the modified mptcp with the user controlled subflow selection, we first prioritized the network interface that was configured with ad-hoc mode by providing a user hint. then we initiated the connection using the prioritized interface and observed the data rate with the time. thereafter, we explored the rate of data flow separately by initiating another connection using the second interface which was non-prioritized. fig. 8 and fig. 9 show the graphs for those two experiments respectively. with these two experiments, we discovered that the scheduler has scheduled the segments only to the prioritized interface and the data rate was dropped to a half of the full capacity. the sole reason for the bandwidth reduction is the user control over the subflow selection where the data was sent only through the prioritized ad-hoc interface. fig. 8 data rates when creating the initial connection through prioritized interface fig. 9 data rates when creating the initial connection through non prioritized interface b. path switching on modified mptcp since the modified mptcp permits prioritizing subflows, we have to test the path switching ability when the prioritized path fails. typically, the ad-hoc connection is unstable compared to the infrastructure connection in a host with an infrastructure based network connectivity and an ad-hoc connectivity. preferably, in this experimental setup, even though the ad-hoc connection is prioritized, the host has to switch to the infrastructure based connection when the ad-hoc connection fails. in order to test whether it switches back and forth from prioritized subflow to non-prioritized subflow, we have conducted an experiment. to simulate breaking the prioritized subflow, we artificially block the prioritized subflow time-to-time. the latency of the connections was set to 40ms using tc command and the maximum bandwidth was set as 512kb/s. the server drops all packets from the ad-hoc connection, which is the prioritized connections, for a five seconds period, with a gap of ten seconds. during those periods, we observed that the path is switched to the connection through the infrastructure based network and once the ad-hoc connection is active, the connection switches back to the prioritized ad-hoc connectivity. fig. 10 clearly shows that data rate via non-prioritized standard interface is minimal while the prioritized ad-hoc connection is having higher data rate and vice versa. it implies that when the prioritized connection is active the data flows through it, and when the prioritized connection fails data starts to flow through the non-prioritized infrastructure based connection. fig. 10 path switching between prioritized and non-prioritized interfaces. c. use of prioritized subflow with high latency one of our main objectives of this study is to make mptcp tolerate high latency in ad-hoc networks based on user preference. in this experiment, the prioritized sub-flow, which is the ad-hoc connection, has configured with high latency (< deactivation threshold) compared to the non prioritized subflow. fig. 11 shows the prioritized sub-flow with high latency is still used to transfer the data. for the evaluation 40ms latency was used in non prioritized subflow and 60ms latency was used in prioritized subflow. fig. 11 with high latency (< deactivation threshold) user-controlled subflow selection in mptcp: a case study 8 international journal on advances in ict for emerging regions march 2021 d. switching the face of higher latency in this experiment, 40ms was used as the latency of the non prioritized subflow and 200ms was used as the latency of the prioritized subflow. as shown in fig. 12, when the latency difference is high, scheduler falls back into default behaviour and more data has been transferred through the interface with lower latency. fig. 12 throughput between prioritized and non-prioritized interfaces when the latency difference is high e. prioritized subflow recovery speed evaluation this experiment was conducted to observe the path recovery speed of modified mptcp, compared with the standard mptcp. it has used 40ms as the latency on all the subflows. in the standard mptcp connection, one sub-flow was dropped and measured the time taken to recover the particular path. the same method was used in the modified mptcp and measured the time taken to recover the dropped subflow. it took around 90ms to recover the path in modified mptcp and around 140ms in standard mptcp. therefore, the modified mptcp has shown a significant improvement than the standard mptcp by recovering the path around 50ms earlier. v. discussion the main objective of this study is to explore the viability of influencing the mptcp scheduler in kernel on the selection of a subflow for mptcp. we show that, a) it is practically feasible to incorporate user preference when selecting subflows and scheduling paths, b) the proposed modifications incur minimal overhead, making the proposed solution a practical alternative. a. conditional priority the paper presents a use-case where a user can instruct the kernel to use the ad-hoc network over infrastructure mode. the hint creates a priority to a given subflow giving other alternatives a chance, i.e., the kernel will try to use the preferred subflow as much as possible while switching to another subflow as soon as the preferred subflow is no longer viable. in a future setup, one could argue that users could potentially submit conditions to set up the priorities. in such a setup, whichever flow that fulfills the condition would be used to send the packets. we believe the proposed research will pave the way for future more complicated priority setup or directions. b. application-level decision making technically speaking, priority set up can be achieved at any layer of the tcp/ip stack. lower in the stack has less flexibility for users but provides abstraction for users who want less configuration headache and higher up at the stack has more user control but could require additional user involvement which could be a distraction for some. however, it is important to discuss the pros and cons of each approach. for application to participate in the subflow selection, the application should have access to a number of parameters, such as the state of the interface (active or not), the congestion window of the subflow, the rtt estimation, etc. which is not readily available at the application layer. further, making such information available for the application layer would have widened the threat surface as well by making the application literally in control of the transport layer of the device. recent events that surfaced applications rerouting traffic and moderating the content for surveillance purposes, it is better not to give applications such control [22]. c. user space decision making with the design we proposed in the study, the mptcp has the relevant and standard information to select the most suitable subflow for a particular situation to schedule the communication user hint and priorities (policy) feeded to the scheduler would act as additional information to make more effective decisions in catering the specific user needs i.e. the system will get qualitative and subjective insights such as opportunistic costs, privacy concerns, etc. in a quantitative form to make more accurate and effective decisions which otherwise are not available for the mptcp scheduler / kernel. the user space decision making or priority setting can be one level up than the proposed research. one could also design in a manner that users can specify priorities per application basis. to demonstrate the effectiveness of the mechanism we used, we test our system between infrastructure mode and ad-hoc mode wifi connections. if the vanilla mptcp is allowed to take path decisions it prefers the infrastructure networks since it is less volatile than the ad-hoc network. however, from the point of view of the user, the ad-hoc network is less costly than the isp based infrastructure network and the ad-hoc network should be used whenever possible. hence, we showed that we can influence the kernel to use the ad-hoc connection over the infrastructure mode. d. privacy and security concerns mptcp provides useful functionality from a congestion and flow control perspective. however, we believe that this functionality has exciting security and privacy implications that can benefit users safeguarding them from data-hungry malicious actors. in the era of tight cyber-surveillance and control of internet access, mptcp can be a solution where 9 k. liyanage1*, t. wijethilake#2*, k. gunawardana3*, k.thilakarathne4*, p. wijesekera5ϯ, c. keppitiyagama6* march 2021 international journal on advances in ict for emerging regions users can instruct the kernel for packet-level finer-grained data routing over preferred network connections. users with restrictions over internet access can instruct the kernel to send certain portions of network communication over the subflow with less scrutiny and control. such fine-grained routing can be completely transparent to the user-level application, but more work needs to be done on session management at the kernel.the subflow management can also be a privacy win for users. different subflows avoid isp or state-level actors from full visibility into the communication at a given moment. that will reduce the possibility of network traffic reconstruction or data leakage from a given entity. lack of full visibility will also let the user have more control over who can see which data in their network traffic. one interesting future avenue would be to understand merging with tor and mptcp [23, 24]. prior work has already looked into this but we believe it is interesting to understand how different decision making layers affect the consumer privacy and security expectations while merging with tor. these exciting avenues are possible only if the user can pass on their preferences and priorities to the kernel and transparently to the application. thus, we believe this will open up exciting research, along with security and privacy. however, the current widespread tcp adoption is due to its simplicity and stability from the design of tcp to its implementations. while mptcp provides much-needed functionality for security, cost reduction, etc., this could eventually challenge the core simplicity. this reason might further explain the conservative adoption of mptcp in the wild. while current complicated user needs will call for more tcp functionality, we should be cautious not to harm the robust core of tcp. vi. conclusion mptcp has introduced dynamism to the traditional tcp by permitting to establish multiple paths through different network interfaces in a host. mptcp scheduler is responsible for segmentation/integration of data into/from multiple subflows and making the decision on selecting a subflow. however, we believe that the mptcp scheduler does not possess or cannot discover all the metrics required to make such a decision independently. in addition to that, though mptcp increases the reliability and the throughput of the connection by providing redundant paths, the user has no control over path selection for the data flow. considering all these limitations, in this study, we proposed an approach to take user preferences into account when selecting subflows in mptcp. we modified the mptcp kernel stack and carried out several experiments to investigate the viability of this mechanism. the results of the experiments exemplified that our proposed mechanism can take user preferences into account when selecting mptcp subflows. also, we demonstrated that it is possible to pass user preferences as a hint to the kernel without changing the applications. a hint is not a hard rule, rather a soft directive to the mptcp scheduler to follow. in such a context, applications are unaware of the presence of mptcp or the use of user-supplied hints. further, experiments showed that our changes to the mptcp scheduler did not introduce extra overhead to the regular mptcp operation. finally, we used a test environment with a handicapped (in terms of latency and stability), but user-preferred, subflow to demonstrate that it is possible to cater to user preferences even in such an extreme environment. references [1] z. khan, h. ahmadi, e. hossain, m. coupechoux, l. a. dasilva, and j. j. lehtomäki, “carrier aggregation/channel bonding in next generation cellular networks: methods and challenges,” ieee network, vol. 28, no. 6, pp. 34–40, 2014. [2] "rfc 793 transmission control protocol", tools.ietf.org, 2020. [online]. available: https://tools.ietf.org/html/rfc793. [accessed: 19 jan2020]. [3] p. wijesekera and c. keppitiyagama, "comonet: community mobile network", arxiv preprint arxiv:2009.05966. 2020. [4] “rfc 6824 tcp extensions for multipath operation with multiple ad dresses.” https://tools.ietf.org/html/rfc6824. [accessed on 03/20/2019]. [5] m. handley, c. raiciu, a. ford, j. iyengar, and s. barre, “[5]"rfc 6182 architectural guidelines for multipath tcp development", tools.ietf.org, 2020. [online]. available: https://tools.ietf.org/html/rfc6182. [accessed: 19feb2020]. [6] m. lima, n. fonseca, and j. de rezende, “on the performance of tcp loss recovery mechanisms,” pp. 1812 – 1816 vol.3, 06 2003. [7] p. neira-ayuso, r. gasca and l. lefevre, "communicating between the kernel and user-space in linux using netlink sockets", software: practice and experience, p. n/a-n/a, 2010. available: 10.1002/spe.981 [accessed 19 february 2020 [8] t. wijethilake, k. gunawardana, c. keppitiyagama and k. de zoyza, "an alternative approach to authenticate subflows of multipath transmission control protocol using an application level key", in 13th international research conference, general sir john kotelawala defence university, 2020. [9] t. bowden, b. bauer, j. nerin and s. feng, "the /proc filesystem", [online] available: https://www.kernel.org/doc/documentation/filesystems/proc.txt [accessed: 19feb2020] [10] "vmware inc., "using vmware workstation pro", vmware, inc., 2019. [online]. available: https://docs.vmware.com/en/vmwareworkstation-pro/. [accessed: 19feb2020]. [11] r. withnell and c. edwards, “towards a context aware multipathtcp,” 2015 ieee 40th conference on local computer networks (lcn), 2015. [12] s. h. baidya and r. prakash, “improving the performance of multipath tcp over heterogeneous paths using slow path adaptation,” 2014 ieee international conference on communications (icc), 2014. [13] a. elgabli and v. aggarwal, “smartstreamer: preference-aware multipath video streaming over mptcp,” ieee transactions on vehicular technology, vol. 68, no. 7, pp. 6975–6984, 2019. [14] y.-s. lim, e. m. nahum, d. towsley, and r. j. gibbens, “ecf: an mptcp path scheduler to manage heterogeneous paths,” proceedings of the 13th international conference on emerging networking experiments and technologies, 2017. [15] c. paasch, s. ferlin, o. alay, and o. bonaventure, “experimental evaluation of multipath tcp schedulers,” proceedings of the 2014 acm sigcomm workshop on capacity sharing workshop csws 14, 2014. [16] a. frömmgen, a. rizk, t. erbshäußer, m. weller, b. koldehofe, a. buchmann, and r. steinmetz, “a programming model for application-defined multipath tcp scheduling,” proceedings of the 18th acm/ifip/usenix middleware conference, 2017.] [17] o. bonaventure, "observing siri : the three-way handshake", multipath-tcp.org, 2014. [online]. available: http://blog.multipath tcp.org/blog/html/2014/02/24/observing_siri.html. [accessed: 22 feb-2020]. [18] "the first multipath tcp enabled smartphones — mptcp", blog.multipath-tcp.org, 2021. [online]. available: http://blog.multipathtcp.org/blog/html/2018/12/10/the_first_multipath_tcp_enabled_smart phones.html. [accessed: 08mar2020]. [19] "apple developer documentation", developer.apple.com, 2020. [online]. available: https://developer.apple.com/documentation/foundation/urlsessioncon figuration/improving_network_reliability_using_multipath_tcp. [accessed: 08mar2020]. user-controlled subflow selection in mptcp: a case study 10 international journal on advances in ict for emerging regions march 2021 [20] "an evaluation of multi-path transmission control protocol (m/tcp) with robust acknowledgement schemes", ieice transactions on communications, vol. 87-, no. 9, pp. pp.26992707, 2004. available: https://search.ieice.org/bin/summary.php?id=e87-b_9_2699. [accessed 10 january 2021]. [21] s. barré, c. paasch and o. bonaventure, "multipath tcp: from theory to practice", in networking 2011 10th international ifip tc 6 networking conference, valencia, 2011. [22] d. harwell and e. nakashima, "federal prosecutors accuse zoom executive of working with chinese government to surveil users and suppress video calls", washington post, 2020. [online]. available: https://www.washingtonpost.com/technology/2020/12/18/zoomhelped-china-surveillance/. [accessed: 27dec2020]. [23] w. de la cadena, d. kaiser, a. panchenko and t. engel, "out-ofthe-box multipath tcp as a tor transport protocol: performance and privacy implications," 2020 ieee 19th international symposium on network computing and applications (nca), cambridge, ma, usa, 2020, pp. 1-6, doi: 10.1109/nca51143.2020.9306702. [24] "annymous/oniontinc", github, 2020. [online]. available: https://github.com/annymous/oniontinc. [accessed: 12dec2020]. [25] k. liyanage, t. wijethilake, k. gunawardana, k. thilakarathne, p. wijesekera and c. keppitiyagama, "priority based subflow selection in mptcp: a case study", in 2020 20th international conference on advances in ict for emerging regions (icter), colombo, sri lanka, 2020. ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2021 14(2): march 2021 international journal on advances in ict for emerging regions cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders kalana wijegunarathna#1*,uditha maduranga2*, sadeep weerasinghe3*, indika perera4*, and anuradha wickaramarachchi5 ϯ abstract— analysis of metagenomic data is not only challenging because they are acquired from a sample in their natural habitats but also because of the high volume and high dimensionality. the fact that no prior lab based cultivation is carried out in metagenomics makes the inference on the presence of numerous microorganisms all the more challenging, accentuating the need for an informative visualization of this data. in a successful visualization, the congruent reads of the sequences should appear in clusters depending on the diversity and taxonomy of the microorganisms in the sequenced sample. the metagenomic data represented by their oligonucleotide frequency vectors is inherently high dimensional and therefore impossible to visualize as is. this raises the need for a dimensionality reduction technique to convert these higher dimensional sequence data into lower dimensional data for visualization purposes. in this process, preservation of the genomic characteristics must be given highest priority. currently, for dimensionality reduction purposes in metagenomics, principal component analysis (pca) which is a linear technique and t-distributed stochastic neighbor embedding (t-sne), a non-linear technique, are widely used. albeit their wide use, these techniques are not exceptionally suited to the domain of metagenomics with certain shortcomings and weaknesses. our research explores the possibility of using autoencoders, a deep learning technique, that has the potential to overcome the prevailing impediments of the existing dimensionality reduction techniques eventually leading to richer visualizations. keywords— metagenomic data visualizations, nonlinear dimensionality reduction, autoencoders, clustering i. introduction he field of metagenomics has shown popular interest among bioinformatics and computer science researchers in the recent years. it has opened up new pathways in many areas including population-level genomic diversity of the microbial organisms. metagenomics [1] was coined, with the idea of performing analysis on similar in certain criteria, yet non identical, microorganisms which are extracted from diverse environmental samples or from the natural habitats in order to study the structure and functions of microorganisms. an earliest-known method for studying metagenomic dna is the abundance of guanine-cytosine (%gc) content. %gc varying widely between species but remaining relatively constant within the species is proven. acquiring oligonucleotide frequencies of the microbial organisms is a widely used method that identifies the nucleotide composition with much better accuracy and effectiveness, compared to %gc [2]. contemporary studies have shown that the oligonucleotide frequencies as they appear in genomic sequences is unique for a given microorganism. research on this which runs back to 1960s, showcase the fact that oligonucleotide frequencies having species-specific signatures [3]. because of this, an array of all oligonucleotide frequencies for a given length provides genomic signatures for microorganisms. oligonucleotide frequencies can be represented as vectors in high dimensional euclidean space. visualization of metagenomic data, without prior taxonomic references using sequence fragments can use frequency vectors to be used as genomic signatures. an ideal visualization must be capable of capturing the authentic characteristics of the microorganisms in the sample, given a set of metagenomic sequence fragments, and display the alike species separated from the rest. consequently, the visuals must be capable to display the taxonomic structure which is inherited by the original sequence data. being self-explanatory and ability to carry out further analysis are few of the other characteristics that are expected of visualizations. visualization of metagenomic data is broadly twofold. the visualization of a single metagenome and the visualization of multiple metagenomes. while visualization of multiple metagenomes gives insight into the nature of the same or different types of species found in the two different environments and helps researchers gain insightful information on the environments, the study of single metagenomes focuses on the species richness and diversity in the particular environmental sample. despite the high dimensionality and other challenges, metagenomic data can be visualized in various techniques as described by sudarikov et al [4]. conversely, taxonomic classification of metagenomic data can be broadly categorized into four categories. sequence similarity based classification employs a database search on a database of reference sequences. this method has been successfully used in the identification of reads of length as short as 80 base pairs where most other methods have failed [5]. this method is usually slower and uses basic local alignment search tool (blast) [6] to identify similarities. classification based on sequence composition is another method of taxonomic classification. one way of doing this is by using nucleotide composition of the reads. the nucleotide composition of the reads are compared with the models built using the composition of reference genomes and the model that fits the composition of the reads best are chosen. the t correspondence: k. wijethunga#1(e-mail: kalanainduwara.16@cse.mrt.ac.lk) received: 20-12-2020 revised: 25-02-2021 accepted: 14-03-2021 this paper is an extended version of the paper “dimensionality reduction for cluster identification in metagenomics using auto encoders” presented at the icter 2020 conference. k. wijegunarathna#1*,u. maduranga2*, s. weerasinghe3*, i. perera4* are from the department of computer science & engineering, university of moratuwa, sri lanka. {kalanainduwara.16, maduranga.16, sadeep.16, indika}@cse.mrt.ac.lk anuradha wickramarachchi5ϯ is from the australian national university, australia. (anuradha.wickramarachchi@anu.edu.au) doi: http://doi.org/10.4038/icter.v14i2.7224 © 2021 international journal on advances in ict for emerging regions cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders 2 international journal on advances in ict for emerging regions march 2021 main drawback of this method is that it fails to classify reads shorter than 1000 base pairs with reasonable accuracy [7]. the third classification method is a hybrid of the two previous methods that combines both approaches of read similarity and nucleotide composition. it is possible to increase the accuracy by taking an aggregation of the two matches for a better result. in this case the score from the read-sequence similarity and the score from the read composition and reference genome model are combined. alternatively, it is possible to narrow down the database using composition matching as the initial step to apply the similarity search on the filtered, reduced database [8]. the last of the four approaches is the use of marker genes. the reads are classified according to the markers they hit. although this method is faster and efficient, it induces a bias towards organisms with larger genomes since they generate a larger number of reads [7]. due to variability in 16s rrna (ribosomal ribonucleic acid) [9] copy number, amplicon sequencing [10] also suffers with bias. a key challenge in bioinformatics as well as metagenomics to date is the visualization of metagenomic fragment data without prior taxonomic identification. usually for the visualization needs, the higher dimensional frequency vectors need to be embedded into lower dimensions (2d or 3d). preservation of the inherited data from the genomic sequences is a deciding factor of the quality and accuracy of the visualizations when reducing into lower dimensions. principal component analysis (pca) [11] and t-distributed stochastic neighbor embedding (t-sne) [12] are two of the most widely used techniques for this purpose. but limitations in these techniques which hinder faithful visualizations prevail. massive volumes of genomic data can be produced in such an efficient manner using the advancements in the field of sequencing and with the introduction of latest sequencing technologies like next-generation sequencing (ngs). it is evident that with the availability of large volumes of genomic data, optimizations and new analyzing techniques are becoming more crucial. cutting-edge technologies that are being used throughout in computer science can be adopted in the field of bioinformatics as well. deep learning techniques are built to handle the rapid rate of generation of data and for the intuitive and rapid analysis. this research aimed at producing better visualizations in metagenomics for cluster identification by adopting an autoencoder based approach of dimensionality reduction. autoencoders can be used in this context to convert higher dimensional metagenomic data into visualizable lowerdimensional data preserving the important characteristics of the original sequences. since it is a deep learning technique it will be useful in a context like genomics. this allows better analysis on data-inherent taxonomic structure, free from alignments. the rest of the paper is organized as follows. the related work on dimensionality reduction techniques and research that are carried out on using autoencoders in the genomics domain are discussed under section 2. a broader introduction to autoencoders is given in section 3. dataset, methodology for dimensionality reduction and visualizations are discussed in section 4 followed by the experimental results which are described and demonstrated in section 5 in detail. finally, section 6 concludes the paper giving an overview of the future work. ii. related work a. dimensionality reduction techniques representing data visually can be considered as a major challenge in the field of genomics. processes such as transforming, scaling, normalizing, color-encoding and clustering play major roles in visualizing genomic data. it is important not to hinder users’ ability to interpret data while facilitating the users to carry out their analysis more conveniently and precisely. according to the studies done by rall et al. [13], still number of challenges prevail in visualizing the genomic data. one of the leading challenges will be dealing with the dimensionality of the genomic sequence data. a plethora of research were carried out on lower-dimensional embedding techniques. one of the dominant concerns is dimensionality reduction, in bioinformatics fields, which looks into analyzing sequence data. genomic sequence data consists of extensive amounts of sequence data and features. thus, it is essential to reduce the dimensionality of data to extract useful analysis and visualization by avoiding the curse of dimensionality. transformation of high dimensional data to a lower number of dimensions is a major goal in dimensionality reduction, providing simple interpretations. in the ideal case, dimensionality of reduced representation must have the dimensionality that corresponds to the intrinsic data [14]. one of the related concerns is the dense preservation of information. pca, without a doubt, is the most commonly used linear technique in dimensionality reduction across multiple domains. it converts data in the higher dimension spaces to the lower-dimensional subspace making sure maximized variance in the projected data. making sure that the maximum variance in the projected data also means pca minimizes the squared reconstruction error. one of the leading drawbacks of pca is its restriction with respect to linear transformations. hauskrecht et al. [15] in in their work also displays the restrictions of pca to guarantee highquality features for discriminatory purposes because it is a totally unsupervised technique. stochastic neighbor embedding (sne) [16] introduced by hinton and roweis is a non-linear technique to get lower dimensional representations. it uses a gaussian distribution on each point of data in the higher dimension and defines a probability distribution for its neighbors. this unsupervised dimensionality reduction technique has been commonly used over many years. t-sne [12] is a variant of the sne for nonlinear dimensionality reduction. this method is also used to produce lower dimensional representations of higher dimensions that can be visualized with ease, especially in scatterplots. t-sne preserves the global structures of the data like clusters while capturing the local structure of the higher dimensions. t-sne adopts a gaussian kernel in order to identify the similarities between points in the higher dimensions. lower dimensional points are plotted while providing similar probabilities to the points and they are usually configured in such a way that they will reduce the divergence between the two distributions. importantly, t-sne strongly advocates against using gaussians to measure distances in lower dimensions. it will instead opt for the onedimensional t distribution (i.e. the cauchy distribution). thus, it has heavier tails and allows for more spread in the lower dimensional representation than gaussian. nevertheless, a 3 k. wijegunarathna#1*, u. maduranga2*, s. weerasinghe3*, i. perera4*, a wickaramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions significant limitation of sne as well as t-sne is that their computational and memory complexity scale quadratically in the number of data objects n. thus, sne and its variants can only be used with limited number of points [17]. the standard implementation of t-sne has a time complexity of o(n2), where n is the number of genomic fragments. barnes-hut-sne (bh-sne) introduced by laurens van der maaten [17] has a time complexity of o(n log n). therefore, bh-sne is significantly faster compared to t-sne. despite the efficiency, the results produced by bhsne are similar to that of t-sne. sparse similarities between each pair of points were obtained using vantage-point trees while the forces of the corresponding embedding were acquired using an enhanced version of barns-hut algorithm [18]. due to the lower time complexity, bh-sne can process more than a million data points within a reasonable time. a 2009 paper [19] by dick et al describes the use of selforganizing maps [13] (soms) for reducing the dimensionality of tetranucleotide genomic signatures belonging to two acidophilic biofilm communities. the unsupervised learning technique som uses an artificial neural network to generate a two-dimensional representation of the high dimensional data. this technique was used by the researchers to bin tetranucleotide sequence fragments obtained from isolated genomes and metagenomic samples. despite being neural networks based, som does not rely on error-correction learning. instead, som depends on competitive learning to map the inputs to an output while preserving characteristics in the input space. gisbrecht et al. [21], in their research on “nonlinear dimensionality reduction for cluster identification in metagenomic samples” has compared several currently used techniques for dimensionality reduction. the researchers have obtained oligonucleotide frequency vectors from a set of sequences generated by simulating metagenomic nextgeneration sequencing. these vectors have then been fed into dimensionality reduction algorithms. researchers have used the effectiveness of these algorithms at clustering the output to compare the dimensionality reduction techniques. the techniques being compared are pca, gtm (generative topographic mapping) [22] and t-sne. the research demonstrates that t-sne outperforms the other techniques in terms of accuracy. the researchers have also introduced improvements to t-sne to overcome its high complexity. datasets in bioinformatics are typically large. the quadratic time complexity of t-sne does not scale well for the needs of the genomic datasets. this is a shortcoming that needs to be addressed. the use of autoencoders for this purpose has been explored by wang and gu in their paper on dimension reduction and visualization of single-cell rna seq data [23]. b. autoencoders in genomics as deep learning became mainstream, we have observed a rise of neural network based techniques that rival the traditional mathematical & probabilistic methods in their corresponding areas. in the case of dimensionality reduction, autoencoders can be named as the candidate, deep learning has to offer. the viability of autoencoders to replace t-sne, and pca in the context of bioinformatics has to be explored. autoencoders, however, have already played a role in the bioinformatics domain in several instances. a 2019 paper by ersalan et al. [24] describes how they managed to denoise single-cell rna sequencing (scrna-seq) datasets using deep count autoencoders (dca). they used an enhanced version of autoencoders although a number of other methods existed to perform this very task. the enhancement was in terms of the specialized loss functions which drive in the direction of denoised scrna-seq data. in addition to that, dca scales linearly with regard to the number of cells overcoming the limitation of limiting the datasets. experiment results suggest that dca has surpassed the existing methods for imputation by means of quality and speed. wang et al. [23] in the research on dimension reduction of single-cell rna seq data, propose the use of variational autoencoders. research experimented the use of variational autoencoders for single-cell rna seq data because the chances of dropouts are higher when dealing with single-cell levels with higher transcriptional fluctuations. this experiment went on to show how to overcome the limitations of pca and zifa (zero inflated factor analysis) [25] by testing on over 20 different datasets. variational autoencoders gain a special edge over both pca and zifa because of its ability to deal with complex non-linear relationships. wang and wang [26] in their research have used variational autoencoders to study two subtypes of lung cancers, lung adenocarcinoma (luad) and lung squamous cell carcinoma (lusc). although the researchers were expected to capture underlying dna methylation patterns of the different original subtypes separately, some lusc samples were classified into a luad group, which may be an indication that some parts of lusc tumor samples may have similar dna methylation expression compared to luad tumor. it was evident that a biologically meaningful latent space can be extracted using variational autoencoders from the data consisting of two or more subtypes of the lung cancers. iii. autoencoders although autoencoders took the center stage in the late 2000s with the introduction of the deep architecture, the original concept goes back as far as 1980s when rumelhart et al. [27] introduced a new learning procedure that recurrently adjusts the weights of the connections of the network in such a way that the disparity between the actual output and the expected output of the network is minimal. autoencoders can be considered as a special type of neural networks, which consists of two symmetric components namely encoder and decoder. encoder maps input to more compressed lower dimensions, in contrast to the decoder which does exactly the opposite. autoencoders are an unsupervised learning technique that leverages reducing the dimensionality of the input vectors efficiently, such that it will preserve the important characteristics of the original data and then reconstructing the original data with minimum loss using compressed representation. l(x, x’), which represents the deviation between the original input (x) and the consequent reconstruction (x’) should be diminished to get better reconstructions. a general autoencoder (fig. 1) can be expressed using the tuple x, y, φ, ψ, x’ where φ denotes the encoder function and ψ represents the decoder function of the autoencoder. x cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders 4 international journal on advances in ict for emerging regions march 2021 and x’ are the input and the output vectors of the autoencoder respectively. fig. 1 structure of a general autoencoder φ : χ → υ (1) ψ : υ → χ (2) φ, ψ = . argφ,ψ min (loss(x,x’)) (3) φ denotes the function that maps the original data x, to a latent space y, which appears at the bottleneck and ψ function does the opposite. deep autoencoders can have multiple hidden layers. basically, when the layers of the neural network increased, those autoencoders are called deep autoencoders. having advanced learnability will be essential to reconstruct the input data as it is if possible. a well trained autoencoder is capable of reconstructing the input with minimum loss. although the reconstruction is done by the decoder, saving important characteristics of the input data plays a major role up to the bottle-neck layer of the autoencoder. autoencoders can be considered as the non-linear generalization of pca that converts higher dimensional data into lower dimensional code using the encoder part [28]. autoencoders are being used as a powerful tool for dimensionality reduction [29]. it is proven to be useful in fields which have extensive amounts of data to work with. iv. methodology a. dataset for the experiment, we chose a subset of the genomic sequences used by gisbrecht et al [21] as our dataset. the sequences were obtained from the ncbi (national center for biotechnology information) microbial genomes database (https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/). we first went on to reproduce the pca, and t-sne based results to work as the baseline for the research. to conduct the research the genomic sequences were downloaded in the fasta file format. complete genomes of 8 microbial species were used. our chosen subset contained species that are taxonomically very close to each other, as well as species that are very different from each other. the intention was to explore how well the dimensional reduction process preserves the taxonomic similarities and dissimilarities. b. implementation the dna sequences that we obtained, must go through a multi-stage process before they are ready to be visualized. this does not mean that significant alterations are done to the underlying genomic features of the sequences. twoor threedimensional scatter plots are drawn based on the input sequences at the end of the following process. • processing raw microbial dna sequences and extracting genomic metrics • converting the higher dimensional genomic metrics into a lower dimensional state • visualizing the lower dimensional state with scatter plots first, we read contiguous blocks from each input dna sequence. the blocks were taken from random locations within the sequence. the lengths of the sequences were chosen so that they form a normal distribution where the mean length is 5000 base pairs, and the standard deviation is 1000 base pairs. for a given k value, we calculated the number of the occurrences of each k-mer within each block. a k-mer and its reverse complement pairs were considered as a single k-mer; therefore, we had to take the sum of the occurrences of each k-mer in the pair and use that to represent that k-mer and its reverse complement. this method was proposed by abe et al. [30]. for each block, we created a vector by taking all such k-mer frequencies. the resulting vector was then normalized. after following the above process, what we end up with is a set of vectors, each having an equal dimensionality. the dimensionality depends on k. the high dimensionality of these vectors makes them impossible to visualize as they are. transforming these high dimensional vectors to a lower dimensionality is done by the autoencoder. the output of the autoencoder is a set of lower dimensional vectors, typically the dimensionality of these vectors is less than 4. these vectors are then directly used for visualization by drawing scatter plots. other techniques can also be used in the place of the autoencoder for the process of dimensionality reduction. the autoencoders perform dimensionality reduction in 2 stages. in the first stage, we feed the high dimensional vectors to train the autoencoder. in this stage the input layer of the neural network is provided with the vectors, and the same vectors are expected from the output layer of the neural network. in other words, we train the neural network to produce the same vector as the input. each of the vectors fed into the autoencoder corresponds to each block we read originally. once the training is over, we send the same set of vectors as input and obtain the values produced in the bottleneck layer. the resulting values correspond to a lower dimensional representation of the input vector. we experimented using various neuron counts for each layer and analyzed the results. as we identified the following configurations worked the best for respectively 3-mers and 4mers, 1. {32, 16, 2, 16, 32} 2. {136, 64, 2, 64, 136} the results we obtained using these autoencoders are presented in the paper. we used sigmoid as the activation function for neurons, while ‘adam’ and mean square error were used as the optimizer and the loss function respectively. 1000 reads (blocks) were taken from each of the original genomes, each on average having a length of 5000 bp. for comparison purposes, the results of pca and t-sne obtained using the same data fed to the autoencoder were visualized. the sequence lengths and standard deviation were kept constant across all three approaches. t-sne and pca https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ 5 k. wijegunarathna#1*, u. maduranga2*, s. weerasinghe3*, i. perera4*, a wickaramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions were both imported from the ‘scikit-learn’ [31] library. seaborn data visualization library was used to visualize the data in its reduced 2d space. the availability of ground truth labels ensured comparability of the performances of the three techniques of dimensionality reduction using the clusters produced by the respective approaches. due to the use of a manually curated simulated metagenomic dataset, the ground truth label, i.e., the true identity of the microorganism to which the oligonucleotide frequency belonged to, is known. but this is rarely the case in the field of metagenomics, where a single metagenomic sample obtained from the environment may consist of large amounts of dna fragments from a wide variety of species who may or may not be related to each other. hence making the evaluation of latent space visualizations of real metagenomic datasets extremely challenging. though this can be broadly named a clustering problem, several acute complications remain. while the creation of an isolated, lonely clusters consisting of data points from the right species is useful in the context of identifying the number of organisms in the sample, this sort of isolation can be costly in terms of taxonomic information. not only is the creation of a cluster consisting of the same species important but the whole visualization must reflect the connections and similarities these different clusters bear with each other. hence, mere arbitrary separation is not recommended. in our experiment, we used dbscan (density-based spatial clustering of applications with noise) dbscan [32] for clustering. though ideal method for clustering low dimensional metagenomic data is still a point of debate in the research community, dbscan is a powerful algorithm that discovers clusters of arbitrary shapes and can efficiently deal with large datasets that are part and parcel of genomic problems. dbscan relies on a density-based notion of clusters and requires only one essential input parameter. the low dimensional (2d) coordinate pairs, obtained from the bottleneck layer of the autoencoder, of the oligonucleotide frequencies are then fed to the dbscan for clustering. these 2d coordinate pairs are then clustered by dbscan without knowledge of their ground truth labels into clusters solely based on their distances in the 2d coordinate plane. it is these clusters that are produced by dbscan that need evaluation. clustering evaluation can be broadly categorized into two; intrinsic metrics and extrinsic metrics [33]. extrinsic metrics, when calculating the quality of a cluster, considers the ground truth label of the data points in each of the clusters. therefore, the use of an extrinsic metric demands ground truth labels, which we fortunately possess. contrary to extrinsic metrics, intrinsic metrics only consider the intra-cluster closeness and inter-cluster distance, not taking into consideration the ground truth labels of the data. intrinsic clustering evaluations evaluate the integrity of the clusters formed in the low dimension that meets the user’s eyes. the primitive logic that is used in the density-based clustering approach is derived from a human-intuitive clustering method. in dbscan the resultant clusters are the dense regions in the given dataset, separated by regions of the lower density of points. the ability to identify arbitrary shaped clusters and robustness towards the outliers make dbscan more effective with the datasets that have relatively similar densities in the clusters. this algorithm is based on connecting data points within certain distance thresholds. for any given set of data points, dbscan separates data points into three categories. • hub points points that are at the interior of a cluster (centre). • edge points points that fall within the neighborhood of a hub point that is not a hub point. • noise points any point that is not a hub point or an edge point. the most important factor to consider when using dbscan is getting appropriate values for the parameters of the algorithm. although there are a few parameters that can be tuned, the eps parameter, and the min_points (min_samples) are crucial. among those two, min_points is used to identify a dense region in the dataset by considering the number of neighboring points required for a point to be considered as a dense region. the quality of clusters created by dbscan is governed by dbscan’s vital input parameter, epsilon. epsilon is the threshold for the maximum distance between two data points, above which distance, the two points will no longer be neighbors. for a fair estimation of epsilon in each of the dbscan clustering, the “knee point” [34] of the plot between the 10 nearest neighbors in sorted order vs. the distance was calculated to find the optimal epsilon. scikitlearn’s nearestneighbors and “kneed’s” kneelocator was used for the calculations. to improve the visibility of the clusters, convex hull algorithm [35] was used to draw boundaries around them. v. results for a formal evaluation two metrics were employed in addition to direct visual inspection. distinct autoencoders were used with distinct number of layers and neurons for trinucleotide and tetranucleotide data to obtain separate visualizations. clustering was then applied to the reduced dimensionality data obtained through the three dimensionality reduction approaches. as was discussed earlier, a comprehensive evaluation of clustering must take into account an intrinsic measure as well as an extrinsic measure. there are important metrics which can be used to evaluate clustering. but most of these metrics alone will not make it an effective performance evaluation on clustering as they can be biased. in this experiment one intrinsic metric and one extrinsic metric will be used for the evaluations and comparisons. we chose v-measure and silhouette coefficient to evaluate the performance of overall clustering with respect to all three techniques that will be analyzed. validity measure (v-measure) [36] based on two other metrics called homogeneity and completeness, is an entropy based extrinsic clustering evaluation metric. it evaluates the successful clustering of data points with respect to their ground truth labels. homogeneity and completeness are inversely proportional, and a good clustering should maintain a balance between those two metrics. homogeneity evaluates how many data points in each cluster are with the same label. maximum homogeneity is obtained by a clustering that has clusters that only have data points of the relevant class. this in turn results in zero entropy. assume a clustering with n number of total data samples, c different class labels and k clusters. assume also that the number of data points from class c in cluster k is ac,k. cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders 6 international journal on advances in ict for emerging regions march 2021 table i clustering using trinucleotides data representation trinucleotides v-measure silhouette coefficient autoencoder {32, 16, 2, 16, 32} 0.901 0.564 t-sne 0.878 0.614 pca 0.745 0.485 (4) (5) (6) completeness is a symmetrical metric to homogeneity. a clustering is complete when all the data points from the same class are clustered together in each cluster. in a clustering, completeness can be evaluated using the conditional entropy of the cluster distribution in comparison with the label given. (7) (8) (9) v-measure, the metric that is used can be computed using weighted harmonic mean of homogeneity and completeness. (10) the second evaluation metric, silhouette coefficient [37] is a measure of distance. it gives a measure on how close each data point in one cluster is to other data points in the neighboring clusters. silhouette coefficient ranges between -1 and 1. if the value of the coefficient is 0 that means the data point is on the inflection point of the two clusters. (11) the tetranucleotide and trinucleotide reductions of autoencoders were separately compared and evaluated with the reductions of t-sne and pca. autoencoder generated reductions colored with using the ground truth label can be seen in fig. 2. table iii clustering using tetranucleotides data representation tetranucleotides v-measure silhouette coefficient autoencoder {136, 64, 2, 64, 136} 0.932 0.621 t-sne 0.886 0.654 pca 0.646 0.536 as visible, there are clear clusters and separations. the same 2d datapoints clustered using dbscan without taking their ground truth labels into consideration can be seen in fig. 4. worthy of notice is the close affinity of the two clostridium bacteria, clostridium phytofermentans and clostridium beijerinckii.. as seen in fig. 4., dbscan has managed to differentiate these two species to two different clusters but this is not always the case. in fig. 3 and fig. 5 with trinucleotide frequencies, the reduced dimensions do not distinctly classify the two clostridium bacteria into two different clusters. but the close affinity of the two species on both occasions signals the preservation taxonomical information and relationships during dimensionality reduction when compared to t-sne. in comparison, t-sne’s performance on the same 4-mer data can be found in fig. 7 and fig. 8. the distances between the clusters of the two clostridium bacteria are arbitrary and conveys no information about the two species belonging to the same genus. clostridium phytomenatas is in close affinity to all microcytis, mycobacterium, erythrobacter and rubrobacter as much as it is to clostridium beijerinckii. furthermore, dbscan seems to have identified separate clusters within the two clostridium bacteria as well. pca plots, fig. 9 and fig. 13 as seen to the naked eye and as also suggested by the evaluation metrics, are subpar. well below the metrics for t-sne and autoencoders. note that a new optimized epsilon was calculated for each dbscan clustering, optimizing the algorithm to the data presented by each dimensionality reduction technique. fig. 6. gives the legend for plots colored using the ground truth labels. the autoencoder outperforms both t-sne and pca in all cases with regards to the v-measure. this improvement reflects the relatively lower loss of information compared to the ground truth labels. the formation of string-like shapes in t-sne is another reason for the relatively lesser v-measures of t-sne compared to autoencoders. the breakage of these strings has more than once led dbscan to identify the data points of the same species as few different clusters. silhouette coefficient, on the other hand, is highest in tsne. compared to the autoencoder t-sne’s silhouette coefficient is always slightly higher. this is due to the arbitrary yet clear separation of data points of different species. this arbitrary yet clear separation comes at a greater cost. the arbitrary distancing of the different clusters by tsne has led to a loss in taxonomic relationships among species. on the contrary, autoencoder’s cluster distances are not arbitrary. the relative distances between clusters provide some useful insights into the taxonomic relationships in the real species the data points belong to. 7 k. wijegunarathna#1*, u. maduranga2*, s. weerasinghe3*, i. perera4*, a wickaramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions fig. 2 dimensionality reduction of tetranucleotide frequencies using autoencoder {136, 64, 2, 64, 136}. coloured according to known true labels fig. 3 dimensionality reduction of trinucleotide frequencies using autoencoder {32, 16, 2, 16, 32}. coloured according to known true labels vi. conclusion and future work the results obtained from the research backed by the superior results obtained by autoencoders back the potential of using autoencoders in the field of metagenomics for dimensionality reduction and visualization of metagenomic reads. the systematically optimized dbscan clustering algorithm has always managed to identify a number of clusters that is quite close to the actual number of microorganisms present in the sample. this congruency of the lower dimensions with the information in the higher dimensions is reflected in the improved v-measures. the significantly higher v-measure produced in comparison with the t-sne and pca dimensionality reductions prove autoencoder’s ability to preserve the intrinsic dimensionality of data in the process of dimensionality reduction. silhouette coefficient does not consider whether the points in the same cluster actually belong to a single cluster in the higher dimensions. the autoencoder remains in close contention with t-sne on the silhouette coefficient which is an intrinsic measure of clustering that only considers the visual integrity of the data in the lower dimensions. not only have autoencoders outperformed pca and t-sne on the metrics front but it has also managed to preserve taxonomic data by placing the species of the same genus in fig. 4 dbscan cluster identification with convex hull for tetranucleotide frequencies in fig. 2. black dots show noise points. fig. 5 dbscan cluster identification with convex hull for trinucleotide frequencies in fig. 3. black dots show noise points. relatively closer affinity. taxonomic data preservation ability of autoencoders standout in contrast to t-sne’s shortcoming of arbitrarily separating clusters giving no relevance to species’ relationships. additionally, unlike t-sne’s faltering quality with growing data volume, autoencoders, being a deep learning technique thrives with growing data volumes. these results demand a place for autoencoders in bioinformatics. noisy reads can however plague the ultimate analysability and interpretability of visualizations. noise can result in loss of important insights and false interpretations. the use of denoising autoencoders to denoise large volumes of sequence data is another potential avenue of research. other forms of autoencoders like the denoising autoencoders and variational autoencoders are potential techniques that can be integrated to improve metagenomic analysis and further the use of deep learning in the wider domain of bioinformatics. fig. 6 legend for fig. 2, fig 3, fig 7, fig 8, fig 9 and fig 13. cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders 8 international journal on advances in ict for emerging regions march 2021 fig. 7 dimensionality reduction of tetranucleotide frequencies using t-sne . coloured according to known true labels fig. 8 dimensionality reduction of trinucleotide frequencies using t-sne. coloured according to known true labels fig. 9 dimensionality reduction of tetranucleotide frequencies using pca. coloured according to known true labels fig. 10 dbscan cluster identification with convex hull for tetranucleotide frequencies in fig. 7. black dots show noise points. fig. 11 dbscan cluster identification with convex hull for trinucleotide frequencies in fig. 8. black dots show noise points. fig. 12 dbscan cluster identification with convex hull for tetranucleotide frequencies in fig. 9. black dots show noise points. 9 k. wijegunarathna#1*, u. maduranga2*, s. weerasinghe3*, i. perera4*, a wickaramarachchi5ϯ march 2021 international journal on advances in ict for emerging regions fig. 13 dimensionality reduction of trinucleotide frequencies using pca. coloured according to known true labels references [1] j. handelsman, m. r. rondon, s. f. brady, j. clardy, and r. m. goodman. “molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products”. chem. biol. 5:r245–r249, 1998. [2] b. j. baker, g. w. tyson, r. i. webb, j. flanagan , p. hugenholtz, e. e. allen, j. f. banfield, “lineages of acidophilic archaea revealed by community genomic analysis”. science 2006, 314:1933-1935. [3] c. c. laczny, n. pinel, n. vlassis, and p. wilmes, “alignment-free visualization of metagenomic data by nonlinear dimension reduction,” sci. rep., vol. 4, pp. 1–12, 2014. [4] k. sudarikov, a. tyakht, and d. alexeev, “methods for the metagenomic data visualization and analysis,” curr issues mol biol, vol. 24, pp. 37– 58, 2017. [5] c. ander, o. b. schulz-trieglaff, j. stoye, and a. j. cox, “metabeetl: high-throughput analysis of heterogeneous microbial populations from shotgun dna sequences,” in bmc bioinformatics, vol. 14, p. s2, springer, 2013. [6] s. f. altschul, w. gish, w. miller, e. w. myers, and d. j. lipman, “basic local alignment search tool,” journal of molecular biology, vol. 215, no. 3, pp. 403–410, 1990. [7] m. a. peabody, t. van rossum, r. lo, and f. s. brinkman, “evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities,” bmc bioinformatics, vol. 16, no. 1, p. 362, 2015. [8] m. h. mohammed, t. s. ghosh, n. k. singh, and s. s. mande, “sphinx—an algorithm for taxonomic binning of metagenomic sequences,” bioinformatics, vol. 27, no. 1, pp. 22–30, 2011. [9] j. m. janda and s. l. abbott, “16s rrna gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls,” journal of clinical microbiology, vol. 45, no. 9, pp. 2761–2764, 2007. [10] n. a. bokulich, s. subramanian, j. j. faith, d. gevers, j. i. gordon, r. knight, d. a. mills, and j. g. caporaso, “quality-filtering vastly improves diversity estimates from illumina amplicon sequencing,” nature methods, vol. 10, no. 1, pp. 57–59, 2013. [11] s. wold, k. i. m. esbensen, and p. geladi, “principal component analysis,” vol. 2, pp. 37–52, 1987. [12] c. r. garcía-alonso, l. m. pérez-naranjo, and j. c. fernándezcaballero, “multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms,” ann. oper. res., vol. 219, no. 1, pp. 187–202, 2014. [13] g. f. rall, m. j. schnell, b. m. davis, “tasks, techniques, and tools for genomic data visualization,” comput graph forum., vol. 176, no. 1, pp. 139–148, 2019. [14] l. j. p. van der maaten, e. o. postma, and h. j. van den herik, “dimensionality reduction: a comparative review,” j. mach. learn. res., vol. 10, pp. 1–41, 2009. fig. 14 dbscan cluster identification with convex hull for trinucleotide frequencies in fig. 13. black dots show noise points. [15] m. hauskrecht, r. pelikan, m. valko, j. lyons-weiler, “feature selection and dimensionality reduction in genomics and proteomics,” fundamentals of data mining in genomics and proteomics, springer, pp.149-172, 2006, [16] g. hinton and s. roweis, “stochastic neighbor embedding,” in advances in neural information processing systems, 2003. [17] l. van der maaten, “barnes-hut-sne,” 1st int. conf. learn. represent. iclr 2013 conf. track proc., pp. 1–11, 2013. [18] j. barnes and p. hut., “a hierarchical o(n log n) force-calculation algorithm,” nature, 324(4):446–449, 1986. [19] g. j. dick et al., community-wide analysis of microbial genome sequence signatures. genome biol 10, r85 (2009). [20] t. kohonen: self-organizing maps. volume 0. new york: springerverlag; 1997. [21] a. gisbrecht, b. hammer, b. mokbel, and a. sczyrba, “nonlinear dimensionality reduction for cluster identification in metagenomic samples,” proc. int. conf. inf. vis., pp. 174–179, 2013. [22] c. m. bishop, m. svensn, and c. k. i. williams, “gtm: the generative topographic mapping,” neural computation, vol. 10, pp. 215–234, 1998. [23] d. wang and j. gu, “vasc: dimension reduction and visualization of single-cell rna-seq data by deep variational autoencoder,” genomics, proteomics bioinforma., vol. 16, no. 5, pp. 320–331, 2018. [24] g. eraslan, l. m. simon, m. mircea, n. s. mueller, and f. j. theis, “single-cell rna-seq denoising using a deep count autoencoder,” nat. commun., vol. 10, no. 1, pp. 1–14, 2019. [25] e. pierson and c. yau, “zifa: dimensionality reduction for zeroinflated single-cell gene expression analysis,” genome biol., vol. 16, no. 1, pp. 1–10, 2015. [26] z. wang and y. wang, “extracting a biologically latent space of lung cancer epigenetics with variational autoencoders,” bmc bioinformatics, vol. 20, no. suppl 18, pp. 1–7, 2019. [27] d. e. rumelhart, g. e. hinton, and r. j. williams, “learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986. [28] a. j. holden et al., “reducing the dimensionality of data with neural networks,” science (80-. )., vol. 313, no. july, pp. 504–507, 2006. [29] w. wang, y. huang, y. wang, and l. wang, “generalized autoencoder: a neural network framework for dimensionality reduction deepvision: deep learning for computer vision 2014,” cvpr work., pp. 490–497, 2014. [30] t. abe, h. sugawara, s. kanaya, m. kinouchi, and t. ikemura, “selforganizing map (som) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes,” gene 365, 27– 34 (2006). [31] g. varoquaux, l. buitinck, g. louppe, o. grisel, f. pedregosa, and a. mueller, “scikit-learn,” getmobile mob. comput. commun., vol. 19, no. 1, pp. 29–33, 2015. [32] m. daszykowski and b. walczak, “density-based clustering methods,” compr. chemom., vol. 2, pp. 635–654, 2009. cluster identification in metagenomics – a novel technique of dimensionality reduction through autoencoders 10 international journal on advances in ict for emerging regions march 2021 [33] e. amigó, j. gonzalo, j. artiles, and f. verdejo, “a comparison of extrinsic clustering evaluation metrics based on formal constraints,” inf. retr. boston., vol. 12, no. 4, pp. 461–486, 2009. [34] b. chazelle, “an optimal convex hull algorithm in any fixed dimension,” discrete comput. geom., vol. 10, no. 1, pp. 377–409, 1993. [35] v. satopää, j. albrecht, d. irwin, and b. raghavan, “finding a ‘kneedle’ in a haystack: detecting knee points in system behavior,” proc. int. conf. distrib. comput. syst., pp. 166–171, 2011. [36] a. rosenberg and j. hirschberg, “v-measure: a conditional entropybased external cluster evaluation measure,” emnlp-conll 2007 proc. 2007 jt. conf. empir. methods nat. lang. process. comput. nat. lang. learn., no. june, pp. 410–420, 2007. [37] m. pant, t. radha, and v. p. singh, “particle swarm optimization using gaussian inertia weight,” proc. int. conf. comput. intell. multimed. appl. iccima 2007, vol. 1, pp. 97–102, 2008. 2. {136, 64, 2, 64, 136} ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2021 14(1): february 2021 international journal on advances in ict for emerging regions an affordable, virtual reality based training application for cardiopulmonary resuscitation h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 abstract— in medical science, proficiency in cardiopulmonary resuscitation (cpr) is considered as a vital skill for physicians. for training cpr, medical professionals use mechanical manikin which has some drawbacks when it comes to the realism of the simulation and the feedback of performance. this paper presents a virtual reality (vr) based solution to address some of these shortcomings. the approach here is augmenting the mechanical manikin with vr using htc vive, leap motion controller, and a glove. to test the acceptance of this solution, a user-based evaluation was carried out. 85.7% of the users who have participated in the evaluation have expressed their preference upon using vr in cpr training. even though the overall evaluation depicts a neutral output, this study opens avenues for future research in combining vr into medical training processes. keywords— cardiopulmonary resuscitation training, engineering in medicine and biology, medical simulation, virtual reality, hand occlusion i. introduction ardiopulmonary resuscitation (cpr) is a lifesaving procedure that is considered as an essential skill for medical professionals. it prevents failure and curbing damage that may badly affect the critical organs of the human body due to the low oxygen flow [1]. cpr is conducted by compressing the chest of a patient suffering from cardiac arrest. acquiring proper training in cpr is crucial because an untrained person might harm the patient’s life. cpr should consist of 30 compressions performed at a depth of 5-10 cm underneath the sternum at a constant rhythm with an artificial airway established through intubation [2]. each compression also has to have a constant pressure applied through the duration of the compression. these guidelines have been set by the european resuscitation council (erc). in most countries, cpr training is carried out on mechanical manikins as they offer a good approximation of the tactile and haptic feel to the person who is administering cpr. but these manikins are not capable of giving the cpr practitioner an immersive and dynamic training. additionally, there are more advanced manikins that are not affordable for the government-sponsored health sector of developing countries such as sri lanka. therefore, integrating vr into cpr training presents an interesting path of research. some studies have explored the potential of integrating vr into cpr training [3]–[5]. but these approaches are not suitable as the solutions are too bulky or expensive. one main objective of this study is to implement an economically viable solution while providing the user an experience to accurately train the process of cpr without the aid of a trainer. to accomplish the targets of immersive experience and commercial viability, the proposed solution is implemented with affordable, off-the-shelf hardware that is available commercially. the center of the setup is a mechanical manikin, which provides tactile feedback to the user. htc vive is used as the viewing device and the leap motion controller (lmc) to map the user’s hands into the virtual world. the pressure applied during chest compressions is measured using an arrangement of four (4) load cells [6]. a vr application containing a humanoid avatar lying in the hospital setting is created using unity 3d. the avatar is then mapped with the mechanical manikin in the physical world. when performing chest compressions on the manikin, the user can see as the compressions are performed on the patient avatar through the head mounted device (hmd) [7]. a user-based evaluation was carried out to assess the acceptance level of the proposed approach. two user groups were created according to their prior experience on cpr. both novice users and expert users were given the vr application and the traditional manikin, and their feedback has been collected through a questionnaire. during the implementation process, the researchers have encountered several problems. the occlusion problem of the lmc and the requirement for the users to keep looking at the hands within the field view of the leap motion were two major limitations found. at the posture of the user performing chest compressions, one hand of the user should be kept on the other. the researches have realized that the occluded hand of the user was not visible in the virtual environment at this posture. the reason for this issue was the failure of the lmc to track the occluding hand. a solution for this hand occlusion problem is also presented in this paper. the occlusion problem experienced in this study is considered as type-i inter-object occlusion, as the occluding objects are of the same type [8]. several studies have been addressed the problem of human hand occlusion [9]–[11]. but these proposed hand tracking methods are not acceptable to be integrated into the above-mentioned vr application, as some of the solutions are heavy and some need a large dataset in the case of training a neural network. in this research, a light-weight glove has been introduced to track the occluded hand of the user. finger movements of the occluding hand can also be identified with the proposed glove-based method. the integration of the implemented glove into the existing vr application is done by mapping the glove coordinates with the graphical occluded hand in the vr application. to gauge the acceptance of the solution proposed for the occluding hand, quantitative and qualitative evaluations were conducted. the position accuracy of the glove is measured by moving the glove on a grid and the experience of the users was evaluated through a questionnaire. c correspondence: h. n. kegalle #1 (e-mail: hnk@ucsc.cmb.ac.lk) received: 13-09-2019 revised: 01-02-2021 accepted: 24-02-2021 h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 are from the university of colombo school of computing (hnk@ucsc.cmb.ac.lk, samali.liyanage93@gmail.com, klj@ucsc.cmb.ac.lk, mie@ucsc.cmb.ac.lk). doi: 10.4038/icter.v14i1.7222 © 2021 international journal on advances in ict for emerging regions an affordable, virtual reality based training application for cardiopulmonary resuscitation 2 international journal on advances in ict for emerging regions february 2021 there are two main contributions of this study. the first one is the design and the implementation of a preliminary solution for cpr training using vr. the other contribution is the hand tracking glove introduced as the solution for the hand occlusion problem. the rest of the paper is structured as follows: section ii comprises of an analysis of the related work to this study. the design overview of the proposed solution is described in section iii. section iv will describe the overall setup of the solution and, the experiments and results will be detailed in section v. finally, the conclusions of the study will be detailed in section vi along with the future work. ii. related work vr has been used in the implementation of different simulated training solutions in a variety of vocations [12]. in the field of medicine, vr has been used to train medical personnel in different procedures from simple checkups to complex surgeries [3]. the integration of vr to train cpr is one such instance where immersive [13] and non-immersive [4] solutions have been explored. to simulate haptic and tactile feedback, either a mechanical manikin or an external force feedback machine has been used [4]. the respective studies of semeraro et al. [14] and pramanik and mannivanam [13], have enhanced a mechanical manikin and relied on them for the tactile and haptic component of the setups. further, pramanik and mannivanam have enhanced the manikin by adding a ‘force plate’ to measure the pressure applied during the compression. tian et al. [4] and khanal et al. [5] have used force feedback hardware to remove the reliance on the manikin. the final goal of the vr based cpr training application proposed in this study is also to replace the mechanical manikin. but the devices used, novint falcon and the omega-3 are expensive compared to the manikin and thus the cost of the final setup suggested by this study may increase. the evaluation of semeraro et al. [14], have focused on the user’s experience while they train with the setup. the experts of cpr were the group of participants who have used the system and submitted their feedback through a questionnaire. 84.6% of the participants have had a positive attitude towards using vr in cpr. novices in cpr were not associated with the evaluation and because of that, it was not possible to check whether there is any effect on the results, if the users had no prior experience in cpr. a comparison on the skills acquired using the vr solution against mechanical manikin was conducted by pramanik and mannivanam [13]. in their research, the participants were given a score for their performance, but the scoring system has not been specified in the paper. the results of the comparison have shown that the users trained with the vr solution have acquired a higher score than the users who were trained with the manikin. this result explains the possible achievements of integrating vr into the cpr training application. the studies mentioned above have indicated that there is a possibility to implement a cpr training application using vr. but further explorations are needed for some of the matters, such as the impact on user attitudes depending on their prior experience in cpr. this study focuses to examine those problems while providing a vr solution for cpr training. on reviewing the body of work available for hand occlusion, it can be seen that a diversity of approaches has been taken by the researches to discover a solution. to identify the finger movements of the occluding hand, yamamoto et al. [15] and baldi et al. [16] have come up with a glove implementation. the study of yamamoto et al. has attached color markers on to the fingertips of the glove and baldi et al. have estimated the finger movements using marg sensor data. researches have made several attempts [17], [18] to explore a solution for the occlusion problem using hardware devices. the lighthouse base station of the htc vive system is one such device that emits infrared rays (ir) for object tracking in the existence of occlusion. quinsones et al. [18] have implemented a device called “hive tracker” which communicates with the lighthouse base stations to perform positional tracking. according to quinsones et al., since the implemented tracker is small in size, it can be connected to small surfaces that need to be tracked. hive tracker provides the positional tracking ability, which this study hopes to explore. according to the average error rate specified, that may lead to an inaccurate hand tracking in this proposed study. when it comes to the evaluation process of quinsones et al. [18] have concentrated on tracking the path of the implemented small device. the commercial hand tracker of the vive setup and the hive tracker both were moved along a specific path parallel to the floor and compared their tracking ability. the evaluation fails to gauge the movement along the y-axis to compare the accuracy in a 3d environment. with the evaluation conducted, the hive tracker marks a higher average error of 10mm, compared to the commercial tracker. the above approaches point out the ability to track the occluded objects and also, the channels to be explored such as providing an improved method of 3d positioning in an ir environment. this study aims at exploring the above problem while making it integrable to the vr based cpr training application. iii. design overview to accomplish the aim of enhancing the mechanical manikin with vr, a set of activities has to be addressed in the hardware aspect. a. track the location and orientation of the manikin the mechanical manikin, which represents the patient is the tactile source of this application. to correctly overlay the virtual avatar on the manikin, it is a must to identify the manikin’s location and the orientation within the 3d environment. b. track the user’s hands and map them into the virtual environment the hands of the user, are the main administering tool that helps in performing cpr. therefore, the hands have to be tracked up to the finger level, and then it has to be mapped with the virtual environment. c. provide the user with a view of the synthetic environment user should be able to immerse in the synthetic environment, to interact with the virtual avatar. in addition, 3 h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 february 2021 international journal on advances in ict for emerging regions fig. 1 overall flow of tasks the method uses to provide the synthetic view for the user has to be a method that does not make a disturbance in navigation. d. measure the parameters of chest compressions the final goal of this application is to be used as a training source for the cpr trainees. therefore, the application must be able to detect successful compressions by measuring the parameters such as the compression depth, compression count and the pressure applied. from the software perspective, the study has to address another set of activities. e. create a virtual world a virtual world that preserves the realistic condition has to be created and then, the created illusion can be projected on to the viewing device of the user. f. overlay the avatar onto the manikin when the location and the orientation of the manikin in the 3d environment are acquired, the vr software should be able to find the corresponding coordinates in the virtual world. then mapping the virtual avatar on the identified location has to be performed. g. calculate compression parameters the parameter values mentioned in section iii-d are calculated based on hardware readings. these values have to be converted into a form in which the user can identify the validity of compression by comparing the parameter values. h. provide the user with feedback the user has to be provided with the feedback, based on the calculated parameters in section iii-d. this feedback is crucial for the training application as the validity of the compressions are decided based on it. according to section i, the problem of tracking the occluded hand is an impediment that occurred in the course of the implementation. to provide a solution to that problem, the researches have used the lighthouse base stations of the htc vive system. the following are the activities that have been taken place for tracking the occluded hand of the user. i. obtain the base station information the lighthouse base station of the vive system is a device that emits ir (infrared) rays in a specific pattern. to identify the coordinates of the ir absorbing objects in a 3d environment, information such as the base station center coordinates and rotation matrix of the base station has to be acquired. j. track the positions of the photodiodes the application should be able to track the location of moving photodiodes with the least latency. then the values obtained have to be converted to the coordinates in the 3d environment. k. map virtual hand into the tracked location to improve the realism of the application while providing the user with better training, the user needs to observe correct hand posture through the viewing device. therefore, the occluded hand of the user in the real world has to be mapped with the graphical hand in the virtual world. the flow of these tasks can be seen in fig. 1 along with the relationships between them on the hardware and software aspect. iv. implementation the implementation of this study will be discussed in two stages. the initial part of this section will explain the implementation details on enhancing the mechanical manikin with vr, while the second part of this section will discuss glove implementation which is the proposed solution for the occluded hand. an affordable, virtual reality based training application for cardiopulmonary resuscitation 4 international journal on advances in ict for emerging regions february 2021 to address the design aspects mentioned under section iii, the study has used commercially available hardware such as fig. 2 hand controller attached to the manikin htc vive, leap motion, and arduino board connected to a series of electronic components. besides, free or community editions of software unity3d and blender 7 has been used in the implementation process. the application created using unity3d is hosted on a pc which has the following specifications, to meet the processing needs of the lmc and the htc vive vr devices. • processor: intel ® core i7-7700 • ram: 8.00gb • graphics card: nvidia geforce gtx 1060 6gb • operating system: windows 10 a. creating and viewing the virtual world the virtual environment consists of an avatar lying on a bed in a hospital setting is created with unity3d. then the view is projected to the vive hmd. the human avatar is downloaded from mixamo.org and modified by adding new bones to the skeleton system of the avatar following the requirements. the connection between the unity application and the hmd is created by adding steam vr asset into the unity project. there are two box-like devices called base stations, which help in tracking the location of hmd and two hand controllers. therefore, the location of the user, wearing hmd also can be tracked within the vr environment. the user’s view through hmd is getting changed according to the user’s head movements. b. tracking the location of the manikin and overlaying the avatar to track the location of the manikin, the hand controller provided with htc vive is attached as seen in fig. 2. the position of the hand controller in the real world is converted to the coordinates of the virtual world by a component in the software asset that is used to connect vive hardware with the vr application. the component in the above-mentioned unity asset is then attached to the avatar that represents the patient. the positioning of the avatar concerning the position of the hand controller asset in the application is determined empirically. with this, the avatar is overlaid onto the manikin when the vr application is launched. it was detected that when performing chest compressions, as the manikin’s chest moves there is a tendency for the hand controller to be moved as well. due to this movement of the controller, the avatar also moves and makes misalignment with the manikin. based on the assumption that the manikin is stationary, this problem is solved by tracking the location of the manikin only once at the beginning of the simulation. after 5 seconds the avatar was decoupled from the hand controller. c. tracking the user’s hands and mapping them into the virtual world leap motion controller is a device that uses ir rays to track human hands. in this stage of the study, hand tracking is done using an lmc attached in front of the vive hmd. as discussed in section iv-b, similar to the vive device, lmc also has a unity asset that converts the position of the hands in the real-world to the synthetic world coordinates. during the implementation, it was observed that the lmc fails to track the location of the occluded hand. following the details provided in section ii, one hand of the user should be placed on top of the other, to perform chest compressions. due to this standard posture, failure in tracking the occluded hand is inevitable. a temporary solution for this issue was implemented with the ‘collider’ construct available in unity3d. colliders are similar to the bounding box concept [19] in computer graphics. the colliders can be used to trigger events when entering, exiting, and staying within each other’s bounds. d. handling the failure to track hands due to occlusion under the assumption, the left hand is occluded by the right hand, the following algorithm was used to handle the failure of hand tracking when hands occlude each other. 1: lh = graphical left hand; 2: rh = graphical right hand; 3: lh.c = collider attached to the graphical left hand; 4: rh.c = collider attached to the graphical right hand; 5: copy lh; 6: while application is running do 7: if lh.c and rh.c collide then 8: copy lh copy(lh); 9: attach copy lh to rh; 10: deactivate(lh); 11: else if (lh.c and hr.c do not collide) and (lh is not active) then 12: destroy (copy lh); 13: active(lh); 14: end if 15: end while graphical hands were attached with two separate colliders as mentioned in lines 3 and 4. in the implementation of this algorithm, to test the condition specified in line 7, an event was triggered when the left hand’s collider stays within the right hand’s collider. as the graphical left hand disappears from the virtual environment due to the occlusion, deactivation of the left hand mentioned in line 10 happens automatically. each time the condition in line 7 is met, a copy of the graphical left hand is created and stored in the memory. to eliminate this process of unnecessarily storing copies of the graphical left hand, the copies made are destroyed as described in line 12. 5 h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 february 2021 international journal on advances in ict for emerging regions fig. 3 collider added to the chest area of the avatar e. detecting compressions and measuring compression parameters the correct positioning of the user’s hands on the patient’s chest has to be identified. then the depth of the compression has to be measured, to detect whether the compression provided by the user falls within the standard range. additionally, the compression depth is needed to deform the avatar’s chest to provide the visual experience to the user. a collider, seen in fig.3 is attached to the chest of the patient. when the graphical hands are entered and stays within the collider, it is decided that the user’s hand have placed correctly to begin cpr. fig. 4 calculating compression depth 1) measuring compression depth: a game object titled, “chest level” has been placed in the avatar’s chest area which is taken as the marker against where the chest compressions are measured. at the start of the compression, the user’s hands are placed at this level. y-coordinates of the hand position and the chest level object are compared to calculate the compression depth as seen in fig. 4. the compression depth d(y) is calculated using equation 1, where c(y) is the y-coordinate of the chest level object and h(y) is the y-coordinate of the hand. d (y) = c (y) – h (y) since h (y) < c (y) (1) 2) deforming the chest of the avatar: the bone added to the skeleton of the avatar is moved along the y-axis by the value of d(y) to deform the avatar’s chest. if the coordinates of the new position of the bone are b’(y) and the original position is b(y), equation 2 can be derived. d (y) = b (y) – b0 (y) since b0 (y) < b (y) (2) the new position of the bone according to equation 3, can be derived using equations 1 and 2. fig. 5 calculating compression depth b0 (y) = b (y) – c (y) + h (y) (3) in this section, the y-coordinate referred to the coordinate along the y-axis of the coordinate system in the unity game engine. 3) number of correct compressions: in this study, a compression that reaches the depth between 5cm to 10cm is considered as a correct compression. therefore, when a compression meets the threshold of 5cm and stays below 10cm, the compression count will be increased by one. if the user’s hands are removed from the chest avatar’s chest area, the compression count will be reverting to 0. 4) measuring the pressure applied: to find the pressure applied during the compression, a full-bridge load cell arrangement has been used. the arrangement is placed between the rubber casing and the inner sponge of the manikin, in the chest area where the compressions are expected to be performed. the analog signal generated by the load-cell arrangement has been amplified using an hx711 amplifier and then connected to an arduino uno board. the data is streamed to the vr application hosts in the pc through the serial port. initially, the load-cell arrangement is calibrated using an arduino sketch. then another sketch is uploaded onto the board to measure, convert, and stream the applied pressure. both these sketches were downloaded from the bogde/hx711 [20] git repository. f. providing the user with information and feedback as this is a training application, it should be able to provide information and feedback to the user on their performance of cpr. for that, the measurements and the calculations are done on various parameters of the compressions can be used. an information panel as seen in fig. 5 is attached to a fixed point in the field view of the user. the panel contains an outline of the human upper body, as seen in fig.5, to indicate whether the user has placed the hands at the correct position. the outline remains red when the hands are incorrectly placed on the avatar’s chest and will turn into yellow while the hands in position. when the compression depth lies between 5cm to 10cm, the outline turns green but when it is below 5cm will remain yellow and turn purple when the compression reaches beyond 10cm. the integer in between two lines indicates the number of correct compressions performed in one cycle [21]. an affordable, virtual reality based training application for cardiopulmonary resuscitation 6 international journal on advances in ict for emerging regions february 2021 fig. 6 glove with three photodiode circuits g. tracking the occluded hand using the glove as stated in section iii b, the hands of the user play a huge role in performing chest compressions. for a better visual experience, the user should be able to see a similar virtual view as the real world. therefore, it is very important to display both hands in the original posture. to address this issue, a reliable solution to track the occluding hand was discovered using a glove. fig. 7 circuit diagram for one photodiode circuit 1) identifying the position of the glove: photodiode is a device that absorbs ir and generates an electric current. in an environment where ir rays are flooding, it is possible to find the locations of photodiodes. because the lighthouse base stations of the vive system emits ir, a glove implemented with photodiodes can be used to track the location of the hand. in this study, it was assumed that it is always the right hand that occludes the left. under this assumption, a glove for the left hand, seen in fig.6 has been implemented using a teensy 3.2 usb development board and a set of photodiode circuits. the processing module; the teensy board was attached to an adjustable wrist band. three photodiode circuits were attached to the wrist, thumb, and middle finger of the glove. to capture ir in 360°, where one photodiode detects ir in a range of 120°, the circuit is built with three photodiodes connected in the shape of a tetrahedron. according to the circuit diagram in fig.7 the generated signal is amplified and then connected to the teensy development board. to accomplish the target of making the circuit small in size, the circuit was designed as a printed circuit board (pcb) of size 0.8cm x 0.8cm with surface mount devices (smd). the process of streaming data from the teensy board to the pc was done through a usb port. the base station information mentioned in section iii– i was collected using a teensy sketch. to acquire the x, y, z coordinates of the photodiode circuit, another sketch was uploaded onto the teensy board. the codes were downloaded from the git repository, ashtuchkin/vive-diy-position-sensor [22]. 2) mapping the occluded hand into the virtual world: a 3d hand model of the left hand has been downloaded and added into the vr application. the x, y, z coordinates collected from each sensor, which were attached to the wrist, thumb, and middle finger have been mapped with the respective parts of the 3d hand model. the positioning of the hand model in relation to the position of the sensor is determined empirically. the necessity of an additional hand tracking method occurs with the matter of two hands occlude each other. therefore, according to the proposed solution, the 3d hand model should be inserted into the vr application only at the occurrence of occlusion. to handle this insertion of the 3d hand model, the following algorithm has been used. 1: lh=virtual left hand; 2: rh=virtual right hand; 3: lh.temp=new temporary virtual left hand; 4: while application is running do 5: if lh not visible then 6: make lh.temp visible; 7: end if 8: if lh visible then 9: make lh.temp not visible; 10: end if 11: end while according to lines 5 and 6, when the graphical left hand created with lmc is not visible in the vr environment, the added hand model is set active. otherwise, the 3d hand model is set inactive. this solution can be extended to handle the occlusion of both hands by creating another glove which suits for the right hand. fig. 8 connectivity of software and hardware components 7 h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 february 2021 international journal on advances in ict for emerging regions h. component connectivity a collection of hardware devices was used in the implementation of the vr application. namely, htc vive (hmd, hand controller, base stations), leap motion controller, full-bridge load cells, arduino uno board, mechanical manikin, teensy development board, photodiodes. all the devices are connected to a pc that contains the software drivers. the relationships among the components are given in fig. 8. v. evaluation and results a. acceptance of the vr application a user-based evaluation was conducted to determine the acceptability of the proposed vr application from the perspective of potential users. two user groups were selected for the evaluation based on their experience in performing cpr. one group consists of assistant nurses from the sri lanka navy and the sri lanka fire brigade who were proficient in cpr. the other group was novices, students, and research staff from the university of colombo school of computing (ucsc). the number of participants in expert and novice groups is 23 and 51 respectively. the dissimilarity of these group size is due to the difficulty of finding cpr trained participants. in the expert group, none of the participants had experience using vr. participants of the novice group were aware of the technology, but they had little or no experience in using vr. therefore, it can be said that both groups had a similar experience in the case of using vr. to remove the biases that may arise due to the exposure of one method before exposing to the other, the novice group was again divided into two subgroups (n1 and n2). one group was given the vr solution before the manikin, while the other group was given with manikin before using the vr solution. but the expert group was not divided into two subgroups, because all the participant in that group already have the experience on training cpr with the mechanical manikin. therefore, the expert group was directly exposed to the vr solution in order to collect responses. after carrying out cpr training with both methods for the two subgroups n1 and n2 of the novice users, feedback was collected with a survey of likert scale questions. the points of the likert scale arrangement can be seen in table i. table i likert scale arrangement 1 strongly disagree 2 disagree 3 somewhat disagree 4 neither agree or disagree 5 somewhat agree 6 agree 7 strongly agree the 12 questions given for the users to collect feedback is given in table ii. table iii questions used for the evaluation of vr application questions q1 it was difficult to wear the headset q2 it has been difficult to perform chest compressions using the modified manikin q3 it has been difficult to navigate in the virtual space q4 the information panel provided useful information q5 i felt the patient was really in front of me q6 i felt my hands were aligned with the virtual ones q7 the visualization of the chest compressions felt real q8 i could feel i was immersed in a 3d space q9 it was difficult to reach out and touch the patient q10 the interaction with the patient felt realistic q11 the traditional manikin can be improved using virtual reality q12 i would like to use virtual reality in my cpr training fig. 9 results of the user-based evaluation for novice user subgroups comparison of group n1 and n2 is used to detect whether there is any bias in using the vr application for cpr before the mechanical manikin and vice versa. according to figure 9, it can be seen that there are no large distinctions between two groups on each of the statement, while slight dissimilarities can be seen regarding q1, q2 and q12. a point to be discussed from the analysis is the spread of opinion for q1; “it was difficult to wear the headset”. spread of the median values obtained for n1 (=2) and n2 (=3) might be because some of the novice users found it difficult to put the hmd and adjust it for comfortable use. the discrepancy in the opinion for q2 and q12 can be explained with a few extreme cases of responses compared to the general response in the ideas of the difficulty in performing chest compression with the modified manikin and the preference of using vr in cpr training. thus, it can be assumed that the two user groups (n1 and n2) can be combined and create a homogeneous group since the opinions on most of the statements are generally in the same direction. it can be also stated that the order of the method being exposed generates a little to no bias based on the observations made. an affordable, virtual reality based training application for cardiopulmonary resuscitation 8 international journal on advances in ict for emerging regions february 2021 fig. 10 results of the main user-based evaluation for novice and expert user groups the comparison between novice (n) and expert (e) user groups are presented in figure 10. according to the information, a wide disagreement between the user groups can be seen in the opinion for the statement in q9 which depicts the difficulty of reaching and touching the patient. the difference in the responses of two user groups, novices (median=2) disagreed while the experts agreed (median=5) upon q9 might be due to the difficulty of expert users to keep hands in the field view of lmc. since the novice users were technically proficient compared to the experts, novice group had not experienced the trouble of managing hand postures to reach out the patient. the difference in the spread of opinion between n (mode=6) and e (mode=3), for the statement in q4 is because of the experience gap between two user groups. most of the expert users claimed that they would prefer more advanced details to be displayed on the information panel while the novices, who had no experience in cpr, wish to have any information that will be useful for the training. when considering the overall feedback, there are slight differences of the responses received upon several expressions when comparing the novice users and the experts. but still, the responses obtained for q11 and q12 illustrates the participants of both groups have agreed on using vr for cpr training with the improved manikin. the proficiency gained by an individual who is inexperienced in performing chest compressions cannot be measured by conducting an evaluation once, given that the expertise level improves gradually with the time. therefore, it is proposed to conduct an evaluation with several iterations with the same set of users. however, the proposed approach of integrating vr into cpr as an affordable training application extends new research paths. b. acceptance of the glove-based solution to examine the acceptance of the solution provided for the hand occlusion problem, quantitative evaluation, and qualitative evaluation was conducted. the objective of the quantitative evaluation is to analyze the position accuracy of the photodiode circuits. a user-based evaluation was carried out to fulfill the goal of determining the attitudes of users towards the proposed solution. the accuracy of position tracking was inspected using a cartesian grid of size 0.4m x 0.4m with grid lined spaced by 10cm. the implemented circuit was placed on different grid points and their 3d coordinates were collected. according to the results, the average position errors are as follows: • average position error in x direction = 0.01409m • average position error in y direction = 0.00402m • average position error in z direction = 0.00431m considering the low error rates for the positions of x, y, z directions, it can be said that the majority of the position data given by the system is accurate. but essentially, for the circuits to be tracked, photodiodes must be exposed to the base stations. the evaluation carried out to explore the user attitudes towards the glove-based solution was conducted with the participation of 20 students from the university of colombo school of computing. the participants were divided into two groups (g1 and g2) to eliminate any bias that may occur due to the order of the method that exposes. the users in group one was submitted through the vr solution of detecting the hands only using the lmc and next given the glove-based approach. for the users in group two, the glove-based solution was given before the method of lmc. after experiencing both solutions, the participants were given a survey containing 9 likert scale questions to collect the feedback. ordering of the likert scale is similar as seen in table i. the questions provided in the questionnaire can be found in table iii. table iii questions used for the evaluation of the glove-based solution questions q1 it was difficult to wear the glove with the circuits q2 it has been difficult to perform chest compressions with the glove q3 i felt my hands were aligned with the virtual ones q4 i experienced latency in hand movements when com pressing the chest of the patient q5 it was difficult to reach and touch the patient q6 i felt that the occluding hand gets disappear when crossing one another q7 the overall application was up to the required realism level q8 the glove has improved the realistic level of the application q9 i would like to use hand detection glove in the vr based cpr application. fig. 11 results of the user-based evaluation for glove acceptability the comparison between g1 and g2 can be seen in figure 11. the most noticeable difference can be observed in the opinions gained for the statement q6. the spread of opinion for q6 among g1 (iqr=1) and g2 (iqr=2.75) can be explained by some of the users marking in 9 h. n. kegalle#1, h. s. u. liyanage2, k. l. jayaratne3, m. i. e. wickramasinghe4 february 2021 international journal on advances in ict for emerging regions agreeing that the occluding hand gets disappears while the majority disagree on the statement. when considering the mode values obtained for the groups g1 (= 1) and g2 (= 2), both are aligned with the direction of agreeing. the opinion for q8 as the spread among g1 (iqr=1.75) is larger compared to g2 (iqr=0.75), as some participants have marked disagreeing with the statement “the glove improved the realistic level of the application”. but both groups can be observed with similar values for mode (= 6) and median (= 6). additionally, there are some other distinctions of the opinions of two user groups according to the graph in figure 11. considering the overall results, responses given for q8 and q9 are leaned towards the agreed end which infers that the users have agreed to the idea of incorporating a hand detection glove for the vr based cpr training application to improve the realistic level. for the comparison of g1 and g2, fisher’s exact test has been used as it compares proportions and thus gives a comparison of the distribution of answers among the groups. in the test, the null hypothesis (h0) for each statement (q1q9) was that both groups make no significant difference in the agreement for that statement. and the alternative hypothesis (h1) was that there is a significant difference in the agreement for the given statement, while the statistical significance is 5%. as seen in table iv, in each statement, p-value is greater than 0.05. thus, we cannot reject the null hypothesis which explains that there is no significant difference when a group either uses the only lmc for hand tracking first or the glovebased approach first in the evaluation. therefore, it can be state that the order of hand tracking methodology exposed by the users do not make any vital impact on the evaluation. table iv p-values for fisher exact test run on statements among user groups g1 and g2 statement p-value q1 0.8483 q2 0.7731 q3 0.6356 q4 0.5383 q5 0.6843 q6 0.0604 q7 0.1 q8 0.1125 q9 0.1125 vi. conclusion and future work the two main contributions of this study have been the development of a vr application as an affordable solution for cpr training and the implementation of a wearable glove to track the occluding hand while performing chest compressions. carrying out implementation using off-theshelf hardware has been considered as a way of cost reduction [23], being technology becomes widely available. the opportunity of using free development software or at least the software that has a free community version cut down the licensing fees that can be also contributed to the costs. because of this utilization of customized, cost effective setup, the proposed application is affordable to the medical sector of developing countries such as sri lanka. the vr application proposed in this paper has been evaluated with two groups of users categorized by the level of proficiency in cpr. both groups have responded favorably towards integrating vr into cpr training. the attitude of expert users expresses that they see a benefit of using vr for cpr training over the use of mechanical manikin. the positive feedback of the novice users inferred that even the users with no prior experience can use this application to gain cpr training. the position accuracy of the glove has been assessed by comparing the 3d coordinates streamed out from the circuits with the real-world coordinates. the average position error in x, y, z directions has concluded that the coordinates given by the system are accurate. to figure out the user acceptability of the glove, feedback of the users has been collected on the experience of using the suggested occluded hand tracking method. the majority of the participants have admitted that integrating hand detection glove, increases the realistic level of the application. to improve the realistic experience and the ability of acquiring accurate training, while making the training application more affordable, refinements must be done on the current version of the development. for future work, replacing the mechanical manikin will be the main task, as it removes the reliance on the manikin in producing haptic and tactile feedback. manipulating the avatar to present signs of recovery is another task to be conducted. a suitable solution for the problem of latency in vertical hand movements also must be explored. possibly by improving the sensor (photodiode) accuracy or some other method that can handle the latency of hand movements while not disturbing the user’s experience of the simulation. a longitudinal study must be conducted in order to evaluate the output of the entire training application. the improvements gained by the trainers in the process of performing chest compressions has to be measured over a time period since acquiring the proficiency in performing cpr is an incremental process. the procedures such as endotracheal intubation and defibrillation can also be integrated into the solution. intubation and defibrillation are also skills that can be added as these are essential for target users, which is emergency medical personnel. references [1] c. a. bon, “cardiopulmonary resuscitation (cpr).” [online]. available: https://emedicine.medscape.com/article/1344081-overview [2] j. soar, j. p. nolan, b. w.bottiger, k. sunde, and c. d.deakin, “european resuscitation council guidelines for resuscitation 2015.” [online]. available: https://www.resuscitationjournal.com/article/s0300-9572(15)00328-7 [3] a. kotranza and b. lok, “virtual human + tangible interface = mixed reality human an initial exploration with a virtual breast exam patient,” ieee virtual reality conference, pp. 99–106, 2008. [4] y.tian, s. raghuraman, y. yand, x. guo, and b.prabhakaran, “3d immersive cardiopulmonary resuscitation (cpr) trainer,” proceedings of the 22nd acm international conference on multimedia, p. 749–750, november 2014. [5] p. khanal, a. vankipuram, a. ashby, m. vankipuram, a. gupta, d. drumm-gurnee, k. josey, l. tinker, and m. smith, “collaborative virtual reality based advanced cardiac life support training simulator using virtual reality principles,” journal of biomedical informatics, vol. 51, pp. 49 – 59, 2014. [online]. available: http://www.sciencedirect.com/science/article/pii/s1532046414000902 an affordable, virtual reality based training application for cardiopulmonary resuscitation 10 international journal on advances in ict for emerging regions february 2021 [6] s. u. liyanage, l. jayaratne, m. wickramasinghe and a. munasinghe, "towards an affordable virtual reality solution for cardiopulmonary resuscitation training," 2019 ieee conference on virtual reality and 3d user interfaces (vr), osaka, japan, 2019, pp. 1054-1055, doi: 10.1109/vr.2019.8798159. [7] s. liyanage. (2018, july 25). an affordable solution for cardiopulmonary resuscitation training using virtual reality [video]. youtube. https://www.youtube.com/watch?v=hyy8azznbzk [8] d. t. nguyen, w. li, and p. o. ogunbona, “inter-occlusion reasoning for human detection based on variational mean field,” neurocomputing, vol. 110, pp. 51 – 61, 2013. [online]. available: http://www.sciencedirect.com/science/article/pii/s0925231213000039 [9] t. l. baldi, s. scheggi, l. meli, m. mohammadi, and d. prattichizzo, “gesto: a glove for enhanced sensing and touching based on inertial and magnetic sensors for hand tracking and cutaneous feedback,” ieee transactions on human-machine systems, vol. 47, no. 6, pp. 1066–1076, 2017. [10] a. clark and d. moodley, “a system for a hand gesture-manipulated virtual reality environment,” in proceedings of the annual conference of the south african institute of computer scientists and information technologists, ser. saicsit ’16. new york, ny, usa: association for computing machinery, 2016. [online]. available: https://doi.org/10.1145/2987491.2987511 [11] f. mueller, d. mehta, o. sotnychenko, s. sridhar, d. casas, and c. theobalt, “real-time hand tracking under occlusion from an egocentric rgb-d sensor,” corr, vol. abs/1704.02201, 2017. [online]. available: http://arxiv.org/abs/1704.02201 [12] n. gavish, t. gutierrez,´ s. webel, j. rodr´ıguez, m. peveri, u. bockholt, and f. tecchia, “evaluating virtual reality and augmented reality training for industrial maintenance and assembly tasks,” interactive learning environments, vol. 23, no. 6, pp. 778–798, 2015. [online]. available: https://doi.org/10.1080/10494820.2013.815221 [13] s. pramanik and m. mannivanan, “immersive virtual reality based cpr training system,” pp. 463–464, 2015. [14] f. semararo, a. frisoli, m. bergamasco, and e. l. cerchiari, ”virtual reality enhanced mannequin (vrem) that is well received by resuscitation experts,” resuscitation, vol. 80, no. 4, pp. 489-492, 2009. [online]. available: http://www.sciencedirect.com/science/article/pii/s0300957209000136 [15] s. yamamoto, k. funahashi, and y. iwahori, “a study for vision based data glove considering hidden fingertip with self-occlusion,” in 2012 13th acis international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, aug 2012, pp. 315–320. [16] t. l. baldi, s. scheggi, l. meli, m. mohammadi, and d. prattichizzo, “gesto: a glove for enhanced sensing and touching based on inertial and magnetic sensors for hand tracking and cutaneous feedback,” ieee transactions on human-machine systems, vol. 47, no. 6, pp. 1066–1076, dec 2017. [17] y. yang, d. weng, d. li, and h. xun, “an improved method of pose estimation for lighthouse base station extension,” vol. 17, p. 2411, 10 2017. [18] d. r. quinones,˜ g. lopes, d. kim, c. honnet, d. moratal, and a. kampff, “hive tracker: a tiny, low-cost, and scalable device for submillimetric 3d positioning,” in proceedings of the 9th augmented human international conference, ser. ah ’18. new york, ny, usa: acm, 2018, pp. 9:1–9:8. [online]. available: http://doi.acm.org/10.1145/3174910.3174935 [19] m. s. ibrahim, a. a. badr, m. r. abdallah, and i. f. eissa, “bounding box object localization based on image superpixelization,” procedia computer science, vol. 13, pp. 108-119, 2012, proceedings of the international neural network society winter conference (inns-wc2012). [online]. available: http://www.sciencedirect.com/science/article/pii/s1877050912007260 [20] “an arduino library to interface the avia semiconductor hx711 24-bit analog-to-digital converter (adc) for weight scales,” 2017. [online]. available: https://github.com/bogde/hx711 [21] s. liyanage. (2019, march 17). an affordable virtual reality solution for cardiopulmonary resuscitation training [video]. youtube. https://www.youtube.com/watch?v=t0r7cdrlpy4 [22] “vive-diy-position-sensor,” 2017. [online]. available: https://github.com/ashtuchkin/vive-diy-position-sensor [23] “htc vive reduces price by $200, making the best virtual reality system more accessible to the mass market,,” 2017. [online]. available: https://www.vive.com/us/newsroom/2017-08-21/ ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2011 04 (02) :39 49 et guide assisting the etransformation journey: a case study approach ana hol, and athula ginige  abstract — etransformation, the ability to select and implement appropriate information and communication technologies (ict) based on company’s goals, objectives and operations is essential for the businesses today. this paper identifies how an online system etranformation guide (et guide) allows companies especially small to medium enterprises (smes) to track, measure and guide their etransformation journey. in particular this paper outlines how three companies from different industry sectors namely: manufacturing – staircase design, service – finance and accounting services and tourism and hospitality – boat cruise have travelled the etransformation journey over the period of last year. furthermore, the study highlights that et guide recommendations are important indicators to the companies in terms of identifying into which dimension they should be investing into next and what changes they should be making for the future benefits of their etransformation journey. moreover, the findings also benefit the research community that is investigating how best companies can undertake the etransformation journey. keywords — etransformation, et guide, ict implementation, ict selection, smes i. introduction oday, we realise that to survive in a globally competitive environment businesses are required to identify needs of the customers and select appropriate tools and technologies to help them remain competitive. the requirement of businesses to adapt to the changing needs is not only the effect of the changes information era brought but evolutionary effect which has been evident for centuries. for example, in order to survive in the agricultural era it was essential to make the appropriate use of the field, structure working times during the days and distribute jobs so that they can be completed throughout the year. industrial era brought changes with the introductions of the machines, electricity and conveyer belt. it has also changed the field working patterns. today, we see the requirement for change too but in a different way. it is now becoming more and manuscript received december 08, 2010. recommended by prof. maria lee on november 20, 2011. ana hol is with aeims research group, school of computing and mathematics university of western sydney, sydney, australia. (e-mail: ana hol is with aeims research group, school of computing and mathematics university of western sydney, sydney, australia. (e-mail: a.hol@uws.edu.au) athula ginige also with aeims research group, school of computing and mathematics university of western sydney, sydney, australia..(e-mail: a.ginige@uws.edu.au) more apparent that successful businesses require implementations of appropriate information and communication technologies (ict) which in turn may require changes to business operations and even sometimes business strategies to assure ict can bring appropriate benefits. throughout the research that has been conducted up to now we observe the process of change through which companies make modifications to their business processes, operations and structures so that they can implement appropriate technologies. this process has been named etransformation [1]. therefore, to survive in the continuously changing environment businesses, particularly small to medium (smes) organisations are required to actively respond to the changes surrounding them, which in turn may require changes and selections and implementations of appropriate ict. the selection and implementation of ict, however is not straight forward as it depends on a number of factors. previous studies indicate that there are a number of models of organisational change and etransformation, however that none of the available models explains the process in full [2]. the model that only explains it tools and systems and the process through which a company makes staged improvements within this dimension is depicted within the etransformation road map [3]. fig. 1. etransformation road map [3] new processes new processes e-commerce site e-commerce site interactive site interactive site basic website basic website effective organisation effective organisation effective team effective team effective individual effective individual p r o c e s s s o p h is t ic a t io n e x t e r n a l p r o c e s s e s i n t e r n a l p r o c e s s e s external processesexternal processes internal processesinternal processes convergenceconvergence stage 1 stage 2 stage 3 stage 4 etransformation road map t 40 ana hol and athula ginige international journal on advances in ict for emerging regions 04 december 2011 the road map indicates that selection and implementation of required it tools and systems is a staged process. the road map however, does not explain the effects itc adoption, selection and implementation have on the other areas within the organisation. previous studies also indicate that etransformation is a staged, multidimensional process [4-14]. however, dimension seen as crucial for etransformation is not only the dimension of it tools and systems but also dimensions of strategy, structure and tasks and processes [2].therefore, to implement appropriate technology it is essential to first identify company’s strategy – goals, objectives and vision. the understanding of what the company would like to achieve and what its plans are for the future, are likely indicators of what types of the technologies would be the most optimal for the company’s successful operations. next, it is essential to identify the type of organisational structure that would fit the organisation and its operations the best. this should take into the account decision making processes, departments, flexibility, company location and extent to which bureaucracy may play a role in company’s leadership. after strategy and structure have been identified it is crucial to define the most optimal tasks and processes that will allow the company to achieve its objectives. in addition, table i et dimensions and categories [2] category stage 1 stage 2 stage 3 stage 4 s tr a te g y the environment 1.1 smes competitors awareness 1.2 competitors – products & services 1.3 matching competition 1.4 be better then competition plans & visions 2.1 meet essential deadlines 2.2 meet all deadlines 2.3 create improvements 2.4 vision for the future customers 3.1 smes customers awareness 3.2 customer requirements 3.3 smes marketing 3.4 smes learn from systems products & services 4.1 standards & certifications 4.2 marketing strategies 4.3 support & guarantee for customers 4.4 new improved products & services employees 5.1 knowledge requirements 5.2 education and new ideas 5.3 employees & future 5.4 innovation goals 6.1 smes goals 6.2 reality vs goals 6.3 strategy vs goals 6.4 skills & resources vs goals s tr u c tu r e centralisation / decentralisation 1.1 decision – ceo 1.2 decision managing director 1.3 decision some employees 1.4 decisionwhole organisation functions / divisions 2.1 operations fixed 2.2 diversification present 2.3 operations can be changed if needed 2.4 smes adaptable to new circumstances formalisation 3.1 business functions / operations 3.2 smes focus 3.3 global needs 3.4 new ideas, innovation and future t a sk s a n d p r o c e ss e s nature of tasks 1.1streamlining 1.2 automation 1.3 creation of new tasks 1.4 reassessment of existing tasks – fit from tasks to processes 2.1 removal of repetitive tasks 2.2 improvement in operations 2.3 improvement in profitability 2.4 reassessment of existing processes – fit task & process streamlining 3.1 avoid change 3.2 change when essential 3.3 change for benefits 3.4 change for innovation task & process integrations 4.1 activities are independent 4.2 activities are grouped – tasks 4.3 activities form processes 4.4activities across processes are integrated it t o o ls a n d s y st e m s it tools 1.1 stand alone tools 1.2 networked – sections, whole integration not present 1.3 networked-partially integrated 1.4 enterprise wide network – fully integrated tool users 2.1 few employees 2.2 all departments not all employees 2.3 all employees 2.4 all employees & some stakeholders internet 3.1 searching 3.2 customer contacts 3.3 advertising 3.4 external & business contacts website 4.1 static 4.2 interactive 4.3 ecommerce 4.4 convergence it support 5.1 limited support (internally or externally) 5.2 some support – usually ongoing 5.3 basic it department 5.4it department fully operational it systems 6.1 office management – file management 6.2 operational systems – crm, tps 6.3 kms, dss – emerging 6.4 erp, dss, ess security 7.1 antivirus and antispyware software 7.2 user access rights, authorisation and authentication, proxies and firewalls 7.3 network traffic encryption (ssl, tls) 7.4 system monitoring (intrusion detection, full system disaster recovery plan) ana hol and athula ginige 41 december 2011 international journal on advances in ict for emerging regions 04 when identifying tasks and processes it is essential to review if some processes can be automated or streamlined to assure optimal operations. only after these three dimensions have been identified the company is ready to review its information technology (it) tools and systems (detailed explanation of each dimension is presented in table 1.). therefore, as depicted in figure 2 a company should always review dimensions in a set order as only this way it is possible to assure a set strategy will be met and that the associated structure, tasks and processes have been carefully outlined so that the technology implementation can bring positive outcomes. for the companies to be able to do the comprehensive analysis of the four dimensions namely: strategy, structure, tasks and processes and it tools and systems they are usually required to invest into consulting services which are extremely expensive and hardly affordable by small organisations as they can rarely afford to take time off their day to day duties. therefore, to allow the companies to gain the benefits of the comprehensive study requirements and to be able to screen the dimensions (strategy, structure, tasks and processes and it tools and systems) themselves as well as carry out the etransformation analysis aeims (advance enterprise information management systems) research group of the university of western sydney has developed an online etransformation (et guide) that companies particularly small to medium enterprises (smes) can use to track, measure and guide their etransformation journey [13]. the et guide aims to enable etransorming businesses and to allow them to get a better understanding of their organisations. through series of questions an online system is able to identify current business strategy and suggest required changes to it as well as it is able to assess business structure, tasks and processes as well as give recommendations for the future developments. in addition, the power of the system is in its functionality and time stamps. it is possible for the organisation to use the system over a period of time, reflect to its current dimensional stages and identify future progress. it is also possible to do longitudinal analysis with the data gathered so that managers can get a better perspective of the organisational growth. furthermore, managers and their associates can use the system at the times of their convenience. the system can be accessed from anywhere with the internet connection and primary user can allocate unlimited number of additional users who can also then monitor companies progress. furthermore, all question answers for the dimensions are recorded in real time, so users can always go back to their previous answers and are at any time allowed to continue previously commenced surveys. taking into the account that etransformation is hugely important for the businesses today this paper aims to demonstrate how three smes approached etransformation and used et guide to make decisions about their future business changes with the aim to reach their set goals and remain competitive in the turbulent markets of the information era. . fig. 2. iterative model of etransformation [2] to conduct the study and demonstrate the use and applicability of the et guide we have selected three case study companies, one from manufacturing sector staircase design, one from service sector finance and accounting services and one from tourism and hospitality sector boat cruise company. all three companies are smes (small to medium enterprises) from metropolitan sydney region, australia. to assure company’s data stays confidential company names have been changed. initially the three companies have been interviewed to get the holistic understanding of what their operations were and to what extent they use technology. data from the interviews is summarised below. a. staircase design staircase design is a manufacturing sector company. it has 54 employees and has been in business for 22 years. its main operations focus on designing staircases for private houses or occasionally small businesses. their primary clients are from the surrounding suburbs usually recommended by builders, carpenters or other tradesman. main decisions within the company are made by the ceo in a consultation with the other two senior staff members. there are five employees in total who work in the office mostly conducting admin tasks, marketing, finance and stock count. these employees are also in a contact with the clients. measurements, material picking and selections for the customers are done by the two staff who also work in the office and four other staff members who occasionally assist the shop floor depending on how busy the company is. the rest of the employees mostly work at the shop floor however it tools & systems strategy structure tasks & processes c h a n g e a c c e p ta n c e 42 ana hol and athula ginige international journal on advances in ict for emerging regions 04 december 2011 in some instances they may be required to fill in other jobs such as deliveries. the company has a computer at the shop floor and this computer keeps track of employee working hours as well as required material for the day. other four computers are in the office. all computers are networked and data is mostly transferable from one to the other. the computer with the specialised staircase software however is the one that is only accessible by the ceo who is the main designer of the products. finalised drawings are accessible as pdf files through shared networks. the company uses portable storage media for data storage and backup. the company website is up to date, interactive with inquiry form and catalogues. to complete its daily tasks the company uses e-mail, fax and phone for basic communication with customers which includes inquiries, requirements gathering, quotes and business agreements. the company uses myob application for financial record keeping; outlook for customer data keeping and communication; pdf files user for records keeping; excel spreadsheets for production and order tracking and specialised staircase package for drawings and quoting. fig. 3. ict development for staircase design to get an understanding of the ict development of the companies we used the etransformation road map to depict the stage of the organisational it tools and systems development. from figure 3 it can be seen that our initial investigation shows that the company has an interactive site signified with online catalogues and that internal systems lay somewhere between effective individuals and effective teams as only a certain group of the employees share data and they usually do it by exchanging data from the portable storage device. internal systems within the company are networked which includes computers and printers. in addition, the computer located within the shop floor is also accessible via office computers. b. finance and accounting services finance and accounting services is a service based family run business with three employees – father, mother and their son. the company has been in business for seven years. decisions are equally made by all three employees. the company systems are networked in employees’ homes (two homes – mother and father’s home and son’s) as well as the office which is located near parent’s home. the company has an interactive website listing sample products and featuring an inquiry form. it has been designed by father, son and a family friend. currently, son is looking after the site and is keeping it up to date. to communicate with its customers the company uses e-mail, fax and phone. initial inquiries are usually done via phone or e-mail. bills and statements are usually exchanged via fax or occasionally via e-mail. data collection and finance approvals are only done trough face to face interviews which may happen in the office or at the client’s location. for the approvals to be granted staff members are required to fill out forms and submit required data for each client. data for each approving institution needs to be in a slightly different format which makes it difficult to standardise internal company documents. files for the customers are kept as pdfs and are forwarded to the approving organisations as required. the company uses myob application for financial record keeping; outlook for customer data keeping; specialised online bank forms for data collection and finance approvals and pdf files for past records storage. fig. 4. ict development for finance and accounting services based on the data within figure 4 it can be seen that the company has an interactive website with forms. it is also seen that the company has effective team as its all locations are networked and data is easily accessible. electronic exchange of documents is very apparent internally within the company however not outside. when communicating with the approving organisations the company is required first to gather all data required for the customers and then fill out etransformation road map new processes e-commerce site interactive site basic website effective organisation effective team effective individual p r o c e s s s o p h is t ic a t io n e x t e r n a l p r o c e s s e s i n t e r n a l p r o c e s s e s external processes internal processes convergence etransformation road map new processes e-commerce site interactive site basic website effective organisation effective team effective individual p r o c e s s s o p h is ti c a ti o n e x te r n a l p r o c e s s e s i n te r n a l p r o c e s s e s external processes internal processes convergence ana hol and athula ginige 43 december 2011 international journal on advances in ict for emerging regions 04 forms needed by the financial institutions. there is no exchange of data between institutions. c. cruise company the cruise company is in tourism and hospitality sector. the company has 20 employees and has been in business for 34 years. primary decisions are made by family owners – mother, father and two children – who are all full time employees. there are six employees in the office and the rest work in the field or on the boat to assure each cruise runs smoothly. it is also essential to always have employees at the docks when boat is to arrive as well as to assure boats are cleaned, fused and equipped with safety equipment as well as the equipment required for the particular occasion before each trip. the company uses e-mail, fax and phone for communication with customers and contractors which includes caterers and entertainers. moreover, the company uses myob application for financial record keeping, outlook for customer data keeping; excel spreadsheets for rostering and events planning and word for typing documents. in addition, the company has three networked computers out of which one is used for payments processing and is mostly kept offline as owners feel that this way their confidential data will be more secure. this data is also kept on the owner’s flash drive. no other backup procedures are in place. only primary decision makers are given access to electronic data. all other employees receive printed sheets each morning, which gives them an overview of the tasks that need to be completed and the tasks they are responsible for. electronic data is never shared. contractors may receive arrangements e-mails to arrange the meetings and finalise bills however their schedules and activity lists are always done on paper. in addition, the company advertises its services via a simple brochure ware website and pamphlets that are given to the local accommodation places (motels and hotels) in the area. fig. 5. ict development for the cruise company based on the data gathered it can be seen from the figure above, that cruise company has a very basic brochure ware website and that it used it tools only to compete individual tasks. even the basic networking is not continuous as it is constantly interrupted due to false security perception. electronic documents are never shared and only two copies of them are kept. all day to day activities are completed with paper sheets. following review of the interview data it was seen essential that the companies such as the three described above etransform. this however was not easy as companies are often reluctant to change and are unwilling to experiment while conducting their main business operations. therefore, researchers identified that in order for these companies to review their current operations they will give them access to the etransformation guide and will also follow up their status with the interview. ii. use and application of the etguide once the three above mentioned case study companies have been given the access to the et guide company’s ceo or a general manager had the online access to the system which they could access from any location with the internet access with a dedicated user name and password and use the system to track measure and guide their etransformation journey. the system, et guide (see figures 7 and 8), is composed of series of questions. the questions relate to the four dimensions namely: strategy, structure, tasks and processes and it tools and systems as well as their categories shown in figure 6. fig. 6. etransformation dimensions and their categories therefore, as each dimension is composed of a set of categories a company is required to answer questions for each category within each dimension (for example there are seven categories within it tools and systems – fig. 6 and table 1). moreover, it is worth noting that etransformation is seen as an incremental staged journey [14] and therefore questions for each category within a dimension are split etransformation road map new processes e-commerce site interactive site basic website effective organisation effective team effective individual p r o c e s s s o p h is t ic a t io n e x t e r n a l p r o c e s s e s i n t e r n a l p r o c e s s e s external processes internal processes convergence 44 ana hol and athula ginige international journal on advances in ict for emerging regions 04 december 2011 across four developmental stages. for example, questions about the website (category) within the dimension of it tools and systems within the first stage of the development are about the static websites, then within the stage two about the interactive websites, stage three about ecommerce sites and stage four about convergence and system integrations. after questions for all categories within a dimension have been answered a company can review their developmental stage within that dimension and identify their abilities – what they can do and their recommendation – what they should improve in. furthermore, once the company has answered all of the set questions for each of the dimension they can see in what dimension they need to be investing into next based on the etransfomation position which is calculated through the questions answered and the developmental stages the company has reached within each of the etransformation dimensions. furthermore, the results given by the et guide allow the company to measure, track and guide their etransformation journey. fig. 7. et guide [2] fig. 8. et guide [2] with the system the etransformation can be measured using the et position which identifies a percentage completion of each of the dimensions across each of the four developmental stages. furthermore, the et position can indicate where next developments should happen as progress through stages is incremental and should always happen first within strategy, then structure followed by tasks and processes and finally it tools and systems. moreover, the etransformation can be tracked using the et history (sample history is provided in table 2) which only can be analysed if a company has been using the system for a prolonged period of time. furthermore, the results generated by the et guide allow the company to identify the progress it has made thus far through the et report which provides in detail summary of the achievements the company has made as well as it outlines the recommendations identifying etransformation changes and investments the company should undertake next within each of the four etransformation dimensions. the report can be generated from one dimension or it can be generated for a number of specified dimensions (ie. structure and it tools and systems or all four). specific data generated through the use of the et guide for the three studied cases is presented in the next section. table ii sample et history [2] a. staircase design based on the data records within the system the following can be identified for the staircase design company: et position table ii et position: staircase design stage 1 stage 2 stage 3 stage 4 strategy 100% 83% 66% 50% structure 100% 100% 66% 33% tasks & processes 50% 0% 0% 0% it tools & systems 71% 42% 14% 14% from table 3 above it can be noted that strategy and structure in stage 1 have been completed. to be able to develop further within stage 1 the company will need to invest into tasks and processes as this dimension has only been completed 50 % and is not at all developed within the stage 2. this also indicates that further improvements within it tools and system strongly depend on a careful identification and assessment of the tasks and processes dimension. furthermore, based on the et report which outlines future recommendations for the company it can be seen that staircase design should first of all review its tasks and processes dimension and identify changes that need to happen next. in addition, it is important to take into the ana hol and athula ginige 45 december 2011 international journal on advances in ict for emerging regions 04 account that the organisation is using old software and is predominantly relying on past knowledge and cumbersome management of production managing it with excel spreadsheets. to be able to advance the company should identify how best their processes can be streamlined. furthermore, the company states that currently they are not getting as many jobs as they used to. in order to get back into the market they should analyse their current marketing strategies and identify how they should be changed. the company could also explore possibilities of collaborating for example with builders, carpenters, tilers so that they can get more jobs. it is also worth noting that most of their jobs even now come as a result of recommendation from one of the above. making this relationship more formal would in turn mean a construction of a more active job seeking possibilities. such collaborations could also be formalised through common marketing strategies and or even in some instances machine, equipment sharing or bulk ordering of commonly required raw material goods. moreover, so that the company can easily adapt to the required change they should assess their workflow operations as well as their day to day operations to assure that the selected tasks and processes are those most suitable for the company goals to be reached. day to day operations where a majority of tasks is written within the spreadsheets and where designs are only done by the ceo is very difficult and at times hardly managed. this also means that activities take time to be distributed as task wait for the office clerks. furthermore, to make appropriate investments in it tools and systems dimension the company should closely review its stage 2 strategy. this may involve a review of the current products and services including possible diversification or collaboration with business partners within or outside of the company’s industry sector. in addition, this would help the company strengthen its goals and assure that they can gain the competitive advantage. trying only to find jobs where staircases are to be designed for the personal homes may not be the most optimal when an organisation has a capacity to design staircases for businesses as well. job diversification and marketing changes may need to be looked into. only after strategy, structure and tasks and processes have been reviewed for stages 1 and 2 the company can review it tools and systems dimension. based on the recommendations it seems that staircase design requires to have someone who will be able to look after their ict as currently the company does not have a person on whom the company can rely on to get dedicated ict assistance. furthermore, the company backups data on a portable storage device, which is then carried by employees from office to a remote location. the company should look into more optimal backup procedures and assurance that their data is secure. for remote access company may explore use of vpn or ftp. in addition, to be able to attract more customers it would be good if the company could develop online showcase of past implementations to allow customers to search for the required products easier. furthermore, in the future the company could consider implementation of customer relationship management system to allow the company to keep track of their customers, as well as past and present orders. in addition, implementation of the tracking system would allow the company to transfer data they currently keep in spreadsheets and easily monitor warehouse stock, orders, production and delivery which currently are not done effectively. if the company is to take on board specified recommendations they would be able to satisfy stage 2 across all dimensions and move closer to stage 3. by analysing stage 3, we can also see that once the company makes effective changes in tasks and processes they will be able to effectively move across other three dimensions. b. finance and accounting services based on the data records within the system the following can be identified for the finance and accounting services. et position table iv et position: finance and accounting services stage 1 stage 2 stage 3 stage 4 strategy 83% 66% 50% 16% structure 100% 33% 33% 33% tasks & processes 75% 25% 25% 25% it tools & systems 57% 28% 0% 0% from the table 4 it can be seen that finance and accounting services company only has developed its structure to 100% in stage 1. this result should be taken with caution as changes in strategy may force structure to change. therefore, to make the advancement in etransformation journey the company should review its strategy and assure that its goals and objectives are set. after this, the company should review its structure, tasks and processes and finally it tools and systems. in addition, data within the et position table identify that structure and tasks and processes for stages 2, 3 and even 4 have progressed along the same path, however that it is unlikely for them to progress any further before the company has reviewed its strategy. based on the recommendations the company got within the et report it can be noted that company needs to review the services it offers, the business partners it collaborates with as well as review other financial services. 46 ana hol and athula ginige international journal on advances in ict for emerging regions 04 december 2011 furthermore, to be able to effectively work off site it is essential for the company’s strategy to reflect upon this business requirement and therefore for the changes to be reflected within business activities tasks and processes. currently, the company is required to duplicate the work because when they work off site they are unable to get live data from the network in time. to assure company is able to work effectively workflow procedures and current company objectives would need to be assessed and modified accordingly. moreover, to be able to give quick responses to the customers it is essential for the business operations to change so that service delivery can be streamlined and customers can get answers about their loans quicker. the company should also try to explore extending its networking capabilities beyond homes and office and assure that their work activities can be fully carried from customer locations however this may require a review of online data sharing access as well as security measures required to access files from remote locations. to assure company’s tasks and processes are adequate it will be essential for the company to review them. the review of tasks and business processes should encompass assessment of the workflows as well as external operations the company relies on such as formats of data collection, editing and retrieval for various banks and other financial institutions as well as assurance that the company is able to respond to the customers inquiries in a timely manner. finally, after the review of strategy, structure and tasks and processes has been conducted recommendations for it tools and systems dimension can be given. firstly, it is important that the company has someone who will look after their ict holistically and assure appropriate security measures are implemented as financial data the company deals with is highly confidential. secondly, it is crucial for the networking capabilities to be extended so that work can be performed from the locations other than homes and office and finally it is crucial for the systems to be centrally monitored so that data for the customers from different financial institutions can be collated through the same input fields. after strategy has been carefully assessed for stages 1 and 2 changes in other dimensions will be able to follow. c. cruise company based on the data records within the system the following can be identified for the cruise company: as per the et position (table 5) of the cruise company it can be seen that the company has achieved 100% in strategy and structure for stage 1. this identifies that next investment should happen within tasks and processes in stage 1. to further indentify other changes that need to happen for the company to effectively implement required ict it is crucial to note that strategy within stage 2 has only been completed 66%. therefore, the company should imperatively review its strategy before they are to undertake further investments within tasks and processes. stagnation of improvement for structure at 66% within stages 3 and 4 and 25% for tasks and processes also indicate that for the change to happen and for it to be effective the company would essentially need to review its strategy before it is to undertake any further changes. et position table v et position: cruise company stage 1 stage 2 stage 3 stage 4 strategy 100% 66% 66% 33% structure 100% 100% 66% 66% tasks & processes 75% 50% 25% 25% it tools & systems 71% 42% 28% 14% as per the et report and the outlined recommendations highlight that it is important for the company to reviews its strategy first. this would mean that the organisation should assess its goals and visions carefully. for example, the company may like to review whether they would like to change their operations and possibly introduce daily or at least weekly trips rather than purely rely on group and functions booking. to be able to survive in the information age it is also crucial to identify appropriate marketing strategies and potentially join into collaborative ventures with other organisation. for example the company could establish links with accommodation places in the area, tourist guides and tourist agencies to increase its customer base. furthermore, to be able to deal with a larger customer base it would be essential to identify changes that would need to be undertaken in order to streamline business processes and make them smoother. for example, inquiries and quotes should not only be handled by the ceo. having a system where a number of people could have the access to the quote generation data would streamline planning and quotes generation. this would in turn allow the company to select it tools and systems required for such operations which would also help the business to achieve required outcomes. furthermore, the business should not run purely on the basis of printed sheets that have been prepared for the day. data should be at least e-mailed to all involved. it would also be important to have a central hub where information about the trips can be stored so that all staff has an understanding of what activities are set to take place and what they are responsible for. this would also allow for any staff changes to be made quickly which would in turn open more possibilities. in addition, customer data is a valuable ana hol and athula ginige 47 december 2011 international journal on advances in ict for emerging regions 04 resources and should be kept in a more manageable format that the excel sheets. if a crm is used the company can learn based on their past experiences. furthermore, they can also use their existing customer base to strengthen existing relationships, invite them to promotional tours and consequently expend their business. iii. recommendations and future of etransformation based on the recommendations given by the et guide each company has attempted to make required changes in order to undertake etransformation. one year after the initial interviews follow up interviews were conducted. during this time it was identified that three case study companies approached the etransformation in a slightly different way. the section below specifies changes each of the three companies made or at least identify as important for the future of their etransformation. a. staircase design the staircase design company has identified that in order to implement appropriate ict some of their tasks and processes would need to be modified. in particular, they have identified that it would be beneficial if they could keep customer data in a more manageable form than they do now, where they could link staircase proposals and actual drawings in a pdf format to customer details and keep data in a one place. the company feels that this would allow them to easier manipulate drawings, adjust them to the customer requirements and quicker develop final quotes and start working on the jobs. this would also mean that anyone within the office could easily communicate with the customer about their designs. they would also be able to record notes and identify customer requirements which would in turn increase customer satisfaction and would improve response time. furthermore, the company has identified that it would be good if they could implement a tracking system that would allow them to track warehouse storage space and incoming and outgoing raw material goods as well as staircase developments and installations. at present this is done by physically visiting the warehouse and writing down the stock number which in turn means that company records are not precise which occasionally creates delays in production and deliveries. in addition, the company has identified that it would be essential for them to update their website as their current website only reflects their operations from early 2000s. the updates would include sample videos, showcases of staircase installations, and staircase building tools so that customer can explore the product and identify their requirements easier. the company has also identified future collaboration channels with builders, plumbers and other tradesman’s in the area. currently, this working relationship is relatively casual however has already resulted in new jobs within the new development site in the nearby area. b. finance and accounting services the finance and accounting services company has identified that in order to survive in the information era it would be essential for them to review the way the company deals with its customers. in particular, the company has identified that outlook alone is not sufficient for them to be able to keep customer data in the effective manner as for each customer they often have over ten to fifteen different pdf files. to be able to do this, the company has recognised that they should invest into a customer records management system that will allow them to keep records for each customer and record all conversations they had with each customer in a chronological order including phone calls, e-mails as well as meetings. furthermore, the company has identified that it would be crucially important for them to market certain services to particular niche markets and also have the ability to contact individual customers when needed. in addition, the company has also identified that it would be beneficial if they could track communication they have with the customers, business partners and the other financial institutions so that by a click of the button they can find all correspondence they had with each stakeholder. according to the company it appears that this would help speed up operations as well as improve decision making. they have also identified that more personalised services in their industry are highly valued and they have also expressed interest to be involved in a prototype development of the crm mass e-mailing system that would allow them to select customers with a particular interest and e-mail them information based on their preferences. furthermore, the company has realised that various banks and financial institutions keep financial data and forms in different formats however that the data in its essence is identical. consequently, the company has realised that it would help if they had a system which would allow them to input data into the required fields just by selecting functions which are already set within the company systems. this however may require further explorations and discussions with the industry sectors where providers such as financial accounting service can together with the industry come to the agreement regarding the application that could be used to solve the current issue. c. cruises company after the cruise company has received the recommendations via the et guide the company has identified that it should seek better ways on how to manage customer communication and generation of quotes. furthermore, the company also identified that the spreadsheets they currently use are effective, however that in order for the operations to happen quicker and smoother it 48 ana hol and athula ginige international journal on advances in ict for emerging regions 04 december 2011 would be essential that the files can be accessed by multiple users at the same time and that data input can happen online. in addition, the company has also identified that in order for them to stay in business and retain their existing customers and encourage them to come and visit the company again it would be good if the company could have a customer relationship management system which would allow them to enter data about the customers, add documentation, identify requirements as well as later on be able to group customers and send them promotional material. in addition, the company has also identified that it would be beneficial if they could mass e-mail customers and inform them about the upcoming events. in particular the company has noted that they would like to offer past customers cruise birthday celebrations for a discounted price, they have also identified that it is important to keep in touch with tour organizers who plan large cruises as well as wedding planners and corporate party organizers as well as businesses who have in past been their guest. in addition, they have also decided that in the instances where the boat is booked for a leisure cruise and it is not filled to its capacity they will offer past members thirty percent discount and will encourage them to join the cruise. moreover, to be able to manage quotes the company has identified that it would be beneficial if they could have an electronic package that could allow them to plan cruises as well as organize staff, catering, docking and entertainment which would in turn allow the company to have a more holistic view of the day to day operations. iv. beyond the et guide based on the et guide outputs both in terms of company abilities as well as the recommendations as company’s future requirements researchers at the university of western sydney have identified that with the et guide companies can now guide, track and measure their etransformation journey. they have also identified that et guide outputs can also help ict practitioners, contractors or business analysts to quicker identify problem areas and pinpoint the most optimal business solutions. moreover, the researchers identify that et guide may have its further potentials in its ability to easily identify problem areas as well as point potential solutions which could potentially be implemented by the organizations instantaneously. this further initiated thoughts of the potentials that the researchers could develop associated systems to the et guide that seem to be of a crucial importance for the small to medium companies. such systems could be embedded within the services offered by the et guide and suggested to the companies when the et guide identifies a need for one of such systems. based on the data collected via the et guide so far the researchers have identified needs for a particular set of systems. therefore, to streamline etransformation process and assist companies under current development researchers have commenced developments of the following systems: customer relationship management system (crm) for small to medium enterprise – where businesses can easily store data about their customers, contact details, transcripts and logs of phone conversations, meetings, staff responsible for particular customers as well as keep records of past purchases, current enquiries and a list of customer associated documents; other system is the mass e-mailing tool. the tool is originally embedded within the crm, however if needed can be used as a standalone package. the tool allows its users to make personalized mass e-mails, to extract recipients based on a particular grouped criteria, such as a company may like to e-mail customers who have purchased a particular product or they may like to give customers discounts for the purchases made on their birthdays. the third system under the development is: spreadsheets on the web. this system allows its users to manipulate spreadsheet data online with multiple user accesses, something that currently existing google documents do not allow. the system allows users at the remote locations to work on the same spreadsheet files simultaneously. data into the cells is entered through a system by a cell locking algorithm meaning that each user occupies a cell at the time. while data is being entered by one user into the cell the cell is at that time locked for the other user and the other user is unable to make changes to that cell and the cells requiring data from the cell being modified. this project is in its early stages of its development. it is believed that new systems, similar to those described above, will allow companies to achieve their business requirements smoother. as soon as the companies are to realize that they need one of the above mentioned systems or are given a recommendation by the et guide to try one of the available systems the company will not necessarily need to hire it consultants in order to implement a new solution but will be able to set it up on its own and use it instantaneously. furthermore, the advantage of using such custom based applications is that the designed systems will be modeled taking into the account business requirements, predominately requirements of small to medium enterprises and environmental demands including demands from customers, business partners, industry sectors and government organizations. furthermore, the et guide users are to play a vital role when it comes to et guide future developments. the higher the number of the et guide users the easier small business requirements will be gathered and required systems developed. this means that researchers will be taking into ana hol and athula ginige 49 december 2011 international journal on advances in ict for emerging regions 04 the account systems recommendations for the businesses and will based on the system suggestions be making future research implementations and system developments. in addition, this would also enable the et guide community to develop quicker and for its businesses to have a variety of tools that they can use so that they can continue caring important business activities within the information era. v. conclusions in summary, it can be seen that without the et guide companies, especially small to medium enterprises may struggle to guide, track and measure their etransformation journey due to their busy schedule and at times luck of ict budget. the et guide acts as the consultant and is allowing busy company owners and managers to identify their most optimum etransformation directions based on the answered questions for four important etransformation dimensions. as identified, etransformation is not just pure selection and implementation of it tools and system but also the identification of company’s goals and objectives – the strategy; company’s structure and decision making as well as identification, streamlining and automation of companies tasks and processes without which companies are unable to deliver required services or produce needed products. the et guide assesses the company holistically. it takes into the account current operations, activities, technologies as well as company goals, visions and objectives. moreover, the system also takes into the account that an organization does not exist in the isolation of the industry sector, competitors as well as well agile customers and business partners. in addition, the et guide allows the companies to observe their growth and their current status through abilities as well guide their developments while being assisted with given recommendations with the aim to help companies make the most appropriate decisions. we also see that with the use of the et guide companies are able to explore future possibilities and much easier identify how some of their company’s processes can be modified and linked to required ict resources so that jobs can be completed much swifter and smoother and that the customers’ demands can be met more effectively in time. in the future, it is expected that the et guide will also be associated to the packages such as spreadsheets on the web and crm for small to medium businesses which will make the etransformation journey with the et guide even more fulfilling. references [1] a. hol, & a. ginige, dimensions of etransformation 4th international conference on information and automation for sustainability. paper presented at the iciafs ieee, 12-14 december, 2008 colombo, sri lanka. [2] a. hol, & a. ginige, etransformation guide: an online system for smes. paper presented at the ieee dest, 31st may 2009 -3rd june 2009, istanbul, turkey. [3] a. ginige, s. murugesan, & p. kazanis, a road map for successfully transforming smes into ebusiness. cutter it journal, 15(5), 13, 2001. [4] j. burn, , & c. ash, a dynamic model of e-business strategies for erp enabled organisations. industrial management & data systems, 105(8), pg1084-1095, 2005 [5] m. j. earl, management strategies for information technology. burr ridge: irwin, 1989. [6] m. j. earl, m. j. evolving the e-business. business strategy review, 11(2), pg33-38, 2000. [7] r. d. galliers, y. merali, & l. spearling, coping with information technology? how british executives perceive the key information systems management issues in the mid 1990s. journal of information technology, 9, pg. 223-238, 1994. [8] a. mawson, the advanced organisation, new models for turbulent times, 2000. retrieved 27 february, 2006, from www.advancedworkplace.com [9] j. mckay, a. prananto, & p. marshall, e-business maturity: the sog-e model. paper presented at the acis proceedings of the 11th australasian conference on information systems, brisbane, australia, 2000, 6-8 december [10] r. l. nolan, managing the crises in data processing. harvard business review, 57(2), pg115-116, 1979. [11] people and process. business crisis management, 2005. retrieved 3 march 2006, from www.the-process-improver.com/business-crisismanagement.html [12] j. f., rayport & b. j. jaworski, introduction to e-commerce. boston, 2002. [13] r. h. waterman, t. j. peters, & j. r. phillips, structure is not organization. business horizons, 23(3), pg14-26, 1980. [14] e. wons, organisational change an ethical, means based approach boston jpc training & consulting llc , 1999 microsoft word 209 improving citation network scoring by incorporating author and program committee reputation international journal on advances in ict for emerging regions 2016 9 (1) june 2016 international journal on advances in ict for emerging regions improving citation network scoring by incorporating author and program committee reputation dineshi peiris, ruvan weerasinghe abstract— publication venues play an important role in the scholarly communication process. the number of publication venues has been increasing yearly, making it difficult for researchers to determine the most suitable venue for their publication. most existing methods use citation count as the metric to measure the reputation of publication venues. however, this does not take into account the quality of citations. therefore, it is vital to have a publication venue quality estimation mechanism. the ultimate goal of this research project is to develop a novel approach for ranking publication venues by considering publication history. the main aim of this research work is to propose a mechanism to identify the key computer science journals and conferences from various fields of research. our approach is completely based on the citation network represented by publications. a modified version of the pagerank algorithm is used to compute the ranking scores for each publication. in our publication ranking method, there are many aspects that contribute to the importance of a publication, including the number of citations, the rating of the citing publications, the time metric and the authors’ reputation. known publication venue scores have been formulated by using the scores of the publications. new publication venue ranking is taken care by the scores of program committee members which derive from their ranking scores as authors. experimental results show that our publication ranking method reduces the bias against more recent publications, while also providing a more accurate way to determine publication quality. keywords—ranking; citation network; publication venues; publications; publication authors i. introduction he internet has opened up new ways for researchers to demonstrate research results and share their research findings at a rapid pace than the traditional methods. today, researchers tend to submit their findings to a wide variety of publication venues such as conferences, journals, and seminars. these publication venues play an important role in the scholarly communication process and the visibility that their work receives. often researchers might be concerned in knowing about the most important publication venues for publishing their research [1]. however, the selection of publication venues is usually based on the researcher’s existing knowledge of the field of his/her discipline [2, 3]. as a result, researchers may not be aware of more appropriate publication venues to which their publications could be submitted. manuscript received on 23 nov 2015. recommended by prof. k. p. hewagamage on 16 june 2016. this paper is an extended version of the paper “citation network based framework for ranking academic publications and venues” presented at the icter 2015 conference. dineshi peiris holds a b.sc. (honours) in computer science from the university of colombo school of computing, sri lanka.(e-mail: dineshi.peiris89@gmail.com). dr. ruvan weerasinghe is a senior lecturer at the university of colombo school of computing. (e-mail: arw@ucsc.cmb.ac.lk). on the other hand, computer science (cs) is a highly active research area that brings together multiple disciplines such as physics, mathematics, and life sciences. the number of publication venues has been increasing continuously, making it difficult for researchers to be fully aware about the appropriateness of such publication venues [4]. with an abundance of available publication venues, it becomes a very difficult task for new researchers to find exactly what they are looking for or for researchers to keep up to date on all the information [2]. most of the existing methods to measure the reputation of publication venues use citation count as their chief metric [1]. for journals, among existing methods the most popular citation analysis method is garfield’s impact factor (if) which itself is based on citation counts [5]. the number of citations is not a good individual indicator to measure the quality of publications, since it does not take into account the quality of the citations [6, 7, 8]. in the case of conferences, there are no criteria or consolidated metrics for measuring impact. unlike some other fields, conferences are essential instruments for the timely dissemination of computer science research [9]. as demonstrated in [10], the computer science programs follow publication ratio of more than two conference papers per journal paper. in addition, conferences have the precise benefits of giving rapid publication of papers [11]. therefore, the impact of a publication venue is a key consideration for researchers whether the venue is a journal or a conference [3]. selecting the most appropriate venue to which to submit a new paper minimizes the risk of publishing in disreputable or fake publication venues. on the other hand, the quality of a publication venue is also important in helping with decisions about awards as well for deciding about scholarships funded by research institutions [12]. if publication venue ranking scores are measured successfully, then researchers can make better decisions about a particular publication venue much quicker based on such a mechanism. there is a significant requirement for an automated process of measuring the publication venue scores to support researchers, so that they can easily recognize the venues in which to publish their research. the findings of this research will definitely be beneficial for the researchers and in return it gives this research a great importance. in our research, we propose a novel approach for ranking publication venues by considering publication history. we have used a modified version of the pagerank algorithm [13] to generate the scores for publications. we have considered two types of publication venues for which we normally need such information: 1. known publication venues about which we have historical data • for example, publications of previous conference venues in the series with citation and author data t improving citation network scoring by incorporating author and program committee reputation 2 international journal on advances in ict for emerging regions june 2016 fig. 1. architecture of the proposed approach 2. new publication venues about which we have little information • for such new conferences, we often only have information about the program committee (pc). for new journals, we often only have information about the editorial board. the paper is organized as follows: first, we briefly describe our data sets. then the major modules of the conceptual approach citation network construction, out-links and in-links creation, publication score generation, author score estimation, lower citation counts of recent publications smoothing and publication venue ranking are presented in section ii. the results of our experiments on the real datasets obtained from dblp and eventseer.net are presented in section iii. then a survey of the existing approaches which perform academic publication analysis is conducted. the strengths and weaknesses of these approaches are also given in section iv. finally, a conclusion is provided in section v. some directions for the future research work are also suggested in. ii. our approach fig. 1 illustrates the architecture of our proposed citation network-based publication venue ranking approach, which consists of an academic database and six major modules: citation network construction, out-links and in-links creation, publication score generation, author score estimation, lower citation counts of recent publications smoothing and publication venue ranking. first of all, data preparation is discussed in detail. then we explain the design of each module in our proposed approach. a. datasets we had to work with data that we have access to, which generally are the citation data and the pc/editorial board data which may be not easy to get directly through sources such as google scholar1. besides, there are diverse digital repositories freely available to the general public. dblp2, acm3, microsoft 1 http://scholar.google.com/ 2 http://www.informatik.uni-trier.de/~ley/db/ 3 http://portal.acm.org/dl.cfm academic search4, and citeseer5 digital libraries are vast collections of citations of past publications. dbworld6 and eventseer.net7 contain most of cfps for conferences in computer science. from these sources we can collect a list of upcoming and past publication venues with the information about topics and organizations among others. our approach used data from two primary sources: dblp and eventseer.net. these data sources offer different data services: from dblp we got xml records, while data from eventseer.net can only be extracted from its website using a html parser. dblp offers xml records for its dataset which can be download from its website. the dblp dataset contains information about publications from the numerous fields published over the years. this stores a set of metadata for each publication, including publication title, author(s), type of publication, the year of publication and citations. each publication is represented by the unique key from dblp. they did a lot of work into resolving the names problem the same person referenced with many names. because of that the work in this study relied on the dblp dataset for author and citation data. eventseer.net contains most of the call for papers (cfps) for conferences in computer science. therefore this dataset is essential for our work since it is required in ranking upcoming new publication venues. from eventseer.net, we collected a list of five new conferences with this information from the listed pc members. summary statistics of the collected data is shown in table i. table i. summary statistics of the collected data data quantity publications 2645295 unique authors 1441289 conferences 3642 journals 1345 pc members(within five new conferences) 177 b. citation network construction citation networks help in evaluating the impact of publication venues, publications and authors [14]. citation networks are directed networks in which one publication cites another publication. in most cases, authors cite older publications in order to identify the related body of work or to critically analyze earlier work. hence citation networks are networks of relatedness on subject matter [15]. on the other hand, publications are well defined units of work and accepted papers play an important part in the success of a publication venue [16]. our approach is completely based on the citation network represented by publications. we built the citation network defined as a directed graph, with each publication representing a vertex and the citations representing the edges in the graph; the edges being directed ones, directed from the citing vertex to the cited vertex [1]. each vertex has several attributes, including publication title, conference/journal of publication, year of publication, author(s) and a unique key from the dblp dataset. 4 http://academic.research.microsoft.com/ 5 http://citeseerx.ist.psu.edu/ 6 http://www.cs.wisc.edu/dbworld/ 7 http://eventseer.net/ 3 dineshi peiris, ruvan weerasinghe june 2016 international journal on advances in ict for emerging regions c. the ranking method the method for ranking publications consists of four phases: • creating the publication in-links and out-links • using a modified version of the iterative pagerank algorithm to calculate the ranking score for each publication • estimating ranking scores for authors using the ranking scores of stable publications. • smoothing for lower citation counts of recent publications 1) creating the publication in-links and out-links the method for ranking publications based on the citation network uses the two forms of edges: out-links and in-links. definition 1. out-links: from a given publication p, link all the publications pi that the publication p cites. definition 2. in-links: to a given publication p, link all the publications pj that cite the publication p. 2) generating publication scores according to a class of publication-based ranking methods, the graph vertices represent publications, whereas an edge from node ni to node nj represents a citation from publication pi to publication pj. computing the ranking at the publication level has the benefit that only a single procedure is performed to evaluate more than one entity: the publication itself, the publication venue it belongs to, and the author(s) [14]. pagerank offers a computationally simple and effective way to assess the relative importance of publications beyond mere citation counts [6]. unlike other methods, pagerank constructs a single model that integrates both out-links and in-links [17]. pagerank of a publication is defined as follows [18]: definition 3. assume publication p has publications r1...rn which point to it. the parameter d is a damping factor which can be set between 0 and 1. people usually set d to 0.85. c(p) is defined as the number of out-links of publication p. the pagerank of a publication p is given as follows: ( ) ( )1( ) (1 ) ( .. ) ( ) ( ) 1 pr r pr rnpr p d d c r c rn = − + + + (1) pagerank is calculated using an iterative algorithm, and corresponds to the eigenvector of the normalized link matrix [18]. however, damping factor allows for personalization and can make it nearly impossible to deliberately mislead the calculations in order to get a higher ranking score [18]. pagerank extends the idea by not counting citations from all publications equally, and by normalizing its rank by the number of citations on a publication [18]. another important justification is that a publication can have a high pagerank value if there are many publications that point to it, or if there are publications that point to it which themselves have high pagerank values [18]. pagerank handles both these cases by recursively using the link structure of the citation network. there are many aspects that contribute to the importance of a publication, such as the number of citations it has received, the rating of the citing publications, the time metric of these citations and its author(s). pagerank only includes the first two factors. the publication environment is not static but changes continuously. pagerank favors older publications because older publications have many citations accumulated over time. bringing emerging publications to researchers are very important since most of them want the latest valuable information. on the other hand, aditya pratap singh et al. [1] have introduced the timing factor in the pagerank algorithm [13] to reduce the bias against more recent publications which have less time than the older publications to get cited. to make the algorithm time-independent, the metric aditya pratap singh et al. [1] proposed to use is the average of the total number of citations of the publications published in each year. we have also modified the formula for calculation of the pagerank of a publication p, to make the algorithm time-independent in this way. this timed pagerank value of a publication p is given by the following: [ ] ( ) * ( ) ( ) (1 ) [ ] n n yeary py p tpr r d c r tpr p d aycc y = = − + ∑ (2) where py[p] is the year of publication p, tpr(p) is the timed pagerank of p, tpr(rn) is the timed pagerank of publication rn that links to publication p, c(rn) is the number of out-links of publication rn, aycc[y] is the average number of citations in the year y, and d is a damping factor, which is set to 0.85. 3) alternative method to smooth for lower citation counts summarizing the weaknesses of the ranking methods we observe that: • citation count does not take into account the quality of the citing publications. • pagerank does not capture the fact that an older publication has more time to be cited in comparison to the recent publications. • timed pagerank is able to adjust the rank of emerging quality publications. but it is not sufficient for all the publications since new publications of recent years only have a few or zero citations. timed pagerank algorithm is adequate for ranking the publications as it captures the important aspect that an older publication has more time to be cited in comparison to the recent publications. but it is not sufficient for all the publications since new publications of recent years only have a few or no in-links. new publications, which may be of high quality, have a few or no in-links are left behind in this aspect. it is possible the time independent metric of recent publications to become zero. a study of conferences and journals indicates that many of the references reach back five and more years giving newer publications comparatively little opportunity to get cited [19]. it is possible that the time independent metric of recent publications is zero. having in mind the above weakness, an alternative method was defined to smooth for lower citation counts of recent publications by modifying the timed pagerank method. a) ranking scores for authors to address the weakness of the timed pagerank algorithm, we have proposed an alternative metric which uses an author score derived from citations received for publications of that author for previous publications. to assess the quality of a recent publication, its author(s) are useful [20]. it is important to use stable publications for calculating scores for authors improving citation network scoring by incorporating author and program committee reputation 4 international journal on advances in ict for emerging regions june 2016 since we use these author scores for smoothing the lower citation counts of recent publications. it is to be noted that the dblp dataset that we have used only has less citation data after the year 1999 (see table ii). thus we have taken the year 1999 as the margin year to demarcate the stable publications and the recent publications. an author score is computed by averaging the timed pagerank values of all the past publications a given author has written till the year 1999. the equation for the score of an author ai is: [ ] p a i ai i tprs ars apc a = ∑ (3) where arsai is the author ranking score, tprspai is the timed pagerank score of a publication pai written by the author ai and apc[ai] is the number of publications written by ai. table ii. average number of citations per publication from 1999 to 2014 year average year citation count year average year citation count 1999 1.547473x10-2 2007 1.231x10-5 2000 2.53011x10-3 2008 5.81x10-6 2001 7.080x10-5 2009 1.073x10-5 2002 0 2010 1.541x10-5 2003 1.034x10-5 2011 2.431x10-5 2004 1.752x10-5 2012 4.68x10-6 2005 1.49x10-5 2013 0 2006 0 2014 0 b) smoothing for lower citation counts of recent publications using these authoritative scores of authors, we adjust the publication scores after the year 1999. thus, the score for a new publication is the average score of all the authors of that publication. if this newly calculated publication ranking score is less than the timed pagerank score of that publication, we will take the timed pagerank score as the score of the publication. the equation for the score of a publication pi is: [ ] a p i pi i ars nprs pac p = ∑ (4) where nprspi is the new publication ranking score of lower citation count, arsapi is the author ranking score of an author api who has written the publication pi and pac[pi] is the number of authors who have written the publication pi. d. ranking publication venues in our adjusted pagerank method, there are many aspects that contribute to the importance of a publication, including the number of citations it has received, the rating of the citing publications, the time metric of these citations and the authors’ prior reputation. besides, computing the ranking at the publication level has the benefit that only a single procedure is performed to evaluate more than one entity: the publication itself, the publication venue it belongs to, as well as the authors of such publications [14]. hence we can evaluate publication venues based on this adjusted pagerank scores. 1) type i: generating the scores for known publication venues the quality of accepted papers plays an important part in determining the success of a publication venue [16]. the ranking score of a publication venue depends on the quality of research papers it publishes [1]. this is the key behind our approach for ranking known publication venues. we have adjusted the publication ranking scores to deal with computer science publication venues. using the timed pagerank scores of the publications and the new publication ranking scores of the publications, we formulate scores for publication venues. known publication venue scores have been formulated by using the scores of the publications. the equation for the score of a publication venue vj is: [ ] p v j v j j aprs pvrs vpc v = ∑ (5) where pvrsvj is the publication venue ranking score, aprspvj is the adjusted pagerank score of a publication pvj in the venue vj and v pc[vj ] is the venue publication count in vj. aprspvj can be either tprspvj timed pagerank score of a stable publication or nprspvj new publication ranking score of lower citation count of a publication. 2) type ii: generating the scores for new publication venues adjusted publication ranking scores are not sufficient for all venues because new venues only have pc/editorial board data. research indicates that the quality of a conference is related to that of its pc members [4]. to assess the importance of a new conference, its pc members are useful. as a proof-of-concept, new publication venue scores are generated only for selected conferences. a recent study of pc candidate recommendation shows that the publication history is the strongest indicator for being invited as pc members [16]. new publication venue ranking is taken care by the scores of pc members which derive from their ranking scores as authors. the score for a pc member is the author score of this person as an author. earlier we used the publications the author has written till the year 1999 for calculating the author score. then we adjusted the publication scores. now we can calculate the author score of the pc using the adjusted pagerank scores of each of its members as authors. the score for an author is the average score of the adjusted pagerank values of the publications the author has written. the ranking score for a new conference is the average score of all the pc members of that conference. the equation for the score of an author ai is: [ ] p a i ai i aprs ars apc a = ∑ (6) where arsai is the author ranking score, aprspai is the adjusted pagerank score of a publication pai written by the author ai and apc[ai] is the number of publications written by ai. 5 dineshi peiris, ruvan weerasinghe june 2016 international journal on advances in ict for emerging regions the equation for the score of a new conference cj is: [ ] c c j j j ars ncrs cpc c = ∑ (7) where ncrscj is the new conference ranking score, arscj is the author ranking score of a pc member in the conference cj and cpc[cj ] is the conference program committee member count in cj . iii. experiments and results a. ranking publications we carried out our comparative study mainly based on the studies on academic publication analysis [1, 6, 14, 21]. most of the existing methods use citation count (cc) to determine the impact of publications [5, 22, 23]. on the other hand, there has been some work done on academic research using the pagerank (pr) algorithm [1, 6, 21], which considers the importance of the citing publication to rank the publication being cited. to integrate the time measurement, we have added a timing factor in the pagerank algorithm named timed pagerank (tpr). since our approach has been derived through above mentioned methods, we were able to make a comparison between our adjusted pagerank (apr) method and other mentioned methods. table iii. ranking methods method notation citation count cc pagerank pr timed pagerank tpr adjusted pagerank apr table iv. summary of publication ranking methods method/factor cc pr tpr apr number of citations x x x x rating of the citing publications x x x time metric x x smoothing for lower citation counts of recent publications x 1) comparison between apr and cc the following table shows the top 10 publications as determined by our method. along with the publication apr rank and score, we also show its citation count and its citation rank. table v. top 10 publications in apr method and their citation ranks title apr cc rank score rank count data cube: a relational aggregation operator generalizing group-by, crosstab, and sub-total. 1 1 79 90 implementing data cubes efficiently. 2 0.93830579 71 95 title apr cc rank score rank count a relational model of data for large shared data banks. 3 0.88883433 2 580 mining association rules between sets of items in large databases. 4 0.77657482 45 111 fast algorithms for mining association rules in large databases. 5 0.71465106 62 100 object exchange across heterogeneous information sources. 6 0.71062948 108 77 the entity-relationship model toward a unified view of data. 7 0.59574822 1 604 relational completeness of data base sublanguages. 8 0.52836066 18 170 query evaluation techniques for large databases. 9 0.51691711 70 95 rganization and maintenance of large ordered indices. 10 0.50069555 21 153 on analyzing the table, the following key observations were made: • citation count, the most common measure of publications, is based on mere citation counts that do not account for the quality of the publications where the citations originate. this table illustrates how accounting for citation origin affects the citation ranking of publications. • adjusting for citation origin provided a more refined measure of publication status and changed the publication rankings. 2) comparison between apr and pr the following table shows the year-wise contribution in the top 100 publications from both the pagerank and the adjusted pagerank methods. table vi. comparison between pagerank and adjusted pagerank method distributions of the top 100 publications year pr apr year pr apr 1965 0 1 1992 1 5 1970 1 1 1993 1 9 1971 3 2 1994 1 7 1972 2 2 1995 0 21 1973 0 0 1996 2 13 1974 4 1 1997 0 10 1975 13 0 1998 0 0 1976 9 2 1999 0 1 1977 11 1 2000 0 0 1978 6 1 2001 0 1 1979 9 1 2002 0 1 1980 2 0 2003 0 1 improving citation network scoring by incorporating author and program committee reputation 6 international journal on advances in ict for emerging regions june 2016 fig. 2. the number of publications distributed over the years in adjusted pagerank and pagerank methods year pr apr year pr apr 1981 8 0 2004 0 1 1982 4 0 2005 0 2 1983 3 0 2006 0 1 1984 6 2 2007 0 0 1985 1 0 2008 0 0 1986 5 0 2009 0 0 1987 5 0 2010 0 1 1988 0 0 2011 0 4 1989 1 1 2012 0 2 1990 2 2 2013 0 1 1991 0 0 2014 0 2 fig. 2 shows the variation of the number of publications in the top 100 in both the apr and the pr methods over the years spanning from 1965 to 2014. on analyzing the graph, the following key observations were made: • the top publications in the pr are mostly from 1970s and 1980s whereas in the apr, the top publications are mostly from 1990s and 2000s. this shows that pr favors older publications because older publications have many citations accumulated over time. • this shows that our method reduces the bias against the recent publications which have less time than older publications to get referenced. hence it is able to adjust the rank of emerging quality publications. 3) comparison between apr and tpr the following table shows the year-wise contribution in the top 100 publications from both the timed pagerank and the adjusted pagerank methods. table vii. comparison between timed pagerank and adjusted pagerank method distributions of the top 100 publications year tpr apr year tpr apr 1965 1 1 1992 7 5 1970 1 1 1993 10 9 1971 2 2 1994 11 7 1972 2 2 1995 24 21 1973 1 0 1996 14 13 1974 2 1 1997 10 10 1975 0 0 1998 0 0 1976 4 2 1999 1 1 1977 1 1 2000 0 0 1978 1 1 2001 0 1 1979 1 1 2002 0 1 1980 0 0 2003 0 1 1981 1 0 2004 0 1 1982 0 0 2005 0 2 1983 0 0 2006 0 1 1984 2 2 2007 0 0 1985 0 0 2008 0 0 1986 0 0 2009 0 0 1987 0 0 2010 0 1 1988 0 0 2011 0 4 1989 1 1 2012 0 2 1990 2 2 2013 0 1 1991 1 0 2014 0 2 fig. 3 shows the variation of the number of publications in the top 100 in both the apr and the tpr methods over the years spanning from 1965 to 2014. 7 dineshi peiris, ruvan weerasinghe june 2016 international journal on advances in ict for emerging regions fig. 3. the number of publications distributed over the years in adjusted pagerank and timed pagerank methods on analyzing the graph, the following key observations were made: • in the apr method, the publications are distributed over the years as compared to that in the tpr method. • tpr is able to adjust the rank of emerging quality publications. but it is not sufficient for recent publications which only have a few or zero citations (after 1999). this is clearly visible in the graph as the tpr method is not able to assess the importance of recent publications whereas the apr method is able to assess the importance of recent publications based on their authors. • to assess the importance of a recent publication, its authors are useful. for better analysis, we selected few young publications for which citation statistics are not readily available in our dataset, and analyzed them by using their normalized ranking scores in the time pagerank and the adjusted pagerank methods over the recent years as shown in table viii. table viii. normalized publication scores in the adjusted pagerank and the timed pagerank methods year title ranking score tpr score apr score 2006 a whole genome long-range haplotype (wglrh) test for detecting imprints of positive selection in human populations. 0.06421025 0.34819944 2007 distributed resource management and admission control of stream processing systems with max utility. 0.06421025 0.20661683 2008 a low-power rf front-end of passive uhf rfid transponders. 0.06421025 0.20620484 2009 a revised r*-tree in comparison with related index structures. 0.06421025 0.24453887 2010 fair power control for wireless ad hoc networks using game theory with pricing scheme 0.06421025 0.24960894 year title ranking score tpr score apr score 2011 permission re-delegation: attacks and defenses. 0.06421025 0.19908869 2012 anatomy of a gift recommendation engine powered by social media. 0.06421025 0.24091758 2012 clickjacking: attacks and defenses. 0.06421025 0.19908869 2013 optimizing budget constrained spend in search advertising. 0.06421025 0.17896617 2014 thermal design and simulation of automotive headlamps using white leds. 0.06421025 0.20620484 on analyzing the table, the following key observations were made: • every publication has a pagerank of 0.15 value even though no-one is referencing for it. in the timed pagerank method, every publication has a ranking score of 0.06420859 when normalizing the 0.15 to scale down the value within the range (0, 1). • recent publications, which may be of high quality, have no in-links climbed up their ranking scores when switched from timed pagerank to adjusted pagerank method. • this table shows that our method reduces the bias against the recent publications, which have no in-links. b. ranking publication venues 1) type i: known publication venues we relied on our publication ranking method to compute publication venue ranking scores. one approach could be to compute the average score of all their publications. the following table shows the top 10 publication venues by averaging of all their publications. table ix. top 10 publication venues by averaging of all their publications. type: whether the venue is a journal (j) or a conference (c) publication venue type score ipsj c 0.28796447 electronic networking: research, applications and policy j 0.16220377 vldb workshop on management of uncertain data (mud) c 0.08146009 foundations and trends in databases (ftdb) j 0.08097285 acm trans. database syst. (tods) j 0.08017983 conference on very large data bases (vldb) c 0.07990088 acm sigmod international conference on management of data (sigmod) c 0.07961964 science j 0.07911258 improving citation network scoring by incorporating author and program committee reputation 8 international journal on advances in ict for emerging regions june 2016 publication venue type score views c 0.07818149 performance and evaluation of data management systems (expdb) c 0.07746338 for instance, publication venue a has 30 publications with only 20 being top ranking publications. assume that these high quality publications have a score of 10 points each, where the remaining ones have a score of 1 point. publication venue b has in total 5 publications, with 4 publications of them being top ranking publications. it is reasonable to consider that publication venue a should be ranked higher than publication venue b for their scientific contribution, because a has 5 times the number of top ranking publications than publication venue b. if we compute the average of all publication scores, then publication venues a and b would have 7 and 8.2 points respectively. it is not fair to take that approach to compute venue scores. in order to deal with this problem, we have taken into account the top n% of publications to calculate publication venue score. therefore, our problem was to choose the n% of publications of each publication venue that should be considered in the ranking. we performed the following experiment to determine the number n. we computed the average score for each publication venue by using their top n% publications, ∀ n ∈{25, 50, 75}. thus, we produced 3 ranking lists for our publication venue ranking task. as a test bed we used the core 2013 conference ranking list8. in core conference ranking, conferences are allocated a rank of a*9, a10, b11 or c12. the ratios of a* and a conferences within the top 10 publication venues were calculated, the better the evaluation was considered as the publication venue ranking list. it is to be noted that we have only considered the conferences within the top 10 publication venues to compute the ratio. the following tables show the top 10 publication venues by averaging the top 25%, 50%, and 75% of publications respectively. along with our publication venue rank and score, we also show its core 2013 ranking. table x. top 10 publication venues by averaging the top 25% of publications. type: whether the venue is a journal (j) or a conference (c) publication venue type score core ranking foundations and trends in databases (ftdb ) j 0.12188608 views c 0.11511190 acm sigmod international conference on management of data (sigmod) c 0.11356147 a* vldb workshop on management of uncertain data (mud) c 0.11322808 conference on very large data bases (vldb) c 0.11307256 a* 8 http://www.core.edu.au/ 9 flagship conference 10 excellent conference 11 good conference 12 other ranked conference venues publication venue type score core ranking acm trans. database syst. (tods) j 0.11122050 acm sigmod digital symposium collection (disc) j 0.10751429 performance and evaluation of data management systems (expdb) c 0.10126516 conference on parallel and distributed information systems (pdis) c 0.10032141 c conference on innovative data systems research (cidr) c 0.09893309 a table xi. top 10 publication venues by averaging the top 50% of publications. type: whether the venue is a journal (j) or a conference (c) publication venue type score core ranking foundations and trends in databases ( ftdb) j 0.09897220 vldb workshop on management of uncertain data (mud) c 0.09815774 acm sigmod international conference on management of data (sigmod) c 0.09391697 a* conference on very large data bases (vldb) c 09386330 a* acm trans. database syst. (tods) j 0.09375841 views c 0.09076796 performance and evaluation of data management systems (expdb) c 0.09055314 conference on innovative data systems research (cidr) c 0.08719047 a acm sigact-sigmod-sigart symposium on principles of database systems (pods) c 0.08603544 a* acm sigmod digital symposium collection (disc) j 0.08586227 table xii. top 10 publication venues by averaging the top 75% of publications. type: whether the venue is a journal (j) or a conference (c) publication venue type score core ranking vldb workshop on management of uncertain data (mud) c 0.08721005 foundations and trends in databases (ftdb) j 0.08706834 acm trans. database syst. (tods) j 0.08532985 conference on very large data bases (vldb) c 0.08508535 a* 9 dineshi peiris, ruvan weerasinghe june 2016 international journal on advances in ict for emerging regions publication venue type score core ranking acm sigmod international conference on management of data (sigmod) c 0.08475611 a* performance and evaluation of data management systems (expdb) c 0.08391501 views c 0.08360480 conference on innovative data systems research (cidr) c 0.08026346 a workshop on data management on new hardware (damon) c 0.07970309 acm sigact-sigmod symposium on principles of database systems (pods) c 0.07965752 a* according to the tables, table xiii shows the ratios of a* and a conferences within the top 10 venues by averaging the top 25%, 50% and 75% of publications respectively. based on this experiment, we concluded that the average of top 50% publications is the most appropriate publication venue ranking list. the equation for the ratio is: x ratio y = (8) where x is the number of a* and a conferences within the top 10 publication venues, and y is the number of conferences within the top 10 publication venues table xiii. the ratio of a* and a conferences within the top 10 publication venues 25% 50% 75% number of conferences within the top 10 publication venues 7 7 8 number of a* and a conferences within the top 10 publication venues 3 4 4 ratio 0.4286 0.5714 0.5 furthermore, we produce a venue ranking list by using a cut-off of 50 publications to indicate statistical significance. the following table shows the top 10 publication venues which have higher than 50 publications. table xiv. top 10 publication venues which have higher than 50 publications type: whether the venue is a journal (j) or a conference (c) publication venue type score core ranking acm trans. database syst. j 0.08017982 conference on very large data bases (vldb) c 0.07990088 a* acm sigmod international conference on management of data (sigmod) c 0.07961964 a* conference on innovative data systems research (cidr) c 0.07623364 a publication venue type score core ranking acm sigact-sigmodsigart symposium on principles of database systems (pods) c 0.07582618 a* parallel and distributed information systems (pdis) c 0.07480646 c journal on very large data bases (vldb j.) j 0.07449637 international workshop on the web and databases (webdb) c 0.07401554 c international conference on database theory (icdt) c 0.07394727 a international conference on data engineering (icde) c 0.07238653 a* 2) type ii: new publication venues as a proof-of-concept, new publication venue scores were generated only for following five conferences. among recent cfps we have only taken conferences which were stated as their first conference. the following table shows the selected new conference venue details along with their ranking scores. table xv. new conference venues and corresponding ranking scores conference venue year score 1st international conference on geographical information systems theory, applications and management (gistam) 2015 0.15460470 1st ieee international conference on multimedia big data (bigmm) 2015 0.15156164 1st biomedical linked annotation hackathon (blah) 2015 0.15059667 1st international conference on fundamentals and advances in software systems integration (fassi) 2015 0.15053273 1st international conference on decision support system technology (icdsst) 2015 0.15007116 iv. related work there has been considerable work in the field of academic research. among existing methods, the most widely adopted method for measuring the quality of publication venues is to use garfield’s if. this metric uses the publication citations from only the last two years, which neglects the importance of older papers that they cite. on the other hand, it has been criticized for its only dependency on citation counts [7]. as a result, many alternative methods, e.g., h-index [22], g-index [23], and pagerank algorithm [13], have been used to rank venues [24]. most research work on academic publications uses citation count as the metric. however, metrics like if, h-index and gindex are based on the citation count, and hence would not give accurate results in all scenarios [1]. the number of citations is not a good individual indicator to measure quality of publications, since it does not calculate the importance of the quality of citations [6]. it is important to look at a metric which improving citation network scoring by incorporating author and program committee reputation 10 international journal on advances in ict for emerging regions june 2016 considers the importance of the citing publications to rank the publication being cited. there has been much interest in applying social networkbased methods for generating recommendation and measuring conference quality. a recommender system for academic events and scientific communities based on social network analysis (sna) is presented in [25]. this work regards on coauthorship and citation networks. the system constructs an academic event participating matrix, based on which similarity between any two researchers is computed. to make recommendations to a target researcher, a group of the most similar researchers is first selected and then the rank of upcoming events is determined by their aggregating ratings. zhuang et al. [4] have identified a set of heuristics to automatically determine the quality of the conferences based on characteristics of the pc. this research is completely based on a hypothesis, where the quality of a conference is closely correlated to the reputation of its pc members. the study was unique in the way the authors have brought their views. the heuristics both in combination and isolation have been examined under a classification scheme. in [4], when combined under this scheme, these proposed heuristics achieved a satisfying accuracy in differentiating conferences. these heuristics are also used to rank and recommend conferences. the proposed heuristics rely on the completeness of the list of the pc members. one issue is that a small number of cfps do not have an entire list of pc members. there has been some work done on academic research using the pagerank algorithm [1, 6, 21]. ding et al. [6] used pagerank to rank authors based on the co-citation network. the closest to our work is research work in [1] which uses an efficient approach to rank the papers in various conferences. a modified version of pagerank has been used to rank papers as well as conferences. an important metric in the algorithm which takes the time factor in ranking the papers has been introduced to minimize the bias against new papers which get little time for being cited. using the year of publication of the papers, the year-wise score for each conference venue has been calculated. however, the timing factor is not sufficient for all the publications since new publications only have a few or zero citations. another issue is how to estimate scores for new venues for which citation data are not available using this method. our work is motivated by this work and takes two further steps. to address the weakness of this method, we have proposed an alternative metric which uses an author score derived from citations received for previous publications of the author. we have also introduced a new way to assess the importance of old and new publication venues. v. conclusion and future work we proposed a novel approach for ranking publication venues by considering publication history. the timed pagerank algorithm is not sufficient for all the publications since new publications of recent years only have a few citations. new publications, which may be of high quality and have a few citations, are left behind in this aspect. to assess the relative importance of recent publications, we have adjusted the timed pagerank values with its authors’ past publication scores. in our approach, there are many aspects that contributed to the importance of a publication, including the number of citations it has received, the rating of the citing publications, the time metric and its authors’ reputations. the experimental results indicate that our method reduces the bias against more recent publications, which only have a few citations. the researchers can make better decisions about a particular venue much quicker and easier based on this mechanism. the dblp dataset that we have used only have a few or no citation data after the year 1999. thus we have taken the year 1999 as the margin year to separate the stable publications and the new publications. there is definitely room for improvement on the margin year. the proposed margin year relies on the completeness of the citation data. one issue is that our database does not have a complete list of citations. for example, a quality publication may get a lot of citations from scientific domains that are not included in the dblp dataset. in such cases, it requires further action to harvest citation data before the proposed approach can be applied. the ranking scores for authors were derived from the publication ranking scores till the year 1999 only. using the scores of authors, the lower citation counts of recent publications were adjusted by calculating an average score for each publication after the year 1999. the score for a lower citation count publication was taken as the average score of all the authors who have written that research paper. if there was no author score for a particular author, then we would have ignored that author score and take the average score of other authors. on the other hand, if there were no author scores for all the authors of a particular paper then solution would not have been given. thus we have taken previously measured timed pagerank value as the score of the publication. our smoothing method relies on the generated scores of the authors. in such cases, as mentioned earlier, it requires further action to harvest the citation data as well as author data before proposed approach can be applied. currently, five cfps from eventseer.net were imported into our database. in the future, it would be of interest to add other cfps for venue ranking problem. eventseer.net does not offer a structured dataset like that of the dblp dataset; we have to parse its website to extract the relevant information. regular expressions could be used to process aspects of the cfps text. dblp xml records and eventseer.net need to be combined in one unique dataset. the problem is to connect these two data sources to provide a unique data repository for publications. regular expression could be used to match authors' names and join pc members' names in eventseer.net to dblp dataset. various data refining techniques could be applied to make the analysis more precise. some other data sources like google scholar, citeseer and acm could be integrated into our data repository to make it more complete. currently, data from dblp and eventseer.net is imported into our database. to have better ranking results, we need data from other sources. publication data gathered from the web by a web crawler is also an interesting development direction. acknowledgment we would like to thank all the staff members of the network operating center of the ucsc for providing a virtual server instance to facilitate our research activities. references [1] singh a. p., shubhankar k. and pudi v. (2011). an efficient algorithm for ranking research papers based on citation network. data mining and optimization (dmo), 2011 3rd conference on. ieee, pp. 88-95. 11 dineshi peiris, ruvan weerasinghe june 2016 international journal on advances in ict for emerging regions [2] luong h., huynh t., gauch s., do p. and hoang k. (2012). publication venue recommendation using author network's publication history. intelligent information and database systems. springer, pp. 426-435. [3] luong h. p., huynh t., gauch s. and hoang k. (2012). exploiting social networks for publication venue recommendations. kdir, pp.239-245. [4] zhuang z., elmacioglu e., lee d. and giles c. l. (2007). measuring conference quality by mining program committee characteristics. proceedings of the 7th acm/ieee-cs joint conference on digital libraries. acm, pp. 225-234. [5] garfield e. (1999). journal impact factor: a brief review. canadian medical association journal, 161(8): 979-980. [6] ding y., yan e., frazho a. and caverlee j. (2009). pagerank for ranking authors in co-citation networks. journal of the american society for information science and technology, 60(11): 2229-2243. [7] saha s., saint s. and christakis d. a. (2003). impact factor: a valid measure of journal quality? journal of the medical library association, 91(1): 42. [8] seglen p. o. (1997). why the impact factor of journals should not be used for evaluating research. british medical journal, 314(7079): 497. [9] patterson d. a. (2004). the health of research conferences and the dearth of big idea papers. communications of the acm, 47(12): 23-24. [10] laender a. h. f., de lucena c. j. p., maldonado j. c., de souza e silva e. and ziviani n. (2008). assessing the research and education quality of the top brazilian computer science graduate programs. acm sigcse bulletin, 40(2): 135-145. [11] franceschet m. (2010). the role of conference publications in cs. communications of the acm, 53(12): 129-132. [12] martins w. s., goncalves m. a., laender a. h. and pappa g. l. (2009). learning to assess the quality of scientific conferences: a case study in computer science. proceedings of the 9th acm/ieee-cs joint conference on digital libraries. acm, pp. 193-202. [13] page l., brin s., motwani r. and winograd t. (1999). the pagerank citation ranking: bringing order to the web. stanford infolab, tech. rep. [14] sidiropoulos a. and manolopoulos y. (2006). generalized comparison of graph-based ranking algorithms for publications and authors. journal of systems and software, 79(12): 1679-1700. [15] valmarska a. (2014). analysis of citation networks. diploma thesis, faculty of computer and information science, university of ljubljana. [16] han s., jiang j., yue z. and he d. (2013). recommending program committee candidates for academic conferences. proceedings of the 2013 workshop on computational scientometrics. acm, pp. 1-6. [17] mihalcea r. (2004). graph-based ranking algorithms for sentence extraction, applied to text summarization. proceedings of the acl 2004 on interactive poster and demonstration sessions. association for computational linguistics, pp. 20-23. [18] brin s. and page l. (1998). the anatomy of a large-scale hypertextual web search engine. computer networks and isdn systems, 30(1): 107117. [19] rahm e. and thor a. (2005). citation analysis of database publications. acm sigmod record, 34(4): 48-53. [20] yu p. s., li x. and liu b. (2004). on the temporal dimension of search. proceedings of the 13th international world wide web conference on alternate track papers & posters. acm, pp. 448-449. [21] dellavalle r. p., schilling l. m., rodriguez m. a., van de sompel h., and bollen j. (2007). refining dermatology journal impact factors using pagerank. journal of the american academy of dermatology, 57(1): 116119. [22] hirsch j. e. (2005). an index to quantify an individual's scientific research output. proceedings of the national academy of sciences of the united states of america, 102(46): 16 569-16 572. [23] egghe l. (2006). theory and practise of the g-index. scientometrics, 69(1): 131-152. [24] katerattanakul p., han b. and hong s. (2003). objective quality ranking of computing journals. communications of the acm, 46(10): 111-114. [25] klamma r., cuong p. m. and cao y. (2009). you never walk alone: recommending academic events based on social network analysis. complex sciences. springer, pp. 657-670. international journal on advances in ict for emerging regions 2019 12 (2): classification of voice content in the context of public radio broadcasting g.a.g.s.karunarathna#1, k.l.jayaratne #2, p.v.k.g.gunawardana#3 abstract— with the rapid development of mass media technology, content classification of radio broadcasting has emerged as a major research area facilitating the automation of radio broadcasting monitoring process. this research focuses on the voice dominant content classification of radio broadcasting by employing a multi-class support vector machine (svm) in order to automate monitoring of radio broadcasting in sri lanka. this study investigates the performance of “one vs. one” and “one vs. all” methods known to be two conventional ways to build a multi-class svm. these two multi-class svm models are trained to classify five voice dominant classes as news, conversations, and advertisements without jingles, radio drama and religious programs. one of the substantial measures in creating such a classification is selection of the optimal feature sets. for that purpose, time domain features, frequency domain features, cepstral features, and chroma features are manually analyzed for each binary svm classifier independently. two multi-class svm models are trained based on the selected features and the “one vs. one” model was able to better classify the recordings with an 85% accuracy compared to 83% accuracy achieved by “one vs. all” model. further, the results revealed the importance of careful feature selection in order to achieve higher classification accuracies. keywords— audio monitoring, audio classification, radio broadcasting, audio feature analysis, support vector machines i. introduction the radio is a dynamic and amiable communication device to people for many decades since its invention. unlike other communications devices such as computers and smartphones, anyone can easily use the radio without an age bracket. according to the statistic in 2015, more than half of the population use the radio while they are driving, women tend to listen to the radio while they are cooking, and most of the people listen to the radio even at their workplaces [1]. therefore, unlike the rest of the communication mediums radio plays an important role in sharing information. in radio transmission, radio station and the listeners are the two endpoints. radio stations broadcast unidirectional wireless signals over space to the multitudes of individual listeners with radio receivers. radio stations broadcast a sequence of content categories such as songs, advertisements, news, interviews, conversations, and radio dramas. the number of listeners of a radio channel always relies on the programs that the radio stations are broadcasting. thus, in order to grasp the audience to a program, the program should be performed well, and fit into the audience. hence, for the purpose of measuring the performance of a broadcast program, radio stations need to monitor the broadcasting content regularly. broadcast content monitoring helps to verify when and where the broadcast content placed, protect copyrights by knowing precisely how the content is being used, and measure performance across other broadcast channels [2, 3, 4, 5, 6]. therefore, broadcast content monitoring is a necessary thing for radio stations. furthermore, the stakeholders of radio channels also need to monitor the broadcast content for different purposes such as business, political and legal needs [7, 8]. authorized people in mass media and information corporations need to track the fm channels regularly to ensure whether the broadcasting contents adhere to the rules and regulations and has a diversity of available programs. singers and composers need monitoring of songs to claim their rights [9, 10, 11, 12, 13]. advertising agents are keen on the frequencies of the advertisement broadcasting that have a huge impact on their corporate income. political parties also keep an alert on their name referencing in the radio broadcasting content, especially on the news and political discussions. therefore, it is reasonable to state that many stakeholders are interested in monitoring the radio broadcasting content for a wide range of reasons. in the monitoring of radio broadcasting, both manual monitoring and automated monitoring are used. in manual monitoring techniques such as having an observer to listen to the radio content, reading the attached meta-data, asking for the broadcast report from broadcast stations are used. these techniques become inefficient and resource intense when the amount of content that needed to be monitored is high. in automated radio monitoring processes, well-trained machine monitors the radio content effectively and efficiently than the manual monitoring process. most of the time, developed countries use automated radio monitoring process [14]. as a developing country, sri lanka has not yet established such a technology to monitor radio broadcasts. since there is a large number of radio channels in sri lanka, manual monitoring is not practical. unfortunately, the mechanisms used in developed countries cannot be substituted for fm channels in sri lanka, due to the differences in languages and pronunciations. hence, it is imperative for the sri lankan broadcasting context to have an automatic radio monitoring program. as the initial step to build an automated monitoring process, identifying different content classes (i.e. songs, advertisements, news, interviews, conversations, and radio dramas) in radio broadcasting content is essential. when analyzing the current situation of the above-mentioned problem, classifying broadcast context for onset detection is recognized as the closest research work [15]. onset detection manuscript received on 22nd feb. 2019. recommended by dr. d.n. ranasinghe on 30th dec. 2019. g.a.g.s.karunarathna, k.l.jayaratne, p.v.k.g.gunawardana are from the university of colombo school of computing, sri lanka. (gothamikarunarathna@gmail.com, klj@ucsc.cmb.ac.lk, kgg@ucsc.cmb.ac.lk). classification of voice content in the context of public radio broadcasting 2 international journal on advances in ict for emerging regions december 2019 is the mechanism which is used to identify the places where the content changes are happening in a musical note or other sound streams. the researchers have proposed a unified methodology to automate radio broadcasting monitoring which detects onsets of radio broadcasting context with the assist of the classification of the broadcasting content. the proposed mechanism distinguishes songs, commercial advertisements with jingles, news, and other contents in a radio stream. however, the issue with this unified method is, it is unable to identify voice dominant content classes in the broadcasting context. hence, the classification of different voice dominant contents in the radio broadcasting stream such as news, advertisements without jingles, conversations, radio dramas, and religious programs are identified as the knowledge gap in between the requirement and existing solutions. therefore, in the context of radio broadcasting, this research proposes a methodology to classify voice dominant contents in radio broadcasting. ii. related work as audio classification has emerged as a demanding research area, a considerable amount of related works can be examined. based on the approaches that the researchers followed in audio classification, these works can be divided into two parts as algorithmic approaches and machine learning approaches. a. algorithmic approaches lie lu, stan z. li and hong-jiang zhang [21] proposed an algorithm which is able to classify an audio stream into speech, music, environmental sounds and silence. silence detection is performed based on short-time energy and zerocrossing rate (zcr) in a one-second window. linear spectral pairs (lsp) distance analysis is used to apply refinements over the proposed algorithm. the result of this research has some misclassifications between music and environment sound due to the overlaps in the distribution of the features. an algorithm for discriminating speech from music on broadcast fm radio based on zcr of the time domain waveform is proposed by john saunders [16]. this technique emphasized the characteristics of speech such as limited bandwidth, alternate voiced and unvoiced sections, energy contour between high and low levels are well capable of separating speech from music. barzilay et al [17] proposed an algorithmic approach for the speaker’s role identification in radio broadcasting context. this approach classified anchor, journalist and guest programmer by considering lexical features, features from the surrounding context and the duration features. though the algorithmic approaches show promising results, when the number of classes in the classification is increasing the problem becomes non-trivial. identifying the threshold values to discriminate each class is also difficult. in order to avoid these negatives on the algorithmic approaches, researchers recently moved to machine learning approaches. b. machine learning approaches the machine learning community has done numerous works the under both supervised learning and unsupervised learning. the learning approaches associated with supervised learning are neural networks (nn), hidden markov model (hmm), and support vector machine (svm). unsupervised learning approaches are k means and gaussian mixture model (gmm). since our problem domain refers to the supervised learning approaches, more attention goes to nn, hmm, and svm. the most recent and closest work of addressing the same problem is, “classification of public radio broadcast context for onset detection” conducted by c. weeratunga et al [15]. in this approach, the onset detection mechanism along with a classification model is proposed to predict four classes (i.e. songs, voice-related segments, news, and radio commercials). a supervised neural network model with 38 extracted features has included in the classification framework. radio commercials, songs, news, and other voice contents are classified with accuracies of 76%, 75%, 41%, and 59% respectively. in this approach, the output of the onset detection largely depends on the accuracy of the classification. currently, it has 82% accuracy for onset detection with respect to prior mentioned audio classes in radio broadcasting context. in order to automate the radio broadcasting monitoring process, the existing onset detection method should be improved. therefore as a further step, the focus should be on the voice dominant content classification in radio broadcasting events. another supervised neural network approach has used by khan et al [22] to classify speech and music. as the classification framework, multilayer perceptron neural network and back-propagation learning algorithms are used. the experimental results have shown an overall accuracy of 96.6%, with 100% accuracy in recognizing music from speech. in the research done by r. kotsakis, g. kalliris, c. dimoulas [14], various audio pattern classifiers in the broadcast-audio semantic analysis are investigated using radio program-adaptive classification strategies with supervised ann system. in the evaluation, kotsakis et al found ann and knn classifiers quite effective than tree complex and smo methods. kons et al [23] suggested a deep neural network (dnn) as a solution for classifying four classes as the crowd of people, cars/road noises, applause yelling/cheering, and various kinds of music recorded in outdoor. the overall performance of the dnn classifier achieved the best in most of the classes, except for the music class where the svm performs better. same as neural networks approaches, the hidden markov model is also shown high performance in radio broadcasting content classification problems. hmm is used in a radio commercial classification by g.koolagudi et [19]. as they observed, in some situations where ann failed (i.e. background music follows an advertisement), hmm performed well. another work related to hmm has conducted by yang liu [18], identify the roles of speakers in radio broadcasting news contents. well-structured news content is used in this research which highlights the speaker role sequences. accuracy of 80% is obtained and they found the beginning and the end of the sentences in the voice of the speaker as a good heuristic for role identification. svm basically designs for binary classification problems. as extensions, multi-class svm obtains by compromising set of binary svm classifiers. there are 3 ways to design multiclass svms as one vs. all, one vs. one, and dagsvm [24]. the main advantage of svm when compared to other 3 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions machine learning approaches is that svm performs much better in many cases because it finds the best hyperplane/s that separates all data into different classes, no matter even the dataset is small [25]. aurino et al [26] have proposed a oneclass svm based approach to detect anomaly events that are considered as abnormal sounds in the environment like a gunshot, screaming and broken glass. the proposed methodology consists of two stages. at the first stage, the researchers introduced a new mechanism called “majority voting and rejection” to classify short time frames into predefined classes. at the second stage, aggregated the results of the first stage into longer time frames and reclassified. in the work of bouril, a. et al [27], 3000 phonocardiograms from 9 locations of the body of both adults and children were taken to identify normal and abnormal heart sounds using svm. here, 74 features of time and frequency domain were considered. the svm model was utilized by a gaussian kernel where it allows three different classifications; -1 for the normal heartbeat, 0 for ambiguous sounds due to noise and 1 for abnormal heartbeat sound. in this research, a binary svm is chosen to be effective in normal and abnormal classification. audio-based event detection in live office environments using optimized mfcc features with svm model has implemented by kucukbay et al [28]. sixteen classes such as alert beeping, clear throat, keyboard and switch on/off sounds were classified. one vs. all multi-class svm is used as the classifier. martin morato et al [29] conducted a case study on feature sensitivity for audio event classification using one vs. all multi-class svm. same as the above [28], sixteen classes have been differentiated by mfcc features and mfe features using 2.5s frame length with 1s overlaps and 44.1 khz sampling frequency. wang, j.et al [30] have used a frame based multi-class svm classifier to differentiate fifteen audio classes including both male, female voices. a frame-based classifier segmented one audio file into several frame sizes and trained the classifier for each. even though this method improves accuracy from 13.9%, the preprocessing and training time was considerably high. lie lu, stan z. li and hong-jiang zhang [31] have proposed a method called hierarchical binary support vector machine for employing an audio segmentation and classification. here the researchers furthermore considered five pre-defined classes as silence, music, background sound, pure speech and non-pure speech including speech over music and speech over the noise. in the evaluation, it has shown the accuracy of the svm based method is better than the method based on knn and gmm. but the major disadvantage in this approach is misclassifications of upper levels can be propagated to the classifiers at the lower level. since the broadcasting fm channels demand the news content classification of broadcasting context, vavrek, j. et al [20] also proposed a hierarchical tree to address the news content classification problem. this hierarchical classification strategy is used as a particular feature set for each svm binary classifier. therefore, the f-score feature selection algorithm is used to obtain optimal features for each svm. the drawback of this work is the error of upper levels of the tree were propagated to the bottom levels of the tree. to prevent that, misclassifications of upper levels have not considered. the work of zhu, y., ming, z. , q. huang [32] is classified six audio classes using clip based svm method. here, the researchers classified pure speech, music, silence, environmental sounds, speech with music, and speech with environmental sounds. the key finding of this work is, the researchers found that the performance of svm shows good results in similar cases than decision trees, knn, and neural networks. the potentials of these approaches vary from problem domain to domain. based on the research question and past studies in the domain, the following choices were made in order to carry out this study. since the dataset consists of a set of pre-defined classes, a supervised learning approach is proposed for the classification. therefore unsupervised classifiers were eliminated. as mentioned before, unique sets of features to discriminate each class from the rest has identified. hence, if all the features are input to the classification model together, it will reduce the accuracy of the model because of some irrelevant features input to the classification of some classes. therefore, another facet of this research is input different feature sets to discriminate each class. when considering the ann approach, it is impossible to provide unique sets of features for each class separately. moreover, according to the research conducted by c. weeratunga et al [15], the ann model is not the best approach to distinguish voice dominant categories such as news. in other hands, hmm was rejected in view of the fact that the sequence of the audio events appearing is not beneficial to our problem. accordingly, the svm classification model is selected after considering all aspects. since svm's are originally designed for binary classification, the multi-class svm builds as a compound of binary svm classifiers. as we already identified specific features for each class, we can input only the relevant features separately in the case of using a multi-class svm model because it holds multiple binary svm models. accordingly, multi-class svm is chosen as the most suitable classifier which fits into our problem domain. as it is a composition of several binary svms, multi-class svm can be designed as one of the following methods [24],  one vs. one  one vs. all  dynamic acyclic graph svm (dagsvm) one vs. all constructs n number of binary svm models where it has n number of classes. every single binary svm is trained with all of the data in the one class with positive labels and rest with negative labels. the decision function which has the largest value is taken as the predicted class. one vs. one constructs n(n-1)/2 number of binary svm models where each one is trained only for two classes and a class is predicted using the "max-winning" strategy. same as one vs. one, dagsvm also constructs n(n-1)/2 number of binary svm models where each one is trained for two classes. these binary svms are structured as a top to the bottom hierarchical tree where it has (n-1) number of leave nodes. it starts at the root node, then a binary decision function is evaluated, and it moves to either left or right depending on the output value of the previous node. since the dagsvm is a hierarchical graph, the misclassifications of upper levels in the graph can propagate to lower levels in the graph [33]. this will lead to an erroneous situation. hence dagsvm was rejected at the very first step. one vs. one and one vs. all both have benefits as well as limitations [24]. it depends on the application domain. hence classification of voice content in the context of public radio broadcasting 4 international journal on advances in ict for emerging regions december 2019 this research attempts to obtain the most reliable method by modeling the multi-class svm in both ways. iii. proposed approach the proposed approach mainly focuses on the classification of different voice dominant classes in radio broadcasting of sri lanka. since the dataset consists of a set of pre-defined classes (i.e. news, advertisements without jingles, radio dramas, conversations, and religious programs), a supervised learning approach is proposed initially for the classification. slbc (sri lanka broadcasting cooperation) audio recordings are used as the dataset in order to represent all the sri lankan fm channels. the length of the dataset is 5 hours and 50 minutes and contained both male and female voices. as shown in figure 1, initially the whole dataset is divided as 60% and 40% for training and evaluation purposes respectively. again the training dataset is divided into 70% and 30% for training and testing respectively. the number of frames consists of training and the testing dataset is given in table 1 and table 2. the length of the frame is 5s. a quantitative interpretation of audio data is required for the analysis to identify the most suitable features for distinguishing each class separately. this research specifically focuses on the analysis of time series and frequency series of audio signals. figure 2 depicts the design of the proposed approach. a. feature analysis features are used to capture the measurable information in the dataset. in a classification, identifying the most appropriate features is essential for differentiate one class from another. in order to select the appropriate features, the dataset should be thoroughly analyzed. here, altogether 34 features used in the most recent and relevant past study [15] are analyzed. these features belong to time domain features, frequency domain features, cepstral features, and chroma features. figure 2: design overview the novelty of the research is that, rather than feeding all the features together, specific sets of features are fed separately into each class. this assists to avoid the input of unnecessary features, reduce dimensions, and make classification faster and more accurate. since a multi-class svm holds multiple binary svms, one of the advantages of using a multi-class svm is that it can input specific features for each binary svm separately. since this research compare the performance of two types of multi-class svm models, the feature selection carried out separately for both multi-class svm models. as illustrates in table 3, binary svm models used to construct multi-class svm models are trained to classify different class pairs. therefore, for each binary svm model, the features are identified by the class pairs that are to be classified. table 3: binary classifiers of two multi-class svms multi-class svm binary classifiers identical classes one vs one svm 1 news vs. advertisements svm 2 news vs. conversations svm 3 news vs. radio drama svm 4 news vs. religious program svm 5 advertisements vs. conversations svm 6 advertisements vs. radio drama svm 7 advertisements vs. religious program figure 1: dataset partition table 2: number of frames in the testing dataset table 2: number of frames in the training dataset 5 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions multi-class svm binary classifiers identical classes svm 8 conversations vs. radio drama svm 9 conversations vs. religious program svm 10 radio drama vs. religious program one v s all svm 1 news vs. others svm 2 advertisements vs. others svm 3 conversations vs. others svm 4 radio drama vs. others svm 5 religious program vs. others 1) feature analysis: one vs. one according to table 3, the best features are analyzed to distinguish ten class pairs by looking at the spectrums. for that purpose, with the class pairs, 10 audio clips are prepared as in figure 3, where one class is 15 minutes long. by observing spectrums of each pair, 24 of the 34 features are selected by eliminating the features that do not show a spectrum discrimination pattern for any class pair. as an example, figure 4 shows the spectrum of energy feature which is selected to distinguish advertisements and religious programs. figure 4: energy feature spectrum for advertisements vs. religious programs then ranked the selected features by calculating the score of the feature importance. feature importance is scored using a tree-based classifier, which provides a measurement of the relevance of a feature towards the output variable. it is an inbuilt class provided by the scikit-learn library. table 4 shows the selected features for one vs. one model with the ranks. 2) feature analysis: one vs. all as shown in table 3, one vs. all method required only five binary classifiers because it constructs the number of binary svms equal to the number of classes. each classifier allocates for the classification of one class. here, the relevant features for distinguishing a class from the rest were analyzed. for that, 2 hours and 5 minutes long audio clip are prepared as figure 5, which includes all classes of 25 minutes per class. figure 6 shows a pattern obtained from the frequency spectrum of spectral entropy against the five classes. using this 2 hours and 5 minutes lengthened sample audio clip, 34 features are analyzed and 24 features are selected as shown in table 5. figure 3: audio clip structure designed for analyze features of two classes table 4: selected features for one vs. one model with ranks features binary svms in one vs. one 1 2 3 4 5 6 7 8 9 10 zcr 7 energy 7 3 4 1 3 energy entropy 1 3 5 4 spectral centroid 2 3 4 2 1 3 6 spectral spread 4 1 6 1 2 1 spectral entropy 1 1 4 1 spectral flux 6 spectral roll off 4 4 3 mfcc 1 7 mfcc 2 3 5 mfcc 3 5 2 4 2 mfcc 4 3 5 1 2 4 8 mfcc 5 6 7 mfcc 6 6 mfcc 7 5 5 mfcc 8 6 7 6 8 mfcc 9 2 2 2 6 3 mfcc 10 7 3 7 mfcc 11 6 6 5 mfcc 12 5 2 mfcc 13 5 9 chroma vector 1 chroma vector 2 5 chroma vector 3-9 chroma vector 10 7 chroma vector 11 4 chroma vector 12 chroma std figure 5: audio clip structure designed for analyze features of one class from rest classification of voice content in the context of public radio broadcasting 6 international journal on advances in ict for emerging regions december 2019 figure 6: spectral entropy for five classes table 5: selected features for one vs. all model with ranks features binary svms in one vs. all 1 2 3 4 5 zcr 4 7 energy 5 2 energy entropy 8 13 8 spectral centroid 2 5 spectral spread 5 1 spectral entropy 4 1 spectral flux 12 9 spectral roll off 5 3 mfcc 1 2 mfcc 2 1 6 mfcc 3 7 4 3 mfcc 4 1 3 mfcc 5 10 9 mfcc 6 10 mfcc 7 10 11 mfcc 8 9 2 mfcc 9 2 7 6 4 mfcc 10 6 8 mfcc 11 3 7 6 mfcc 12 3 1 mfcc 13 5 chroma vector 1-2 chroma vector 3 4 chroma vector 4-6 chroma vector 7 11 chroma vector 8-10 chroma vector 11 6 chroma vector 12 chroma std b. data pre-processing data pre-processing helps to create raw data from audio files in a consistent way before extracting the features. as in figure 2,  in the data formatting stage, the data files convert to the .wav file format. then the monophonic channel is chosen as the channel type and 44.1 khz is selected as the sample rate [34].  in the data annotation, manually listen to the audio clips using “audacity” tool, segment them into different classes and label with relevant class labels.  then, remove silence only from news and conversations. the reason for remove silence from news and conversations will be described in section iv. c. feature extraction in audio classification, feature extraction is the most important component. frame blocking before extracting the features. the audio waves are framed into 5s of blocks. then extract the best subset of features from selected features. extracted features are expressed as feature vectors. table 6 lists all the features that are extracted in both one vs. one and one vs. all. d. classification as stated in the related work, the multi-class svm classifier is selected as the best approach which fits into our problem domain. to acquire higher performance, one vs. one and one vs. all multi-class svm models are parallelly implemented and evaluated. table 6: extracted features for each binary svm multi -class svm binary svms no. of features identical classes one vs one svm 1 7 chroma2, chroma10, chroma11, energy entropy, mfcc4, mfcc8, mfcc9 svm 2 7 mfcc2, mfcc4, mfcc5, mfcc 10, spectral centroid, spectral entropy, spectral rolloff svm 3 7 energy, mfcc4, mfcc6, mfcc 9, mfcc12, spectral centroid, spectral spread svm 4 6 energy, mfcc3, mfcc9, mfcc11, spectral centroid, spectral spread svm 5 7 mfcc2, mfcc5, mfcc10, spectral centroid, spectral entropy, spectral flux, , spectral rolloff svm 6 7 energy entropy, mfcc4, mfcc7, mfcc 8,spectral centroid, spectral spread, spectral entropy svm 7 7 energy, mfcc3, mfcc7, mfcc9, mfcc10, spectral spread, spectral centroid svm 8 6 mfcc4, mfcc8, mfcc12, mfcc13, spectral entropy, spectral rolloff svm 9 8 energy, energy entropy, mfcc1, mfcc3, mfcc4, mfcc9, mfcc11, spectral spread svm 10 9 energy, energy entropy, mfcc3, mfcc 11, mfcc13, mfcc8, spectral centroid, spectral spread, zcr one vs all svm 1 6 chroma3, chroma11, energy, mfcc4, mfcc9, mfcc11 svm 2 10 energy entropy, mfcc1, mfcc2, mfcc 5, mfcc8, mfcc9, mfcc10, mfcc12, spectral entropy, spectral rolloff svm 3 13 chroma7, energy entropy, mfcc2, mfcc 3, mfcc5, mfcc7, mfcc 10, spectral entropy, spectral centroid, spectral flux, spectral rolloff, spectral spread, zcr svm 4 7 mfcc3, mfcc4, mfcc8, mfcc9, mfcc11, mfcc12, mfcc13 svm 5 11 energy, energy entropy, mfcc3, mfcc 6, mfcc7, mfcc9, mfcc11, spectral flux, spectral centroid, spectral spread , zcr 7 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions 1) one vs. one model in this approach, n(n-1)/2 number of binary svms are implemented to classify n number of classes. therefore, we design ten binary svms where each svm classifies a pair of classes as shown in table 3. svm classifies ith and jth classes for a data point d = (xt, yt) as follows, (𝑤 𝑖𝑗 )𝑇 𝜑(𝑥𝑡 ) + 𝑏 𝑖𝑗 called as decision boundary where wij is the weight vector, xt is the input vector, b ij is the bias, and data xt is mapped to a higher dimensional space by the function φ. the motivation behind the svm is maximizing the decision boundary between two classes. the maximized decision boundary for ith and jth classes acquired by minimizing the magnitude of wij. hence, to find the maximum margin, the magnitude of wij should be minimized as in the equation (3). when the data is non-linearly separable, 𝐶 ∑ 𝜀𝑡 𝑖𝑗 𝑡 is introduced as the penalty terms to reduce the number of training errors. 𝑚𝑖𝑛𝑤𝑖𝑗,𝑏𝑖𝑗,𝜀𝑖𝑗 1 2 (𝑤𝑖𝑗 ) 𝑇 (𝑤𝑖𝑗 ) + 𝐶 ∑ 𝜀𝑡 𝑖𝑗 𝑡 (3) one vs. one model builds ten binary svms to classify five classes. since there are ten decision boundaries, the predicted class for a particular data point is identified using a voting strategy called “max winning” strategy. if the decision boundary says the data point belongs to ith class, then vote for the ith class. otherwise, vote for the jth class. then the class with the maximum votes is taken as the predicted class. 2) one vs. all model one vs. all method constructs n number of binary svms where it has n number of classes to classify. therefore, five svm classifiers are designed as shown in table 3. each svm is trained with the whole dataset where the data belongs to ith class with positive labels and remain of the data with negative labels. an svm solves data point d = (xt, yt) for i th class according to the following equations equation (4) and equation (5). to find the maximum margin, the magnitude of wi should be minimized as in the equation (6) where c is the constant used to reduce training error. 𝑚𝑖𝑛𝑤𝑖,𝑏𝑖 ,𝜀𝑖 1 2 (𝑤𝑖 ) 𝑇 (𝑤𝑖 ) + 𝐶 ∑ 𝜀𝑡 𝑖 𝑡 (6) one vs. all model implements five binary svms to classify each class individually. after training five classifiers, the class of a data point x is predicted by finding the decision boundary which has the maximum value. equation (7) gives the prediction function for data point x. 𝑐𝑙𝑎𝑠𝑠 𝑜𝑓 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥 ((𝑤𝑖 ) 𝑇 𝜑(𝑥𝑡 ) + 𝑏 𝑖 ) (7) e. evaluation in evaluation, the performance of one vs. one and one vs. all multi-class svm models are evaluated. ground truth data is required to evaluate the accuracies of the two models. 40% of the total data that is never in the training set is taken as the ground truth data. the ground truth data contains 28 minutes long audio recordings of each class (news, conversations, advertisements, drama, and religious programs). “audacity” tool is used to annotate the ground truth data. the models are evaluated under different criteria as depicted in figure 7. by increasing the features, the performances of the models are evaluated. additionally, the two models are evaluated using selected frame lengths, selected sample rates, before silence removal and after silence removal. the performances of the two models are presented using graphs and confusion matrices. necessary 𝑖𝑓 (𝑤𝑖𝑗 )𝑇 𝜑(𝑥𝑡) + 𝑏 𝑖𝑗 ≥ 1 − 𝜀𝑡 𝑖𝑗 ; 𝑦𝑡 = 𝑐𝑙𝑎𝑠𝑠 𝑖 (1) 𝑖𝑓 (𝑤𝑖𝑗 )𝑇 𝜑(𝑥𝑡 ) + 𝑏 𝑖𝑗 ≤ −1 + 𝜀𝑡 𝑖𝑗 ; 𝑦𝑡 = 𝑐𝑙𝑎𝑠𝑠 𝑗 (2) 𝑖𝑓 (𝑤𝑖 )𝑇 𝜑(𝑥𝑡 ) + 𝑏 𝑖 ≥ 1 − 𝜀𝑡 𝑖 ; 𝑦𝑡 = 𝑐𝑙𝑎𝑠𝑠 𝑖 (4) 𝑖𝑓 (𝑤𝑖 )𝑇 𝜑(𝑥𝑡) + 𝑏 𝑖 ≤ −1 + 𝜀𝑡 𝑖 ; 𝑦𝑡 ≠ 𝑐𝑙𝑎𝑠𝑠 𝑖 (5) figure 7: evaluation plan figure 7: evaluation plan classification of voice content in the context of public radio broadcasting 8 international journal on advances in ict for emerging regions december 2019 figure 8: changing of precision when increasing features of binary svms in one vs. one model 9 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions refinements for both models are made based on the evaluation results. the precision value is taken to measure performance. iv. experiments and results models were initially evaluated using data that were not used in the training phase. all the experiments mentioned in section iii.e were repeated for five times and the average of the results was calculated to determine the overall performance of the system. according to the results, necessary refinements were done to the classification model. a. increase the number of features this method is used to prevent "diminishing returns". first, features selected for each svm are ranked according to the score value of feature importance. thereafter, many rounds of experiments were conducted by increase the input feature count in order to determine the optimal feature set. figure 8 and figure 9 illustrate the obtained results for one vs. one and one vs. all models. these figures indicate less number of features can achieve the highest precision value. the minimal, required subset of features for each svm is selected through this process. the obtained features are listed in table 6. b. different frame sizes selecting a correct frame size to extract the features is essential when comes to a classification problem. in the closest work to this research done by c. weeratunga et al [15] has used 2.5s as the frame size. other than that, the works that are found in the literature have used different frame sizes such as 25ms, 0.25s, and 0.3s etc. therefore, the model is evaluated with respect to different frame sizes and reports the results for the chosen frame sizes 0.25s, 2.5s, 4s, and 5s. increasing the frame size more than 5s is impossible in this case since some of the data segments in the dataset has the length in between 5s and 6s. when changing the frame size, the rest of the model's parameters such as k value and sample rate were kept constant. the obtained results are shown in figure 9: : changing of precision when increasing features of binary svms in one vs. all model classification of voice content in the context of public radio broadcasting 10 international journal on advances in ict for emerging regions december 2019 figure 10. length of 5s frame is selected as the best frame size. c. different sample rates fm radio channels have a bandwidth of 15 khz approximately. bandwidth is the difference between the highest and lowest frequencies carried in an audio stream. according to nyquist shannon theorem, the highest frequency is half of the sample rate. practically, the highest frequency for a radio stream is in between 22050 hz 20000 hz because the highest audible frequency of a human is 20000 hz [34]. thus, logically the best sample rate for our study is 44100 hz. in addition to that, 16000 hz and 22050 hz were also used as the sample rates in previous works related to radio broadcasting classification. weeratunga et al [15] proposed 22050 hz as the sample rate, john saunders [16] and vavrek, j. et al [20] proposed 16000 hz as the sample rates for their studies. therefore, we evaluate the models with sample rates 16000 hz and 22050 hz, and 44100 hz to find the most reliable sample rate for the research. figure 11 illustrates the changing of performance against different sampling rates for both one vs. one and one vs. all models. d. with silence removing in the data pre-processing phase, silence removal is done only to news and conversation contents because they have long silence periods within an audio clip. for the evaluating purposes, the model is evaluated without silence removal and with silence removal from all classes. figure 12 and figure 13 show the changing of the performances of the classification without silence removal and with silence removal in one vs. one model and one vs. all model respectively. as shown in figures, after the removal of the silence from all data, performances increased only in news and conversations. therefore, we decided to remove silence only from news and conversations. v. evaluation one of the main aspects of this research is to select the optimal subset of features for classification. table 7 and table 8 provide the training accuracy (precision value) of each svm when using all features and the selected subset figure 13: variation of the precision values of one vs. all against silence removal figure 10: variation of precision against frame size figure 10: variation of precision value against frame sizes figure 11: variation of precision against sample rate figure 11: variation of precision value against sample rates figure 12: variation of the precision values of one vs. one against silence removal 11 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions of features. this indicates that the overall performance of the models increases with the optimal subset of features. table 7: training accuracies with all the features and the optimal subset of features of one vs. one model models accuracy using all the features (precision) accuracy using the optimal subset of features (precision) news/ advertisement 85% 87% news/ conversation 88% 92% news/ drama 90% 92% news/ religious program 99% 100% advertisement/ conversation 89% 92% advertisement/ drama 91% 93% advertisement/ religious program 98% 99% conversation/ drama 84% 87% conversation/ religious program 95% 95% drama/ religious program 97% 99% one vs. one 81% 85% table 8: training accuracies with all the features and the optimal subset of features of one vs. all model models accuracy using all the features accuracy using the optimal subset of features news/ other 76% 82% conversation/ other 82% 92% advertisement/ other 86% 91% drama/ other 79% 82% religious program/ other 93% 96% one vs. all 78% 83% after applying the optimal features with selected parameters, the performance of one vs. one and one vs. all models show respectively in table 9 and table 10. table 9: confusion matrix of one vs. one model predicted class support precision news conv advert drama relprog true class news 260 8 43 6 0 317 81% conv 25 247 14 33 6 325 85% advert 28 12 242 22 3 307 80% drama 7 13 6 290 0 316 82% relpog 0 7 0 2 314 323 98% overall results 85.20% table 10: confusion matrix of one vs. all model when considering the results of both models, a considerable amount of news frames has misclassified as advertisements, while the conversations and advertisements have misclassified as news and drama. the religious program has classified better than others. even though the news reading can be considered as monotonic, in some scenarios conversations and drama also have a monotonic nature as news. even in the advertisements show a monotonic nature after removal of music and jingles. therefore, the monotonic nature of news, conversations, advertisements, and drama might be the reason for these misclassifications. table 9 and table 10 shows the obtained accuracies of each binary svm in both models. as included in table 11, even though ten binary svms trained with high training accuracies, after combining the ten models together the accuracy of the alliance is degraded. the reason might be the ``max winning" strategy that uses to predict the classes in one vs. one. in ``max winning" strategy when computing the mode, if the maximum number of votes is equal to two classes, then it outputs only one class which appears first in the array. this might be caused to degrade the final result of one vs. one svm. in our training dataset, approximately 13% of data has faced this issue when the frame length is 0.25s. but when we increase the frame length, the proportion of identical classes decreased to 4% as shown in figure 14. table 11: overall results of one vs. one model models accuracy precision recall f1-score news/ advertisement 87% 87% 87% 87% news/ conversation 90% 92% 91% 91% news/ drama 92% 92% 92% 92% news/ religious program 99% 100% 100% 100% advertisement/ conversation 91% 92% 92% 92% advertisement/ drama 94% 93% 94% 94% advertisement/ religious program 99% 99% 99% 99% conversation/ drama 85% 87% 86% 87% conversation/ religious program 95% 95% 95% 95% drama/ religious program 99% 99% 99% 91% one vs. one 85% 85% 85% 85% table 12: overall results of one vs. all model models accuracy precision recall f1-score news/ other 86% 82% 75% 78% conversation/ other 91% 92% 78% 84% advertisement/ other 91% 91% 79% 85% drama/ other 88% 82% 81% 82% religious program/ other 98% 96% 97% 97% one vs. all 83% 83% 83% 83% as depicted in table 12, one vs. all model shows less accuracy than one vs. one model. one disadvantage of one predicted class support precision news conv advert drama relprog true class news 265 9 38 5 0 317 73% conv 38 247 4 34 2 325 85% advert 44 11 198 51 3 307 80% drama 13 12 6 283 2 316 76% relpog 1 11 0 0 311 323 98% overall results 83.37% classification of voice content in the context of public radio broadcasting 12 international journal on advances in ict for emerging regions december 2019 vs. all model is when analyzing the features, each binary svm in one vs. all model must look at the features as one class against the four classes. therefore, to choose the best, it is difficult to observe the distinguishing features for one class versus four. but when analyzing features for one vs. one model, features should be analyzed against two classes each time whenever it is easy to identify a pattern that distinguishes two classes. another drawback of one vs. all model is that it takes high training time since each binary svm classifier in one vs. all model requires a complete dataset for individual training. but the binary svm classifiers in one vs. one model takes less training time compared to the binary svms in one vs. all model as it requires data only from two classes for training. according to table 9 and table 10, the one vs. one model achieved 85% of overall precision and the one vs. all model achieved 83% of overall precision. the obtained results of this study guide us to find the most suitable multiclass svm for this problem domain. according to the performance of these two models, the one vs. one model with 85% of precision, is chosen as the most appropriate model for this research. vi. conclusion and future work the main aim of this research is to identify voice dominant content categories to automate the radio broadcasting context in sri lanka. for that, a multi-class svm was proposed. multi-class svm was built using two conventional ways, “one vs. one” and “one vs. all” and compared the performance to find the best model for this domain. the novelty of this approach is that instead of feeding all the features once, only selected features were fed separately to each classifier in the model. the performance of these two models was evaluated under different criteria. one vs. one model successfully classified the pre-defined content categories with the accuracies of 81% for news, 85% for conversations 80% for advertisements, 82% for drama and 98% for religious programs. the one vs. all model successfully classified the categories with the accuracies of 73% for news, 85% for conversations, 80% for advertisements, 76% for drama and 98% for religious programs. the final overall accuracies of the one vs. one and one vs. all models are 85% and 83% respectively. moreover, this proposed methodology is able to increase the classification accuracy of news contents to 81% and the accuracy of the existing methodology [15] was 41%. the major limitation of this research is that the model is trained and tested only for “slbc” radio fm channel. however, this study creates a platform to further generalize this model to all sri lankan fm channels. another limitation is that restricts the data of each class to 1 hour and 10 minutes. the reason is that religious programs were unable to provide data for more than 1 hour and 10 minutes long. therefore, the data of the rest of the classes were also limited to 1 hour and 10 minutes to avoid the proportional bias in the dataset. therefore, using more training data for classification to improve performance is a good choice. in addition, identifying the most prominent features yields more accurate results. references [1] c. r. s. celebrating radio: statistics / world radio day 2015, 2018. [online].available:http://www.diamundialradio.org/2015/en/content/c elebrating-radiostatistics.html [2] nishan, w. senevirathna, and k. l jayaratne, “a highly robust audio monitoring system for radio broadcasting, proceedings of sixth annual international conference on computer games, multimedia and allied technology” gstf journal on computing (joc), vol. 3, no. 2, pp. 8798, 2013. [3] n. senevirathna and k. l jayaratne, “automated content based audio monitoring approach for radio broadcasting,” proceedings of sixth annual international conference on computer games, multimedia and allied technology (cgat 2013), singapore, pp. 110–118, cgat, 2013. [4] e. n. w. senevirathna and k. l. jayaratne, “audio music monitoring: analyzing current techniques for song recognition and identification,” gstf journal on computing (joc), vol. 4, no. 3, pp. 23-34, 2015. [5] e. d. n.w. senevirathna and k. l jayaratne, “automated audio monitoring approach for radio broadcasting in sri lanka,” proceedings of international conference on advances in ict for emerging regions (icter 2017), sri lanka, pp. 92–98, 2017. [6] e.d.n.w. senevirathna and lakshman jayaratne (2018): radio broadcast monitoring to ensure copyright ownership. international journal on advances in ict for emerging regions (icter), 11(1) [7] dhanith chaturanga and lakshman jayaratne (2013): automatic music genre classification of audio signals with machine learning approaches. international journal of computing (joc) by global science and technology forum (gstf), 3(2):137-148 [8] dhanith chaturanga and lakshman jayaratne (2012): musical genre classification using ensemble of classifiers. proceedings of fourth international conference on computational intelligence, modeling and simulation (cimsim 2012), kuantan, malaysia. [9] rajitha amarasinghe and lakshman jayaratne (2016): supervised learning approach for singer identification in sri lankan music. european journal of computer science and information technology (ejcsit) by european centre for research training and development uk, 4(6):1-14 [10] rajitha peiris and lakshman jayaratne (2016): musical genre classification of recorded songs based on music structure similarity. european journal of computer science and information technology (ejcsit) by european centre for research training and development uk, 4(5):70-88 [11] tharika madurapperuma, gothami abayawickrama, nesara dissanayake, viraj b. wijesuriya and k. l. jayaratne (2017): highly efficient and robust audio identification and analytics system to secure royalty payments for song artists, proceedings of ieee international conference on advances in ict for emerging regions (icter 2017), sri lanka, 149-157. [12] rajitha peiris and lakshman jayaratne (2016): supervised learning approach for classification of sri lankan music based on music structure similarity, proceedings of ninth annual international figure 14: proportion of the identical classes against the frame length figure 14: proportion of the identical classes against the frame length 13 g.a.g.s.karunarathna #1, k.l.jayaratne #2, p.v.k.g.gunawardana #3 december 2019 international journal on advances in ict for emerging regions conference on computer games, multimedia and allied technology (cgat 2016), singapore, 84-90. [13] m. g. viraj lakshitha and k. l. jayaratne (2016): melody analysis for prediction of the emotion conveyed by sinhala songs, proceedings of ieee international conference on information and automation for sustainability (iciafs 2016), sri lanka. [14] r. kotsakis, g. kalliris, and c. dimoulas, “investigation of broadcastaudio semantic analysis scenarios employing radio-programmeadaptive pattern classification,” speech communication, vol. 54, no. 6, pp. 743–762, 2012. [15] c.o.b. weerathunga, p.v.k.g. gunawardena and k.l. jayaratne (2018): classification of public radio broadcast context for onset detection. european journal of computer science and information technology (ejcsit) by european centre for research training and development uk, 7(6):1-22, published by ecrtd – uk, issn2054 – 0957 print 2054 – 0965 online, www.eajournals.org, 13 duncan rd, gillingham kent me7 4 la, uk. [16] j. saunders, “real-time discrimination of broadcast speech/music,” in icassp. ieee, 1996, pp. 993–996. [17] r. barzilay, m. collins, j. hirschberg, and s. whittaker, “the rules behind roles: identifying speaker role in radio broadcasts,” in aaai/iaai, 2000, pp. 679–684. [18] y. liu, “initial study on automatic identification of speaker role in broadcast news speech,” in proceedings of the human language technology conference of the naacl, companion volume: short papers. association for computational linguistics, 2006, pp. 81–84. [19] s. g. koolagudi, s. sridhar, n. elango, k. kumar, and f. afroz, “advertisement detection in commercial radio channels,” in industrial and information systems (iciis), 2015 ieee 10th international conference on. ieee, 2015, pp. 272–277. [20] j. vavrek, e. vozarikov ´ a, m. pleva, and j. juh ´ ar, “broadcast news audio classification using ´ svm binary trees,” in telecommunications and signal processing (tsp), 2012 35th international conference on. ieee, 2012, pp. 469–473. [21] l. lu, h. jiang, and h. zhang, “a robust audio classification and segmentation method,” in proceedings of the ninth acm international conference on multimedia. acm, 2001, pp. 203–211. [22] m. khan, w. g. al-khatib, and m. moinuddin, “automatic classification of speech and music using neural networks,” in proceedings of the 2nd acm international workshop on multimedia databases. acm, 2004, pp. 94–99. [23] z. kons, o. toledo ronen, and m. carmel, “audio event classification using deep neural networks.” in interspeech, 2013, pp. 1482–1486. [24] c. w. hsu and c. j. lin, “a comparison of methods for multiclass support vector machines,” ieee transactions on neural networks, vol. 13, no. 2, pp. 415–425, 2002. [25] c. lin, “a comparison of methods for multi-class support vector machines,” ieee transaction onneural networks13 (2), pp. 415–425, 2002. [26] f. aurino, m. folla, f. gargiulo, v. moscato, a. picariello, and c. sansone, “one-class svm based approach for detecting anomalous audio events,” in intelligent networking and collaborative systems (incos), 2014 international conference on. ieee, 2014, pp. 145– 151. [27] a. bouril, d. aleinikava, m. s. guillem, and g. m. mirsky, “automated classification of normal and abnormal heart sounds using support vector machines,” in computing in cardiology conference (cinc), 2016. ieee, 2016, pp. 549–552. [28] s. e. kuc¸ ¨ ukbay and m. sert, “audio-based event detection in office live environments using ¨ optimized mfcc-svm approach,” in semantic computing (icsc), 2015 ieee international conference on. ieee, 2015, pp. 475–480. [29] i. mart´ın-morato, m. cobos, and f. j. ferri, “a case study on feature sensitivity for audio ´ event classification using support vector machines,” in machine learning for signal processing (mlsp), 2016 ieee 26th international workshop on. ieee, 2016, pp. 1–6. [30] j. c. wang, j. f. wang, c. b. lin, k.-t. jian, and w. kuok, “contentbased audio classification using support vector machines and independent component analysis,” in pattern recognition, 2006. icpr 2006. 18th international conference on, vol. 4. ieee, 2006, pp. 157–160. [31] l. lu, s. z. li, and h. j. zhang, “content-based audio segmentation using support vector machines,” in proc. icme, vol. 1, 2001, pp. 749– 752. [32] y. zhu, z. ming, and q. huang, “automatic audio genre classification based on support vector machine,” in natural computation, 2007. icnc 2007. third international conference on, vol. 1. ieee, 2007, pp. 517–521. [33] b. kijsirikul and n. ussivakul, “multiclass support vector machines using adaptive directed acyclic graph,” in neural networks, 2002. ijcnn’02. proceedings of the 2002 international joint conference on, vol. 1. ieee, 2002, pp. 980–985. [34] “sample rates audacity manual,” https://manual.audacityteam.org/ man/sample rates.html, (accessed on 12/22/2018). [35] t. giannakopoulos, “pyaudioanalysis: an open-source python library for audio signal analysis,” plos one, vol. 10, no. 12, p. e0144610, 2015. http://www.eajournals.org/ https://manual.audacityteam.org/%20man/sample%20rates.html https://manual.audacityteam.org/%20man/sample%20rates.html ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2021 14(3): international journal on advances in ict for emerging regions july 2021 don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras piyumi seneviratne#1, dilanka perera2, harinda samarasekara3, chamath keppitiyagama4, kasun de soyza5, kenneth thilakarathna6, primal wijesekera7 abstract— video surveillance systems (vss) that are used to provide physical protection to assets and personnel of organizations open up new information channels, but they are often not considered an integral part of the organization's information system. therefore, more often than not, vss is not considered when designing and evaluating organizations' information security. hence, a vss may weaken the information security of an organization while strengthening physical security. we present such a threat that the vss used in atm kiosks of sri lankan banks can severely weaken the atm pin security due to the ad hoc placement of cameras. while we have observed that in some installations, the video camera directly captures the pin-pad, we show that forearm movements' visibility is sufficient to infer pins with a significant level of accuracy. we used a mock-up of an atm kiosk for our analysis, and we show that a human observer can guess a pin with 22.5% accuracy within 3 attempts without the pin pad's visuals. a computer can infer the pin using the same footages with an accuracy of 50% using a straightforward algorithm. critical processes in the banks, such as authentication, are built around the assumption of the confidentiality of the pin thus invest heavily in the pin generation process. this well-protected pin is exposed to the vss when entering the pin, thus violating a crucial assumption. however, this violation has hitherto gone unnoticed by the banks' security audits because vss is not considered an inalienable component of the information system. keywords video surveillance, inferring keyboard inputs, side-channel vulnerabilities, computer vision-based attacks, atm pin security, video analysis, shoulder surfing, threat modelling, information security. i. introduction security plays a crucial role in an information system. implementation of video surveillance systems (vss) has come to a point where it is accepted as a common practice to have vss indoors and outdoors as a security control [1][2].vss is used to provide physical security to the assets of organizations. one such scenario is the implementation of vss inside atm kiosks. while vss provides a mechanism to increase security and thwart any physical security breaches in an atm kiosk, there is a lack of literature on whether a vss setup could possibly be leaking confidential and security sensitive data & information of an atm user, in turn, jeopardizing the overall atm system’s security assumptions. the central bank of sri lanka (cbsl), the local governing body for banks, suggests and encourages banks to have vss [3] in atm environments. however, it does not provide any specifications or guidelines on the vss installation. further, payment card industry pin transaction security point of interaction (pci pts poi) security requirements guidelines [4] states, “the location for camera installation at atms should be carefully chosen to ensure that images of keypad entry are not captured and the camera should support the detection of the attachment of alien devices to the atm front view and possess the ability to generate an alarm for remote monitoring if the camera is blocked or otherwise disabled”. this was the only relevant guideline available regarding the specific instructions on installing vss in an atm. we believe that this specific pci guideline is there to address the possible issue of leakage of user pin, from a camera that is directly focused on the pin-pad of an atm. to this end, we examined whether banks are adhering to the previously mentioned pci standards and guidelines when installing and managing vss in atm environments and the adherence to these guidelines is sufficient to prevent possible pin leakage through vss footage of atm kiosks. we had interviews with 9 employees from all 3 local banks whose duties are related to the atm system and its functions and a technical expert in the atm systems domain. after the interviews, we found out that banks do not follow any formal guidelines when installing the vss in atm kiosks. this has resulted in ad hoc placement of security cameras. in our 3 onsite visits from each of the 3 banks, we observed that there are many incidents where the keypad and the keypress events are visible through the vss footage due to this ad hoc installation practice of surveillance cameras in atm kiosks. to further investigate the possibility of predicting the passcode through vss footage, we extended our study to investigate whether it is possible to infer the typed pin from a vss footage. for this we have developed an experimental design setup mimicking an atm kiosk, having standard prototypes of an atm pin pad and a camera. we used this to gather video data of users entering different pins. we were able to show that it is possible to infer the atm pin during correspondence: p. seneviratne#1 (e-mail: piyumis@scorelab.org) received: 15-02-2021 revised:24-03-2021 accepted: 07-04-2021 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6 are form university of colombo school of computing. (piyumis@scorelab.org, dilankap@scorelab.org, harindas@scorelab.org, chamath@ucsc.cmb.ac.lk, kasun@ucsc.cmb.ac.lk, kmt@ucsc.cmb.ac.lk ) p. wijesekera7 is from university of california, berkeley & international computer science institute, berkeley, usa (primal@cs.berkeley.edu) this paper is an extended version of the paper “impact of video surveillance systems on atm pin security” presented at the icter conference (2020) doi: http://doi.org/10.4038/icter.v14i3.7229 © 2021 international journal on advances in ict for emerging regions don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 2 july 2021 international journal on advances in ict for emerging regions situations even when the keypad and fingertips of the user are not visible to the attacker. we were able to infer the passcode with a 22.5% accuracy which indicated that there is indeed a possibility of inferring the pin of a vss footage even if the pin-pad is not visible. further, we explored the possibility of intensifying the accuracy of successful pin inference by having automated the footage analysis rather than naked-eye observation. for this, we used a programme that tracks the forearm movement of the person who is entering the pin and then feeding the x, y coordinates of the forearm in adjacent keypress events, to our heuristic algorithm to try and infer the pin. using this approach, we were able to increase the accuracy level to 50%, relatively higher than the human-observer approach. furthermore, this solidifies the risk of revealing security sensitive information by having ad-hoc implementations of cctv cameras in atm environments. to the best of our knowledge, this study is the first work that studied the realworld practice of installing vss inside atm kiosks and addressing possible threats and implications. and also, previous studies performed by balzarotti et al. [5] and xu et al. [6] have shown the possibility of reconstructing typed input with such video recordings. they have tracked the fingertip movements when there is a direct or indirect line of sight to the fingertips during the typing process to reconstruct the inputs. in our study, we show that the visibility of forearm movements is sufficient to infer pins with a significant level of accuracy. in this paper, section ii describes the background study which was focused on the current practice of installing vss at atm kiosks, section iii gives the details of the previous work related to reconstructing typed inputs taking different approaches. section iv and section v explain our approach to show the existing threat to the atms in terms of qualitative analysis and quantitative analysis respectively. results of our work and the discussion of the results are presented in section vi and section vii respectively. section viii presents the conclusion and future work in section ix. ii. background we performed a detailed background analysis on the rules and regulations of installing at atm kiosks. policies, guidelines, rules, and regulations by both local and global regulatory bodies concerning the atm system were reviewed to identify if there are any regulatory framework that should follow when surveillance cameras in atm kiosks. policies, guidelines, rules, and regulations by both local and global regulatory bodies concerning the atm system were reviewed to identify if there are any regulatory framework that should follow when surveillance cameras in atm kiosks. cbsl is the governing authority for the financial sector which regulates and supervises banks and selected non-bank financial institutions in sri lanka. according to the cbsl, the number of atm terminals at the end of quarter 3, 2018 is recorded as 4,296 [7]. the analysis of regulations imposed by local regulatory bodies revealed that there is only one circular issued by cbsl in 2006 concerning the surveillance cameras placed in atm cubicles [3]. this circular was published addressing all licensed banks of sri lanka to encourage the installation of cctv cameras to enhance the surveillance at atm kiosks. however, this circular has not referred to any recommendations on the positioning of the surveillance cameras in atm kiosks. when analyzing the global regulations relating to the surveillance camera installation at atm kiosks, pci atm security guidelines found a greater emphasis on addressing the information security in an atm kiosk [4]. pci atm security guidelines encourage the installation of surveillance cameras at atm kiosks in situations where possible and allowed by law. however, the guidelines and best practices of pci atm security guidelines state that the location of the surveillance camera installation needs to be carefully identified to ensure that the images of keypad entry are not captured. the surveillance camera needs to be installed to assist in identifying the attachment of alien devices to the atm front and should possess the capability to produce an alert for remote monitoring if the camera is blocked or disabled. iii. related work we have considered previous studies related to both surveillance systems and inference of keyboard inputs using camera footage. andrei costin [9], has conducted a systematic review of existing and novel threats and vulnerabilities in video surveillance, cctv and ip-camera systems based on publicly available data to identify security and privacy risks associated with the development and deployment of these systems. the paper also presents a set of recommendations and mitigations to enhance the security and privacy aspects of video surveillance systems. michael et al. [10] address vulnerabilities associated with wi-fi ip based cctv systems. the authors have considered the relevant vulnerabilities and significance based on confidentiality, integrity, availability and present a framework that can be utilized to minimize the security risks associated. mowery et al. [11] in their research, discuss the usage of thermal camera footage of a keypad after a user’s typing session, to derive the possible keys pressed. atm pin is one of the main focused areas of their study. first, the pin recovery results from human analysis have been presented. secondly, the researchers incorporated computer vision techniques to automatically extract the code from the created heat map of the thermal camera data. even though the automated analysis only slightly outperformed the manual analysis it has demonstrated the potential to scale such an attack scenario in practice. balzarotti et al. [5] present a tool named clearshot which automatically constructs the most probable text from video footage of a user’s keyboard typing process. the video is captured using an over the keyboard video camera which has a full view of the hand movements and the keyboard. in constructing the process to recover the text being typed, firstly, the video recording of the typing session of the user is analyzed to identify the hand or finger movements and possible keypresses. the researchers then follow an occlusion-based analysis to find out keypresses. then, a text analysis process constructs the most probable text that has been typed. clearshot was able to extract a substantial proportion of typed text by the user and to suggest around 80% words correctly within the first 50 choices of correct words proposed. the study by maggi et al. [12] on the dynamic conditions present that key magnifying feedback provided by iphone, android and blackberry mobile devices while typing a pin are vulnerable to shoulder-surfing attacks. hence, there is no specific positioning of the attacking camera and the target device. the discussed attack was facilitated by computer 3 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6, p. wijesekera7 july 2021 international journal on advances in ict for emerging regions vision and image processing techniques to identify possible key magnifying events. this study proposed a fast method to infer keystrokes from either online or offline videos. it concludes that the key magnification feedback is not suitable for applications that require high security. however, both these attacks by balzarotti et al. [5] and maggi et al. [12] require the camera to record the typing process that captures a complete view of the hand and finger movement on the desktop keyboard where our study focuses on a situation where only the forearm movements are visible. qinggang yue et al. [13] have used google glass-based spy camera attack on touch screens to decode the typed input, where the input is not visible to the naked eye. yet, it needs to have a direct sightline to the fingertips of the user. the basic idea of this approach is to track the movement of the fingertip and use its relative position on the touch screen to detect the pressed keys. by applying the optical flow, deformable partbased model, k-means, and clustering and other computer vision techniques to automatically track and analyze fingertip movements. they were able to decode more than 90% of the typed passcodes. the studies by maggi et al. [12], balzarotti et al. [5] and qinggang yue et al. [13] show that the requirement of an undisturbed view of the keypad or typing fingers to infer keypresses and to construct a probable text. jin et al. [14] also present a novel vision-based attack towards keyboard inputs. in this study, the researchers have created a tool called vivisnoop which uses video recording of a typing session to construct the typed phrase. they have analyzed the video with image processing to identify the subtle vibrations of the desk where the keyboard is placed, which occurs with each keypress. it emphasizes the fact that even though they have used a mobile phone camera, a webcam or ordinary surveillance cameras can be used to infer typed inputs despite having a direct sightline on the keyboard. a recent study conducted by chen et al. [15] prototypes and evaluates that gaze-based attacks exploit with a video are possible within a short distance and a small angle between the camera and the victim. in this work, it proposes a novel keystroke inference method exploited by recording the eye gaze. as per the study, it was able to infer pins, unlock patterns and text input to mobile devices. both of these studies by jin et al. [14] and chen et al. [15] states that there is a threat to keyboard inputs from video recordings, even if there is an indirect view of the keyboard or keypad or the user’s input scenario. however, they have considered eye gaze and the subtler vibrations without focusing on the keypad to create attack vectors while we use forearm movements to guess the pin of an atm transaction present the information security threat. shukla et al. [16] show that typing inputs can be reconstructed even if the keyboard is invisible. it analyzes around 200 videos of the typing process which was captured using an htc phone with a camera focusing on the rear side of the target device. it selects the spatio-temporal dynamics of the hand of the user typing on the smartphone to reconstruct the typed text. this paper emphasizes the scenario in which a user holds the smartphone in one hand and typing using the other. the scenario does not depend on having a direct or complete view of the keypad of the target device. shukla et al. were able to infer an average of over 50% pins in the first attempt and up to an average of over 85% pins in the tenth attempt. the study by shukla et al. [16] can be considered similar to our work where both studies do not require the visibility of either the fingertips or keypad to infer the pin. as we consider the atm as our threat scenario, the visibility area of the placed surveillance camera inside the atm kiosk might not be able to either grasp the subtle vibrations of the atm [14] or record the eye gaze of the subject [15] to possibly infer the user’s pin input. in contrast, our study focuses on how surveillance cameras can become a threat to information security. by taking a scenario where both pin pad and the fingertips of the atm user not visible through the surveillance camera and only using the forearm movement of the user during the typing process which intuitively thought to be as an unfavourable situation for an adversary. iv. qualitative analysis as the first step, we conducted a preliminary study to qualitatively analyse the current state of installing vss inside atm kiosks. we have conducted interviews with employees from 03 banks and one technical expert to further study the context and validate our hypothesis. commencement of interviews was done only after receiving the consent from relevant authorities of the banks. and also the authorities of banks provide us with their consent to publish the analysed results while maintaining the anonymity of the bank. after this qualitative analysis, we have built an experimental design to quantitatively analyse the threat. a. interviews with three banks besides the circular issued by cbsl in 2006, no other specific regulation was found that addresses the conservation of information security which is at stake through surveillance cameras. hence, we conducted interviews with three leading banks of sri lanka (two state-owned banks and one private bank) to explore more information on the purposes of installing surveillance cameras at atm kiosks, procedures followed by the bank when installing surveillance cameras at atm kiosks, and overall management of atm surveillance camera footages. the banks were selected based on convenience and acceptance of our request. table i interview questions interviews were carried out with employees whose duties are related to the atm system and its functions. as interviewees, a cio, a security divisional head, an it manager, 4 employees in the atm division at the bank's head office and 2 employees from the centralized surveillance monitoring room were participated in total. employees were selected based on convenience and face-to-face semi-structured question no question q1 how many atms are under operation by the bank? q2 is there a separate vendor(s) to install surveillance cameras at atms? q3 are you aware of the pci atm security guidelines and cbsl regulations concerning the placement of cctv in atm kiosks? q4 how the monitoring procedure of atm surveillance camera footage is being managed? q5 who in the bank has access to the atm kiosk’s surveillance camera streams? don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 4 july 2021 international journal on advances in ict for emerging regions interviews were carried out with a set of predetermined questions (table i). all the interviews were transcribed and analyzed to identify processes and operations concerning the surveillance cameras at atm kiosks and their installation procedure. based on the responses of the interviews, we identified that there are mainly two requirements for the installation of surveillance cameras at atm kiosks; 1.) to monitor and capture the face of the person who enters the atm kiosk 2.) to focus on the cash dispensing area of the atm as a precautionary against any dispute. however, it was identified that not all the atm kiosks in sri lanka are equipped with surveillance cameras and there are some atm kiosks set up with multiple surveillance cameras. based on the interviews we had with 03 banks, it was found that bank a and bank c install surveillance cameras inside atm kiosks to meet the above two requirements and have their own documented guidelines for the installation of surveillance cameras, while bank b does not have any documented guidelines (table ii). bank a and bank c have live monitoring of the surveillance video of the atm kiosks both at branch level and the centralized in head offices. security division personnel and branch managers have access to those videos, while in bank b, only branch managers have access to surveillance video footage. also, it was stated that all three banks are aware of the pci atm security guidelines and cbsl guidelines concerning the installation of surveillance cameras at atm kiosks. during the interviews, real video streams of atm surveillance cameras were observed on screens at surveillance control rooms. during this inspection, it was identified that there are incidents where the pin pad is visible fully or partially when a customer is entering the pin due to the adhoc installations of surveillance cameras at atm kiosks. although the real video streams were not observed in bank b, it was stated the pin pad may be captured through the surveillance cameras at atm kiosks. in particular, interviewees from bank c stated that the bank uses high-quality hd cameras wherein it is even possible to determine the colour of the note that is dispatched from the atm. however, with the analysis of information gathered from the three banks, it reveals that the current practices of installing surveillance cameras do not comply with the pci atm security guidelines and are contradicting with atm pin security controls. the banks have not considered the security threat that might affect the atm pin security through surveillance cameras. b. interview with the technical expert a face-to-face unstructured interview was conducted with one of the technical experts of the atm system information technology and services industry in sri lanka to explore the knowledge on functions of the atm system and atm pin management. based on the interview, we gathered detailed insights concerning the operations of the atm system and its use of hsm (hardware security module). banks employ hsm for pin generation, management, and validation. hsm is a part of the atm system which is employed for the atm pin management validation process. in an atm system, all the pins are stored in the hsm. in general, there are two ways for a user to create a pin. one is the random pin generated by the hsm through the pin mailer and the other way is that the user creates his/her own pin via atm. pin mailer used to securely print the pin without revealing it to anyone except the user who owns it. either way, the pin offset is securely stored in this hsm. 3pin offset is the reference key to the pin block which is stored in the client information database whereas pin block is the encrypted pin that is stored in the hsm. in the user authentication process of the atm system, when the user enters the pin, the system validates the entered pin by matching the pin block together with the relevant pin offset and hsm. with this mechanism, the user pins table ii summary of the interview findings description bank a bank b bank c date(s) of the interview 24/10/2019 04/11/2019 and 06/11/2019 05/11/2019 national ratings by fitch ratings (lanka) ltd. [8] aa+ (lka) aa+ (lka) aa (lka) type of bank licensed specialised banks licensed commercial bank licensed commercial bank number of atm terminals owned 310 1141 853 installer external party external party own staff follows any policies or guidelines when installing cameras yes (to meet the requirements mentioned in section iv a ) no (minor feasibility study is conducted with a new atm installation) yes (to meet the above mentioned two requirements; consider the distance and height, the location and the size of the atm kiosks) awareness of pci dss guidelines and cbsl guidelines on surveillance camera installation at atms yes yes yes mechanism to monitor live atm video stream both central and at branch level only branch level both central and at branch level. who has the access to atm video stream security division employees in the control room and branch manager. branch manager and higher authorities of the bank. branch manager and higher authorities of the bank. 5 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6, p. wijesekera7 july 2021 international journal on advances in ict for emerging regions are protected and stored securely to ensure confidentiality, integrity and availability. hence, the atm system is mainly operating under the trust assumption that atm pin is secured and kept confidential from both the banking system and employees of the bank. c. threat model for the existing atm system based on the information and insights gathered from all four interviews, we developed a threat model for the existing atm system considering the atm pin as the asset. the existing atm system operates under the assumption that the atm pin is secured, and the system provides confidentiality for the pin. it also assumes that the pin is securely stored and is not revealed even to the internal employees of the banking system who are considered to be in the trusted boundary. as shown in fig. 1, the threat model of the existing atm system, the banking system functions within a trust boundary. atm kiosks are located in an untrusted environment as their operation needs a publicly open interface. banks control the communication between trusted and untrusted environments using channel encryption as a security control. banks have employed hsms to securely manage pins, and also, they use physical and technical access controls to ensure the security of the pin. hence, banks are under the assumption that the protection of atm pin is ensured. hence, it is assumed that the atm pin is secured within the threat model illustrated in fig. 1 and further confidentiality of atm pins are guaranteed. however, banks have not considered the vss in the atm kiosk in the threat model (fig. 1). the vss is only regarded as a physical control to ensure operational security at atm kiosks by banks and the consequences resulting from the physical security control which might cause threats to atm pin security are not considered. hence, with the observations and finding we gathered during the interviews, we state that there is a threat to the pin, as well as to the banking information system from the installation of vss. v. quantitative analysis as our second step of the research, we demonstrate that the current practice of installing surveillance cameras in atm kiosks possess a threat to atm pin security, even in the scenarios where the pin pad is not visible through the surveillance camera. incorporating the information gathered during the preliminary study, we created an experimental setup to simulate the pin entering scenario at an atm kiosk due to legal constraints to collect or obtain actual surveillance camera footage from banks. fig. 2 shows the implementation flow of our approach and the experimental setup. the actual implementation should consider the vss camera footage but due to the constraints, we use footage recorded with a mobile phone camera. in our pin inferencing process, we used the video clips that were obtained by employing the experimental setup and demonstrated the possibility of inferring the pin from captured surveillance footage. fig. 2 implementation flow fig. 1 existing threat model don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 6 july 2021 international journal on advances in ict for emerging regions a. experimental setup 1) pin pad we used a pin pad prototype (fig. 3) built using the dimensions of juste6021 atm pin pad [17] (fig. 4) which is commonly used for atms in sri lanka as stated by officials, working at a leading commercial bank in sl in an interview on november, 2019. we have created this pin pad prototype as an electronic keypad using 12 push buttons and a dot board. the size of the keys and distance between keys were kept identical to the juste6021 pin pad (fig. 4). we set up an led to the pin pad prototype as an indicator to identify the actual keypress events. fig. 3 prototype of the pin pad fig. 4 juste6021 pin pad 2) pins 20 randomly generated pins were used, and each pin was containing 04 unique digits. 3) camera settings we captured the pin entering process using a mobile phone camera which has an f/22 and 26mm focal length. we recorded the videos without any camera effects, zooming, filters, or flashers. in the real-world scenario, some banks have full hd surveillance cameras with 30fps. therefore, the frame rate was set to 30fps while the resolution was set to 1080p. however, the sensor sizes of the surveillance cameras used in banks are bigger than the mobile phone camera sensor. hence, actual footage from cameras in atm kiosks produce better quality video with high resolution and more detail. unlike the footage we capture, the quality of actual vss streams is good enough to observe the number of dispense notes, their colour and sometimes even the serial number on the note. 4) camera position during the background study, we were able to observe live footage from vss control rooms. two banks of the study state that they focus their camera to capture the cash dispense area. in such cases, forearm movements are clearly visible through the surveillance footage. considering the observations and the facts gathered, we selected an arbitrary position on the left side of an atm user who is facing towards the prototype of the pin pad to place the camera. as shown in fig. 5, from that position, forearm movements were visible during the pin entering process but nor the pin pad, neither the fingertips of the atm user. hence, it creates a comparatively unfavourable situation for the attacker. two users took part in the simulation of the pin entering process. both of them only used the index finger of their right hand to enter the pin. the two users (referred to as user a, user b later) were given all 20 pins and using this experimental setup we have collected 20 video footages for each user entering given pins. all those footages were labelled with the respective pin (referred to as ground truth) entered. b. pin guessing we used two methods to find out the possibility of revealing a pin by using the video footage of the pin entering process that we captured using the experimental setup where the pin pad or the fingertips are not visible through the video footage. 5) pin guessing method 01 – human observer in this phase, a lab study was conducted with twenty participants of a convenient sample of undergraduates from the university of colombo school of computing. here we have used randomly selected 10 video footages out of 40 recorded. in the first attempt given to the participant, all ten videos without the pin label were put on a laptop computer display, one at a time. the participant was instructed to observe the forearm movement of the user who is entering the pin and guess the pin at the end of each video. without providing any feedback on whether they have guessed the correct digit(s) or not, three such attempts were given to each participant to watch each video consecutively and to guess the digit(s) of the fig. 5 pin entry process 7 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6, p. wijesekera7 july 2021 international journal on advances in ict for emerging regions pin. a structured document was given for the participant to record each digit(s)/ pin they guessed at each attempt. table iiii pin guessing phases pin guessing phase description method 01 – human observer pin pad prototype with actual dimensions of just e6021 atm pin pad. lab study on atm pin guessing with human observation. method 02– computer analysis pin pad prototype with actual dimensions of just e6021 atm pin pad. an algorithmic approach for automated pin guessing. 6) pin guessing method 02 – computer analysis in the automated analysis, we automated the pin guessing process by incorporating an algorithmic approach. this phase subsumed the exact video footage from the pin guessing method 01 and the rest of the video footage we collected using an experimental setup. the computer analysis approach for pin guessing contains 04 basic steps as shown in fig. 6. fig. 6 steps of computer analysis step 1: align frames: we rotated each frame in the video in a way that rows of the pin pad align in the x direction and columns of the pin pad align in the y direction of the opencv coordinate system. therefore, increment in x direction and increment in y direction in the forearm movement corresponds to increment in the number values on the pin pad on the grounds. ultimately it is straightforward to map the direction of the forearm movements to the pin patterns. step 2: marker-based forearm detection and tracking: the main objective of the research was to test whether we can identify pin patterns entered by a person using forearm movements. therefore, tracking forearm movements was done by placing a marker that has a contrasting colour from the surroundings on the user’s forearm, during the pin entering process. an opencv python program was written to extract the center coordinates of the minimum enclosing circle of the placed marker. the x, y coordinates of the center of the minimum enclosing circle were recorded concerning the frame number to trace the forearm movement during the entire video footage. alongside, the actual keypress events or the ground truth was also detected using the attached led to the prototype of the pin pad. we use an additional opencv programme to identify the blinks of the led more accurately and mark corresponding video frame numbers as “actual keypress events”. a researcher may extend this work to fine-tune the code to infer pin without using a marker to track forearm movements. fig. 9 show the x, y data point of the forearm in each frame. actual keypress events identified using the attached led also marked in the same graph. step 3: keypress detection: in this paper, we propose a novel method to detect keypress events considering the gradient of the forearm movements during the pin entry process. here, we have applied a heuristic, that the movement of the forearm (in the aligned frames) in the x direction and the y direction is very little when compared to the movement that happens during the travelling between keys or no movement happens at all when the atm user is pressing a key. therefore, the gradient of the forearm movement is close to zero during a keypress event. the net gradient of the forearm was calculated by taking the gradient of the forearm movement in the x direction (dx) and gradient of the forearm movement in the y direction (dy) using the following equation (1). 𝑁𝑁𝑁𝑁𝑡𝑡 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑁𝑁𝐺𝐺𝑡𝑡 = �𝐺𝐺𝑑𝑑2 + 𝐺𝐺𝑑𝑑2 (1) according to the heuristic we applied, the keypress events fall on the local valleys of the gradient graph where the gradient is near to zero. keypress events were identified by extracting these local valleys in the gradient graph. for particular footage, the related frame numbers were backtracked for the extracted local valleys. these frame numbers were clustered into 04 because there are 04 digits in the pin. corresponding x, y coordinates of cluster centroids taken as the position of the forearm during a keypress event. fig 10 depicts forearm movement for pin 0631 with actual keypress events in dark blue dots. fig. 11 shows the gradient fig. 7 orientation of the video frame after alignment fig. 8 aligned pin pad with opencv coordinate system fig. 9 x, y coordinates of forearm with actual key press events don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 8 july 2021 international journal on advances in ict for emerging regions graph correspond to the graph in fig. 10. in fig. 11, actual keypress events were marked in orange dots. hence, it proves the heuristic is applicable to find the keypress events using gradient. fig. 10 traced forearm movement for pin 6031 fig. 11 gradient graph of pin 6031 c. pin guessing algorithm as our main focus in this study is on identifying and exhibiting the existence of threats exploiting the vss and due to the time-consuming data collection process involved in the machine learning-based approach, we opted not to follow a machine learning-based approach for the study. despite this, we came up with an algorithmic approach to guess the pin using the x, y coordinates obtained from step 3. those coordinates were used to calculate the input parameters to the pin extraction algorithm. first, the change matrix or the matrix containing the unit difference of rows & columns between adjacent keypresses was calculated for user b. 1) calculating change matrix: taking a scenario where a user enters the pin as 1789 into consideration and following x, y coordinates were extracted for each key pressed. • 1 → (517, 513) • 7 → (523, 560) • 8 → (532, 567) • 9 → (545, 567) and the changes were as follows. • 1 →7; 0 column change, +2 row down • 7 →8; +1 column to right, 0 row changes • 8 →9; +1 column to right, 0 row changes fig. 12 coordinates of pin 1789 these changes were put up as {0, +1, +1} for columns and {2,0,0} for rows. this change matrix is then mapped to possible matching pins. to determine the column & row changes between two presses, the coordinates of adjacent presses were subtracted, and the difference was calculated. as shown in table iii, 0 change of columns or rows could still record a movement or difference of x & y coordinates because of camera tilt and relative hand movement difference even for the same button pressed. in the 1st change, it has recorded a -6 change for xaxis while the ground truth for the change of columns remains 0. further, it has recorded +47 of a difference of the y-axis in the 1st change. this leads to two main challenges as follows. • identify actual movement in columns or rows. (did delta of the coordinates for two adjacent keypresses actually depicts a change in rows & columns?) • differentiate between a single unit of change and two or more units of change in columns & rows. 2) calculating threshold to address the aforementioned challenges, we established an approach to defining two separate threshold values for columns (tx) and rows (ty). thus, if only the difference between two adjacent keypresses exceeds the tx or ty threshold value, it was considered as a possible change of columns or rows. an algorithm was developed to determine the most suitable tx and ty using the keypress coordinates of user a. each pin was labelled with its actual value and with keypress coordinates inferred from step 3: keypress detection. then each pin was iterated to find the coordinate difference of adjacent two keypresses. therefore, the resulting row or/and column change was compared with the ground truth. if it results in the same value, the true counter for that specific threshold was incremented. the threshold values which had the highest number of the true count was then determined as the best matching values for tx and ty. table iv column & row changes of adjacent keypresses derived from coordinates of user 9 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6, p. wijesekera7 july 2021 international journal on advances in ict for emerging regions with the data set of user a which is used to determine the best matching threshold values, • for columns (change of x-axis) →tx = 7 || 8 • for rows (change of y-axis) →ty = 11 using the threshold values (tx and ty) derived from the data set of user a, the change matrix for column & row changes between the keypresses were calculated for the dataset of user b. that change matrix was passed to the algorithm and the most probable pin was matched accordingly. this process was automated for the data set of user b for a total number of 20 pins. d. limitations we have used a marker-based forearm tracking method mainly due to two reasons. 1) the implementation of more accurate forearm tracking methods was not considered. 2) due to the video quality and the camera tilt, using bound box methods focused on the forearm was not successful. the gradient-based key-press detection method is only applicable when the hand is not resting on the pin pad while the keypresses and the hand is freely moving during the keypresses. vi. results a. pin guessing method 01human observer-based the responses provided by ten participants during the lab study were analyzed to determine the accuracy against the actual pin in each sample video of the pin entering process. the responses for all three attempts were considered to discover the number of one, two, three and four (exact pin) digits correctly identified. following equation (2) was applied for the calculation of the accuracy of the lab study responses. (2) where a(d,n,i) indicates the accuracy of correctly guessing at least d number of digits within n number of attempt(s), c(d,i) is the number of correctly guessed instances with at least d number of digits in ith attempt, s and v represent the number of subjects and number of sample videos respectively. as shown in fig. 13, the analysis of the responses of lab study indicates that there is 6% accuracy in guessing all four digits of the pin by all ten volunteers within the first attempt while the accuracy has been increased to 16% and 22.5% within the second attempt and third attempt respectively. also, the accuracy of guessing at least one digit of the pin within the first attempt is 76%. this has been increased to 83% and 85.5% in the second and third attempts. this emphasizes that the accuracy of guessing four digits (pin), three digits, two digits and one digit of the actual pin has been improved with the increase in the number of attempts. b. pin guessing method 02computer analysis out of the 20 pins, the automated pin guessing derives the correct pins in 10 instances. this result (refer to fig. 14) shows that 50% accuracy has been obtained in guessing the entire pin while the accuracy has been increased to 80% for at least one digit. fig. 14 results of the human observer-based approach change t x+1-tx change of x actual change in columns ty+1 ty change of y actual change of rows 1st change 517-523 -6 0 560-513 47 2 2nd change 532-517 +15 +1 567-560 7 0 3rd change 545-532 +13 +1 567-567 0 0 fig. 13 results of the human observer-based approach don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 10 july 2021 international journal on advances in ict for emerging regions the human observer-based approach indicates that the sight of the forearm during the pin entry process higher the chances of guessing the pin correctly compared to a random guess. a random guess has an accuracy of 0.001% (1/10000=0.001%) while the human observer-based approach produces 6% accuracy in one attempt. this accuracy improved with the increment of the number of attempts given. this has been further increased with the computer analysis. the human observers considered in this pin guessing method 01 are not experts. hence, a well-trained human attacker might succeed in guessing the pin with much higher accuracy. vii. proposed threat model in this study, we identified that the ad-hoc installation of surveillance cameras at atm kiosks results in revealing the atm pin itself. it was shown that pin can be inferred even in situations where both the pin pad and finger-tips are not visible in the surveillance camera footage during the pin entering process. however, the threat model of the existing atm system ( section iv c. threat model for the existing atm system ) does not consider the implications of possessing surveillance cameras at atm kiosks and the potential threat for atm pin security. threat modelling requires a holistic approach to accurately identify threats and vulnerabilities. therefore, the sidechannel vulnerability of revealing the pin through surveillance video footage can only be identified by including the surveillance camera as an actor in the threat model. hence the potential threats for atm pin security can be identified. even though surveillance cameras are installed as a physical control at atm kiosks to enhance security, it is important to rigorously consider these consequences and to have a holistic approach when developing the threat model. therefore, it is needed to consider surveillance cameras when defining the threat boundary of the atm systems’ threat model. fig. 15 shows an improved version of the threat model that considers this. a holistic approach would assist the banking systems to comply with the basic principles of information as well as physical security when deploying atm systems. viii. conclusion pin is one factor of the two-factor authentication system used in atm transactions. banks invest heavily to ensure that a pin is generated inside an hsm and revealed only to the customer. this indicates that banks operate under the assumption that the pin is known only to the customer. however, surveillance cameras installed inside atm kiosks to improve physical security open up a side-channel that can potentially reveal the pin to third parties. we demonstrated that it is sufficient to capture the forearm movement to infer the pin with a significant level of accuracy and it is also possible to automate this inference process. if we consider an atm transaction such as cardless cash, which allows the user to withdraw a limited sum of money without producing the debit card, the pre-assigned pin is the sole authentication factor. in such situations, the impact is very high on the security of the banking information system if their vss possesses side-channel vulnerabilities towards atm pin security. we have further analysed the pins guessed by human observers, even though those guessed pins did not match with the actual pin. when analysing the responses, we have identified that the participants of the lab study have been guessed pins that have similar trajectory movement to the actual pin of the related sample video. we have conscientiously considered the angle of the movement from fig. 15 proposed threat model 11 p. seneviratne#1, d. perera2, h. samarasekara3, c. keppitiyagama4, k. de soyza5, k. thilakarathna6, p. wijesekera7 july 2021 international journal on advances in ict for emerging regions one digit to another digit when determining the trajectory of the actual pin to the guessed pin. for example, when considering the actual pin like 5792, the shift from digit 9 to digit 2 has a left-angled movement as shown in fig. 16 (a). therefore, when considering a trajectory of a guessed pin, these facts are analysed. hence, a guessed pin like 5731 was regarded as a similar trajectory (fig. 16 (b)) and a guessed pin was like 5793 (fig. 16 (c)) as dissimilar as it has a straight angle movement from 9 to 3. accordingly, for each pin guessed by the participants in the survey for all three attempts, the corresponding trajectory was plotted to identify pin patterns. the outcome of this further analysis shows that human observers can identify the trajectory of the pin with 44.5% accuracy. this implies that even the sight of forearm movements can make favourable situation to guess the pin rather than a random guess. in this paper, we have taken the pin entry process at atm kiosks as a case study to exhibit the possible threats towards information security due to the implementation of vss. our algorithmic approach is not the optimal way to infer the pin using forearm movements, but we present the possibility of pin guessing simply by tracking the forearm movements. as a solution to this unforeseen threat toward information systems security, we suggest that proper guidelines and standards on the placement of vss can mitigate this vulnerability to a certain extent. however, to date, there are no such guidelines, standards, or regulations to govern the surveillance camera placement in sri lanka. as conscientious researchers, we took necessary measures to convey the problems disclosed through the research to banks and banking authorities prior to publishing them. ix. future work our work was carried out to expose the vulnerability of having vss in the environments where security-sensitive data is being used. this case study is an example of one such scenario. however, there are other vss in banking systems that we did not consider for this study. we have observed that banks also use vss inside their work environment where employees log in to their user accounts in the core banking system. the passwords and usernames might be visible through vss or it might be possible to infer those data by analysing the vss footage as well. we plan to extend this study to identify the impact of having vss inside the bank and as well as other work environments where securitysensitive data and/or the data entry process is revealed through the vss footage. our algorithm brings to light that there is at least one possible method to infer the pin, even though the pin pad and fingertips are not visible through vss footage. yet, this is not the ideal algorithm to guess the pin by tracking forearm movements. we intend to improve this work with the use of alternative methods for forearm detection and tracking along with a better algorithm to improve the pin guessing accuracy. we followed an experimental study with a limited amount of data. therefore, we did not consider applying machine learning techniques for computer analysis. we plan to collect a dataset with a large number of pins collected for different users to take a machine learning approach to guess the pin. the purpose of this proposed extension to work is to develop a pin-guessing algorithm that auditors can use to evaluate the security of a vss. references [1] sikandar, t., ghazali, k. and rabbi, m., 2018. atm crime detection using image processing integrated video surveillance: a systematic review. multimedia systems, 25(3), pp.229-251 [2] r. mandal and n. choudhury, "automatic video surveillance for theft detection in atm machines: an enhanced approach," 2016 3rd international conference on computing for sustainable global development (indiacom), new delhi, 2016, pp. 2821-2826. [3] supervision department central bank of sri lanka, "directions, determinations, and circulars issued to licensed commercial banks", sri lanka , 2021 [online]. available: https://www.cbsl.gov.lk/sites/default/files/cbslweb_documents/laws/ cdg/bsd_lcb_up_to_30_nov_2013_compressed_0.pdf. [accessed: jan-272021]. [4] standard: “pci pin transaction security point of interaction security requirements (pci pts poi),” pci security standards council, january 2013. [5] d. balzarotti, m. cova, and g. vigna, “clearshot: eavesdropping on keyboard input from video,” in 2008 ieee symposium on security and privacy (sp 2008). ieee, 2008, pp. 170–183. [6] y. xu, j. heinly, a. m. white, f. monrose, and j.-m. frahm, “seeing double: reconstructing obscured typed input from repeated compromising reflections," in proceedings of the 2013 acm sigsac conference on computer & communications security, pp. 1063-1074, acm, 2013. [7] payments and settlements department central bank of sri lanka, "payment bulletin third quarter 2019", sri lanka , 2021. [online]. available: https://www.cbsl.gov.lk/sites/default/files/payments_bulletin_3q201 9.pdf. [accessed: jan-272021]. [8] lmd.lk. 2020. national ratings by fitch ratings (lanka) ltd. as at 3 february 2020. [online] available at: . [9] a. costin, “security of cctv and video surveillance systems: threats, vulnerabilities, attacks, and mitigations," in proceedings of the 6th fig. 16 determining trajectories don’t forget to include that camera in the threat model: vulnerability of atm systems due to surveillance cameras 12 july 2021 international journal on advances in ict for emerging regions international workshop on trustworthy embedded devices, pp. 45-54, acm, 2016. [10] m. coole, a. woodward, and c. valli, “understanding the vulnerabilities in wi-fi and the impact on its use in cctv systems" 5th australian security and intelligence conference, western australia, december 2012. [11] k. mowery, s. meiklejohn, and s. savage, “heat of the moment: characterizing the efficacy of thermal camera-based attacks," in proceedings of the 5th usenix conference on offensive technologies, pp. 6-6, usenix association, 2011. [12] f. maggi, a. volpatto, s. gasparini, g. boracchi, and s. zanero, “a fast eavesdropping attack against touchscreens,” in 2011 7th international conference on information assurance and security (ias). ieee, 2011, pp. 320–325. [13] q. yue, z. ling, x. fu, b. liu, w. yu, and w. zhao, “my google glass sees your passwords,” proceedings of the black hat usa, 2014. [14] k. jin, s. fang, c. peng, z. teng, x. mao, l. zhang, and x. li, “vivisnoop: someone is snooping your typing without seeing it!,” in2017 ieee conference on communications and network security (cns), pp. 1–9, ieee, 2017. [15] a. t.-y. chen, m. biglari-abhari, i. kevin, and k. wang, “context is king: privacy perceptions of camera-based surveillance,” in2018 15th ieee international conference on advanced video and signal based surveillance (avss), pp. 1–6, ieee, 2018. [16] d. shukla, r. kumar, a. serwadda, and v. v. phoha, “beware, your hands reveal your secrets!,” in proceedings of the 2014 acm sigsac conference on computer and communications security, pp. 904–917, acm, 2014. [17] "wincor atm rkl encrypting pin pad e6021 (j6, j6.1) pci5.0 approved", justtide, 2021. [online]. available: https://www.justtide.com/product/wincor-atm-rkl-encrypting-pinpad-e6021-j6-j61-pci50-approved.html. [accessed: 24mar2021]. international journal on advances in ict for emerging regions 2019 12 (1): september 2019 international journal on advances in ict for emerging regions classifying sentences in court case transcripts using discourse and argumentative properties g. rathnayake#1, t. rupasinghe #2, n. de silva #3 , m. warushavithana #4, v. gamage #5, m. perera #6, a.s. perera #7 abstract—information that is available in court case transcripts which describes the proceedings of previous legal cases are of significant importance to legal officials. therefore, automatic information extraction from court case transcripts can be considered as a task of huge importance when it comes to facilitating the processes related to the legal domain. a sentence can be considered as a fundamental textual unit of any document which is made up of text. therefore, analyzing the properties of sentences can be of immense value when it comes to information extraction from machine-readable text. this paper demonstrates how the properties of sentences can be used to extract valuable information from court case transcripts. as the first task, the sentence pairs were classified based on the relationship type which can be observed between the two sentences. there, we defined relationship types that can be observed between sentences in court case transcripts. a system combining a machine learning model and a rule-based approach was used to classify pairs of sentences according to the relationship type. the next classification task was performed based on whether a given sentence provides a legal argument or not. the results obtained through the proposed methodologies were evaluated using human judges. to the best of our knowledge, this is the first study where discourse relationships between sentences have been used to determine relationships among sentences in legal court case transcripts. similarly, this study provides novel and effective approaches to identify argumentative sentences in court case transcripts. keywords— discourse relations, natural language processing, machine learning, support vector machine i. introduction case law can be described as a part of common law, consisting of judgments given by higher (appellate) courts in interpreting the statutes (or the provisions of a constitution) applicable in cases brought before them [1]. in order to make use of the case law, lawyers and other legal officials have to manually go through related court cases to find relevant information. this task requires a significant amount of effort and time. therefore, automatic extraction of information from legal court case transcripts would generate numerous benefits to the people working in the legal domain. from this point onwards, we are referring to the court case in the process of extracting information from legal court cases, it is important to identify how arguments and facts are related to one another. the objective of this study is to automatically determine the relationships between sentences which can be found in documents related to previous court cases of the united states supreme court. transcripts of u.s. court cases were obtained from findlaw following a method similar to numerous other artificial intelligence applications in the legal domain [2]–[6]. when a sentence in a court case is considered, it may provide details on arguments or facts related to a particular legal situation. some sentences may elaborate on the details provided in the previous sentence. it is also possible that the following sentence may not have any relationship with the details in the previous sentence and may provide details about a completely new topic. another type of relationship is observed when a sentence provides contradictory details to the details provided in the previous sentence. determining these relationships among sentences is vital to identifying the information flow within a court case. to that end, it is important to consider the way in which clauses, phrases, and text are related to each other. it can be argued that identifying relationships between sentences would make the process of information extraction from court cases more systematic given that it will provide a better picture of the information flow of a particular court case. to achieve this objective, we used discourse relations-based approach to determine the relationships between sentences in legal documents. several theories related to discourse structures have been proposed in recent years. cross-document structure theory (cst) [7], penn discourse tree bank (pdtb) [8], rhetorical structure theory (rst) [9], [10] and discourse graph bank [11] can be considered as prominent discourse structures. the main difference that can be observed between each of these discourse structures is they have defined the relation types in a different manner. this is mainly due to the fact that different discourse structures are intended for different purposes. in this study, we have based the discourse structure on the discourse structure proposed by cst. a sentence in a court case transcript can contain different types of details such as descriptions of a scenario, legal arguments, legal facts or legal conditions. the main objective of identifying relationships between sentences is to determine which sentences are connected together within a single flow. if there is a weak or no relation between two sentences, it would probably infer that those two sentences provide details on different topics. consider the following sentence pair taken from lee v. united states [12] shown in example 1. it can be seen that sentence 1.2 elaborates further on the details provided by sentence 1.1 to give a more comprehensive idea on the topic which is discussed in sentence 1.1. these two sentences are connected to each other within the same flow of information. this can be considered as elaboration relationship, which is a relationtype described in cst. now, consider the following manuscript received on 25 feb. 2019. recommended by prof. g.k.a. dias on 13 june 2019. this paper is an extended version of the paper “identifying relationships among sentences in court case transcripts using discourse relations” presented at the icter 2018. g. rathnayake, t. rupasinghe, n. de silva, m. warushavithana, v. gamage and a.s. perera are from the department of computer science & engineering, university of moratuwa sri lanka. (gathika.14@cse.mrt.ac.lk, thejanrupasinghe.14@cse.mrt.ac.lk, nisansadds@cse.mrt.ac.lk, menuka.14@cse.mrt.ac.lk, viraj.14@cse.mrt.ac.lk, shehan@cse.mrt.ac.lk). m. perera is from university of london international programmes university of london . (madhaviperera58@gmail.com). classifying sentences in court case transcripts using discourse and argumentative properties 2 international journal on advances in ict for emerging regions september 2019 sentence pair which was also taken from lee v. united states [12]: in this example, it is evident that the two sentences have the follow up relationship as defined in cst. but still, these two sentences are connected together within the same information flow in a court case. also, there are situations where we can see sentences are showing characteristics which are common to multiple discourse relations. therefore, several discourse relations can be grouped together based on their properties to make the process of determining relationships between sentences in court case transcripts more systematic. the two sentences for example 3 below were also taken from lee v. united states [12]: the sentence 3.2 follows sentence 3.1. a significant connection between these two sentences cannot be observed. it can also be observed that sentence 3.2 starts a new flow by deviating from the topic discussed in sentence 3.1. these observations which were provided by analyzing court cases emphasize the importance of identifying relationships between sentences. in order to identify the relationships among sentences, we defined the relationship types which are important to be considered when it comes to information extraction from court cases. next, for each of the relationship type defined, we identified the relevant cst relations [7]. finally, we developed a system to predict the relationship between given two sentences of a court case transcript by combining a machine learning model and a rule-based component. identifying sentences which provide legal arguments can be considered as another vital task when it comes to legal information extraction based on properties related to sentences. identifying such arguments from previous court cases can hugely benefit legal officials when handling a new legal scenario. in order to have a clear picture of argumentative sentences, it is vital to understand the structure of a us court case transcript. to that end, the following major sections can be observed. 1) summary of the case 2) opinion of the court 3) concurring opinions 4) dissenting opinions at the beginning of a court case transcript, a summary of the case which presents an overview of the case, main argument, and the decision of the court is presented. then the opinion of the court section brings out the decision of the majority of the judges with the facts and arguments supporting the particular decision. if there are concurring and dissenting opinions, they are presented after the opinion of the court. concurring opinion section is present in cases where there exist one or more judges who agree with the decision of the court but states different or additional reasons for the decision. dissenting opinion section is present in cases where there exist one or more judges who disagree with the opinion of the court and brings out reasons for the disagreement. the description in the court case transcript contains valuable statements presented in the court by the major parties involved in the legal scenario. some of these statements are in the form of legal arguments. other statements provide background information which can be considered as mere facts, which are mainly intended to support a legal argument. such facts can be considered as non-arguments. furthermore, the decisions of the court can also be considered as non-arguments. the following example contains statements in taken from lee v. united states [12] can be used to properly understand the difference between argumentative sentences and non-argumentative sentences. example 4: ● argument: lee contends that he can make this showing because he never would have accepted a guilty plea had he known the result would be deportation. ● fact: petitioner jae lee moved to the united states from south korea with his parents when he was 13. ● court’s decision: the district court, however, denied relief, and the sixth circuit affirmed therefore, identifying argumentative sentences from nonargumentative sentences can be considered as a task of significant importance. in this study, a rule-based approach based on linguistic features was used to determine whether a given sentence provides a legal argument or not. section ii provides an overview of the related work done on identifying relationships among sentences and legal information extraction. section iii describes the methodology which was followed when implementing the proposed systems. section iv describes the approaches we took to evaluate the proposed methodologies. the results obtained by evaluating the system are analyzed in section iv. finally, we conclude our discussion in section v. example 1: ● sentence 1.1: the government makes two errors in urging the adoption of a per se rule that a defendant with no viable defense cannot show prejudice from the denial of his right to trial. ● sentence 1.2: first, it forgets that categorical rules are ill suited to an inquiry that demands a “case-by-case examination” of the “totality of the evidence”. example 2: ● sentence 2.1: courts should not upset a plea solely because of post hoc assertions from a defendant about how he would have pleaded but for his attorney’s deficiencies. ● sentence 2.2: rather, they should look to contemporaneous evidence to substantiate a defendant’s expressed preferences. example 3: ● sentence 3.1: the question is whether lee can show he was prejudiced by that erroneous advice. ● sentence 3.2: a claim of ineffective assistance of counsel will often involve a claim of attorney error “during the course of a legal proceeding”–for example, that counsel failed to raise an objection at trial or to present an argument on appeal. 3 g. rathnayake#1, t. rupasinghe#2, n. de silva#3 , m. warushavithana #4, v. gamage #5, m. perera #6, a.s. perera #7 september 2019 international journal on advances in ict for emerging regions ii. background information extraction from machine-readable text can be considered as an integral aspect when it comes to applying artificial intelligence and computer science to various domains. the processes related to information extraction creates new challenges each time they are being applied to a new domain, due to the domain-specific nature of the text and documents. the legal domain can be considered as such a challenging domain when it comes to natural language processing, mainly due to the nature of legal documents, which employ a vocabulary of mixed origin ranging from latin to english [5]. this challenging nature has stimulated the emergence of legal domain specific works related to different areas such as information extraction [3], information organization [2], [4] and sentiment analysis [13]. as a major task in this study, we attempt to identify the relationships among sentences in court case transcripts. understanding how information is related to each other in machine-readable texts has always been a challenge when it comes to natural language processing. determining the way in which two textual units are connected to each other is helpful in different applications such as text classification, text summarization, understanding the context, evaluating answers provided for a question. analyzing discourse relationships or rhetorical relationships between sentences can be considered as an effective approach to understanding the way how two textual units are connected with each other. discourse relations have been applied in different application domains related to nlp. [14] describes cst [7] based text summarization approach which involves mechanisms such as identifying and removing redundancy in a text by analyzing discourse relations among sentences. [15] compares and evaluates different methods of text summarizations which are based on rst [10]. in another study [16], text summarization has been carried out by ranking sentences based on the number of discourse relations existing between sentences. [17]–[19] are some other studies where discourse analysis has been used for text summarization. these studies related to text summarization suggest that discourse relationships are useful when it comes identifying information that discusses on same topic or entity and also to capture information redundancy. analysis of discourse relations has also been used for question answering systems [20], [21] and for natural language generation [22]. in the study [23], discourse relations existing between sentences are used to generate clusters of similar sentences from document sets. this study shows that a pair of sentences can show properties of multiple relation types which are defined in cst [7]. in order to facilitate text clustering process, discourse relations have been redefined in this study by categorizing overlapping or closely related cst relations together. in [24], the discourse relationships which are defined in [23] have been used for text summarization based on text clustering. the studies [23], [24] emphasize how discourse relationships can be defined according to the purpose and objective of the study in order to enhance effectiveness. when it comes to applying discourse relations into the legal domain, [25] discusses the potential of discourse analysis for extracting information from legal texts. [26] describes a classifier which determines the rhetorical status of a sentence from a corpus of legal judgments. in this study, the rhetorical annotation scheme is defined for legal judgments. the study [27] provides details on the summarization of legal texts using rhetorical annotation schemes. the studies [26], [27] focus mainly on the rhetorical status in a sentence, but not on the relationships between sentences. an approach which can be used to detect the arguments in legal text using lexical, syntactic, semantic and discourse properties of the text is described in [28]. in contrast to these studies, our study is intended to identify relationships among sentences in court case transcripts by analyzing discourse relationships between sentences. identifying relationships among sentences will be useful in the task of determining the flow of information within a court case. extracting argumentative sentences from court case transcripts is another significant task when it comes to information extraction in the legal documents. various researches have been carried out on automatic extraction of arguments from legal texts. the study [29] by wyner, et al. brings out extensive background research on the literature of argumentation and argument extraction with an analysis of various argument corpora. araucaria [30], [31] is a database of arguments from various sources and a tool for diagramming and representing arguments. in another study [30], reed and rowe, introducing araucaria tool, point out that arguments can be graphically represented in a tree, where premises are being branched off of conclusions. arguments in araucariadb are manually annotated and marked up in an xml-based format, aml (argument markup language). the study by wyner et al [29] also presents out how legal arguments can be extracted, using a context-free grammar. it describes legal argument construction patterns, to identify premises and conclusions, which they came up with, by analyzing legal cases from echr (european court of human rights). studies [28], [32] on legal argument automatic detection is also done on echr cases. moens, et al. describe argument detection as a sentence classification problem between arguments and non-arguments [28]. there a classifier is trained on a set of manually annotated arguments, considering sentences in isolation. they have evaluated different feature sets involving lexical, syntactic, semantic and discourse properties of the texts. in the study [32] mochales and moens, points out that arguments are always formed by premises and conclusions. so they have determined argument extraction as a sentence classification problem among premises, conclusions, and non-arguments. furthermore, they have improved the feature set used in [28] by including features that refer to content in previous sentences. all these researches, done on argument extraction, have used echr cases as their corpus. to the best of our knowledge, there has been no research carried out about argument extraction from us court case transcripts. argument patterns identified in the study [29] are very rigid and they have specifically been identified for echr cases. the reporting structures in us court case transcripts are significantly different from echr case reports. therefore, the rules that are described in the study [29] are not directly applicable for extracting argumentative sentences from us court cases. also, to the best of our knowledge, there is no existing annotated corpus which contains argumentative sentences extracted from us court cases. the consequence is machine learning approaches described in previous studies on argument identification in echr cases [28], [32] cannot be used. therefore, it is needed to come up with novel ways to classifying sentences in court case transcripts using discourse and argumentative properties 4 international journal on advances in ict for emerging regions september 2019 identify arguments and non-arguments in us court case transcripts. iii. methodology a. defining discourse relationships observed in court cases five major relationship types were defined by examining the nature of relationships that can be observed between sentences in court case transcripts. ● elaboration one sentence adds more details to the information provided in the preceding sentence or one sentence develops further on the topic discussed in the previous sentence. ● redundancy two sentences provide the same information without any difference or additional information. ● citation a sentence provides references relevant to the details provided in the previous sentence. ● shift in view two sentences are providing conflicting information or different opinions on the same topic or entity. ● no relation no relationship can be observed between the two sentences. one sentence discusses a topic which is different from the topic discussed in the other sentence. after defining these relationships, we adopted the rhetorical relations provided by cst [7] to align with our definitions as shown in table i. table i adopting cst relations definition cst relationships elaboration paraphrase, modality, subsumption, elaboration, indirect speech, follow-up, overlap, fulfillment, description, historical background, reader profile, attribution redundancy identity citation citation shift in view change of perspective, contradiction no relation it is very difficult to observe the same sentence appearing more than once within nearby sentences in court case transcripts. however, we have included redundancy as a relationship type in order to identify redundant information in a case where the two sentences in a sentence pair are the same. b. expanding the dataset a machine learning model was developed in order to determine the relationship between two sentences in court cases. we used the publicly available dataset of cst bank [33] to learn the model. the dataset obtained from cst bank contains sentence pairs which are annotated according to the cst relation types. since we have a labeled dataset [33], we performed supervised learning to develop the machine learning model. support vector machine (svm) was used as svms have shown promising results in previous studies where discourse relations have been used to identify relationships between sentences [23], [24]. table ii provides details on the number of sentence pairs in the data set for each relationship type. table ii number of sentence pairs for each relationship type cst relationship number of sentence pairs identity 99 equivalence 101 subsumption 590 contradiction 48 historical background 245 modality 17 attribution 134 summary 11 follow-up 159 indirect speech 4 elaboration 305 fulfillment 10 description 244 overlap (partial equivalence) 429 by examining the cst relationship types available in the dataset as shown in table ii, it can be observed that a relationship type which suggests that there is no relationship between sentences cannot be found. but no relation is a fundamental relation type that can be observed between two sentences in court case transcripts. therefore, we expanded the data set by manually annotating 50 pairs of sentences where a relationship between two sentences cannot be found. this new class was named as no relation. the 50 sentence pairs which were annotated were obtained from previous court case transcripts. a sentence pair is made up of a source sentence and a target sentence. the source sentence is compared with the target sentence when determining the relationship that is present in the sentence pair. for example, if the source sentence contains all the information in target sentence with some additional information, the sentence pair is said to have the subsumption relationship. similarly, if the source sentence elaborates the target sentence, the sentence pair is said to have the elaboration relationship. c. determining the relationship between sentences using svm model in order to train the svm model with annotated data, features based on the properties that can be observed in a pair of sentences were defined. before calculating the features related to words, we removed stop words in sentences to eliminate the effect of less significant words. also, coreferencing was performed on a given pair of sentences using stanford corenlp corefannotator (“coref”) [34] in order to make feature calculations more effective. the two sentences for example 4 are also taken from lee v. united states [12], example 4: ● sentence 4.1 (target): petitioner jae lee moved to the united states from south korea with his parents when he was 13. ● sentence 4.2 (source): in the 35 years he has spent in this country, he has never returned to south korea, nor has he become a u. s. citizen, living instead as a lawful permanent resident. 5 g. rathnayake#1, t. rupasinghe#2, n. de silva#3 , m. warushavithana #4, v. gamage #5, m. perera #6, a.s. perera #7 september 2019 international journal on advances in ict for emerging regions here the “petitioner jae lee” in the target sentence, is referred using the pronouns “he” and “his” in both sentences. as all these words are referring to the same person, the system replaces “he” and “his” with their representative mention “petitioner jae lee”. then the sentences in example 4 are changed as shown below. example 4 (updated): ● sentence 4.1 (target): petitioner jae lee moved to the united states from south korea with petitioner jae lee parents when petitioner jae lee was 13. ● sentence 4.2 (source): in the 35 years petitioner jae lee has spent in this country, petitioner jae lee has never returned to south korea, nor has petitioner jae lee become a u. s. citizen, living instead as a lawful permanent resident. by resolving co-references calculating noun similarity, verb similarity, adjective similarity, subject overlap ratio, object overlap ratio, subject noun overlap ratio and semantic similarity features between two sentences are made more effective. all the features are calculated and normalized such that their values fall into [0,1] range. we have defined 9 feature categories based on the properties that can be observed in a pair of sentences. following 5 feature categories were adopted mainly from [23] though we have done changes in implementation such as use of co-referencing. 1) cosine similarities following cosine similarity values are calculated for a given sentence pair, ● word similarity ● noun similarity ● verb similarity ● adjective similarity following equation is used to calculate the abovementioned cosine similarities. 𝐶𝑜𝑠𝑖𝑛𝑒 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡 = ∑𝑛𝑖=1 𝐹𝑉𝑆,𝑖 ∗ 𝐹𝑉𝑇,𝑖 √∑𝑛 𝑖=1 (𝐹𝑉𝑆,𝑖 ) 2 ∗ √∑𝑛𝑖=1 (𝐹𝑉𝑇,𝑖 ) 2 here fvs,i and fvt,i represents frequency vectors of source sentence and target sentence respectively. stanford corenlp pos tagger (“pos”) [30] is used to identify nouns, verbs and adjectives in sentences. in calculating the noun similarity feature, singular and plural nouns, proper nouns, personal pronouns and possessive pronouns are considered. both superlative and comparative adjectives are considered when calculating the adjective similarity. the system ignores verbs that are lemmatized into “be”, “do”, “has” verbs when calculating verb similarity feature as the priority should be given to effective verbs in sentences. 2) word overlap ratios two ratios are considered based on the word overlapping property. one ratio is measured in relation to the target sentence. other ratio is measured in relation to the source sentence. these ratios provide an indication of the equivalence of two sentences. for example, when it comes to a relationship like subsumption, source sentence usually contains all the words in the target sentence. this property will be also useful in determining relations such as identity, overlap (partial equivalence) which are based on the equivalence of two sentences. 𝑊𝑂𝑅(𝑇) = 𝐶𝑜𝑚𝑚(𝑇, 𝑆) 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(𝑇) 𝑊𝑂𝑅(𝑆) = 𝐶𝑜𝑚𝑚(𝑇, 𝑆) 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(𝑆) wor(s), wor(t) represents the word overlap ratios measured in relation to source and target sentences respectively. distinct(s), distinct(t) represents the number of distinct words in source sentence and target sentence respectively. the number of distinct common words between two sentences are shown by comm(t, s). 3) grammatical relationship overlap ratios three ratios which represent the grammatical relationship between target and source sentences are considered. ● subject overlap ratio 𝑆𝑢𝑏𝑗𝑂𝑣𝑒𝑟𝑙𝑎𝑝 = 𝐶𝑜𝑚𝑚(𝑆𝑢𝑏𝑗(𝑆), 𝑆𝑢𝑏𝑗(𝑇)) 𝑆𝑢𝑏𝑗(𝑆) ● object overlap ratio 𝑂𝑏𝑗𝑂𝑣𝑒𝑟𝑙𝑎𝑝 = 𝐶𝑜𝑚𝑚(𝑂𝑏𝑗(𝑆), 𝑂𝑏𝑗(𝑇)) 𝑂𝑏𝑗(𝑆) ● subject noun overlap ratio 𝑆𝑢𝑏𝑗𝑁𝑜𝑢𝑛𝑂𝑣𝑒𝑟𝑙𝑎𝑝 = 𝐶𝑜𝑚𝑚(𝑆𝑢𝑏𝑗(𝑆), 𝑁𝑜𝑢𝑛(𝑇)) 𝑆𝑢𝑏𝑗(𝑆) all these features are calculated with respect to the source sentence. subj, obj, noun represents the number of subjects, objects and nouns respectively. comm gives the number of common elements. stanford corenlp dependencyparseannotator (“depparse”) [36] is used here to identify subjects and objects. all the subject types including nominal subject, clausal subject, their passive forms and controlling subjects are taken into account in calculating the number of subjects. direct and indirect objects are considered when calculating the number of objects. all subject and object types are referred from stanford typed dependencies manual [37]. 4) longest common substring ratio longest common substring is the maximum length word sequence which is common to both sentences. when the number of characters in longest common substring is taken as n(lcs) and the number of characters in source sentence is taken as n(s), longest common substring ratio (lcsr) can be calculated as, 𝐿𝐶𝑆𝑅 = 𝑛(𝐿𝐶𝑆) 𝑛(𝑆) this value indicates the part of the target sentence which is present in the source sentence as a fraction. thus, this will be useful especially in determining discourse relations such as overlap, attribution and paraphrase. classifying sentences in court case transcripts using discourse and argumentative properties 6 international journal on advances in ict for emerging regions september 2019 5) number of entities the ratio between the number of named entities can be used as a measurement of the relationship between two sentences. 𝑁𝐸𝑅𝑎𝑡𝑖𝑜 = 𝑁𝐸(𝑆) 𝑀𝑎𝑥(𝑁𝐸(𝑆), 𝑁𝐸(𝑇)) ne represents the number of named entities in a given sentence. stanford corenlp named entity recognizer (ner) [33] was used to identify named entities which belong to 7 types; person, organization, location, money, percent, date and time. in addition to the features mentioned above, following features have been introduced to the system. 1) semantic similarity between sentences this feature is useful in determining the closeness between two sentences. this feature will provide the semantic closeness between two sentences. a method described in [39] is adopted when calculating the semantic similarity between two sentences. semantic similarity score for a pair of words is calculated using wordnet::similarity [40]. 𝑆𝑐𝑜𝑟𝑒 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 (∑ 𝑛 𝑖=1 𝑁𝑜𝑢𝑛𝑆𝑐𝑜𝑟𝑒 + ∑ 𝑛 𝑖=1 𝑉𝑒𝑟𝑏𝑆𝑐𝑜𝑟𝑒) 2) transition words and phrases availability of a transition word or a transition phrase at the start of a sentence indicates that there is a high probability of having a strong relationship with the previous sentence. for example, sentences beginning with transition words like “and”, “thus” usually elaborates the previous sentence. phrases like “to make that”, “in addition” at the beginning of a sentence also implies that the sentence is elaborating on the details provided in the previous sentence. considering these linguistic properties two boolean features were defined. i. elaboration transition if the first word of the source sentence is a transition word which implies elaboration such as “and”, “thus”, “therefore” or if a transition phrase is found within first six words of the source sentence, this feature will output 1. if both of the above two conditions are false, the feature will return 0. two lists containing 59 transition words and 91 transition phrases which imply elaboration were maintained. though it is difficult to include all transition phrases in the english language which implies elaboration relationship, we can clearly say that if these phrases are present at the beginning of a sentence, the sentence is more than likely to elaborate the previous sentence. ii. follow-up transition if the source sentence begins with words like “however”, “although” or phrases like “in contrast”, “on the contrary” which implies that the source sentence is following up the target sentence, this feature will output 1. otherwise, the feature will output 0. 3) length difference ratio this feature considers the difference of lengths between the source sentence and the target sentence. when length(s), length(t) represents the number of words in source sentence and target sentence respectively, length difference ratio (ldr) is calculated as shown below. 𝐿𝐷𝑅 = 0.5 + 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆) − 𝑙𝑒𝑛𝑔𝑡ℎ(𝑇) 2 ∗ 𝑀𝑎𝑥(𝑙𝑒𝑛𝑔𝑡ℎ(𝑆), 𝑙𝑒𝑛𝑔𝑡ℎ(𝑇)) in a relationship like subsumption, the length of the source sentence has to be more than the length of the target sentence. in identity relationship, both sentences are usually of the same length. these properties can be identified using this feature. 4) attribution this feature checks whether one sentence describes a detail in another sentence in a more descriptive manner. within this feature, we check whether a word or phrase in one sentence is cited in the other sentence. this is also a boolean feature. the source sentence and target sentence for example 5 were obtained from turner v. united states [41]: example 5: ● sentence 5.1 (target): such evidence is ’material’ . . . when there is a reasonable probability that, had the evidence been disclosed, the result of the proceeding would have been different. ● sentence 5.2 (source): a ’reasonable probability’ of a different result is one in which the suppressed evidence ’undermines confidence in the outcome of the trial. it can be seen that source sentence defines or provides more details on what is meant by “reasonable probability” in the target sentence. such properties can be identified using this feature. d. determining explicit citation relationships in court case transcripts in legal court case documents, several standard ways are used to point out whence a particular fact or condition was obtained. the target sentence and source sentence in example 6 are obtained from lee v. united states [12]. example 6: ● sentence 6.1 (target): the decision whether to plead guilty also involves assessing the respective consequences of a conviction after trial and by plea. ● sentence 6.2 (source): see ins v. st. cyr, 533 u. s. 289, 322-323 (2001). the two sentences given in example 6 are adjacent to each other. it can be clearly seen that the source sentence provides a citation for the target sentence. this is only one of the many ways of providing citations in court case transcripts. after observing different ways of providing citations in court case transcripts, a rule-based mechanism to detect such citations was developed. if this rule-based system detects that there is a citation relationship, the pair of sentences will be assigned with the citation relationship. such a pair of sentences will not be inputted to the svm model for further processing. from this point onward, this system, which is 7 g. rathnayake#1, t. rupasinghe#2, n. de silva#3 , m. warushavithana #4, v. gamage #5, m. perera #6, a.s. perera #7 september 2019 international journal on advances in ict for emerging regions intended to identify relationships among sentences will be referred as sentence relationship identifier. e. extracting argumentative sentences from court case transcripts two major approaches are followed to extract arguments with the consultation of a legal expert. 1) linguistically identifying arguments using verbs: at first, words such as argue, agree, conclude, rejected, contest, contend, consider, testify, concede, claim, affirm were considered to identify legal arguments. as examples consider the following sentences taken from lee v. united states [12]: (i) a claim of ineffective assistance of counsel will often involve a claim of attorney error “during the course of a legal proceeding”–for example, that counsel failed to raise an objection at trial or to present an argument on appeal. (ii) lee contends that he can make this showing because he never would have accepted a guilty plea had he known the result would be deportation. (iii) in post conviction proceedings, they argued that seven specific pieces of withheld evidence were both favorable to the defense and material to their guilt under brady v. maryland, 373 u. s. 83. (iv) the d. c. superior court rejected petitioners’ brady claims, finding that the withheld evidence was not material. the d. c. court of appeals affirmed. here, sentence 1 and 4 cannot be considered as arguments. sentence 1 represents an opinion and sentence 4 brings out the decision of the court. by observing the selected sentences, we refined the word list and decided to only consider verbs to identify arguments. lemmatized form of verbs in a sentence is extracted using stanford pos tagger [35] and then compared with the predefined list of verbs to check whether the sentence brings out an argument. 2) citation-based argument extraction: in a legal case, there are statements with citations which link to previous cases, in which the judgments have already been finalized. those statements come under the category of case law. if a statement is having a citation, it means that the statement is taken from a previous case and that the same statement applies to current legal case as well. therefore, the lawyers can present the same argument in other legal cases to prove their point. consider the following examples yet again taken from lee v. united states [12], • the decision whether to plead guilty also involves assessing the respective consequences of a conviction after trial and by plea. see ins v. st. cyr, 533 u. s. 289, 322-323. • but in this case counsel’s “deficient performance arguably led not to a judicial proceeding of disputed reliability, but rather to the forfeiture of a proceeding itself.” floresortega, 528 u. s., at 483. the first statement links to ins v. st. cyr, 533 u. s. which means that it is taken from the cited case. and the same statement can be presented as an argument in any other case if the statement is appropriate under the conditions of the legal case. we have taken a rule-based approach to identify these kinds of arguments. iv. results a. results obtained from sentence relationship identifier in order to determine the effectiveness of our system, it is important to carry out evaluations using legal court case transcripts, as it is the domain this system is intended to be used. court case transcripts related to united states supreme court were obtained from findlaw. then the transcripts were preprocessed in order to remove unnecessary data and text. court case title, section titles are some examples of details which were removed in the preprocessing process. those details are irrelevant when it comes to determining relationships between sentences. the relationship types of sentence pairs were assigned using the system. first, the pairs were checked for citation relationship using the rule-based approach. the relationship types of the sentence pairs where citation relationship couldn’t be detected using the rule-based approach were determined using the support vector machine model. the results obtained using the system for the sentence pairs extracted from the court case transcripts were then stored in a database. from those sentence pairs, 200 sentence pairs were selected to be annotated by human judges. before selecting 200 sentence pairs, the sentence pairs were shuffled to eliminate the potential bias that could have been existent due to a particular court case. shuffling was helpful in making sure that the sentence pairs to be annotated by human judges were related to different court case transcripts. then the selected 200 pairs of sentences to be annotated were grouped together as clusters of five sentence pairs. each cluster was annotated by two human judges who were trained to identify the relationships between sentence pairs as defined in this study. as expected, the redundancy relationships between sentences could not be observed within the sentence pairs which were annotated using human judges. from the 200 sentence pairs that were observed, our system did not predict redundancy relationship for any sentence pair. similarly, human judges did not assign the redundancy relationship to any sentence pair. the confusion matrix which was generated according to the results obtained is given in table iii. the details provided in the matrix are based only on the sentence pairs that were agreed by two human judges to have the same relationship type. the reasoning behind this approach is to eliminate sentence pairs where there are ambiguities of the relationship type between them. the same approach was used to obtain the results which are presented in table iv. in contrast, table v contains results obtained by considering sentence pairs where at least one of the two judges who annotated the pair agrees upon a particular relationship type. the recall results given in table iv has a significant importance as all the sentence pairs contained in that results set are annotated with a relationship type which was agreed by two human judges. the precision results provided in table v indicate the probability of at least one human judge agreeing with the system’s predictions in relation to each relationship type. evaluation results from table iv, table v shows that the system works well when identifying elaboration, no relation and citation relationship types where f-measure values are above 75% in all cases. shift in view relationship type was not assigned by the system to any classifying sentences in court case transcripts using discourse and argumentative properties 8 international journal on advances in ict for emerging regions september 2019 of the 200 sentence pairs which were considered in the evaluation. human vs human correlation and human vs system correlation when it comes to identifying these relationship types were also analyzed. first, we calculated these correlations without considering the relationship type using the following approach. for a given sentence pair p, m(p) is the value assigned to the pair. n is the number of sentence pairs. 1) human vs human correlation ( cor(h,h) ) when both human judges are agreeing on a single relationship type for the pair p, we assign m(p) = 1. otherwise, we assign m(p) = 0. 𝐶𝑜𝑟(𝐻, 𝐻) = ∑𝑛𝑃=1 𝑚(𝑃) 𝑛 2) human vs system correlation ( cor(h,s) ) when both human judges are agreeing with the relationship type predicted by the system for the sentences pair p, we assign m(p) = 1. if only one human judge is agreeing with the relationship type predicted by the system for p, we assign m(p) = 0.5. if both human judges disagree with the relationship type predicted by the system for p, we assign m(p) = 0. 𝐶𝑜𝑟(𝐻, 𝑆) = ∑𝑛𝑃=1 𝑚(𝑃) 𝑛 table iii confusion matrix predicted actual e la b o r a ti o n n o r e la ti o n c it a ti o n s h if tin -v ie w ∑ elaboration 93.9 6.1 0.0 0.0 99 no relation 11.9 88.1 0.0 0.0 42 citation 0.0 4.8 95.2 0.0 21 shift in view 100 0.0 0.0 0.0 3 ∑ 101 44 20 0 165 table iv results comparison of pairs where both judges agree discourse class precision recall f-measure elaboration 0.921 0.939 0.930 no relation 0.841 0.881 0.861 citation 1.000 0.952 0.975 shift in view 0 the following results could be observed after calculating the correlations, ● the correlation between a human judge and another human judge = 0.805 ● the correlation between a human judge and the system = 0.813 table v results comparison of pairs where at least one judge agrees discourse class precision recall f-measure elaboration 0.930 0.902 0.916 no relation 0.846 0.677 0.752 citation 1.000 0.910 0.953 shift in view 0 when analyzing these two correlations, it can be seen that our system performs with a capability which is near to the human capability. the results obtained by calculating human vs. human and human vs. system correlations in relation to each relationship type are given in table vi. the following approach was used to calculate these two correlations for each relationship type. consider the relationship type r, let, ● s = the set containing all the sentences pairs which are predicted by the system as having the relationship type r ● u = the set containing all the sentences pairs which were annotated by at least one human judge as having the relationship type r. ● v = the set containing all the sentences pairs which were annotated by two human judges as having the relationship type r. corr(h,h) represents human vs human correlation and corr(h,s) represents human vs system correlation. for a given set a, n(a) indicates the number of elements in set a. 𝐶𝑜𝑟𝑟(𝐻, 𝐻) = 𝑛(𝑉) 𝑛(𝑈) 𝐶𝑜𝑟𝑟(𝐻, 𝐻) = 𝑛(𝑆 ∩ 𝑈) 𝑛(𝑆 ∪ 𝑈) the results obtained using this approach is provided in table vi. table vi correlations by type discourse class humanhuman humansystem 𝐇𝐮𝐦𝐚𝐧 − 𝐒𝐲𝐬𝐭𝐞𝐦 𝐇𝐮𝐦𝐚𝐧 − 𝐇𝐮𝐦𝐚𝐧 elaboration 0.750 0.843 1.124 no relation 0.646 0.603 0.933 citation 1.000 0.955 0.955 shift in view 0.188 0.000 0.000 the results which are in table vi suggest that the system performs with a capability which is near to the human capability when it comes to identifying relationships such as elaboration, no relation and citation. enhancing system’s ability to identify shift in view relationship is one of the major future challenges. at the same time, human vs human correlation when it comes to identifying shift in view relationship type is 0.188. this indicates that humans are also having ambiguities when identifying shift in view relationships between sentences. either elaboration or shift in view relationship occurs when two sentences are discussing the same topic or entity. shift in view relationship occurs over elaboration when two sentences are providing different views or conflicting facts on 9 g. rathnayake#1, t. rupasinghe#2, n. de silva#3 , m. warushavithana #4, v. gamage #5, m. perera #6, a.s. perera #7 september 2019 international journal on advances in ict for emerging regions the same topic or entity. the no relation relationship can be observed between two sentences when two sentences are no longer discussing the same topic or entity. in other words, no relation relationship suggests that there is a shift in the information flow. as shown in table iii, the sentence pairs with shift in view relationship are always predicted as having elaboration relationship by the system. by observing these results, it can be seen that in most of the cases the system is able to identify whether the sentences are discussing the same topic or not. b. results obtained from argumentative sentence detection approaches the individual sentences which were extracted from court cases to evaluate the proposed approaches to detect argumentative sentences. the sentences detected as argumentative sentences from each of the two approaches were considered to be annotated by the human judges. the precisions of the argumentative sentence detection approaches were then calculated by comparing with the human annotations. table vii demonstrate the obtained results. table vi results comparison of approaches used to detect argumentative sentences approach no. of detected sentences precision argumentative verb based 77 64.93% citation based 93 90.32% as shown in table vii, the citation-based approach has shown higher precision than that of argumentative verbsbased approach. however, the both approaches work at a satisfactory level with over 60% precision suggesting the effectiveness of the proposed approaches when it comes to detecting legal arguments. v. conclusions the methods and experiments presented in this journal paper on legal information extraction based on sentence classification are extensions of our conference paper [42] on identifying relationships among sentences in court case transcripts using discourse relations. linguistic rule-based approaches that can be used to identify sentences which provide legal arguments are presented exclusively in this journal paper. demonstrating how sentence classification can facilitate the process of information extraction from legal court case transcripts can be considered as the primary research contribution of this study. this study presents the way in which discourse relationships between sentences can be used to identify the relationships among sentences in united states court case transcripts. five discourse relationship types were defined in this study in order to automatically identify the flow of information within a court case transcript. this study describes how a machine learning model and a rule-based system can be combined together to enhance the accuracy of identifying relationships between sentences in court case transcripts. features based on the properties that can be observed between sentences have been introduced to enhance the accuracy of the machine learning model. the study also proposes novel approaches that can be used to extract argumentative sentences from court case transcripts. the approaches to classify argumentative sentences and nonargumentative sentences have the potential to support automatic extraction of legal arguments from united states court case transcripts. the proposed system to identify relationships among sentences can be successfully applied to identify the sentences which develop on the same discussion topic or entity. in addition, it is capable of identifying situations in court cases where the discussion topic changes. the system is highly successful in the identification of legal citations. the empirical results also demonstrate the effectiveness and success of the approaches which were used to identify sentences which provide legal arguments. these outcomes demonstrate that the information extraction mechanisms proposed in this study has a promising potential to be applied in tasks related to systematic information extraction from court case transcripts. one such task is the identification of supporting facts, citations which are related to a particular legal argument. another is the identification of changes in discussion topics within a court case. despite the usefulness and applicability of the proposed approaches, the outcomes of this study have also demonstrated that the proposed mechanisms are not sufficient when it comes to detecting the situations where two sentences are providing different opinions on the same discussion topic. enhancing this capability in the system can be considered as the major future work. references [1] webfinance inc, “what is case law? definition and meaning businessdictionary.com,” http://www.businessdictionary.com/definition/case-law.html, (accessed on 05/17/2018) [2] v. jayawardana, d. lakmal, n. de silva, a. s. perera, k. sugathadasa, b. ayesha, and m. perera, “word vector embeddings and domain specific semantic based semi-supervised ontology instance population,” international journal on advances in ict for emerging regions, vol. 10, no. 1, p. 1, 2017. [3] k. sugathadasa, b. ayesha, n. de silva, a. s. perera, v. jayawardana, d. lakmal, and m. perera, “synergistic union of word2vec and lexicon for domain specific semantic similarity,” in industrial and information systems (iciis), 2017 ieee international conference on. ieee, 2017, pp. 1–6 [4] v. jayawardana, d. lakmal, n. de silva, a. s. perera, k. sugathadasa, and b. ayesha, “deriving a representative vector for ontology classes with instance word vector embeddings,” in innovative computing technology (intech), 2017 seventh international conference on. ieee, 2017, pp. 79–84. [5] k. sugathadasa, b. ayesha, n. de silva, a. s. perera, v. jayawardana, d. lakmal, and m. perera, “legal document retrieval using document vector embeddings and deep learning,” arxiv preprint arxiv:1805.10685, 2018. [6] v. jayawardana, d. lakmal, n. de silva, a. s. perera, k. sugathadasa, b. ayesha, and m. perera, “semi-supervised instance population of an ontology using word vector embedding,” in advances in ict for emerging regions (icter), 2017 seventeenth international conference on. ieee, 2017, pp. 1–7 [7] d. r. radev, “a common theory of information fusion from multiple text sources step one: cross-document structure,” in proceedings of the 1st sigdial workshop on discourse and dialogue-volume 10. association for computational linguistics, 2000, pp. 74–83. [8] p. rashmi, d. nihkil, l. alan, m. eleni, r. livio, j. aravind, w. bonnie et al., “the penn discourse treebank 2.0,” in proceedings of classifying sentences in court case transcripts using discourse and argumentative properties 10 international journal on advances in ict for emerging regions september 2019 the sixth international conference on language resources and evaluation (lrec08), marrakech, morocco, may. european language resources association (elra)., 2008 [9] w. c. mann and s. a. thompson, rhetorical structure theory: a theory of text organization. university of southern california, information sciences institute, 1987. [10] l. carlson, m. e. okurowski, and d. marcu, rst discourse treebank. linguistic data consortium, university of pennsylvania, 2002. [11] f. wolf, e. gibson, a. fisher, and m. knight, “discourse graphbank,” linguistic data consortium, philadelphia, 2004. [12] “lee v. united states,” in us, vol. 432, no. no. 76 -5187. supreme court, 1977, p. 23. [13] v. gamage, m. warushavithana, n. de silva, a. s. perera, g. ratnayaka, and t. rupasinghe, “fast approach to build an automatic sentiment annotator for legal domain using transfer learning,” arxiv preprint arxiv:1810.01912, 2018. [14] z. zhang, s. blair-goldensohn, and d. r. radev, “towards cstenhanced summarization,” in aaai/iaai, 2002, pp. 439–446. [15] v. uzeda, t. pardo, and m. nunes, “a comprehensive summary informa-ˆ tiveness evaluation for rst-based summarization methods,” international journal of computer information systems and industrial management applications (ijcisim) issn, pp. 2150–7988, 2009. [16] m. l. d. r. castro jorge and t. a. s. pardo, “experiments with cstbased multidocument summarization,” in proceedings of the 2010 workshop on graph-based methods for natural language processing. association for computational linguistics, 2010, pp. 74–82. [17] d. marcu, “from discourse structures to text summaries,” intelligent scalable text summarization, 1997. [18] d. r. radev, h. jing, m. stys, and d. tam, “centroid-based summa-´ rization of multiple documents,” information processing & management, vol. 40, no. 6, pp. 919–938, 2004. [19] a. louis, a. joshi, and a. nenkova, “discourse indicators for content selection in summarization,” in proceedings of the 11th annual meeting of the special interest group on discourse and dialogue. association for computational linguistics, 2010, pp. 147–156. [20] k. c. litkowski, “cl research experiments in trec-10 question answering,” no. 250. national institute of standards & technology, 2002, pp. 122–131. [21] s. verberne, l. w. j. boves, n. h. j. oostdijk, and p. a. j. m. coppen, “discourse-based answering of why-questions,” traitement automatique des langues, vol. 47, pp. 21–41, 2007. [22] p. piwek and s. stoyanchev, “generating expository dialogue from monologue: motivation, corpus and preliminary rules,” in human language technologies: the 2010 annual conference of the north american chapter of the association for computational linguistics. association for computational linguistics, 2010, pp. 333–336. [23] n. a. h. zahri, f. fukumoto, and s. matsuyoshi, “exploiting discourse relations between sentences for text clustering,” in 24th international conference on computational linguistics, 2012, p. 17. [24] n. a. h. zahri, f. fukumoto, m. suguru, and o. b. lynn, “exploiting rhetorical relations to multiple documents text summarization,” [25] b. hachey and c. grover, “a rhetorical status classifier for legal text summarisation,” text summarization branches out, 2004. [26] b. hachey and c. grover, “extractive summarisation of legal texts,” artificial intelligence and law, vol. 14, no. 4, pp. 305–345, 2006. [27] c. reed and g. rowe, “araucaria: software for argument analysis, diagramming and representation,” international journal on artificial intelligence tools, vol. 13, no. 04, pp. 961–979, 2004. [28] g. r. chris reed, raquel mochales palau and m.-f. moens, “language resources for studying argument,” in proceedings of the sixth international conference on language resources and evaluation (lrec’08), b. m. j. m. j. o. s. p. d. t. nicoletta calzolari (conference chair), khalid choukri, ed. marrakech, morocco: european language resources association (elra), may 2008, http://www.lrecconf.org/proceedings/lrec2008/. [29] international journal of network security & its applications, vol. 7, no. 2, p. 1, 2015. [30] m.-f. moens, c. uyttendaele, and j. dumortier, “information extraction from legal texts: the potential of discourse analysis,” international journal of human-computer studies, vol. 51, no. 6, pp. 1155–1171, 1999. [31] [32] r. mochales-palau and m. moens, “study on sentence relations in the automatic detection of argumentation in legal cases,” frontiers in artificial intelligence and applications, vol. 165, p. 89, 2007. [33] d. radev, j. otterbacher, and z. zhang, “cstbank: cross-document structure theory bank,” http://tangra.si.umich.edu/clair/cstbank, 2003. [34] k. clark and c. d. manning, “entity-centric coreference resolution with model stacking,” in proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), vol. 1, 2015, pp. 1405–1415. [35] k. toutanova, d. klein, c. d. manning, and y. singer, “featurerich part-of-speech tagging with a cyclic dependency network,” in proceedings of the 2003 conference of the north american chapter of the association for computational linguistics on human language technology-volume 1. association for computational linguistics, 2003, pp. 173–180. [36] d. chen and c. manning, “a fast and accurate dependency parser using neural networks,” in proceedings of the 2014 conference on empirical methods in natural language processing (emnlp), 2014, pp. 740–750. [37] m.-c. de marneffe and c. d. manning, “stanford typed dependencies manual,” technical report, stanford university, tech. rep., 2008. [38] j. r. finkel, t. grenager, and c. manning, “incorporating non-local information into information extraction systems by gibbs sampling,” in proceedings of the 43rd annual meeting on association for computational linguistics. association for computational linguistics, 2005, pp. 363–370. [39] m. a. tayal, m. raghuwanshi, and l. malik, “word net based method for determining semantic sentence similarity through various word senses,” in proceedings of the 11th international conference on natural language processing, 2014, pp. 139–145. [40] t. pedersen, s. patwardhan, and j. michelizzi, “wordnet:: similarity: measuring the relatedness of concepts,” in demonstration papers at hltnaacl 2004. association for computational linguistics, 2004, pp. 38–41. [41] “turner v. united states,” in us, vol. 396, no. no. 190. supreme court, 1970, p. 398. [42] g. ratnayaka, t. rupasinghe, n. de silva, m. warushavithana, v. gamage, and a. s. perera, “identifying relationships among sentences in court case transcripts using discourse relations,” in advances in ict for emerging regions (icter), 2018 eighteenth international conference on. ieee, 2018, pp. 13–20. http://www.lrecconf.org/proceedings/lrec2008/ ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2020 13 (1): april 2020 international journal on advances in ict for emerging regions identification of abusive sinhala comments in social media using text mining and machine learning techniques h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 abstract— with the technology revolution, most of the natural languages that are used all over the world have won the digital world. therefore, people use modern technologies such as social media and the internet with their native languages. as a result, people who are with self-ego on their tradition, race, caste, religion and other social factors, tend to make abusiveness on others who do not belong to the same social group by their native languages. since the social media platforms do not have centralized control, it has become a good platform to advertise their backward ideas without being governed and monitored. the sinhala language has also been added to most famous social media platforms. though the sinhala language has more than 2500 years of history, it does not have rich resources for computer-based natural language processing. therefore, it has been a very difficult task to automatically detect sinhala abusive comments which are being published and shared among social media platforms. therefore, here, we have used evenly distributed 2000 sinhala comment corpus among offensive and neutral classes to train three different models: multinomial naïve bayes (mnb), support vector machine (svm) and, random forest decision tree (rfdt) and the features were extracted from bag of word model, word n-gram model, character n-gram model, and word skip-gram model to automatically detect sinhala abusive comments. after the training process, each model was tested with 200 evenly distributed comment corpus and mnb showed the highest accuracy of 96.5% with 96% average recall for both character tri-gram and character four-gram models. further, two lexiconbased approaches called cross-lingual lexicon approach and corpus-based lexicon approach were considered to detect sinhala abusive comments. from these two lexicon based approaches, the corpus-based lexicon gave the highest accuracy of 90.5% with an average recall of 90.5%. keywords— sinhala abusive comment detection, machine learning, text mining, natural language processing. i. introduction with the rapid growth of technology, traditional communication media such as newspapers, radio channels, and television have been invaded by the internet-based communication media. since these new internet-based communication media make person based telecommunication platforms, people who do not have a space to address the mass community, welcome the social media platforms warmly. therefore, almost all the people around the world now are connected by some kind of social media. despite traditional ideologies, the young generation of each country joins the social media networks massively day by day. since non-international languages are facilitated in social media, people who only know their mother tongue, have been able to work on these platforms without having any difficulties. it is a great opportunity for the people who do not have a place to share their ideologies, experiences or any other thing like to be shared. though the basic and primary concept of social media is connecting people, some social media users misuse this primary goal by making offense towards others. since the number of users who have involved with social media is higher than traditional mass media, the impact that can be put on society by these social media are very serious. when we narrow down our focus into sri lanka, according to the online statistics [15], more than 16 million people are using sinhala as their communication language and more than 7 million people use the internet. the sinhala language has a remarkable history and it belongs to the indoaryan language family which is a subfamily of indoeuropean [1]. it consists of 18 vowels and 42 consonants, but now only uses 16 vowels and 41 consonants [17]. since sinhala is a morphologically rich language [1], it is a very complex task to make algorithmic natural language processing resources. like other languages, sinhala has also been used in social media networks to post abusive comments. from 2014 to 2018, sri lanka faced two racism based riots due to these kinds of abusive comments and the government as well as social media authorities could not control the situation, as there was not a possible method available to identify sinhala abusive comments which were being shared. in the 2018 incident, the government has to ban social media for a few days, but users were able to access their accounts by using third party virtual private networks (vpn). hence, it is clear that banning social media will not be an optimal solution to control such a situation in the future. it was proven by a few subsequent incidents that occurred after the 21st april easter bomb attack in sri lanka 2019. even the government banned social media at that time, people shared a lot of abusive comments by enabling vpn. one of the major problems in social media is the lack of governance in the content which is being published and shared. until someone reports that a particular content violates the policies, that content will not be monitored and governed. also, social media providers do not have enough linguistic expertise to handle such kind of situations. most of the time, when a complaint is received, they say the reported content does not violate the policies even the content violates the policies. manuscript received on 14 nov. 2019. recommended by dr. d.d. karunaratna on 02 april 2020. this paper is an extended version of the paper “sinhala hate speech detection in social media using text mining and machine learning” presented at the icter 2019. h.m.s.t. sandaruwan, s.a.s. lorensuhewa, and m.a.l. kalyani are from the department of computer science, university of ruhuna, sri lanka.. (stsandaruwan001@gmail.com, aruna@dcs.ruh.ac.lk, kalyani@dcs.ruh.ac.lk). identification of abusive sinhala comments in social media using text mining and machine learning techniques 2 international journal on advances in ict for emerging regions april 2020 although everyone has a right to share their opinions and ideologies, that ideas should not invade others’ feelings, because we all are human beings and have a right to be prevented from any kind of offensiveness. since there are no perfect men and women, we should have a way to control these situations, such as social riots which occurred because of a few abusive comments. a social study [1] done in 2014 shows that sri lankan people also use the sinhala language in social media to spread abusive comments and it discusses the importance of identifying online sinhala abusive comments in an automated manner. in this research, we focused on the identification of sinhala abusive comments using text analytical models and lexicon models, hence it is a binary classification. ii. literature review abusive speech detection on web content is an ongoing research area nowadays. there is a considerable amount of researches has been done for the english language ([3],[7],[13]), but very few for other languages such as arabic [16], german [7], and chinese. since 2015 to 2018, a lot of researches has been carried to detect abusive speeches, therefore significant attention has been attracted by the researchers who involve with the text mining area. though it is difficult to compare different methodologies that have been used for hate speech detection due to the different data sets they used, we can identify the main approaches that were used to detect online abusive comments. according to the literature that we studied, abusive speech identification has been carried through two different approaches namely, lexicon-based abusive speech detection and machine learning based abusive hate speech detection. further, some researchers have used a combination of these two approaches to detect abusive comments. according to the research [3] which was published in 2015, the lexicon-based approach has been considered to detect online hate speeches. they started the research with comments collection phase and concerned online forums, blogs and comment sections in news reviews to gather the comments. as the research data sources, they have created two data sets. in the first source, it includes 100 blog postings from different 10 web sites and each site, 10 blog postings were collected. these 10 web sites were taken from a list provided in the hate directory [4]. the next source was the web page content which was related to the israel-palestinian conflict and it was a 150-page web document. annotation of content had been done by two graduate students in their university and 30% of each source was annotated. this research has been conducted in three phases, as the first phase subjectivity detection has been done. in that case, they needed a sentiment lexicon and therefore they have used [5] and sentiwordnet [6] as resources. to determine whether a given sentence is subjective or not, they extracted the positive or negative scores from each sentiment word in the sentence. then they calculated both positive and negative scores and subtract the negative score from a positive score. if the synset score is greater than 0.5 or less than -0.5, the given sentence is considered as positive or negative respectively. therefore, if a sentence has either a positive or a negative score according to the given rules, that sentence was considered as subjective and not objective. about 56% of the first corpus and 75% of the second corpus were subjective in this research. the lexicon of hate speech was built as the second phase. therefore, they have used three different sets of features. they identified negative opinionated words from the subjectivity analysis and that words were selected as the first feature set of the hate speech lexicon. for the second set, they selected all verbs that are related to their hate corpus but not in the first feature set, then they gathered hypernyms. if those words exist in their corpus, then that words are added into the lexicon. the third set was created with hate nouns, which were related to three types such as religion, race, and nationality. by using named entity recognition (ner) software, they have identified the source and the recipients of opinions. the experiment was done as follows. at the first stage of the experiment, they only considered negative semantic features, but the accuracy was less than 70%. then hate verbs were included in the semantic features, so the accuracy was increased above 70%. the best results obtained were an f-score of 70.83 for the first corpus with three features. since we do not have a sentiment lexicon as sentiwordnet for the sinhala language to evaluate a sentence as positive or negative before the phase of abusive speech detection, we have to directly apply the lexicon with given comments. as sinhala does not have a good ner, we have not been able to obtain the source and the recipient as they did in their research. however, the lexicon-based approaches are not enough in the process of the online abusive speech detection process, because the process of identifying abusive comments from the lexicon approach is limit to the words that exist in the pre-built lexicon. therefore, some researches have focused on machine learning techniques to identify online abusive comments more precisely. machine learning allows computers to learn from the data without applying all the steps or the instructions by a programmer that is needed to perform some specific task. the foundation of machine learning is training data. therefore, training data plays a huge role in machine learning and the learning algorithm generates a new set of rules, based on the inference of the data. these algorithms are formally known as the model and the same algorithms can be used by different data sets to generate different models. further, machine learning approaches can be categorized into two strategies as supervised learning and unsupervised learning. supervised and unsupervised learning is divided based on a feature of the training data. if the training data is labeled with its belonging class, the learning model will be considered as supervised learning. otherwise, the model is known as an unsupervised model. among various supervised learning algorithms, naïve bayes, decision tree algorithms, support vector machine algorithm, and logistic regression algorithm have been considered in abusive speech classification. thus unsupervised learning algorithms such as k-means clustering and hierarchical clustering are very few in the abusive speech detection process. in 2018, research [7] has been done on the topic of hate speech detection and they have concerned the german language as the targeting language and have tried to apply the approaches that were used in english hate speech detection researches to build their models. in that study, they have shown the importance of hate speech detection on other languages as english due to some crisis. therefore, the main 3 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions goal of their research was to “investigate the potential value of automatic analytics of german texts to detect hate speech”. as the dataset, the user-generated comments were taken from the news platforms on the internet and they had to use web scraping technology since most german news platforms do not offer apis. therefore they have used a python framework called scrapy; with that web scraping technology, 376,143 comments and 21,740 articles have been collected and that comments were annotated by 247 individual participants. in that process, they have only concerned two class labels as “hate” and “non-hate”. then they received a total of 11,973 rated comments and they were distributed among three classes as 3875 hate comments, 6073 non-hate, and 2025 unclassified comments. for the feature extraction process, they have used bag of words (bow), n-grams, linguistic features such as number of punctuation marks, number of words in a comment and word2vec/doc2vec. from annotated comments, they have used 811 hate and 1561 non-hate comments to create the model. because of the unbalance of the two classes, comments were used under the sampling method. then 1622 comments that include both hate and non-hate comments were applied to the logistic regression model and the highest accuracy was taken for the bow feature with 0.7608. since they did not have a hate speech corpus, they used web crawling techniques to extract the comments from online forums. we also had to use the same web scraping approach as we did not have a comment corpus that contains sinhala abusive speeches. this research [7] was based on the german language and it shows that the bow features are good at detecting hate speeches as it works with english despite the language difference. hence it guided us to use the bow feature with the sinhala abusive speech detection. as the very first attempt in hate speech detection in sinhala comments that are published with the sinhala unicode, research [8] was published in 2018. in that study, they have focused on sinhala racism based speech detection. however, as the first attempt on sinhala abusive detection, it comes with few limitations. the first limitation is, though they could get 70.8% accuracy, that accuracy cannot be considered as a good measurement, because they have used an unbalanced data set which contains 73 racism based comments with 111 non-racism based comments. the second limitation is the feature extraction method that they used. though a lot of similar researches ([19]) which was done on other languages has tried with several feature extraction methods, this study [8] has only considered the word bi-gram models. from that, we cannot get a clear idea about the best fitting features that can be used in sinhala abusive comment identification process. therefore, there is a significant need to identify other best fitting features, which can be used in sinhala abusive comment detection process. since abusive comments are not limited to racism based comments, we need to consider other characteristics of abusive comments such as religion, gender, sexual orientation or any other disability since these types of comments should also be removed from social media. as we identified, the third limitation of this research is the model that they have used to train the model. since svm is considered as the only one model, we cannot get a clear idea of other existing models that can be applied with sinhala abusive comments identification process. once we went through the literature, we found that all most all the researches have used python with nltk libraries [18] for pre-processing and model-building. further, supervised learning methods are widely used in abusive speech detection and rather than deep learning techniques, machine learning approaches have been used in this task. therefore, we also considered the statistical model-based machine learning approaches due to the lack of a large number of comments to consider the deep learning approaches. in [13], it has used natural process feature selection algorithms like character n-gram, word n-gram, word skipgram and brown cluster with the support vector machine classifier. here, they have obtained 78% accuracy for the character 4-gram feature. as the research [18] shows, unigram, bigram, and trigram features can also be used with the tf-idf method and as per the results, they have found that logistic regression works better and the best model has resulted with 0.91 precision, 0.90 recall, and 0.90 f1-score. recent research [19] done for the indonesian language discusses the importance of identifying hate speeches in the social media that are published with the indonesian language. since they have not much-related researches regarding the indonesian language, they were guided by other language researches. therefore, they have used word n-gram and character n-gram feature extraction methods with the naïve bayes, svm, bayesian logistic regression (blr), and rfdt classifiers. the best performance of 93.5% of f-measure was scored for the word n-gram features with the rfdt algorithm. by considering all most all the factors, we proceeded the research with a few different feature groups; character ngram, word n-gram, word-skip gram, and bow features as experimented in ([7],[8],[19]). these extracted features were trained and tested with a few statistical models; mnb, svm, and rfdt as these classifiers have shown good performances ([8],[18],[19]) despite the language difference to fill the blanks in sinhala abusive comment identification process. other researches such as ([7],[18],[19]) on various languages shows the application of pre-processing techniques such as stemming and removing stop words are best for model training. therefore, we also have considered these preprocessing techniques with our study. iii. methodology the goal of our research was to compare multiple models and investigate the effect of features that can be used to train models in the sinhala abusive speech detection process. therefore a methodology based on training, validation, and testing have been considered in our research. at the training phase, we conducted experiments with three different variations. as the method 1, we built a corpus-based lexicon with hate and offensive words, hence it is an abusive lexicon for sinhala, and then that lexicon was used for abusive speech classification. as the method 2, machine learning techniques were used. by applying extracted features, three different models: multinomial naïve bayes (mnb), support vector machine (svm) and random forest decision tree (rfdt) were trained. as the method 3, using trained models and created lexicon, we tested the testing data set that is containing 200 evenly distributed comments. since we could not find any sinhala abusive or offensive related corpus online, we initiated the research with the corpus construction. identification of abusive sinhala comments in social media using text mining and machine learning techniques 4 international journal on advances in ict for emerging regions april 2020 a. data set construction in supervised learning methods, there is a need for a labeled data set that is required for the training. the resulting system’s ability depends on the content of the data set and its annotation. when the data set is highly correlated with the considering topic, the predicting results can be guaranteed. though there is a lot of hate speech annotated data set available on the internet for many languages such as english, german and arabic, there is no such resource available for the sinhala language. therefore, we extracted and annotated the comments according to the details given in the following section. 1) collection of sinhala comments: here we considered two online social media platforms: facebook and youtube with a sri lankan gossip site. since facebook and youtube have apis, we could access them by creating api keys. but for sinhala gossip site, we had to build an in-house web crawler to extract the comments. 2) comments filtering process: since some of the collected comments were not in sinhala unicode encoding, we filtered them out from our corpus. further, some special characters such as emoji, urls and other non-sinhala characters were removed from our data corpus by using regular expressions. b. comments annotation. after the comment extraction process, extracted comments were annotated with the help of three annotators, since we are focusing on the supervised learning approach. based on the majority vote, the class of comment was elected. if a comment was elected as a neutral one it was annotated with the label “0” and else if it is an abusive comment, it was labeled as “1”. from the annotated comments, 2000 comments were selected to train the model while selecting 200 comments for testing the model randomly. table i shows the individual instances of each class label. table i annotated comments comment type count offensive comments 1100 neutral comments 1100 from the annotated comments, randomly selected 100 comments from each class and that comments set was used as the testing data set. after the annotation process, we did a corpus analysis of collected data. c. corpus analysis to identify the characteristics of our corpus, we applied a few corpora analyzing techniques as follows. 1) length analysis: at the first step of the corpus analysis, we did a length analysis of the corpus. the annotated data set was used for the analysis and we divided the comments into two corpora such as offensive and neutral. this comment separation was done based on the comments’ class and it proceeded with a python script. if the comment’s label is “1” that comment is put to the offensive corpus and if the label is “0”, it is put to the neutral corpus by this script. each corpus contained 1000 comments. at the length analysis process, we considered the following criteria and the results obtained are listed in table ii. • average word length in this analysis, we considered the average word length of words in each corpus. it is important to identify the number of characters that are used in each corpus, because every language has a fixed set of the abusive word list, though they may vary according to the context. • average sentence length we analyzed sentence length to check whether offensive and neutral speeches use a small number of words or not, as we wanted to discover the average sentence length that is used in the offensive and neutral comments. table ii length analysis of each corpus offensive neutral average sentence length 11.03 12.377 average word length 4.973 5.033 according to the analysis, we found that neutral comments use more words than offensive comments and sinhala abusive words have fewer characters than normal words. 2) vocabulary analysis: here, we have considered the number of words used in each corpus. our objective was to identify the number of words used in all two types and we wanted to analyze the word count behavior in both corpora. therefore, we did the following two analyses for each offensive and neutral corpus. in table iii, analyzed results are listed. • total number of words here we considered the total number of words used in each corpus. our objective was to compare and contrast the total number of words used in an offensive speech and a neutral speech. • total number of unique words we found the number of unique words in each corpus. we did this analysis to check whether people use the same set of words to make comments or not. table iii vocabulary analysis offensive corpus neutral corpus total num. of words 11041 12389 total num. of unique words 4642 5346 though each corpus contained 1000 comments, from these results, we can conclude that offensive speeches use only a few words since it has less unique words than the neutral comments corpus and these words are used again and again. 5 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions 3) zipf’s law analysis: this law was invented in 1935 and it was the first academic study of word frequencies. it states that the frequency (f) of any word in a given natural language corpus is inversely proportional to its rank (r) in the frequency table. the formula of zipf’s law is as follows. where ‘f’ is the frequency of a word ranked ‘rth’ and the exponent ‘a’ is almost 1. zipf’s law behavior on the sinhala language has been experimented in 2004 [9] and it shows that the sinhala language almost follows the law. since their corpus is based on grammatically correct sinhala sentences, we decided to study the behavior of zipf’s law on an ungrammatical, neutral and offensive corpus. therefore, we divided our sentences into two corpora as we used in the above analysis. each corpus consisted of 1000 comments and that comments were tokenized. the tokenization process was done in two approaches. in the first approach, stop words removing and stemming processes were not considered and in the second approach, words were tokenized after applying stop word removal and stemming techniques. • zipf’s law behavior of abusive comments without stemming and stop word removing we took 1000 comments to study the behavior of zipf’s law in sinhala offensive comments. in this analysis, we did not consider any stemming techniques and stop word removing techniques. top 10 words with the highest frequency are shown in table iv and some letters of some words are replaced by the “#” symbol due to the abusiveness of the words. table iv top 10 offensive words without stemming and stop word removal. rank word frequency 1 මේ 151 2 කැ# 92 3 ම ො# න 90 4 හු#ම ෝ 67 5 ම ො#නයො 60 6 එ ො 53 7 හු#ම ො 52 8 පු# 51 9 ම ොන් 50 10 අමන් 50 then we plotted the log value of offensive words frequency against the log value of word ranks. following fig. 1 shows the retrieved graph. according to the graph that we obtained, we can identify that even the corpus is not grammatically correct; it follows zipf’s law. when we further investigated, we found that there are a lot of words that frequently occurred are just used to construct the order of a sentence. in other words, some of the frequently occurred words are not useful in the separation of comment class since they are used in both abusive and neutral comments. here, we considered them as stop words. fig. 1 zipf’s law behaviour of offensive corpus without stemming. after removing stop words from the corpus, we did another experiment to analyze zipf’s law behavior. • zipf’s law behavior of abusive comments with stemming and stop word removing here, we applied the same approach in the offensive corpus. after applying the stemming and stop word removing techniques on the offensive corpus, we got word tokens with their frequencies. obtained results are shown in table v. some characters of words in the table are replaced by the “#” symbol since they are disgraceful words. table v top 10 offensive words with stemming and stop word removing. rank word frequency 1 ම ො#න 227 2 හු# 215 3 # 127 4 මේ# 106 5 කැ# 103 6 පු# 102 7 එක 99 8 අේම 63 9 උබ 58 10 එ 55 according to the results, we observed that applying stemming as pre-processing is also a better practice since it reduces the feature vector by replacing words by their root word. as done previously, we plotted the log value of offensive words frequency against the log value of word ranks. following fig. 2 shows the graph obtained. according to the graph, we can conclude that, even after applying pre-processing techniques, the corpus follows zipf’s law. d. sinhala abusive lexicons construction since there is no lexicon resource for sinhala abusive speech detection, we made two lexicons by following two approaches. in the following sub-sections, these approaches are discussed. identification of abusive sinhala comments in social media using text mining and machine learning techniques 6 international journal on advances in ict for emerging regions april 2020 fig. 2 zipf’s law behaviour of offensive corpus with stemming. 1) dictionary-based abusive lexicon creating: as there is no online dictionary for sinhala hate speeches, we used “google bad word list” [10] based on the user policies of them and also other resources that contain social media banned english words. therefore, we used an online sinhala english dictionary to map each banned word in english list to sinhala. fig. 3 dictionary-based lexicon building. though we sent 1703 english bad words to the online sinhala-english dictionary, it gave 1128 translated words. then that translated words were given to the annotators to get classified. among 1128 words, only 157 words were annotated as abusive. table vi shows the annotator’s agreement on each sinhala word. table vi annotator’s agreement on translated sinhala words word type count abusive words 157 neutral words 971 total translated words 1128 since dictionary-based lexicon does not cover all words that can be used in the context of sinhala abusive comments, we decided to consider corpus-based lexicon construction. 2) corpus-based abusive lexicon creating: in this approach, we used our 2000 comment corpus as the resource to identify words that are specific to sinhala abusive speeches. we separated the comment corpus into two corpora called abusive and neutral by considering their class. each corpus contains 1000 comments and using a seed word set taken from an online source, we searched their variations in the offensive corpus. fig. 4 shows the steps that were used to build the corpus-based lexicon. this seed word set contains base forms of sinhala abusive words. by adding suffixes to these seeds words, their variations were identified. fig. 4 corpus-based lexicon building. annotators’ agreement on each word is listed in table vii. then both dictionary-based and corpus-based lexicons were used to identify sinhala abusive speeches. table vii annotators’ agreement on propagated words seed words propagated words annotators’ agreement accuracy 64 279 277 99.28% e. text pre-processing. to reduce the dimensionality of the feature vector, we applied several famous pre-processing techniques, before the models are trained. all non-sinhala characters, emojis, and urls were removed as the basic step of pre-processing by using regular expressions. further, the following standard techniques were used. 1) stop words and stop word removing: stop words are words that have a little meaning but they are essential to maintain the structure and grammatical relationship among other words in a sentence. in general, stop words are known as the most common words in a given language. in natural language processing, these words are dropped to reduce the dimensionality of the feature vector. since the sense of these words affects the sentiment of a given sentence, before the classification, it is essential to decide whether these stop words should be removed or not. even though sinhala is a less resource language, in this study we have used a stop word list which is compiled and published by [11]. that list contains 425 words. but some words in that list are important as they are the cause for the negativity of a sentence as well as their presence in sinhala abusive comments. therefore those words were removed from the standard stop word list before the classification. the 7 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions removed words from that stop word list are listed in the following table viii. table viii removed stop words from the stop word list removed words from stop word list නැ මනොමේ බැහැ බෑ අම ොයි අයිමයෝ අේමමෝ ආමන් අප් ච්චිමේ අම ෝ ිහ ් ඕනෑ එ ො ඌයි ෂික් ෂො නැහැ නෑ 2) stemming: stemming is the process of reducing a word into its word root by stripping recognized prefixes and suffixes. it is very important to identify stems of words because it reduces the dimensionality of the feature vector by converting words into its relevant word stem. since sinhala is a less resource language in natural language processing, it does not have a stemmer such as porter’s stemmer for english. therefore, in this study, we used a shallow stemming method proposed [12] for sinhala. the main problem it has is, if the document does not contain the stem word itself, the algorithm is unable to find the stem of a particular word. since its programmatic implementation is not available publically, we implemented the concept and applied it to our corpus. table ix shows few stem roots that were found through our corpus. table ix stems and their variations stem words අන් වොදීන් අන් වොදීන්මේ, අන් වොදීන්ට අමන අමනයන්මේ, අමනයනි, අමනයො, අමනො මකල්ල මකල්ලක්, මකල්ලට, මකල්ලන්ට, මකල්ලයි identified stems were listed alphabetically and saved in a text file and it was used as a stemming dictionary in the process of feature extractions. the steps that we followed to apply stemming are shown in the following figure, fig. 5. fig. 5 stemming. since this algorithm is based on removing suffixes, we needed a standard suffixes list for sinhala. therefore, we used a standard list that contains 413 suffixes, published by [11]. some sample suffixes are listed in table x. table x sample set of sinhala suffixes ක ය ක් යක කක යකි කක් යක් කකට යකම න් f. feature extraction. in our study, we use machine learning algorithms to detect abusive sinhala comments and therefore identified features through the literature could be used to train these models. since sinhala abusive comment detection is a novel area, it is a challenging task to identify best fitting feature types. however, in this study, we considered four different feature types: bag of words (bow), word n-gram, character n-gram, and word skip-gram to train the models. 1) bag of words (bow) features: bow is the most famous [7] and the simplest feature that can be used in natural language processing. therefore, it has been used in many text mining related researches. in this approach, a text is represented as a bag of its words disregarding grammar and the word order. though it neglects the grammar and the word order, the frequency of each word’s occurrence is recorded with the word. 2) word n-gram features: word n-gram is a word model that captures the structure of sentences or corpus. though bow is good at feature extraction, it is not sufficient since natural languages do not contain just words but words with some structure. these ngram features can be a unigram, bigram, trigram or combination of these features. similar researches in other languages such as english [13] show that considering word n-gram features to identify abusive speeches makes good results. therefore we considered several groups of word ngram features and they are listed in table xi. table xi word n-gram feature groups group pre-processing feature set w01 rn+sr+st ug w02 rn+sr+st bg w03 rn+sr+st tg w04 rn+sr+st ug+bg w05 rn+sr+st ug+bg+tg ugunigram tgtrigram bgbigram ststemming srstop word removing rn – removing non-sinhala symbols the main difference between word unigram and bow is the word count that is considered in each approach. in bow, all the words are considered when the feature vector is constructed. but identification of abusive sinhala comments in social media using text mining and machine learning techniques 8 international journal on advances in ict for emerging regions april 2020 in word unigram, it does not consider all the words in the corpus to make the feature vector. 3) character n-gram features: most of the languages such as english and sinhala are composed of characters or letters, digits, punctuations, and spaces. in social media comments, where many words very often to be misspelled, character n-grams are especially powerful at detecting patterns in such things and substantially less sparse than previously introduced word n-gram features. though many kinds of research in abusive speech detection domain have used word n-grams for feature extraction, character n-grams were considered by less number of researches. a hate speech identification study [7] in the german language has used character 2-gram and 3-gram features to identify hate speeches with 0.62 and 0.65 accuracies respectively. in our study, we used up to 4-gram character levels separately and combined them to identify hate, neutral and offensive speeches. table xii shows the feature groups that were considered in our study. table xii character n-gram feature groups group pre-processing feature set c01 rn+sr+st 2g c02 rn+sr+st 3g c03 rn+sr+st 4g c04 rn+sr+st 2g+3g c05 rn+sr+st 2g+3g+4g 2gbigram 3gtrigram 4gfour-gram ststemming srstop word removing rn –removing nonsinhala symbols 4) word skip-gram features: these features are similar to word n-gram features and the difference is that skip-gram models extract features from a text by parsing some words from its current position. as an example, consider the sentence “i went to school in the morning”, if the features are taken with 1-skip gram then the features will be as this: “i to”, “went school”, “to in”, and “school the”, “in morning”. here we considered 1-skip gram and 2-skip grams for bigrams and these were combined with normal bigram features and with unigram features. table xiii shows the word skip-gram feature groups that were considered in our study. table xiii word skip-gram feature groups group pre-processing feature set s01 rn+sr+st 1sg s02 rn+sr+st 2sg s03 rn+sr+st 1sg+ug+bg s04 rn+sr+st 2sg+ug+bg s05 rn+sr+st 1sg+ug+bg+tg 1sgskip-1-gram 2sgskip-2-gram these features were extracted by using scikit learns count vectorizer and skip-gram features were obtained by customizing the count vectorizer. g. feature vectorization. the process of converting natural language text into numbers is called as vectorization in machine learning. machines are not able to identify natural languages as a human does and, also statistical methods that are used for classification of given texts require input data in the form of numeric. therefore feature vectorization plays a major role in natural language processing. in this study, we used two feature vectorization methods, scikit learns countvectorizer [14] and tfidftransformer to vectorize extracted features. 1) countvectorizer: it is an implementation of scikit learn’s machine learning package and it is being used in a lot of ongoing researches all over the world. it provides a simple way to tokenize a collection of text documents and build a vocabulary of known words. when new documents are needed to be classified using that vocabulary, these documents are encoded. 2) tfidfvectorizer: tfidfvectorizer is another implementation of feature vectorization. it consists of the term frequency-inverse document frequency (tf-idf) concept as the basis. typically tf-idf is composed of two terms and calculates a weight. these two terms can be explained as follows. • tf: term frequency, measures how frequently a term occurs in a document. since every document is in different lengths, it is possible that a term would appear much more times in a lengthy document than shorter ones. therefore, the term frequency is always divided by the document length (total number of words) for normalization tf(t,d) = frequency of term (t) in document (d) / total number of terms in (d) • idf: inverse document frequency, measures the importance of a term. tf considers all terms are in an equal manner. but it is known that some terms have less importance while classifying sentences. therefore, we need to weigh down the frequent terms while scaling up the rare ones, by computing the following: idf (t, d) = log_e (total number of documents (d) / number of documents with term (t) in it). the tf-idf score of a term (t) is calculated according to the following equation. tf-idf (t, d, d) = tf (t, d). idf (t, d) since we use scikit learn countvectorizer to vectorize features, we did not use tfidfvectorizer directly, but tfidftransformer, another implementation in scikit learn package is used to convert countvectorizer into tfidfvectorizer. h. classifiers and machine learning. machine learning is a necessary component of advanced text classification. the primary aim of machine learning is to allow computers to learn automatically without involving 9 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions humans.. in this study, we focused on supervised machine learning algorithms for the process of identifying offensive and neutral comments. therefore, we have used three different machine learning algorithms: support vector machine (svm), multinomial naïve bayes (mnb), and random forest tree as per the analysis we did in the literature review, most of the researches have shown best performance for these classifiers. in the following subsections, these algorithms are described separately. here we used the algorithm implementations of scikit learn. 1) naïve bayes (nb): naïve bayes classifiers are simple probabilistic classifiers based on the application of bayes theorem with strong independence assumptions between the features. in this study, multinomial naïve bayes (mnb) classification technique has been used. it considers word frequency information in the document for analysis, where a document is considered to be an ordered sequence of words obtained from the vocabulary. the main difference between multinomial nb and bernoulli nb is that bernoulli nb cares only the presence or absence of a particular feature (word) while multinomial nb considers the occurrence (frequency count) of the features (words). here we used the scikit learn’s implementation of mnb to classify a given sentence. 2) support vector machine (svm): support vector machine is a very famous classification method that is being used in the area of natural language processing. unlike the naïve bayes algorithm, svm is a non-probabilistic classifier algorithm. it is an efficient classification method when the feature vector is high dimensional. svm separates data points using a hyperplane with the largest amount of margin. it constructs a hyperplane in multidimensional space to separate different classes. one of the main advantages of svm is the robustness in general and effectiveness when the number of dimensions is greater than the number of samples. here we used the scikit learn’s svm implementation to achieve the classification goal. 3) random forest algorithm (rfdt): random forests also known as random decision forests are famous as an ensemble method that can be used to build classification models. it consists of several decision trees and based on the majority vote, a particular sentence or a document is classified. this algorithm differs from previous mnb and svm since this decides by considering the majority vote from several trees. since more trees are there, the random forest will not overfit the model and it is a reason for using the random forest in text mining researches as well as in hate speech detection. i. experiments we started our experiments with generated lexicons and later we used feature extraction methods with classifiers to identify abusive speeches in sinhala. therefore, we selected a new 200 comments set randomly, that are annotated by previous annotators. this comment set also balanced among abusive and neutral classes. 1) experiment 01:dictionary-based lexicon approach for abusive speech detection: despite the inability to find context-based opinion words in the dictionary, we made the first experiment for 200 annotated comments. therefore, we selected 100 neural comments and 100 abusive comments which were not used to construct the corpus-based lexicon. finally, we got 200 comments, 100 as neutral and 100 as abusive. we used the dictionary that we built to classify the given comments. that dictionary contains 157 abusive words, together all of them we considered as an abusive word dictionary. after that, we built an algorithm to check whether any comment consists of an offensive word or not. if a sentence contains an offensive word that is listed in the dictionary, the algorithm identifies it as an offensive that is abusive, otherwise as a neutral comment. in the first phase of the algorithm, we tokenized the sentences and sent them through the algorithm. 2) experiment 02: corpus-based lexicon approach for abusive speech detection: we did the same experiment here with the same algorithm by changing the lexicon to a corpus-based lexicon. 3) experiment 03:word n-gram features for abusive speech identification: here we used feature groups that are listed in table xi with three classifiers: mnb, svm, and rfdt. to extract features, we use the comment corpus (table i) which contains evenly distributed 2000 comments. before the word n-gram extraction, we applied previously described preprocessing techniques. then the extracted features were vectorized and weighted. thereafter, each classifier was trained separately using those weighted features. 4) experiment 04:character n-gram features for abusive speech identification: we trained each classifier by using character n-gram feature groups that are introduced in table xii. in this experiment, characters with word boundaries were considered. therefore countvectorizer’s “char_wb” controlling value was used. 5) experiment 05:word skip-gram features for abusive speech identification as the final experiment of our study, we used word skipgram features as listed in table xiii to train each classifier. since we do not have prior knowledge on best features that can be used to identify sinhala abusive speeches, we used five-word skip-gram feature groups here. in experiments 03, 04 and 05, features were extracted by using the comment corpus that is listed in table i. then extracted features were applied to three classifiers and they were trained. evenly distributed 200 test comment corpus was used to test every model that is trained through the experiments. results obtained through these experiments are discussed in the next section. identification of abusive sinhala comments in social media using text mining and machine learning techniques 10 international journal on advances in ict for emerging regions april 2020 iv. experimental results and evaluation as our research follows a quantitative approach, we can use statistical and mathematical techniques to evaluate what is done in the study. therefore, we have used precision, recall, and accuracy as performance evaluation measurements. since binary classification is considered, two confusion matrixes that are with two class labels are constructed in this study. the table xiv gives the structure of the confusion matrix and based on that matrix, precision, recall, and accuracy are obtained. here we considered the binary classification’s confusion matrix for describing the precision, recall, and accuracy since it is the standard way to demonstrate confusion. table xiv confusion matrix structure predicted class true class offensive neutral offensive true positive (tp) false negative (fn) neutral false positive (fp) true negative (tn) at each experiment, a unique confusion matrix is built and based on its true positive, false positive, false negative and true negative values, accuracy, precision, and recall are calculated. accuracy of a model is calculated according to the above equation and it is known as the measurement of the fraction of correct predictions. but it has some very common problems. major problem is that accuracy is not a good measure when the data set is unbalanced through the classes. but our data set is balanced through all two classes, accuracy can be used as a performance evaluation method. precision is the fraction of relevant instances among the retrieved instances and here precision for offensive classification is known as the fraction of actual offensives among predicted offensives. the recall is the fraction of the relevant instances that are successfully retrieved. since it does not contain true negatives, it is a good measurement in our study. it measures the predicted actual instances among all actual instances for a particular class label. therefore it can be taken as the main measurement here. f1-score is the harmonic mean of precision and recall. it ensures that there will be no overly rely on either precision or recall. therefore f1-score is considered as another performance measurement in this study. results that are obtained through every experiment is discussed below. a. experiment 01 results as experiment 01 described, we tested the dictionarybased lexicon with 200 evenly distributed comment corpus. results obtained through the experiment are listed in table xv and its confusion matrix is listed in table xvi. table xv performance measurements of the dictionary-based approach precision recall f1-score accuracy abusive 0.83 0.10 0.18 0.54 neutral 0.52 0.98 0.68 though the model has 0.54 accuracy, the recall of abusive is 0.10. as described previously, recall plays a major role among other performance measurements, and having high recall value makes trust in the model. therefore, with this less recall value of the offensive class, we cannot conclude that a cross-lingual dictionary is a good approach for sinhala abusive speech detection. table xvi confusion matrix of experiment01 predicted class true class offensive neutral offensive 10 90 neutral 2 98 translated words contain the definitions of words than their practical forms. that is why these measurements gave low values. therefore, we can conclude that abusive speech identification cannot be done efficiently by translating one language’s hate or offensive words into another language. b. experiment 02 results since experiment 01 is not efficient and sufficient in sinhala abusive speech identification, the constructed corpusbased lexicon was tested with the same 200 comment corpus. results obtained through the process are listed in following table xvii and the confusion matrix is listed in table xviii. table xvii performance measurements of the corpus-based approach precision recall f1-score accuracy abusive 0.98 0.83 0.90 0.905 neutral 0.85 0.98 0.91 the corpus-based lexicon gives 0.905 accuracies in the process of abusive speech identification and this accuracy is greater than the dictionary-based lexicon approach. therefore, it is clear that corpus-based lexicons are effective than the 11 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions dictionary-based translated lexicons in the process of sinhala hate speech detection. table xviii confusion matrix of experiment 02 predicted class true class offensive neutral offensive 83 17 neutral 2 98 though the corpus-based approach gave 90.5% accuracy, it is not sufficient in the process of abusive language detection, since the lexicon approaches depend on already classified words. once these approaches meet strange abusive words, they will not be able to identify them as abusive. therefore we need to consider machine learning approaches. c. experiment 03 results in this experiment, we trained two different classifiers with five different feature groups. therefore, we obtained ten models to test. since we have limited space here, we considered only one confusion matrix. performance measurements for each feature group with each classifier are listed in table xix. as table xix shows, word n-gram feature with mnb gives the highest accuracy for the w01 feature group which contains unigram features. also feature group w04 and w05 have shown better accuracies and recall values. though w02 and w03 contain more information than w01, they show low accuracies in sinhala hate speech detection. although w01 shows the highest accuracy for rfdt as well as mnb, w04 gave the highest accuracy for the svm classifier. therefore, it is clear that the same feature groups may not be suitable when classifiers are changed. among three classifiers, mnb has given the highest accuracy with the highest f1-score for the w01 feature group. table xix performance measurements of word n-gram features class feature groups w01 w02 w03 w04 w05 mnb classifier precision a 0.94 0.63 0.53 0.93 0.92 n 0.95 0.84 1.00 0.93 0.94 recall a 0.95 0.91 1.00 0.93 0.94 n 0.94 0.47 0.11 0.93 0.92 f1-score a 0.95 0.75 0.69 0.93 0.93 n 0.94 0.60 0.20 0.93 0.93 accuracy 0.945 0.69 0.555 0.93 0.93 svm classifier precision a 1.00 0.95 1.00 0.96 0.95 n 0.79 0.55 0.51 0.86 0.86 recall a 0.74 0.18 0.05 0.85 0.84 n 1.00 0.99 1.00 0.96 0.96 f1-score a 0.85 0.30 0.10 0.90 0.89 n 0.88 0.70 0.68 0.91 0.91 accuracy 0.87 0.585 0.525 0.905 0.90 rfdt classifier precision a 1.00 0.59 0.52 0.88 0.83 n 0.85 0.89 1.00 0.88 0.85 recall a 0.83 0.96 1.00 0.88 0.86 n 1.00 0.34 0.06 0.88 0.82 f1-score a 0.91 0.73 0.68 0.88 0.84 n 0.92 0.49 0.11 0.88 0.84 accuracy 0.915 0.65 0.53 0.88 0.84 a-abusive nneutral since all the confusion matrices cannot be listed, in table xx we presented mnb’s wo1 feature group confusion matrix just because it gave the highest accuracies among other feature groups and models. table xx confusion matrix of experiment 03’s mnb with w01 feature group predicted class true class offensive neutral offensive 95 5 neutral 6 94 d. experiment 04 results according to the performance measurements, we can conclude that all the feature groups that we considered in character n-gram are good at sinhala abusive comments identification process as it gives higher precision, recall, and f1-score values regardless of the model that we used to train. when we compare the precision, recall, f1-score, and accuracy values of the mnb, svm, rfdt classifiers from the table xxi, it is clear that the mnb has outperformed the other two models for the character n-gram model. table xxi performance measurements of character n-gram features class feature groups c01 c02 c03 c04 c05 mnb classifier precision a 0.97 0.99 0.98 0.99 0.98 n 0.95 0.93 0.95 0.94 0.94 recall a 0.95 0.93 0.95 0.94 0.94 n 0.97 0.99 0.98 0.99 0.98 f1-score a 0.96 0.96 0.96 0.96 0.96 n 0.96 0.96 0.97 0.97 0.96 accuracy 0.96 0.96 0.965 0.965 0.96 svm classifier precision a 0.99 0.98 0.99 0.96 0.95 n 0.76 0.92 0.88 0.95 0.97 recall a 0.69 0.92 0.86 0.95 0.97 n 0.99 0.98 0.99 0.96 0.95 f1-score a 0.81 0.95 0.92 0.95 0.96 n 0.86 0.95 0.93 0.96 0.96 accuracy 0.84 0.95 0.925 0.955 0.96 rfdt classifier precision a 0.94 0.99 0.98 0.98 0.99 n 0.89 0.90 0.88 0.91 0.89 recall a 0.88 0.89 0.87 0.90 0.88 n 0.94 0.99 0.98 0.98 0.99 f1-score a 0.91 0.94 0.92 0.94 0.93 n 0.91 0.94 0.93 0.94 0.94 accuracy 0.91 0.94 0.925 0.94 0.935 a-abusive nneutral further, all three models have given good performances for c03, c04 and c05 feature groups. the confusion matrix for mnb with character four-gram is listed in table xxii. identification of abusive sinhala comments in social media using text mining and machine learning techniques 12 international journal on advances in ict for emerging regions april 2020 table xxii confusion matrix of experiment 04’s mnb with co3 feature group predicted class true class offensive neutral offensive 95 5 neutral 2 98 e. experiment 05 results in this section, we discuss the results obtained through the models that were trained by word skip-gram features. as previously described in two subsections, here we also tested ten models with a test comment corpus which has 200 evenly distributed comments. table xxiii shows the performance measurements that were obtained. table xxiii performance measurements of word skip-gram features class feature groups s01 s02 s03 s04 s05 mnb classifier precision a 0.59 0.54 0.95 0.95 0.95 n 0.85 0.82 0.97 0.96 0.96 recall a 0.94 0.96 0.97 0.96 0.96 n 0.35 0.18 0.95 0.95 0.95 f1-score a 0.73 0.69 0.96 0.96 0.96 n 0.60 0.30 0.96 0.95 0.95 accuracy 0.645 0.57 0.96 0.955 0.955 svm classifier precision a 0.52 0.54 0.84 0.99 0.99 n 0.80 0.79 0.95 0.88 0.88 recall a 0.98 0.95 0.96 0.86 0.86 n 0.08 0.19 0.82 0.99 0.99 f1-score a 0.68 0.69 0.90 0.92 0.92 n 0.15 0.31 0.88 0.93 0.93 accuracy 0.53 0.57 0.89 0.925 0.925 rfdt classifier precision a 0.51 0.52 0.89 0.89 0.86 n 0.78 1.00 0.90 0.86 0.89 recall a 0.98 1.00 0.90 0.86 0.89 n 0.07 0.07 0.89 0.89 0.86 f1-score a 0.68 0.90 0.87 0.88 0.67 n 0.13 0.89 0.88 0.87 0.13 accuracy 0.525 0.535 0.895 0.875 0.875 a-abusive nneutral though all skip-gram feature groups show more than 50% accuracy, recall for neutral comments are less than 40% in both s01 and s02 feature groups. according to the results, feature groups s01 and s02 with mnb tend to classify neutral comments as abusive comments. from all measurements, it shows that feature group s03 is the best to identify offensive speeches in sinhala with multinomial naïve bayes classifier. by focusing on the s01 and s02 feature groups, we can conclude that s01 and s02 are not good to detect sinhala offensive speeches with svm since their recall measurement of neutral speech is less than 20%. these two models tend to classify neutral comments wrongly as offensive comments. both s04 and s05 feature groups show the best accuracy as 92.5% with the svm classifier. random forest classifier performs worst with s01 and s02 feature groups since they have low recall values for neutral speeches. the highest average f1-score and accuracy has obtained for the s03 feature group. therefore s03 ssssis the best word skip-gram feature that can be used to identify sinhala offensive speeches. from these experiments, it is clear that character fourgram (c03), feature group c04 and feature group c05 are the best features for sinhala abusive speech detection since it gives high accuracies and f1-scores for all classification models. therefore, we can conclude that these features can be taken in the process of sinhala abusive speech identification effectively though the classification models are changed. as the test results show, we can conclude that mnb performs well with each feature group than the other two classifiers. v. conclusion and future works in this study, we have identified the best features and classification models that can be used in sinhala abusive speech identification with machine learning. further, we built two new lexicons which contain sinhala abusive words. abusive speech detection was done by using these two lexicons and we observed that corpus-based lexicons are the best approaches in sinhala abusive speech detection process. as per the results we obtained through the experiments, character four-gram (c03), c04 and c05 features have outperformed all other feature types that were considered in this study. since many abusive speeches are published with spelling mistakes and substituted with similar characters, we can conclude that character n-gram features perform well in abusive sinhala speech detection. regardless of the feature extraction method, when we focus on the performance measurement values of table xix, table xxi, and table xxiii, it is clear that the rfdt classifier has the least accuracy values for the sinhala abusive comments detection. since we considered only supervised methods, it is an open area to apply unsupervised learning techniques to identify sinhala hate speeches. as the features of this research, we only considered macro features such as word n-grams and character n-grams. therefore, micro features such as patterns of the speeches using pos tags, presence and frequencies of punctuation marks and word counts in a sentence can be used to identify sinhala hate speeches. here we only considered abusive speeches which are published using sinhala unicode, because we did not have enough comments to consider singlish (sinhala words are written in english) speeches. therefore, singlish is also an open area to be considered in hate speech detection domain. working with these transliterated forms (singlish words) may be a challenging task as singlish comments contain both pure english and sinhala comments. it will make the stemming and other pre-processing techniques as well as feature vectorization techniques too complex. further, comparison between stop word removing and without removing stop words should also be investigated, since sinhala is a rich morphological language. s acknowledgment we would like to thank the people who support us to make research data set available. 13 h.m.s.t. sandaruwan#1, s.a.s. lorensuhewa#2, m.a.l. kalyani#3 april 2020 international journal on advances in ict for emerging regions references [1] medagoda, n. (2017). framework for sentiment classification for morphologically rich languages: a case study for sinhala. http://aut.researchgateway.ac.nz/handle/10292/10544 [2] “liking violence: a study of hate speech on facebook in sri lanka” [online] available: https://www.cpalanka.org/wpcontent/uploads/2014/09/hate-speech-final.pdf [3] gitari, n. d. et al. (2015) ‘a lexicon-based approach for hate speech detection’, international journal of multimedia and ubiquitous engineering, 10(4), pp. 215–230. doi: 10.14257/ijmue.2015.10.4.21. [4] “the hate directory” [online] available: http://www.hatedirectory.com/ [5] riloff, e. and wiebe, j. (2003) ‘learning extraction patterns for subjective expressions’, proceedings of the 2003 conference on empirical methods in natural language processing -, 10, pp. 105–112. doi: 10.3115/1119355.1119369. [6] cambria, e. et al. (2010) ‘senticnet : a publicly available semantic resource for opinion mining’, artificial intelligence, pp. 14–18. doi: 10.1038/leu.2012.122. [7] köffer, s. et al. (2018) ‘discussing the value of automatic hate speech detection in online debates’, multikonferenz wirtschaftsinformatik, (october), pp. 83–94. doi: 10.1111/j.13652923.2008.03277.x. [8] dias, d. s., welikala, m. d. and dias, n. g. j. (2019) ‘identifying racist social media comments in sinhala language using text analytics models with machine learning’, (september), pp. 1–6. doi: 10.1109/icter.2018.8615492. [9] gallege, s. (2004) ‘analysis of sinhala using natural language processing techniques’. [10] “google bad word list” [online] available: https://www.freewebheaders.com/full-list-of-bad-words-banned-bygoogle/ [11] language technology research laboratory [online] available: http://ltrl.ucsc.lk/ [12] welgama, v. (2011) ‘evaluation of a shallow stemming algorithm for sinhala’, language, (35), p. 2009. [13] malmasi, s. and zampieri, m. (2017) ‘detecting hate speech in social media’, pp. 467–472. doi: 10.26615/978-954-452-049-6_062. [14] countvectorizer documentation [online] available: https://scikitlearn.org/stable/modules/generated/sklearn.feature_extraction.text.co untvectorizer.html [15] central intelligence agency world fact book (https://www.cia.gov/library/publications/theworldfactbook/geos/ce.html) [16] mubarak, h., darwish, k. and magdy, w. (2017) ‘abusive language detection on arabic social media’, proceedings of the first workshop on abusive language online, pp. 52–56. available at: https://drive.google.com/open?id=0b4xdagbwzjjqzkrzltrzctq 0zke. [17] nandasara, s. t. (2015) ‘bridging the digital divide in sri lanka : some challenges and opportunities in using sinhala in ict bridging the digital divide in sri lanka : some challenges and opportunities in using sinhala in ict’, (may). doi: 10.4038/icter.v8i1.7162. [18] davidson, t. et al. (2017) ‘automated hate speech detection and the problem of offensive language’, (icwsm), pp. 512–515. doi: 10.1561/1500000001. [19] alfina, i. et al. (2018) ‘hate speech detection in the indonesian language: a dataset and preliminary study’, 2017 international conference on advanced computer science and information systems, icacsis 2017, 2018–janua(october), pp. 233–237. doi: 10.1109/icacsis.2017.8355039. http://ltrl.ucsc.lk/ ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2020 13 (2): june 2020 international journal on advances in ict for emerging regions invisible colour image watermarking technique for colour images using dwt and svd tharindu ketipearachchi#1, manjusri wickramasinghe#2 abstract— recent popularity of social media and content sharing networks paired with fast internet access has paved the way for mass multimedia content sharing on the internet. as a result, volumes of sensitive data are been shared in various digital forms such that the risk of copyright infringements and misappropriations of digital content has grown at the same rate. in order to preserve copyright or authenticate digital content, digital watermarking has been considered as a viable solution for these problems. in the domain of digital watermarking, embedding a colour image as an invisible watermark to a colour cover image remain a challenge. as a solution to this problem, this paper presents such an invisible watermarking scheme based on dwt and svd. the watermark created using the novel approach is experimentally shown to have the robustness, imperceptibility and capacity requirements. keywords— image watermarking, dwt, svd, colour image watermarking, invisible watermarking. i. introduction at present, most individuals on the internet are used to post images and videos of their day-to-day activities on social media and sharing platforms. as a result, volumes of sensitive data have been spread across the internet making such data vulnerable to unauthorized access. such unauthorized access paves the way for intellectual property violations which are aggravated by the accessibility of the internet. in recent years, unauthorized use, misappropriating, and misrepresentation [1] of digital data is on the rise with the recent technological advancements with copyright infringements and misappropriations are being considered the major threats. due to the vast amount of data being shared, difficulty in tracking such data copyright violations are a common sight in the internet. furthermore, as a result such infringements, thousands of cases are being filed at courts for proof of ownership. the ability to prove the ownership of an image or digital media is the key to claim of copyright violation. however, there is no accepted standard to prove the ownership of digital data if it was not filled in an intellectual property registry which costs a significant amount of money for the digital content developers. at present, the concept of digital watermarking has emerged as a popular method to prove ownership. when including a watermark into digital data there are a set of aspects that govern the quality of the embedding a watermark to a digital data. these aspects are that the embedded watermark should not cause damage to the actual data, be robust against possible attacks and should have a reasonable capacity. the algorithm explained in the paper represents an invisible colour image watermarking scheme with an informed (non-blind) detector based on discrete wavelet transformation (dwt) and singular value decomposition (svd) for colour images. the watermarks embedded by the algorithm is experimentally shown to satisfy the quality aspects mentioned above. the remainder of the paper is structured as follows. section ii provides the related work and the primary motivation for this research followed by the proposed method in section iii. section iv presents the experiments conducted and the evaluation and this paper concludes in section v. ii. motivation when considering the domain of digital watermarking all algorithms can be broadly categorised in two types as visible watermarking algorithms and invisible watermarking algorithms. a. visible image watermarking in early days, most of the research on digital watermarking concentrated on visible image watermarking techniques [2] – [3]. thus, this branch of watermarking has been studied extensively and is considered a solved problem with many algorithms meeting the necessary quality aspects of a watermark. however, the visible image watermarking is not a viable solution for the copyright infringement problem. since the watermark is visible it can be easily cropped or digitally process to obscure such a watermark. furthermore, such a watermark also harms the fidelity of the cover work as it deprives the user of its original experience. such shortcomings in visible image watermarking has paved the way for invisible image watermarking. b. invisible image watermarking the core principle behind the invisible image watermarking is that the embedding algorithm embeds an watermark to the cover image in invisible way such that it is undetectable to the user of such digital media [4]. this method is considered a better solution as it does not disturb the fidelity cover image. number of studies exist in the literature on implementing invisible watermarking scheme. accordingly such schemes can be categorized as belonging to four class as spatial domain techniques, frequency domain techniques, colour histogram and colour quantization [5]. among these classes, frequency domain techniques are the popular method and techniques such as discrete wavelet transform (dwt), discrete cosine transform (dct), discrete fourier transform (dft) and singular value decomposition (svd) are some of the mostly used frequency domain techniques [6]. invisible image watermarking can be further divided as schemes that embed grayscale images, strings and signals and schemes that embed colour images. 1) invisible watermarking of grayscale images, strings and signals: at nascent invisible watermarking schemes embeded text strings or numbers as a watermark for grayscale cover images. for example, swanson et al. [7] proposed an algorithm to embed pseudo-noise sequences as signature for a grayscale image. although, this algorithm has guaranteed the robustness of the watermark, its fidelity dropped when it comes to jpeg coded version of the image. subsequent cox tharindu ketipearachchi and manjusri wickramasinghe are from the university of colombo school of computing (ucsc), sri lanka. (katramesh91@gmail.com#1, mie@ucsc.cmb.ac.lk#2). manuscript received on 2nd nov. 2019. recommended on 23rd june, 2020. invisible colour image watermarking technique for colour images using dwt and svd 2 international journal on advances in ict for emerging regions june 2020 et al. [8] proposed a transparent watermarking scheme using frequency domain techniques fast fourier transformation (fft) and dct. the scheme is robust against many kinds of attacks such as image cropping and jpeg compression. according to the cox et al. only number sequence can be encoded using this algorithm. in their research they have encoded randomly generated number sequence to the colour image and recovered it successfully. furthermore, it is illustrated that the algorithm was able maintain the quality of the cover image after the watermarking process as well. this is comparatively better than the approach introduced in swanson’s et al. [9]. similarly, fleet and heeger employed sinusoidal signal embedding as a way to achieve invisible watermarking in colour images [10]. the techniques involved using s-cielab technique on the yellow and blue components of the cover image for signal encoding. the scheme is shown to be robust, efficient and the deviation between cover and watermarked images are reasonably low. however, this scheme is only supported for sinusoidal signal encoding. another blind watermarking scheme based on the dwt and the arnold transformation was proposed by dharwadkar and amberker [12]. this algorithm has facilitated an opportunity to embed vector image as a watermark instead of the signals and strings. although, this algorithm was illustrated to be resilient to different types attacks, the decoded watermark image is bit unclear. santi and thangavelu has also suggested an approach to embed vector image to a colour image [13]. they have proposed an algorithm in which the features of dwt, dct, svd techniques are combined. this solution is robust against various kind of attacks such as salt and pepper noise, gaussian noise, gaussian blur, cropping, colour contrast and compression attacks. similarly, watermarks embedded in high frequency components are resistant to image sharpening, histogram equalization, and resize. extracted watermark also with the reasonably better quality. but the proposed algorithm is not robust to the rotation attack. humming et al. [14] proposed another approach to embed grayscale vector image to a colour image [14]. the algorithm is an efficient watermarking embedding algorithm for colour image which combines the spectral characteristics of hvs and the characteristics of green component in a colour image. however, this proposed algorithm is less robust against rotating. when considering the deviation between the cover image and watermarked image, quality of the extracted image, is minimal. when we consider above researches, there are number solutions available for invisible image watermarking on grayscale images as well as text string and grayscale image embedding on colour images. hence, it can be construed to be a solved problem. 2) invisible watermarking of colour images: when considering the embedding of a colour image on to a colour image cover several solutions are found in literature. however, the challenge is to achieve all the basic watermarking requirements in single approach. imperceptibility, capacity and robustness are the major requirements of a good watermarking scheme. finding such a scheme in this category of invisible watermarking remains a challenge. chou and wu [15] has proposed an efficient algorithm for invisible colour image watermarking based on the image quantization approach for watermark encoding. the scheme is computationally simple and quite robust in face of various attacks such as cropping, low-pass filtering, whitenoise addition, scaling, and jpeg compression with high compression ratios. however, the algorithm is only tested for vector image and when raster colour images as watermarks are used there might be issues with extracted watermark. bas et al. [16] suggested an invisible colour image watermarking scheme using the hyper complex numbers representation and the quaternion fourier transform (qft). eventhough, this scheme was able to embed a watermark with minimum deviation of the cover image, they have not described much about the watermark extractions results. however, this scheme has evaluated for different colour image filtering process (jpeg, blur) and the fact that perceptive qft embedding can offer robustness to luminance filtering techniques. chan et al. [17] and others proposed another image quantization based watermarking scheme. this scheme used colour quantization and principal component analysis (pca). the image quality of the extracted watermark image is of acceptable quality. however, this algorithm is not properly tested against attacks. robustness of the algorithm yet to be assured. mohanty et al. [18] proposed a novel approach to invisible colour image watermarking using dct. but in here, they insert watermark image to part of the base image to make the actual watermark image (perceptually) which is very unclear. after the extraction process they were only able to obtain that perceptual image which is unclear. the size of the watermark image is very small. however, the robustness of the algorithm is assured. kaarna et al. [19] have proposed another scheme using ica, dwt and dct with unclear extracted watermark. agarwal and venugopalan [20] has suggested a scheme that divide both cover image and watermark image to its rgb components and embed respective components separately [20]. the algorithm is based on the well-known matrix factorization technique of the singular vector decomposition. the robustness of the algorithm was proven against various attacks. but the extracted watermark image is not sufficiently clear enough. chaitanya et al. [21] and others suggested a scheme using dwt and dct. they have used colour vector image as a watermark. performance with the raster images may be doubtful. since only blue component is used to embedding, size of the watermark image is small when compared to the cover image. in here they have used 1024×1024 cover image and 32×32 watermark image. pradhan [4] has suggested an approach for invisible image watermarking using dct, dwt and svd. although, in grayscale images this approach provides accurate results, the algorithm fails in coloured watermark images. it failed to extract similar image to the original watermark image from the watermarked image. vishnavi and subashini has proposed scheme using svd [22]. this algorithm can embed and extract the colour image perfectly. but in this they are only embedding blue component of the watermark image to the cover image. when considering the above solutions it can be concluded that embedding colour image to a colour image as an invisible watermark still remain an open problem. different researchers have achieved some of the basic watermarking requirements. but none of them were able to achieve all the basic requirements together in one solution. there is no perfect solution found yet. almost all of them are need to be improved. 3 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions iii. proposed method as per the background research, there are number of issues identified with respect to invisible colour image watermarking on a colour image cover. the issues identified on various schemes can be categorized into three aspects as attempting to solve the problem using single technique such as dct or incorrect combination of techniques, applying the techniques that were successful with grayscale images to the colour images in a similar manner, trying to embedded watermark to the entire image. by considering the pros and cons of technologies of each domain we have selected dwt and svd as our base techniques to build our novel watermarking scheme. frequency domain was chosen due to the higher compression ratio and good localization and dwt can preserve a higher fidelity and a higher capacity. however, as observed with other frequency domain techniques as well, dwt is less robust against attacks. in order to overcome this vulnerability, dwt was combined with svd which makes watermarks more robust against attacks. furthermore, svd does not compromise the basic strengths of the dwt techniques. to achieve the desired robustness against attacks, we augment the the resulting image matrix with the svd components without changing the original output form. 1) discrete wavelet transform (dwt) dwt is a neoteric technique consecutive used in digital image processing, compression, digital watermarking etc [6]. this is widely used very popular technique in digital watermarking domain since its more efficient than other similar techniques such as the dct. in dwt, image is dissolved into high and low frequency elements in two level discrete wavelet transform (dwt). the robustness with respect to various attacks increases when the watermark is embedded in low frequencies gained by the wd (wavelet decomposition). at first, the digital media is segmented into frames, then discrete wavelet transform is applied to luminance element of each frame which outcomes into discrete sub bands. subsequently these bands are dissolved into discrete components and the covariance matrix is computed for each such component. the watermarked luminance component of the frames is obtained by applying inverse discrete wavelet transform. ultimately watermarked digital media is gained by renewing the watermarked frame [6,23,24]. 2) svd(singular value decomposition): singular value decomposition is a numerical technique which is utilized to diagonalize matrices in numerical analysis [25]. in variety of applications singular value decomposition is used as an algorithm. in this singular value decomposition transformation, one matrix can be dissolved into three matrices. these matrices are of the equal size as the original matrix. by the linear algebra, an image is an array of nonnegative entries of scalar values that can be deduced as a matrix [23]. a. dwt/svd based approach for invisible image watermarking. the proposed watermarking scheme uses the 2dimensional dwt with ‘haar’ signal along with svd. the major concept of suggested approach is to segment both cover and watermark colour images to its red, blue, green components and apply the watermarking process separately for each component. subsequently, merge the watermarked components again to form the watermarked image. the dwt/svd major image watermarking algorithm explained in section iii.2(b) followed by the component based watermarking algorithm to embed watermark image to red, blue, green components in section iii.2(c). b. dwt/svd major image watermarking algorithm in this major watermarking algorithm both the cover image and the watermark image will be decomposed to red, blue, green components. as we illustrated in the figure 1, cover image is the image which we are going to insert the watermark. watermark image is the image which we are going to insert as the watermark. first, we decompose cover image into the components r, g, b components. br, bg, bb are the resulted red, green, blue components of the cover image respectively. by applying the same process to the watermark image, wr, wg, wb will output as the respective red, green, blue components of the watermark image. as illustrated in fig. 1, the decomposed components are then used as the inputs to the component based watermarking algorithm component based watermarking algorithm which will be explained in the next section iii.2(c). the inputs to the component-based algorithm are of three components. these are the corresponding rgb components (e.g. br, wr) of the cover and the watermark image and the watermark strength factor ( 0 ≤  ≤ 1). once the component-based algorithm is applied to the each components of the cover and watermark images, the algorithm will output the corresponding component of the watermarked image (e.g r’). this process is applied to all components of the images separately by providing a separate alpha value. c. dwt/svd component based watermarking algorithm as mentioned in the previous section component-based algorithm designed as suitable for 1-layer images (e.g single component of an image in case of rgb). the algorithm applies transformation techniques to the image components followed by the embedding of the watermark as detailed below. fig. 1 dwt/svd major image watermarking algorithm invisible colour image watermarking technique for colour images using dwt and svd 4 international journal on advances in ict for emerging regions june 2020 fig. 2 dwt/svd component based watermarking process as illustrated in fig. 2, initial step is to apply 2dimensional dwt to the cover image component using ‘haar’ dwt signal type. as a result, the approximate image (ll1) and other detail components of the image (hl1, lh1, hh1) are obtained. in the second step the approximate image component (ll1) is reapplied with the 2d dwt transformation that results in outputs ll2, hl2, lh2, hh2 components respectively. as the third step, svd is performed on the ll2 component and the watermark image component and obtain the three matrices resulting metrices [uy, sy, vy] and [uw, sw, vw] respectively. then the s matrices of both derivations are combined with the alpha value to obtain the new s component. the combination of the s components are obtained as: 𝑆𝑚𝑎𝑟𝑘 = 𝑆𝑦 + 𝛼 × 𝑆𝑤 where, 𝑆𝑚𝑎𝑟𝑘 : combined s component, 𝛼 : embedding strength ; { 0  𝛼  1 }, then we rebuild the approximate image component ll2 such that, 𝐿𝐿2𝑀 = 𝑈𝑦 × 𝑆𝑚𝑎𝑟𝑘 × 𝑉𝑦 ′ where, 𝑈𝑦 : u component of the final processed watermark output, 𝑉𝑦 ′ : transpose matrix of 𝑉𝑦 , 𝐿𝐿2𝑀 : new ll2 component. then we combine this 𝐿𝐿2𝑀 component with early derived hl2, lh2, hh2 components and apply the inverse 2d dwt process, which will derived another approximate image component (𝐿𝐿1𝑀 ). [𝐿𝐿2𝑀 , 𝐻𝐿2, 𝐿𝐻2, 𝐻𝐻2] 𝐼𝐷𝑊𝑇2 ℎ𝑎𝑎𝑟 → 𝐿𝐿1𝑀 then combine 𝐿𝐿1𝑀 component with hl1, lh1, hh1 and apply inverse 2d dwt process again. which will gives the watermarked image component (wm) as the result. [ 𝐿𝐿1𝑀 , 𝐻𝐿1, 𝐿𝐻1, 𝐻𝐻1] 𝐼𝐷𝑊𝑇2 ℎ𝑎𝑎𝑟 → 𝑊𝑀 d. dwt/svd major image watermark extraction algorithm. this algorithm is also based on red, green, blue component decomposition. a non-blind watermark extraction approach is followed in that some of the components of original watermark image is required for watermark the extraction process. however, the entire image is not required in the extraction process. the watermark extraction process is similar to the embedding process in which a major algorithm is followed by the component-based application. in this section we’ll be explaining the major watermark extraction algorithm. fig. 3 dwt/svd major image watermark extraction algorithm as illustrated in fig. 3, the first step is to decompose watermarked image to the red, blue, green components. then the relevant alpha value along with the [𝑈𝑤 , 𝑉𝑤 ] components of the watermarked image and apply the component-based watermark extraction process to each component separately. after providing required inputs, the component-based algorithm will generate the extracted watermark image component. by merging the three red, blue and green components extracted watermark image will be formed. e. dwt/svd component based watermark extraction algorithm. fig. 4 dwt/svd component-based watermark extraction process. 5 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions fig. 4 depicts the complete flow of the component-based watermark extraction algorithm. initially as the 2d dwt to the watermarked image component with ‘haar’ signal type is applied. this process will decompose image to approximate image and detail components [ll1wm, hl1wm, lh1wm, hh1wm]. then take the ll1wm component and reapply 2d dwt that results in another four components [ll2wm, hl2wm, lh2wm, hh2wm]. now we take the generated ll2wm component and apply svd to generate three independent matrices [uywm ,sywm ,vywm] as the output. using s component of this matrix and s component of the previously svd decomposed matrix of the cover image (sy) we’ll generate the s component of extracted image (swrec). 𝑆𝑤𝑟𝑒𝑐 = (𝑆𝑦𝑤 − 𝑆𝑦 ) / 𝛼 using this 𝑆𝑤𝑟𝑒𝑐 component and provided [uw, vw] components of the original watermark image, watermark image component will be generated. 𝐼𝑀𝐺𝑒𝑥 = 𝑈𝑤 × 𝑉𝑤 in here we have used svd theory and product of the 𝑈𝑤 , 𝑆𝑤𝑟𝑒𝑐 matrices, along with the transpose of the matrix 𝑉𝑤 extracted watermark component will be formed. iv. experiment and evaluation there are three main conflicting properties of watermarking algorithm which have been used to measure the quality of a good watermarking algorithm. in the experiment setup we attempt to prove that the proposed approach has achieved these conflicting requirements to an acceptable level. those conflicting properties of good watermark are imperceptibility, robustness and capacity. apart from the main requirements of the algorithm there is another additional requirement. that is watermark image and extracted watermark should look similar or recognizable. in other words, the deviation between watermark image and the extracted watermark also should stay minimum. separate experiments are designed to evaluate each property and requirements. the deviation between watermark and extracted watermark is also evaluated in each experiment for conflicting properties. we will be discussing the experiment as well as analyse its observations in this section. a. measurements 1) mean square error (mse) the mse measures the error with respect to the centre of the image values, i.e. the mean of the pixel values of the image, and by averaging the sum of squares of the error between the two images. mse = 1 𝑚𝑛 ∑ ∑ (𝑝1(𝑖, 𝑗) − 𝑝2(𝑖, 𝑗)) 2 𝑛−1 𝑖=0 𝑚−1 𝑖=0 where, p1 (m, n) and p2 (m, n) represent two images of size m × n. a lower value of mse signifies a lesser error in the reconstructed image [26]. 2) peak signal to noise ratio (psnr) the psnr estimates the quality of the reconstructed image in comparison to the original; it is a standard way of measuring image fidelity [27]. here ‘signal’ is the original image and ‘noise’ is the error in the modified image. psnr is a single number that reflects the quality of the reconstructed image and is measured in decibels (db). it is most easily defined using the mean squared error (mse) for two monochrome images p1 and p2 where one of the images is considered a noisy approximation of the other. the psnr is defined as: 𝑃𝑆𝑁𝑅 = 10 log10 ( 𝑀𝐴𝑋𝑝1 2 𝑀𝑆𝐸 ) = 20 log10 ( 𝑀𝐴𝑋𝑝1 2 √𝑀𝑆𝐸 ) here, maxp1 is the maximum pixel value of the image. when the pixels are represented using 8 bits per sample, this is 255 [26]. b. imperceptibility imperceptibility is also called as the fidelity. fidelity corresponds to the visibility of artefacts introduced into an image by the watermarking process. simply it is the “perceptual similarity” between watermarked and unwatermarked versions of an image. this is basically the difference between cover image and the watermarked image. after the watermark embedding process, still cover image should stay same as the original cover image. in order to evaluate the imperceptibility of the proposed algorithm, the difference between cover image and the watermarked image is computed once watermarking process has completed. the psnr and mse values between cover image and watermarked image is used to quantify the deviations between the images. 1) experiment in this experiment we evaluate the imperceptibility of the proposed method. in the same setup the alpha values of the three components are iteratively changed to obtain an optimized alpha values set for the suggested approach. fig. 5 depicts the high-level overview of the experiment. fig. 5 imperceptibility evaluation experiment design. in this experiment, the alpha value is increased by 0.1 for each execution of the experiment. initially, all possible invisible colour image watermarking technique for colour images using dwt and svd 6 international journal on advances in ict for emerging regions june 2020 alpha value combinations for the red, blue green alpha components are obtained. the for each alpha value combination, cover image and watermark image will be given as inputs and watermarking algorithm will be executed and psnr and mse values between cover image and the watermarked image will be computed and recorded. subsequently to the embedding of the watermark, the extraction of the watermark is also performed for all the combinations and the psnr and mse values between the watermark image and the extracted watermark image is obtained. 2) results analysis and evaluation following data have been observed from the experiment explained in section iv.b(1) above. an excerpt of the recorded data will be shown here. table i extracted part of psnr, mse values of observation according to different alpha value combinations. r alph a g alpha b alph a without attacks cover vs watermarked watermark vs extracted psnr mse psnr mse 0.1 0.1 0.1 37.40 4 11.823 28.686 88.008 0.1 0.1 0.2 34.06 6 25.494 28.686 88.008 0.1 0.1 0.3 31.31 1 48.085 28.686 88.008 0.1 0.1 0.4 29.12 1 79.607 28.686 88.008 0.1 0.1 0.5 27.34 3 119.89 28.686 88.008 0.1 0.1 0.6 168.82 28.686 88.008 0.1 0.1 0.7 24.58 4 226.3 28.686 88.008 0.1 0.1 0.8 23.47 8 291.94 28.686 88.008 0.1 0.1 0.9 22.49 8 365.84 28.686 88.008 0.1 0.1 1 21.62 1 447.72 28.686 88.008 0.1 0.2 0.1 34.35 5 23.854 28.686 88.008 0.1 0.2 0.2 32.38 8 37.525 28.686 88.008 0.1 0.2 0.3 30.34 1 60.116 28.686 88.008 0.1 0.2 0.4 28.51 91.638 28.686 88.008 0.1 0.2 0.5 26.92 8 131.93 28.686 88.008 0.1 0.2 0.6 25.55 8 180.85 28.686 88.008 0.1 0.2 0.7 24.35 9 238.33 28.686 88.008 0.1 0.2 0.8 23.30 2 303.97 28.686 88.008 0.1 0.2 0.9 22.35 7 377.87 28.686 88.008 0.1 0.2 1 21.50 6 459.75 28.686 88.008 0.1 0.3 0.1 31.72 1 43.749 28.686 88.008 0.1 0.3 0.2 30.54 57.42 28.686 88.008 0.1 0.3 0.3 29.09 9 80.011 28.686 88.008 0.1 0.3 0.4 27.65 7 111.53 28.686 88.008 0.1 0.3 0.5 26.31 7 151.82 28.686 88.008 0.1 0.3 0.6 25.10 4 200.75 28.686 88.008 0.1 0.3 0.7 24.01 1 258.23 28.686 88.008 0.1 0.3 0.8 23.02 7 323.87 28.686 88.008 0.1 0.3 0.9 22.13 5 397.77 28.686 88.008 0.1 0.3 1 21.32 2 479.65 28.686 88.008 table 1 has shown the extracted part of the data set which we had received the above described experiment. one can notice here that the psnr and mse values has been changed between cover image and watermarked image. but the psnr and mse values between watermark image and extracted watermark remains same. it didn’t change according to the alpha value combinations. as we have explained in the section iii, the following formula was used to embed s component of watermark to s component of the cover image. 𝑆𝑚𝑎𝑟𝑘 = 𝑆𝑦 + 𝛼 × 𝑆𝑤 we have rebuilt the s component as per the above formula. for that we have used alpha value. we used 0 – 1 value as an alpha value. because above formula will be applied to the points of the image matrix of each red, blue, green component. here each point represents the pixel value. pixel value takes 0.0 to 255.0 value. when the value is closes to 0 it will get further darker and when the value closes to the 255.0, it will get further bright. according to the above formula we are incrementing the value of the pixel. that means after the watermarking process, our cover image gets further brighter. but if it gets too much brighter that’ll be easily noticeable to the human eye. therefore, in order to minimize the brightness deviation as result of the embedding process alpha is chosen. so, a value between 0 and 1 is used as an alpha value, because pixel value cannot take a higher value than 255. so, we multiply the pixel value of watermark component by decimal value and try to keep the value of rebuild component less than 255. since we have applied svd to the image component and get the one matrix of the output there will not be any values close to the 255, because svd has decomposed single image matrix to product of 3 matrices. when we take 0.1 as an alpha value, it can be observed that the value of each pixel changes between 1.0 to 2.0 intensity levels. that means watermarked image get brighter but it’s not a noticeable amount. then we used 1.0 as the alpha value, you can see that the value of the pixel is changed 15.0 to 16.0 intensity level which is a perceptible 7 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions change. values of alpha greater than 1.0 results in a perceptible change in the intensity of the pixel which is easily observed. the figs 6, 7, 8 and 9 has depicts the original cover image watermarked image 0.1, 1.0 and 2.0 alpha respectively. here it’s obvious that original cover image and 0.1 alpha watermarked image looked same. there is no perceptually noticeable difference between images. but when it comes to 1.0 alpha, image get much brighter and difference is noticeable. but when it comes to 2.0 alpha image has lost his identity as well. due to the reasons stated above, the alpha value range between 0 and 1.0 was considered with 0.1 increments. the following figure 10 has shown the original image which we used as a watermark. figure 11, 12, 13 showed the extracted watermark image for watermark embedding using 0.1, 1.0, 2.0 alpha values respectively. here you can see the extracted watermark look same for every value. that doesn’t simply change according to the alpha value. that’s because the following formula was utilized to extract the s component. 𝑆𝑤𝑟𝑒𝑐 = (𝑆𝑦𝑤 − 𝑆𝑦 ) / 𝛼 simply alpha value is eliminated from the total calculation process. hence, the extracted image is independent of alpha values unless pixel values has been changed by an attack or other kind of processing. now we move on to the recorded data by the algorithm. the image with the minimum mse and maximum psnr is said to be the most successful image. because minimum mse and maximum psnr means the minimum deviation of the image from the original image. by looking at the maximum psnr we could find the optimal combinations of the alpha values. table ii has shown the extracted part of the recorded data. it shows that maximum psnr is 37.404 and minimum mse is 11.823. both are given in to the same alpha value combination which is 0.1 alpha for all components. if psnr is higher mse will get lower because psnr is directly calculated using mse. according to the formula which we have discussed earlier. because of that considering one variable of them is enough for the analysing process. table ii optimized alpha value combination for watermarking without attacks r alpha g alpha b alpha without attacks cover vs watermarked watermark vs extracted psnr mse psnr mse 0.1 0.1 0.1 37.404 11.823 28.686 88.008 0.1 0.1 0.2 34.066 25.494 28.686 88.008 0.1 0.1 0.3 31.311 48.085 28.686 88.008 0.1 0.1 0.4 29.121 79.607 28.686 88.008 0.1 0.1 0.5 27.343 119.89 28.686 88.008 0.1 0.1 0.6 25.856 168.82 28.686 88.008 0.1 0.1 0.7 24.584 226.3 28.686 88.008 0.1 0.1 0.8 23.478 291.94 28.686 88.008 0.1 0.1 0.9 22.498 365.84 28.686 88.008 0.1 0.1 1 21.621 447.72 28.686 88.008 0.1 0.2 0.1 34.355 23.854 28.686 88.008 fig. 6 original cover image. fig. 7 watermarked image using 0.1 alpha value. fig. 8 watermarked image using 1.0 alpha value. fig. 9 watermarked image for 2.0 alpha value. fig. 13 extracted watermark image for 2.0 alpha value. fig. 12 extracted watermark image for 1.0 alpha value. fig. 10 original watermark image. fig. 11 extracted watermark image for 0.1 alpha embedding. invisible colour image watermarking technique for colour images using dwt and svd 8 international journal on advances in ict for emerging regions june 2020 fig. 14 depicts the results of the watermarking process of 0.1, 0.1, 0.1 alpha value combination without attacks. fig. 14 watermarking resulted images for 0.1, 0.1, 0.1 alpha value combination c. robustness robustness basically means the stability against different kind of attacks. there are various kinds of attacks being executed against images. a watermarking algorithm should be implemented in a way that simply attackers cannot remove the watermark by executing attacks against watermarked image. in order to analyse the robustness of suggested approach against different kind of attacks we have designed following experiment. 1) experiment fig. 15 depicts the experimental design. this experiment also designed with the same way as the experiment we had discussed in imperceptibility section. small modifications have been done to the initial experiment design. initially we change the alpha values and do this experiment for different kind of alpha value combinations which is same as the experiment in previous section. then after the watermarking process is complete attacks will be performed on the watermarked images and the psnr and mse measures between cover image and the watermarked image. seven (07) types of attacks which is most common and effective in watermarking domain are executed to the watermarked image. those are, • gaussian noise attack. • salt and pepper attack. • median filter attack. • jpeg compression attack. • mean filter attack. • butterworth high pass filter attack. • butterworth low pass filter attack. for each attack, optimized alpha value combination will be captured. finally, we will be come up with the generalized alpha value combinations which is suitable for all the attacks and attack-less situations. 2) results analysis and evaluation psnr and mse values between cover and attacked images and psnr, mse values between watermark and extracted watermark images have been collected for the seven attacks. data has been recorded for 1000 alpha value combinations. then recorded data has been analysed in a stepwise manner as explained below. as the first step we looked at the deviation between attacked image and the watermarked image. table iii has provided the summary of maximum psnr and minimum mse values for each of seven attacks. table iii summary of minimum image deviation between cover image and attacked image for all attack types. type of the attack max psnr min mse r alpha g alpha b alpha without attacks 37.404 11.823 0.1 0.1 0.1 gaussian noise 11.823 598.15 0.1 0.1 0.1 salt & pepper 17.64 1119.5 0.1 0.1 0.1 median filter 35.242 19.45 0.1 0.1 0.1 mean filter 37.404 11.823 0.1 0.1 0.1 jpeg compression 29.652 70.449 0.1 0.1 0.1 butterworth high-pass 13.323 3025.5 0.1 0.1 0.1 butterworth low-pass 10 6501.8 0.1 0.1 0.1 here we have concluded that minimum image deviation is received for 0.1, 0.1, 0.1 alpha values for every attack. in previous experiment we have concluded that this is the optimal alpha values combination for the image watermarking without any attacks. here what we have done was repeat the same process of the previous experiment and then execute an attack to the watermarked image, then measure the psnr and mse between attacked image and the cover image. according to fig. 15 robustness evaluation experimental design. 9 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions above results, it is evident that every attack has deviated the image by some constant amount from the watermarked image. because of that minimum deviated image in the watermarking process has the minimum deviation against the attack as well. hence, we could not observe a deviation between cover image and the attack image as the factor to measure the robustness of our algorithm. we have few reasons for that. first thing is robustness of the watermarked image against attacks is not only depend on the watermarking algorithm. there may other reasons for that as well. structure of the original image, quality and nature of the original image and lot of factors of the original image has been affected to that. there are two main objectives of our algorithm, first one is to minimize the deviation between cover and the watermarked image and second one is to minimize the deviation between watermark and the extracted watermark images. in order to fulfil the objective of an efficient watermarking scheme one can remove the deviation between cover and attacked images from consideration. the primary focus should be on the deviation between watermark image and the extracted image. the robustness of the watermark against attacks means the embedded watermark image would stay same on any kind of attack. keep extracted watermark image’s deviation minimum against the attacks is the accurate factor to measure the robustness of the watermarking algorithm. here we have found the optimal alpha value combination which made extracted watermark image’s deviation minimum against each attack. table iv optimized alpha value combination for each type of attack. attack max psnr min mse r alpha g alpha b alpha without attack 28.686 88.008 0.1 0.1 0.1 gaussian 15.237 1947.2 0.7 0.6 0.8 salt & pepper 15.382 1883.1 0.7 0.9 0.9 median filter 19.683 699.42 0.1 0.3 0.3 mean filter 17.766 1087.5 0.3 0.4 0.6 jpeg compression 18.082 1011.4 0.2 0.4 0.3 butterworth high-pass 6.7039 13890 1 1 1 butterworth low-pass 5.7119 17454 1 1 1 table iv has shown the optimal alpha value combination for each attack. here we could see that strength of the attacks has been increased from top to bottom the table. psnr value get decreased according to the strength of the attack. mean time we can observe that the respective alpha value combinations also get increased gradually. fig. 16 illustrates the mean of the alpha value combinations against the psnr value. fig. 16 mean of alpha values variation between psnr. psnr values has opposite relationship with strength of the attack. lower the psnr means attack is much stronger. as the graph of the fig. 16 has depicted optimum alpha value has increased according to the increment of the strength of an attack. we can conclude that when alpha value get higher watermark would be more robust against attack. conversly, on the other hand when alpha values get increased image deviation between cover and the watermarked image get increased which means image get much brighter. hence a balance between the alpha values and perceptibility is required. fig. 17 red, blue, green alpha values variation with psnr fig. 17 has shown the red, green, blue alpha values variation with the psnr value. red, blue and green values represented from red, blue and green bars respectively. above figure illustrated that most of the times the alpha value of the blue component is of a higher value than other components. this observation also agreed with the human visual system (hvs) proof that blue component of the image catch the lesser attention of the image. which means changes of the blue component may lesser noticeable to human eye than the green and red components. fig. 18 to fig 24 has depicted the inputs and outputs of the watermarking process for the optimal alpha value combination of each attack. it shows that our watermarking approach has given reasonable robustness against most of the attacks. damage has been done to the watermark image by the attacks is reasonably low hence we can extract the watermark image with a good quality. however, when it comes to 0.1 0.2333333 33 0.3 0.4333333 33 0.8333333 33 0.7 1 1 0 0.2 0.4 0.6 0.8 1 1.2 m e a n o f a lp h a v a lu e s psnr 0 0.2 0.4 0.6 0.8 1 1.2 a lp h a v a lu e psnr invisible colour image watermarking technique for colour images using dwt and svd 10 international journal on advances in ict for emerging regions june 2020 butterworth attacks watermark image has destroyed considerable amount. extracted watermark image has significant damage and the extracted image is barely recognizable. although the watermark image is significantly distorted, the butterworth attack has done the same kind of damage to the watermarked image as well. hence, we can still suggest the proposed approach as an acceptable. fig. 18 resulting images using optimal (0.7,0.6,0.8) alpha values in gaussian noise attack. fig. 19 resulting images using optimal (0.7,0.9,0.9) alpha values in salt & pepper attack. fig. 20 resulted images using optimal (0.1,0.3,0.3) alpha values in median filter attack. fig. 21 resulted images using optimal (0.3,0.4,0.6) alpha values in mean filter attack. fig. 22 resulted images using optimal (0.2,0.4,0.3) alpha values in jpeg compression attack. fig. 23 resulted images using optimal (1.0,1.0,1.0) alpha values in butterworth high-pass filter attack. fig. 24 resulted images using optimal (1.0,1.0,1.0) alpha values in butterworth low-pass filter attack. 11 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions here we have come up with the specific alpha value combination for each type of attack. but we require a more generalized solution which is suitable for any kind of the attack as well as the attack-less situations. on without attack situations image deviation between original watermark and extracted watermark did not changed according to the alpha value combinations. deviation between cover image and watermarked image only differs according to alpha values. hence, for the calculation of generalized value for all situation we have considered deviation between cover and watermarked image of without attacks situations and deviation between watermark and extracted watermark for attack situations. we take the psnr value variation of each above mentioned situations and analysed them in order to find a generalized alpha value combination. here onward we have explained how to calculate the generalized alpha values set. we have selected following data fields to calculate the generalized alpha values set. • psnr between cover vs watermarked image in without any attack. • psnr between watermark vs extracted watermark image against gaussian noise attack. • psnr between watermark vs extracted watermark image against salt & pepper attack. • psnr between watermark vs extracted watermark image against median filter attack. • psnr between watermark vs extracted watermark image against mean filter attack. • psnr between watermark vs extracted watermark image against jpeg compression attack. • psnr between watermark vs extracted watermark image against butterworth high pass filter attack. • psnr between watermark vs extracted watermark image butterworth low pass filter attack. as the first step we calculated mean and the standard deviation of each row which had taken to the calculations. the table v shown the calculated results. table v calculated mean and standard deviation for each column. attack type mean standard deviation without attacks 22.707257 2.728362569 gaussian noise attack 13.8435026 1.340501433 salt and pepper attack 13.2439813 1.722956471 median filter attack 18.041697 0.75662693 mean filter attack 16.885489 0.528859425 jpeg compression attack 17.272446 0.427591227 butterworth high-pass filter attack 5.5107111 0.477158847 butterworth low-pass filter attack 4.883672 0.263098637 then we standardized each value in these data set row by row. by using following formula. 𝑧 = 𝑥− 𝜇 𝜎 μ = mean value of each column, σ = standard deviation for each column, where z given the standardized value. then we calculated new columns with standardized values. table vi and vii depicts the part of standardized data table. table vi standardized values each column. r alpha g alpha b alpha without attacks gaussian noise attack salt and pepper attack median filter attack 0.1 0.1 0.1 5.3866532 1 3.400445 9 3.096527 1.3828519 2 0.1 0.1 0.2 4.1632087 8 2.691905 1 2.627507 6 1.7886001 0.1 0.1 0.3 3.1534456 2 2.607608 3 2.437717 6 1.8242847 9 table vii standardized values each column. r alpha g alpha b alpha jpeg compression attack mean filter attack butterworth high-pass filter attack 0.1 0.1 0.1 -0.9341773 3.3061508 -2.1047731 0.1 0.1 0.2 -0.6582128 2.2170145 -2.0574094 0.1 0.1 0.3 -0.6137778 1.9882202 -1.9769331 then we considered above standardized values set and generated mean value and standard deviation for each column. then we divided mean by the standard deviation for each row and took the maximum value as the optimal alpha combination. following of the part of generated data for the above calculation. table viii mean / standard deviation for optimal alpha value combination. r alpha g alpha b alpha mean sd mean / sd 0.6 0.7 1 14.2012875 0.91073721 15.593178 0.6 0.8 0.1 13.4593375 0.5446826 24.7104232 0.6 0.8 0.2 14.0753 0.17407591 80.8572524 0.6 0.8 0.3 14.25675 0.3157713 45.1489728 as per the above table viii following is the generalized alpha value combination which is suited for every kind of situations. table viii optimal alpha values combination for every situations. red alpha 0.6 green alpha 0.8 blue alpha 0.2 fig. 25 to 32 illustrates the generated image results by applying watermarking and watermark extraction algorithm for above optimal alpha value combination. invisible colour image watermarking technique for colour images using dwt and svd 12 international journal on advances in ict for emerging regions june 2020 fig. 25 resulted images using generalized alpha values for without any attack executing. fig. 26 resulted images using generalized alpha values for gaussian noise attack. fig. 27 resulted images using generalized alpha values for median filter attack. fig. 28 resulted images using generalized alpha values for mean filter attack. fig. 29 resulted images using generalized alpha values for salt & pepper attack. fig. 30 resulted images using generalized alpha values for jpeg compression attack. 13 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions fig. 31 resulted images using generalized alpha values for butterworth high-pass filter attack. fig. 32 resulted images using generalized alpha values for butterworth low-pass filter attack. table ix has summarized the prnr and mse measures for each attack type for generalized alpha values combination. table ix psnr, mse values for each attack type using generalized alpha values combination. attack type cover vs watermarked watermarked vs extracted psnr mse psnr mse without attacks 22.714 348.1 28.686 88.008 gaussian noise attack 18.487 921.32 14.141 2506.2 salt and pepper attack 16.538 1443.1 13.345 3010.1 median filter attack 22.653 352.97 17.773 1086 mean filter attack 22.714 348.1 16.927 1319.5 jpeg compression attack 22.077 403.11 17.305 1209.4 butterworth highpass filter attack 13.421 2958.1 5.4586 18502 butterworth lowpass filter attack 10.317 6044.6 4.9388 20855 to get an idea about how much results have been deviated from their individual optimal values when using generalized alpha values combination, data for both situations needs to be analysed. the summery of data for both situations are presented in table x. table x psnr, mse values comparison for both generalized alpha values combination and individual optimal alpha values combination. attack type generalized alpha values combination individual optimal alpha values combination cover vs watermarked watermarked vs extracted cover vs watermarked watermarked vs extracted psnr mse psnr mse psnr mse psnr mse without attacks 22.714 348.1 28.686 88.008 37.404 11.823 28.686 88.008 gaussian noise attack 18.487 921.32 14.141 2506.2 17.648 1117.6 15.237 1947.2 salt and pepper attack 16.538 1443.1 13.345 3010.1 15.499 1833 15.382 1883.1 median filter attack 22.653 352.97 17.773 1086 28.78 86.118 19.683 699.42 mean filter attack 22.714 348.1 16.927 1319.5 24.156 249.75 17.766 1087.5 jpeg compression attack 22.077 403.11 17.305 1209.4 25.735 173.61 18.082 1011.4 butterworth high-pass filter attack 13.421 2958.1 5.4586 18502 13.323 3025.6 6.7039 13890 butterworth low-pass filter attack 10.317 6044.6 4.9388 20855 10.742 5480.9 5.7119 17454 fig. 33 illustrates the psnr variation of both occasions in cover image and the watermarked image. the blue bars of the graph depicted the psnr values for generalized alpha values combination and orange bars of the graph showed the psnr values for individual optimal alpha values combination for each attack. fig. 33 difference between psnr values of cover vs watermarked images in both occasions. according to fig. 33 it could be concluded that for the higher psnr values the difference is noticeable. but for the lesser psnr values very small difference is there. we have also drafted the psnr values variation of original watermark image and the extracted watermark image. following figure 34 showed that variation. in that figure also blue bars depicted the psnr values of the generalized alpha values set and orange bars showed the psnr of the individual optimal alpha 0 5 10 15 20 25 30 35 40 p s n r attack type invisible colour image watermarking technique for colour images using dwt and svd 14 international journal on advances in ict for emerging regions june 2020 values combination. psnr values has changed from very small amounts in every attack type. we could see that when we take the watermark image and the extracted watermark difference into the consideration generalized alpha values set make much more sense, because difference of the psnr are very low as we can remove that difference from our consideration. fig. 34 difference between psnr values of watermark and extracted watermark images in both occasions. table xi has summarized the difference of psnr values of cover vs watermarked and watermark versus extracted images in both occasions. here we could see that cover versus watermarked image deviation sometimes higher in the individual optimal value scenario than the generalized alpha values combination. the reason for that is we have only considered deviation between original image and the extracted watermark image in the attack occurring instances. table xi psnr values difference in generalized alpha values and individual optimal alpha values for each attack type. attack type difference of psnr values cover vs watermarked watermark vs extracted without attacks 14.69 0 gaussian noise attack -0.839 1.096 salt and pepper attack -1.039 2.037 median filter attack 6.127 1.91 mean filter attack 1.442 0.839 jpeg compression attack 3.658 0.777 butterworth high-pass filter attack -0.098 1.2453 butterworth low-pass filter attack 0.425 0.7731 fig. 35 represents the psnr value difference between generalized alpha values combination and the individual optimal alpha values combinations. the blue bars showed the deviation between cover image and watermarked image, orange bars showed the deviation between original watermark image and the extracted watermark image. fig. 35 difference of psnr values in generalized alpha values combination and individual optimal values combination. in this figure we could see that the difference between cover versus watermarked image is significant. however, the difference between watermark image and the extracted watermark image is not much significance. hence, it can be concluded that generalized alpha value combination also gives reasonably good results. d. capacity capacity is also one of the main properties watermarking algorithm. a watermarking algorithm should provide a facility to embed a reasonable sized image as a watermark. embedding a watermark image which is larger than the cover image may not possible. but at least algorithm should provide an opportunity to embed a 50% 75% sized image as the cover image. in this section we have described an experiment to test the capacity of the suggested approach and discussion and evaluation about the results. 1) experiment fig. 36 experiment design for capacity evaluation. 0 5 10 15 20 25 30 35 p s n r attack type -2 0 2 4 6 8 10 12 14 16 d if fe re n ce o f p s n r attack type 15 tharindu ketipearachchi#1, manjusri wickramasinghe#2 june 2020 international journal on advances in ict for emerging regions fig. 36 has shown the design of the suggested experiment. here we have used peppers.png (96 × 128) of the matlab toolbox as a cover image. then we have selected different sized matlab toolbox images with a 0.1, 0.1,0.1 alpha value combination which is the optimal alpha value combination for the without attack situations. we have selected one image per time and execute the watermarking and watermark extraction process separately. psnr values of the watermarked image and the extracted image will be recorded. 2) results analysis and evaluation in this section we have presented and evaluate the data that we have captured in the above experiment. table xii has depicted the captured data of the experiment. table xii captured data from capacity experiment. image name cover image watermark image ratio (%) cover vs watermarked watermark vs extracted wid. hei. wid. hei. psnr mse psnr mse onion.png 512 384 198 135 13.59 39.7 5 6.88 37.38 11.884 hestain.p ng 512 384 303 227 34.98 35.2 6 19.3 4 30.81 53.931 football.j pg 512 384 320 256 41.66 41.6 1 4.48 29.42 74.161 gantrycr -ane.png 512 384 400 264 53.71 38.9 2 8.32 22.56 359.98 westconc o rdaerial.p ng 512 384 369 394 73.94 37.1 6 12.4 7 25.38 188.28 greens.jp g 512 384 500 300 76.29 42.3 9 3.74 20.27 609.96 pillsetc.p ng 512 384 512 384 100 40.8 8 5.30 25.19 196.44 peppers.p ng 512 384 512 384 100 40.6 4 5.60 31.00 51.569 tape.png 512 384 512 384 100 42.2 3 3.88 29.85 67.294 fabric.png 512 384 640 480 156.2 5 40.4 9 5.79 21.68 441.13 pears.png 512 384 732 486 180.9 4 37.8 0 10.7 7 31.73 43.578 tissue.pn g 512 384 800 506 205.8 9 34.8 9 21.0 8 17.00 1296.7 saturn.pn g 512 384 903 600 275.5 7 42.3 0 3.82 26.25 154.1 saturn.pn g 512 384 120 0 1500 915.5 3 43.5 2 2.88 34.61 22.466 concorda -erial.png 512 384 306 0 2036 3168. 8 38.9 2 8.32 21.19 493.38 we have drafted psnr change against capacity of the watermark image in fig 37. the blue colour graph showed the deviation between cover image and watermarked image and orange one showed the deviation between watermark image and the extracted watermark. here we couldn’t see much deviation according to the ratio. but we can assure that 0% 200% ration range have achieved the better results. suggested approach has been able to achieve the capacity more than 100%. that is a significance of that approach where there wasn’t any watermarking approach achieve this capacity level. v. conclusion a. conclusion in this research we introduced a watermarking algorithm for colour images. our aim was to build a watermarking algorithm which can achieved the main conflicting properties of good watermarking algorithm. those are imperceptibility, capacity and robustness. simply the imperceptibility means the embedding process should not deviate the original image in a noticeable amount. fig. 37 psnr of cover vs watermarked and watermark vs extracted variation with the capacity of watermark image. both the deviation between cover and watermarked image and the deviation between watermark image and the extracted watermark image can be put under imperceptibility. this is the main visible requirement of this algorithm. other requirements such as capacity and robustness are not directly visible to the human eye. there were the solutions which achieved one or two among these requirements, but none of those were able to achieve all three requirements together in a single solution. hence, aim of this research was to come up with a solution which can achieve all three requirements together. according to the evaluation which we have done, for the alpha value set 0.6, 0.8, 0.2 for red, blue, green respectively gives the higher robustness for all kind of attacks and attack-less situation. this value set was able to achieve all the three basic watermarking requirements together. we have achieved deviation between cover and watermarked image 22.714 to 10.317 psnr value for every kind of attack. table ix showed the full details of psnr and mse values of generalized alpha value combination. which means our algorithms was able to achieve both imperceptibility and robustness requirements together up to considerable amount. according to the fig. 37 we can embed the watermark image up to 200% size which compared to the size of the cover image. this is a significance achievement because we have passed the 100% margin as well. which means we can use the double sized watermark image with respect to the size of the cover image. we can conclude that suggested algorithm was able to achieve all the three basic requirements together. which is a significance achievement. that make this algorithm more efficient than all other existing algorithms. when we use optimized alpha value combination for each type of attack as shown as the table iv. we can use this algorithm against specific type of attacks as well. those optimized alpha value combination gives better results than the generalized alpha value combination. b. limitations and future work however, there are few limitations of this approach. the configurations of the cover image and the watermark image should be alike when considering the watermarking process. configurations such as byte rate should be equal of both images. in our experiments, most of the times we used 8bit image. there is a possibility to improve this algorithm for working as images with a different byte rate as well. for that, we need to convert the byte rate of the watermark image programmatically to match with the byte rate of the 0 10 20 30 40 50 1 3 .5 9 6 3 4 .9 8 4 4 1 .6 6 7 5 3 .7 1 1 7 3 .9 4 7 7 6 .2 9 4 1 0 0 1 0 0 1 0 0 1 5 6 .2 5 1 8 0 .9 4 2 0 5 .8 9 2 7 5 .5 7 9 1 5 .5 3 3 1 6 8 .8 p s n r watermark / cover % invisible colour image watermarking technique for colour images using dwt and svd 16 international journal on advances in ict for emerging regions june 2020 cover image first. then apply the existing watermarking algorithm and the watermark extraction algorithm without any issues. after the extraction process we have to convert extracted watermark to its original byte rate. those kinds of improvement would be possible to implement for this algorithm. this approach is a non-blind one. it is better if we can improve this as a blind watermarking technique as well. our approach required one of the svd decomposed matrices of cover image to execute the watermark extraction process. it’s better if we can execute the watermark extraction process without taking any components of the cover image as well. to achieve that we need some considerable modifications for the existing equations of the existing algorithm. we need to find a way to recognize embedded data without help of the original cover image. for that we have to mark the embedded data using some sort of mechanism which can be separately recognize the data of the original cover image. robustness against attacks can be further improved as well. there is nothing wrong if we can improve the robustness of the watermark as much as possible. we have witnessed that attacks like butterworth filtering has doing some considerable damage to the watermarked image as well as the extracted watermark. we can further improve the robustness of the watermark by applying few different solutions. currently, we apply the dwt only once. we can increase this and may apply the dwt to the cover image few times. we could expect some reasonable robustness increment from that. also, we can modify the formulas of current algorithm and make them more complex. in this approach we have tried to keep formulas simple, because we want to clearly observe the effect of other factors to the algorithm such as alpha values. references [1] e. lin and e. j. delp, “a review of fragile image watermarks,” 1999, pp. 25–29. [2] ying yang, xingming sun, hengfu yang, chang-tsun li, and rong xiao, “a contrast-sensitive reversible visible image [3] yongjian hu, byeungwoo jeon, “reversible visible watermarking and lossless recovery of original images,” ieee trans. circuits syst. video technol., vol. 16, no. 11, nov. 2006. [4] d. pradhan, “implementation of invisible digital watermarking technique for copyright protection using dwt-svd and dct,” ijaers res. j., vol. 4, pp. 063–069, jul. 2017. [5] a. treméau and d. muselet, “recent trends in color image watermarking,” j. imaging sci. technol., vol. 53, jan. 2009. [6] c. song, s. sudirman, and m. merabti, “recent advances and classification of watermarking techniques in digital images,” p. 6. [7] m. d. swanson, b. zhu, and a. h. tewfik, “transparent robust image watermarking,” 1996, pp. 211–214. [8] i. cox, m. miller, j. bloom, j. fridrich, and t. kalker, digital watermarking and steganography, 2nd ed. san francisco, ca, usa: morgan kaufmann publishers inc., 2008. [9] i. j. cox, j. kilian, t. leighton, and t. shamoon, “a secure, robust watermark for multimedia,” in information hiding, 1996, pp. 185–206. [10] d. j. fleet and d. j. heeger, “embedding invisible information in color images,” 1997, vol. 1, pp. 532–535. [11] g. m. johnson and m. d. fairchild, “a top down description of scielab and ciede2000,” color res. appl., vol. 28, no. 6, pp. 425– 435, dec. 2003. [12] n. v. dharwadkar and b. b. amberker, “an efficient and secured non blind watermarking scheme for color images using dwt and arnold transform,” vol. 9, no. 2, p. 9, 2010. [13] v. santhi and a. thangavelu, “dc coefficients based watermarking techniquefor color images using singular valuedecomposition,” int. j. comput. electr. eng., pp. 8–16, 2011. [14] h. gao, l. jia, and m. liu, “a digital watermarking algorithm for color image based on dwt,” vol. 11, no. 6, p. 8, 2013. [15] c.-h. chou and t.-l. wu, “embedding color watermarks in color images,” eurasip j. appl. signal process., p. 9, 2003. [16] p. bas, n. le bihan, and j.-m. chassery, “color image watermarking using quaternion fourier transform,” 2003, vol. 3, pp. iii-521–4. [17] c.-s. chan, “color image hiding scheme using image differencing,” opt. eng., vol. 44, no. 1, p. 017003, jan. 2005. [18] s. p. mohanty, p. guturu, e. kougianos, and n. pati, “a novel invisible color image watermarking scheme using image adaptive watermark creation and robust insertion-extraction,” in eighth ieee international symposium on multimedia (ism’06), 2006, pp. 153–160. [19] p. a. kaarna and v. botchko, “a technique for digital color image watermarking using ica,” p. 53, 2006. [20] r. agarwal and k. venugopalan, digital watermarking of color images in the singular domain. 2011. [21] “digital color image watermarking using dwt-dct coefficients in rgb planes | global journal of computer science and technology,” 2013. [online]. available: https://computerresearch.org/index.php/computer/article/view/175. [accessed: 13-aug-2018]. [22] d. vaishnavi and t. s. subashini, “robust and invisible image watermarking in rgb color space using svd,” procedia comput. sci., vol. 46, pp. 1770–1777, 2015. [23] j. guru and h. damecha, “a review of watermarking algorithms for digital image,” vol. 2, no. 9, p. 8, 2007. [24] g. chawla, r. saini, and r. yadav, “classification of watermarking based upon various parameters,” int. j. comput. appl., p. 4, 2012. [25] s. madhesiya and s. ahmed, “advanced technique of digital watermarking based on svd-dwt-dct and arnold transform,” vol. 2, no. 5, p. 6. [26] kamal ali ahm, “digital watermarking of still images,” 2013. [27] b. p. mishra and d. h. n. pratihari, “dct based grey scale still image watermarking using 1-d walsh code and biometric protection,” vol. 4, no. 2, p. 5, 2015. ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2020 13 (1): april 2020 international journal on advances in ict for emerging regions extraction of semantic content and styles in comic books damitha lenadora#1, rakhitha ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 abstract— digitisation of comic books would play a crucial role in identifying new areas in which digital comics can be used. currently, existing systems in this domain lack the capacity to achieve complete digitisation. digitisation requires a thorough analysis of the semantic content within comic books. this can be further sub-categorised as detection and identification of comic book characters, extraction and analysis of panels as well as texts, derivation of associations between characters and speech balloons, and analysis of different styles of reading. this paper provides an overview of using several object-detection models to detect semantic content in comics. this analysis showed that, under the constraint of limited computational capacity, yolov3 was the best-suited model out of the models evaluated. a study of text extraction and recognition using optical character recognition, a method for determining associable speech balloons, as well as a distance-based approach for associations between characters and speech balloons are also presented here. this association method provides an increased accuracy compared to the euclidean distance-based approach. finally, a study on comic style is provided along with a learning model with an accuracy of 0.89 to analyse the reading order of comics. keywords — comics, digitisation, content detection, text recognition, speech balloon to character association, comic styles i. introduction a barrier that comic books face to expand their horizons regarding usages as well as to acquire more readers is that they are traditionally a paper-based medium of entertainment. a remedy for this would be the digitisation of comic books. although this is possible by simply taking pictures of the comic book in question, one can argue that this form of digitisation is not complete as it would not contain any information of the content within the comic book. it would also retain a significant amount of noise. digitising to such a complete extent would open the doors for a wide variety of usages for comic books. to perform indexing and plot analysis with ease as well as to view the content within comic books in an immersive environment are examples of such potential usecases. however, let alone to completely digitise comic books, systems which are capable of extracting the semantic content within comic books without any extensive human intervention are limited. a reason for this may be due to the high diversity of the origins of comic books, authors as well as artists, which leads to an extensive number of styles of artwork as well as methods of storytelling via images being created. nonetheless, there are certain semantic elements within comic books that are considered to be common across most comic books. these are panels, characters and balloons, as shown by fig. 1. panels are sections of a comic book page that depict a certain scene in the story that is being told, characters in comic books are those who play a role in the story, while balloons are used to display various texts in the comic book. in addition, if a balloon has an associated character that utters or thinks the content within the balloon, it can be referred to as a speech balloon. in such instances, it may also contain a segment called a tail that point towards the said character. this can also be seen in fig. 1. all of these combine to form the narrative intended by the authors of the comic and present it to the readers. thus, with the final goal of achieving the monstrous task of digitising comic books, the main focus of the research mentioned in this article is to analyse, locate and extract the aforementioned semantic content. though there exist many conventional semantic content extraction methods applied in the domain of comic books, the usage of learning models, especially deep learning which requires a lot of data, is only starting to raise its head within this domain. however, with multiple comic book datasets now available, the time is nigh to apply such techniques to aid in the task of extracting semantic content from comic books. thus, the prime focus of this research is the application of learning techniques such as object detection to extract the semantic content in comic books and the analysis regarding the need for its usage. in addition, the extension of existing conventional techniques is also looked into as there can be certain instances where such techniques give acceptable results. the remaining sections of the paper are as follows: section ii contains the review regarding the existing literature relevant manuscript received on 06 nov. 2019. recommended by dr. d.d. karunaratna on 07 march 2020. this paper is an extended version of the paper “comic digitization through the extraction of semantic content and style analysis” presented at the icter 2019. damitha lenadora, rakhitha ranathunge, chamath samarawickrama, yumantha de silva and indika perera are from the department of computer science & engineering, university of moratuwa sri lanka. (damitha.15@cse.mrt.ac.lk, rakhitha.15@cse.mrt.ac.lk, chamath.15@cse.mrt.ac.lk, yumantha.15@cse.mrt.ac.lk, indika@cse.mrt.ac.lk). anuradha welivita is from the computer & communication sciences, école polytechnique fédérale de lausanne, lausanne, switzerland (kalpani.welivita@epfl.ch). fig. 1 semantic content in comics. extraction of semantic content and styles in comic books 2 international journal on advances in ict for emerging regions april 2020 to this research. the datasets that have been used for this work are presented in section iii. section iv presents the discussion regarding the panel and character extraction as well as character recognition. the content of section v is that of the analysis of the text extraction processes used as well as text recognition using optical character recognition (ocr). section vi is the analysis of the character and speech balloon association. afterwards, in section vii is the analysis of the different styles found in comic books. section viii, which is located at the end, gives the conclusion to this research mentioned in this article. ii. literature review the accurate identification of semantic content in comic books is a must for a reader to properly understand the story shown through comic book images, let alone even to consider their digitisation. numerous techniques have been developed to extract such semantic content from comics. however, these methods, which often have vast differences among one another, are generally only able to extract a single type of semantic content. when traditional panel extraction techniques are considered, the usage of density gradient [1], as well as line segment combination [2] can be observed. although the results of the tests conducted for these methods prove that they work well, for comics with border-free panels and those with no distinct separation between the background and the foreground, the performance of these methods are not very good since they have not been taken into consideration when devising these methods. in addition, as a solution for comic books with panels which are not separated by white backgrounds, techniques that utilize region of interest detection [3] as well as recursive binary splitting [4] have been developed. on the other hand, techniques such as binarization [5] and morphological operations [6] are prime examples for the extraction of speech balloons. the use of the two approaches stated above can be seen in the researches [7] and [8]. another common technique that can be seen in many of the related works is connected component labelling [9]. it is an algorithm which uses graph theory and is capable of detecting regions within binarized images. this technique can be seen being used in both [10] and [11]. the active contour model [12] is utilized successfully for the extraction of speech balloons in [13]. for face detection in manga, viola-jones detection framework [14] is used in the research done in [15]. however, [16] explains that the traditional face detection and face recognition techniques do not perform well when applied to comic book characters due to the fact that they possess considerable differences when compared to real humans. as a solution to this, a method which utilizes skin colour and regions has been suggested by the authors. the work in [17] has extended the aforementioned method for the detection and identification of comic book characters. furthermore, [18] suggests the usage of a graph-theory based method for identifying main characters in comics. this is done by representing each panel as an attributed adjacency graph and locating the most common colour patterns. the authors in [19] have carried out similar research by using a sift descriptor, which shows good potential for the task of character identification. content extraction of comic books through the utilisation of deep learning has also seen a sharp rise throughout the past few years. the ever-popular convolutional neural networks (cnn) and its derivations are often used in these said usages as they show great potential in dealing with images. the researches carried out in [20], [21] and [22] through the usage of the object detection models of yolov2 [23], a customised faster r-cnn [24] model and mask r-cnn [25] respectively, stand as a testimony for this fact. text identification in comics varies depending on whether the text is typewritten or handwritten. handwritten text is more difficult to identify due to the different styles of writing used. tesseract [26] ocr is one of the prominent tools used in identifying texts, which is capable of recognising the text of a similar style once trained. [27] uses this with a long shortterm memory (lstm) [28] algorithm called ocropus to identify text automatically in a segmentation-free manner. a different method is proposed in [29] using lstm. rather than using pre-trained ocr, this approach focuses on developing an ocr system with token dictionaries using k-means clustering. the advantage of this is a low error rate in text recognition and the potential to outperform existing ocr tools. the final step is the character to speech balloon association in comic books. the method of association proposed in [30] via the usage of geometric graph analysis is the earliest research found to be done on this particular topic. this method requires the identification of panels, balloons as well as characters in comic books, and predicts a relation per speech balloon solely by relying on the minimum euclidian distance to a character. considering association recognition as a binary classification problem, the proposed model in [22] also trains a portion of the model to predict these associations in comics. a notable observation in both methods mentioned above is that they do not utilise the tail of a balloon which comic book authors use specifically to associate a speech balloon with a character. however, the authors intend to utilise balloon tails as a plausible future improvement for their proposed methods. iii. datasets a. ebdtheque the training of object detection models, analysis of text recognition and speech balloon-character association was done using the ebdtheque [31] dataset. the ebdtheque dataset is a selection of comic pages from america, japan (manga) and europe. each comic page consists of three categories of visible objects; characters, balloons and panels. apart from that, the dataset also comprises of speech balloon to character associations and general metadata related to the comic pages such as the isbn, language, author and editor names, which are included in the ground truth files. to elaborate further on the statistics of the annotated elements; • pages – 100 • frames (panels) – 850 • balloons – 1,081 • characters – 1,620 • text lines – 4,693 the ground truth files are in scalable vector graphics (svg) format and follow the underlying xml and encoding information, as shown in fig. 2. semantic and visual annotations on a given page are gathered in such files making the files in the database simple and easily shareable. the original svg format of the ground truths is unique to the ebdtheque dataset, and none of the object detection models is made to support this format as an input. we had two 3 damitha s. lenadora#1, rakhitha r. ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 april 2020 international journal on advances in ict for emerging regions alternatives to overcome this problem; change the model’s input-output pipeline to support these formats or convert svg files to a supported format like pascalvoc, coco or tfrecord. we chose the latter alternative and were able to convert the svg files to the coco format as well as tfrecord format using several python scripts. minor changes were done later to support specific models. thereby, we managed to train and validate different models using the data in the ebdtheque dataset. the dataset was divided into two parts: i.e., training dataset and validation dataset. we decided that the most suitable way to divide the dataset is by allocating 80% of the data to the training set and the rest to the validation set, i.e., 80 images to the training set and 20 images to the validation set. due to the presence of images of different comic types, we decided to put at least one of each kind to the validation set to assert that mean average precisions (maps) obtained covered all types of comic pages in the dataset. b. manga109 the manga109 [32] is a dataset that consists of 109 manga titles with a total of 10,130 pages. it is made publicly available for academic research purposes with proper copyright notation. this dataset comprises of 109 manga volume drawn by professional manga artists in japan. the annotations of this dataset are given in xml format with bounding boxes over character faces, character bodies, text parts and frames (panels). one of the significant differences between this dataset and the ebdtheque dataset is that in manga109, the bounding boxes are drawn over the specific text portion rather than the balloons, which consist the text parts. we can still use the conversion methods stated above, with slight modifications, to convert these annotations to a standard format which can be used to train and test an existing model. one of the special features in this dataset is that the characters are annotated with a character id, which is essential for the task of character identification. the statistics regarding the numbers of each element of the dataset are as follows; • manga volumes – 109 • pages – 10,130 • character faces – 118,715 • character bodies – 157,152 • text portions – 147,918 • frames (panels) – 103,900 we attempted to do character identification from this dataset using facenet [33], but due to the usage of triplet loss by facenet which calculates a distance between faces, this attempt did not yield the results as expected. the reason for this was determined to be the similarities between faces especially due to the black and white nature of the images in the dataset as well as the vast differences in the faces of the same character which was a result of the nature of manga drawings. this resulted in faces of different characters being similar to each other than those of the same character. it was verified through observation that these were not mistakes of the neural network or the code since many characters could not be identified even with the naked eye. c. comics dataset comics dataset [34] is a massive dataset with a size of over 120gb comprised of over 1.2 million panels paired with automatic textbox transcriptions. the main objective of this dataset is to demonstrate that neither text nor image alone can tell a comic book story. therefore, a computer must understand both modalities to keep up with the plot. this dataset consists of comic book panels and text boxes. panels are marked with rectangular bounding boxes in 500 randomly selected pages. each bounding box encloses both the panel artwork and the textboxes within the panel. 1500 selected panels are annotated for textboxes. to get the text in textboxes, they have applied ocr to the selected text boxes. the statistics of the dataset are as follows; • books – 3,948 • pages – 198,657 • panels – 1,229,664 • textboxes – 2,498,657 other than these, this dataset has elements called text cloze, visual cloze and character coherence, which are aimed towards reaching the main objective of creating the dataset. iv. content extraction via object detection a. faster r-cnn the r-cnn family is a set of object detection frameworks that performs selection as well as the classification of objects in images through proposing candidate object regions. predicting bounding boxes for the objects detected, the evolved faster r-cnn framework performs significantly better than its predecessors. a faster r-cnn object detection network that uses inception v2 as its backbone was trained on the ebdtheque dataset. for the implementation, the tensorflow object detection api [35] was used with the configurations of the model being, a batch size of 1, an initial learning rate of 0.002, and momentum optimiser value of 0.9. after approximately 60,000 steps, signs of overfitting began to emerge, and further training was halted. the best results of the model were an map fig. 2 svg files content and its representation. extraction of semantic content and styles in comic books 4 international journal on advances in ict for emerging regions april 2020 of 62% with a 50% intersection and a mean average recall (mar) of 41% per 100 detections. the model’s reliance on whitespace to detect the boundaries of a panel could be observed due to the incorrect detection of additional panels in black and white images. b. single shot multibox detector single shot multibox detector (ssd) [36], is a method for encapsulating object detection processes and computation into a single network. in the ssd architecture, there are two types of deep neural networks. first, a base network for high-quality image classification tasks and then, a multibox detection network which detects objects through bounding boxes. with the tensorflow object detection api [35], ssd mobilenet model was implemented. the label map and the trainer were configured to have three classes to accommodate the ebdtheque dataset. the input images were resized to 300x300 with the use of a fixed shape resizer since the ssd network ends with fully connected layers. the trainer was set to have a batch size of 1, an initial learning rate of 0.004 and a momentum optimiser value of 0.9. after training the network for 200,000 steps, the obtained map values were shown to be extremely poor, standing at less than 0.4. the mar values stood at around 0.3, which was still, far below acceptable. the main reasons for this underwhelming performance could be the lack of training data and the initial resizing of the input that happens in the model. c. yolo yolo stands for you only look once [37]. it is an object detection framework, which spatially separates bounding boxes and associates with class probabilities. through this, it is able to predict both the bounding boxes as well as the class probabilities from full images in one evaluation while using a single neural network. yolov3 [38], which is the 3rd version of the yolo network, is an incremental improvement of its predecessor, yolov2(also known as yolo9000). the neural network used for this version is a network with 53 convolutional layers, named darknet-53. this model predicts an objectness score for each bounding box using logistic regression. class prediction is made using multilabel classification, and independent logistic classifiers are used instead of softmax. binary cross-entropy loss for class predictions is used during training. bounding boxes are predicted at three different scales. the final convolutional layer predicts a 3d tensor which encodes bounding box, objectness and class prediction. bounding box priors are determined using k-means clustering, which is similar to the previous version of yolo. yolov3, the latest model, which is explained above, was used in our experiments. first, we adjusted the darknet-53 configuration files to be in line with the ebdtheque database. there are three main classes in ebdtheque dataset; character, panel and balloon. we also changed the number of filters in various convolutional layers to suit the dataset. we used the aforementioned conversion methods to obtain the labels for the training set and the validation set. the network, darknet-53 was trained for 250 epochs under a batch size of 1, and the following results were obtained. the confidence threshold was set to 0.3, nms threshold was set to 0.45 and iou threshold was set to 0.5. the two graphs which are shown in fig. 3 and fig. 4 depict how the total map and the maps of each class change with respect to the number of epochs, respectively. it can be clearly observed that there is a shortage of training data from the results shown in fig. 3 and fig. 4. the maximum total map that could be obtained was slightly below 0.6, even though the network was trained for a considerable number of epochs. when looking at the graph as a whole, the total map can be observed to arrive at a slightly stable state after about 80 epochs. thereby, we can conclude this to be a rough estimation for the optimum number of epochs the model should be trained for. the map per class graph, which is shown by fig. 4 clearly indicates the low map for the “character” class, which maxes out at around 0.35. the main reason for this is the large variations among different characters in different comics, as well as between the same characters when drawn by different artists, who have their own drawing styles. due to this and the low amount of data, the model is currently unable to generalize well for detecting different comic characters. on the other hand, the speech balloons and panels give considerably good maps, which are above 0.7. the reason for this is the consistent styles of these which do not differ within the same comic and even among different comics. therefore, the model is able to generalize well for these classes, even with a limited amount of data. we used yolo to test the performance in the manga109 dataset as well. the training and testing processes were carried out in a similar fashion to those done using ebdtheque dataset. the model was able to reach a total map value of 0.682, a precision value of 0.755 and a recall value of 0.738. however, manga109 dataset purely consists of black and white pictures. due to this, if the model is trained solely using this dataset, it would fail to generalize to other types of comics, of which a majority are coloured. the first aspect we considered when tuning the hyperparameters was finding out which optimizer to use to train the model. the tests were carried out using the ebdtheque dataset. the dataset was trained for 100 epochs, with a learning rate of 0.001, an nms threshold of 0.45, an iou threshold of 0.5 and a confidence threshold of 0.3. the batch size used for the purpose was 16. the tests were carried out with the hyperparameters stated above while varying the optimizers. the following tables show the total results obtained and the class-wise results obtained for each of the tested optimizers, respectively. table i results from the total dataset for different optimizers total map mean precision mean recall adadelta 0.0554 0.0148 0.4024 adagrad 0.2636 0.3694 0.4179 adam 0.5587 0.5900 0.6542 rmsprop 0.2942 0.4187 0.3887 sgd 0.5072 0.5253 0.6254 table ii results per each class in the dataset for different optimizers character map panel map balloon map adadelta 0.0275 0.0709 0.0821 adagrad 0.1797 0.2584 0.3743 adam 0.4106 0.6458 0.6862 rmsprop 0.1948 0.3734 0.3331 sgd 0.4230 0.5178 0.6166 5 damitha s. lenadora#1, rakhitha r. ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 april 2020 international journal on advances in ict for emerging regions by observing the above results, adam was concluded to be the best optimizer fit for our purpose. sgd was a close second and also managed to outperform adam for the “character” class. however, through further testing, we discovered that adam was able to give better results than sgd, even for the “character” class when trained further (the test was done for 250 epochs, using the same hyperparameters used for the previous test). another test was carried out to determine the optimum learning rate for the model. this was also done using the ebdtheque dataset. the dataset was trained for 100 epochs, with adam set as the optimizer, an nms threshold of 0.45, an iou threshold of 0.5 and a confidence threshold of 0.3. a batch size of 16 was used for this task. the tests were carried out with the above-stated hyperparameters while varying the learning rates. the learning rates were increased by roughly threefold from the previous test case. the tested leaning rates started at 0.001 and ended at 0.1, which was deemed to be a worthy range for carrying out the tests. table iii shows the accuracies corresponding to the tested learning rates. table iii accuracies corresponding to different learning rates learning rate accuracy 0.001 0.4309 0.003 0.3907 0.01 0.3454 0.03 0.2959 0.1 0.2000 it was concluded that 0.001 is the best learning rate to train the model in order to get the most accurate results, considering the above. out of the tested models which we have stated above, yolo was able to get considerably better results compared to the ssd model and was able to do so whilst using much better memory usage compared to the r-cnn family, in the task of object detection. due to the better results as well as the limited computation capabilities available, yolo was decided to be the best model for the task of object detection. with more training data and slight hyperparameter adjustments, this model has the best potential to generalize well for the task at hand. v. text extraction and analysis text extraction and analysis from panels involve two main steps: (1) extracting the speech balloons from the panel, and (2) processing the text within the speech balloons using ocr software. here, we will discuss the methodology we followed in order to extract the text from the comic book panels and process them for further analysis. even though we use models such as yolo, faster r-cnn, and ssd for content detection, we decided to take a different approach to speech balloon detection. the model we have used here is mask-rcnn, which is an improved version of its predecessor; faster-rcnn. we used a tensorflow and keras implementation [39] of mask rcnn instead of using the original version to reduce the usage of resources. the main reason for using mask-rcnn is the generation of masks that mark the object boundary as opposed to just fig. 4 map per class; map for the character class – red, map for the panel class – green, map for the balloon class – blue. fig. 3 total map of yolo. extraction of semantic content and styles in comic books 6 international journal on advances in ict for emerging regions april 2020 locating the object. mask-rcnn facilitates instance segmentation by providing the exact coordinates of the objects, unlike other object detection models. i.e. the exact pixels of the detected speech balloon in the comic book page will be returned by the model. the generated mask will be vital in detecting the tail of the speech balloon. this will be used in making associations between the speech balloons and characters, as mentioned in section vi. the ebdtheque dataset was used to train the model in order to detect speech balloons. since we were trying to distinguish only the speech balloons, adjustments were made to accommodate the model to a single class. the model we chose uses a resnet50 [40] backbone architecture. the default configurations of the model were for the coco dataset, which has about 80 different classes. the model was trained for 100 epochs with 80 steps per epoch while having a batch size of 1. we decided to use a transfer learning approach to train the model by initializing the model with coco pre-trained weights and fine-tuning the layer heads with ebdtheque data. the heads include the region proposal network, classifier, and mask heads of the network. the model was trained with a learning rate of 0.001, a momentum of 0.9, and with stochastic gradient descent as the optimiser. however, if we increase the number of epochs, the validation loss would increase, causing the model to overfit. so, the model obtained in this scenario is more generic to all data. the map of the model over the validation set is around 0.65. the main reason for this would be the lack of diverse training data. fig. 5 shows some of the detections done using the model we trained. the coloured speech balloons show the masks generated when a detection is made by the model. the model returned average precision (ap) of 0.758 and 0.905, respectively for the above two figures for the iou threshold range 0.5 0.95. we decided to use the tesseract ocr for the analysis of text found in extracted speech balloons. the latest version of tesseract 4 options in the ocr engine. these are legacy engine, lstm engine, legacy + lstm engine, and default engine. we decided to use the neural network-based lstm engine for our purpose. this also provides different page segmentation options which can be crucial under different conditions you come across. the results we obtained with both tesseract command-line and pytesseract, which is a tesseract package in python, are similar. table iv contains results obtained through tesseract for three different styles of text found in the comic books. the first two instances are in english, but one is old english while the other is modern english. the third instance is in french. it is clear from the results that the text recognition in the 1st instance is somewhat weak compared to the second instance. the style of the letters and characters is a huge factor here. some characters are difficult to be recognized even though the naked eye. the character error rate is quite high in this case due to that. in both cases, the word error rate is lower than the character error rate. in the third instance; which is in french, the recognised text is identical to what’s in the extracted speech balloon. the error rate will increase if the style of the text varies a lot from the standard style. this is due to the fact that tesseract is trained using characters which belong to a standard format. one reason for the errors in recognition would be the noise in the extracted parts. there is a possibility to improve the results by pre-processing the extraction before processing fig. 5 results of speech balloon detection of different comics by mask-rcnn. 7 damitha s. lenadora#1, rakhitha r. ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 april 2020 international journal on advances in ict for emerging regions them using tesseract. examples for such pre-processing techniques are noise removal, blur, and binary thresholding. table iv tesseract ocr results stay! loon ahead! a stag in the cleabing / 'tis sent us by da fortune jnersele | quiet / t told the captain of my fears, but he paid no attention to what t said and_ left without deigning-to reply. pour faire correctement un frou, il faut être deux. vi. character and speech balloon association the association between a speech balloon in a comic and the relevant characters that uttered the content in the said balloon play a significant role in presenting the story intended by the author to the readers of the comic book. for a human reader, such associations are easily understood by the hints given through the artwork within comic books. one such clue which is primarily used is the tail that can be seen in speech balloons. these tails are usually represented by an elongated section of a speech balloon that ends in a point facing towards the associated comic book character who is within the same panel as the balloon. however, when performing this association via a program, it is not as simple as for humans due to intuition also playing a pivotal role in accurately identifying a correct association. a. association methods building upon the works of rigaud et al. [30], a method of character to speech balloon association given the location of panels, characters, speech balloons and the tail of the speech balloons was looked into. thus, for obtaining an association, each speech balloon was checked to locate its associated character rather than checking each character. furthermore, each speech balloon and character were assigned to a panel within the comic. this association was done by selecting the panel containing the maximum area of the associating speech balloon or character. if either a balloon or character does not have any area in any of the panels in the page, it is then assigned to a separate outlier panel created to hold such objects. as shown in fig. 6, not all balloons have an associated character as some balloons are used to describe scenes. thus, the selection of balloons that are assumed to have an association with a character is necessary. for this, three different methods were evaluated. these methods are selecting all balloons, selecting balloons that possess tails and selecting balloons that possess tails or are dissimilar to a square shape. the intentions behind the final two selection methods are respectively, due to speech balloons with associations usually possessing tails and due to most balloons that describe scenes are of a shape similar to a square. the dissimilarity from a square shape is assumed if the area of the balloon itself is 90% or less of the bounding box that surrounds the balloon. afterwards, the following methods were tested to identify a way of associating speech balloons to characters with considerably high accuracy. 1) method a: the character with the minimum euclidean distance from the tail of the speech balloon to the centre of the character. this is a method that has shown the highest results in [30]. 2) method b: the character with the minimum perpendicular distance from its centre to the line drawn in the direction pointed by the tail of a balloon, which is also intersected by the line extended from the tail in the direction the said tail points towards. 3) method c: the character with the minimum euclidean distance from the tail of the speech balloon to the centre of the character, which is also intersected by the line extended from the tail in the direction the said tail points towards. 4) method d: the combination of method b and method c. this is done by associating the character with the minimum sum of squares of the results of the aforementioned two methods. out of these four methods, method a was chosen as a baseline to evaluate the performance of the other methods. the line drawn in the direction of the tail is obtained by utilizing the point where the tail is located in a balloon and the mean point of the two adjacent points to the tail. in the latter three methods, checking that if the line extended from the tail intersects any part of the bounding box of the associated character is done to avoid associating a speech balloon to a character in other directions than the one the tail is pointing towards. in the situation that a character is not intersected by the line extended from the tail, infinity would be returned as a result. b. experiments and analysis of results for experimenting the speech balloon to character association methods, the ebdtheque v3 dataset along with its annotations was used as a test set. though the acquisition of the previously mentioned prerequisites would be possible to a certain degree utilizing object detection, for the sake of experimentation, the annotated data of the panels, characters, and balloons of the ebdtheque dataset were used. table v performance of the selection of a balloon having a speaker association fig. 6 types of comic balloons extraction of semantic content and styles in comic books 8 international journal on advances in ict for emerging regions april 2020 method of balloon selection with associations performance precision recall all balloons 81.96% 100% balloons with tails 96.09% 97.07% balloons with tails or dissimilar to a square shape 93.64% 99.67% when considering the performance of the selection of speech balloons having an association with a character as shown by table v, if selecting all balloons, the 195 balloons out of 1081 in the dataset that have no association to a character were also selected. though it is possible to recall all the balloons with associations through this method, the precision drops considerably due to the balloons with no association. with the method of selecting only the balloons with tails, the precision rises significantly yet is not 100% due to the presence of speech balloons where the characters who make the utterance in the said balloons are not depicted in the same panel. the recall drops to 97.07% as speech balloons with no tails are ignored in this method. the final method of selection, which is to select balloons that either has a tail or are dissimilar to a square shape, gives a more balanced result compared to the prior two methods. however, an issue in this method is that it would still take into account non-speech balloons that are not square, as well as ignore speech balloons with no tails and shaped similar to squares as shown by fig. 7. the initial evaluation of the association methods alone is done by taking the percentage of correct associations out of all the speech balloons with tails as well as associations to characters that are mentioned in the annotations of the dataset. this is done since all the association methods experimented upon requiring the identification of the tail of the speech balloon. the content of table vi shows the results of the evaluation. table vi performance of association methods character to balloon association performance method a 93.95% method b 86.62% method c 94.53% method d 95% using method a as a baseline, it can be seen that method b’s performance is significantly poorer. the reason for this drop was because this method tends to ignore characters who are closer to the balloon and associated characters who are more matching in regard to the direction the tail of the balloon points to. this is especially true for panels that are crowded with characters. thus, through the evaluation of this method, it is possible to speculate that the distance from a speech balloon also plays great importance, sometimes even more so than the balloon tail for associating speech balloons and characters. when evaluating method c, a considerable increase in accuracy can be observed. this is due to this method rectifying the weaknesses of both method a and method b. the intuition behind this method is to select the closest character who is in the direction the balloon points to. a weakness that could be observed in this method was when the tail of a speech balloon is located within the bounding box of a character not associated with the balloon, as shown in fig. 8. the balloon, which is highlighted in blue, is truly associated with the character on the left although its tail is located within the bounding box of the character that is highlighted in yellow. fig. 8 speech balloon located in the bounding box of another character. it is possible to circumvent this weakness by making the conditions to check if a given character is in the direction that the tail of a speech balloon points to stricter via using the centroid of the character rather than simply intersecting the bounding box. however, this results in a lower overall accuracy since the margin to determine if a character is in the direction that the tail points to reduces. this is especially true for balloons where the associated character is directly below or above it with the tails being represented vertically. the intuition behind method d is to combine both the results of method b and method c, as a single distance measure. as shown in table vi, this gives the highest accuracy out of all the methods considered. upon analysing the results of this method, the weakness of method c is also negated due to considering the distance from the line extended in the direction the tail of the balloon points to. this again brings up the weakness of method b, wherein certain situations, the direction the tail point towards is considered at a higher degree and incorrectly predicts the associations that method c predicts correctly. fig. 7 balloons with differing styles to the norm. fig. 9 balloons, which are pointing to other balloons. 9 damitha s. lenadora#1, rakhitha r. ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 april 2020 international journal on advances in ict for emerging regions analysis of the results of the four associations reveals the presence of balloons that point toward other balloons in the ebdtheque v3 dataset, as shown in fig. 9. these are situations where the utterances a character makes are spread across several speech balloons. when combined as a single balloon, it possible to observe an increase in accuracy in the methods a, b, c, and d respectively to 94.26%, 87.8%, 95.41% and 95.7%. another situation where the latter three methods gave incorrect results were when the tails of the balloons did not point towards characters directly, but rather vaguely, as shown in fig. 10. here, the highlighted balloon points towards the edge of its associated character and not directly. when analysing the results of the observations of the association presence selector via balloon tail or dissimilarity to a square shape, as well as the character association methods as a whole, a way of measuring the performance would be the collective correct decisions, which are the correct character associations and the correctly detected non-associated balloons, divided by the total number of balloons. a point to note would be that for the balloons that did not have a tail but are dissimilar to a square shape was associated to characters utilizing the minimum euclidean distance from the centroid of the balloon to the centroid of the associable characters. furthermore, in addition to the balloons which were deemed not to possess an association by the tail selection, certain balloons were deemed not to have an association by the association methods themselves in situations where an identifiable associated character could not be located. the collective performances using the unmodified ebdtheque v3 dataset, of the selection by tail or dissimilarity to a square shape, and method a, method b, method c and method d, were 93.52%, 89.18%, 95.47% and 95.84% respectively. these results, which are not at the level of perfection, are acceptable to a certain degree with the method based on method d being the best. however, a critical point of concern would be for speech balloons that possess multiple associations and other different styles to that of the norm. the methods contained in the evaluation done prior does not consider such situations. one cannot ignore these situations as comics are comprised of the artwork of the authors. this art can be abstract and can be used to give different meanings from author to author as well as require intuition rather than following a simple set of rules or patterns to grasp its true meaning. thus, the need for the incorporation of learning methods, as well as different learning models per art style seem to be hinted through these observations. in addition, the requirement of the detection of panels, characters, speech balloons and the tails of the speech balloons would also cause a considerable limitation if extracted from a computer-based system instead of directly from human annotations. this is due to each said detection practically possessing a considerable chance of being incorrect as well as imprecise. this would collect more and more and leave an overall low accuracy at the end of the association process. thus, a high accuracy must be achieved in this detection or circumventing the requirement would be a necessity. vii. style analysis in the process of analysing semantic content of comics, determining the reading style of panels is a crucial step. comics from different origins have shown to have differences in the art style, formatting, panelling as well as the reading style. to determine the reading style and the order, a human reader should take cues from the art style, formatting, panelling, and the origin of a comic. furthermore, comics which follow a reading style can have diverse art styles depending on multiple attributes like the region, artist, published year, etc. hence, a significant level of expertise is required to determine the reading order of a comic. reading style of a comic can mainly take one of the following three flavours. left-to-right, right-to-left or top-tobottom. generally, the left-to-right reading style is practised in western comics and in chinese manhua. right-to-left reading style is heavily popularized by manga, which is the japanese version of comics. the top-to-bottom reading style is adopted by korean manhwa and webtoons [41]. fig. 11 shows the aforementioned reading styles in comics. here, the numbers in the panels indicate the order in which they should be read. distinguishing between the different art styles, formatting and panelling in comics that belong to manga, manhwa, manhua and western origins help to determine the reading style of the comics, whether it’d be left-to-right, right-to-left or top-to-bottom. a. the difference in art style the art style in a comic refers to the specific ways the colouring, the character drawing and the background is done. in terms of colour, manga is generally black and white although the cover pages and the first pages of chapters are usually coloured. manhwa, manhua and western comics are generally coloured since they are digitally drawn and are available in digital formats. style of character drawing contributes immensely to distinguish the art styles between manga, manhwa, manhua and western comics. here, western comics differ from the rest considerably with its wide variety where differences between manga, manhua and manhwa are rather minute. chinese manhua mostly uses slender characters with largely drawn muscles and narrow waists, where japanese manga uses more slender characters with no overgrown muscle types [42]. korean manhwa is a combination of manhua and manga with the focus lying on drawing the characters realistically [41]. fig. 10 highlighted balloon pointing to its respective character vaguely. extraction of semantic content and styles in comic books 10 international journal on advances in ict for emerging regions april 2020 manga, manhwa and manhua also use more of a cinematic style than western comics, portraying characters in dramatic angles more in sync with a film than a comic book [43]. the background style is another factor which aids to distinguish between different comic styles. manhua and western comics differentiate from the rest with their more detailed backgrounds. due to its limitation of not using colour, manga usually has much simpler backgrounds compared to others. b. the difference in formatting and panelling in western comics, the establishing shot is generally centred on occupying the first scene in the comic. manga, however, places its establishing shot at the bottom of the page. manga, manhua and manhwa structures their scenes frameby-frame, representing a snapshot of the action and in sync with the dialogue. western comics are graphic novels, and as such, the stories and visuals don't necessarily sync with the dialogue and visual action [43]. c. experiments and analysis of results the defining characteristics of manga, manhwa, manhua and western comic genres were used to identify to which genre the comic belongs to. the reading styles were then inferred as left-to-right for western comics and manhwa, right-to-left for manga and top-to-bottom for manhua. a custom dataset with comic images belonging to manga, manhwa, manhua and western genres was used to experiment with the above approach. three approaches that try to determine the reading style of the comics were tested on this dataset. the first approach was to determine the reading style of comic images based on its colour. with the presumption being most of the black and white comics belong to the genre manga, this approach attributed a right-to-left reading style, which is the general reading style for manga comics, to every black and white image. this approach was tested on the dataset, and it yielded an accuracy of 0.89. however, the result can be biased due to the limited dataset and the imbalanced nature of it. the second approach was to classify the comic images into the main genre classes; western, manga, manhwa and manhua, using a neural network model based on the mobilenet v2 [44]. in this approach, the model, trained on the custom dataset, would classify the image to one of the genre classes. afterwards, the reading style is attributed to the image, following the reading style of that genre. with the first step being classifying comics to their genres, this can later be extended to develop genre-specific extraction models. this approach yielded an accuracy of 0.80 on the tested dataset. the third approach was an improvement from the second approach, where the model used for the classification changed whilst the methodology remained unchanged. there, a convolutional neural network model was trained using the custom dataset to classify comic images into their genre classes. this approach yielded an accuracy of 0.89 superseding the performance of the mobilenet v2 based model. a summary of the results for the approaches that were tested is presented in table vii. table vii summary of the results approach accuracy colour-based classification 0.89 mobilenetv2_1.00_224 0.80 cnn conv2d 0.89 fig. 11 left-to-right, right-to-left and top-to-bottom reading styles. 11 damitha s. lenadora#1, rakhitha r. ranathunge#2, chamath samarawickrama#3, yumantha de silva#4, indika perera#5, anuradha welivita#6 april 2020 international journal on advances in ict for emerging regions out of the three approaches, the third approach with the cnn model was selected because of its high accuracy and the ability to classify the genres of comics which can be extended to develop genre-specific extraction models in future developments. viii. conclusion the initial component of this research consisted of an analysis regarding the semantic content extraction of comic books with the utilization of object detection techniques. for this, the object detection models; faster r-cnn, ssd mobilenet as well as yolov3 were used. from these models, it was deemed that yolov3 performed the best under the constraint of limited computational power, due to its low memory usage as well as high accuracy in making detections in the test set. afterwards, focusing on text extraction, test results on speech balloon detection with a mask r-cnn model were presented. furthermore, test results on analysing the extracted speech balloons with the use of the tesseract ocr engine were also shown. subsequent to extracting the semantic content in comics, this research presents a method for character-speech balloon association. an evaluation of three methods capable of selecting balloons for an association as well as four methods for predicting associations with characters and speech balloons was performed in this section. with test results for each method, it was concluded that for the selection for an association, selecting balloons that possess tails or are dissimilar to a square shape performed the best. in addition, the best results for character association was given when both the distance from the balloon as well as the distance from the direction the tail of the balloon points towards was taken as a distance measure. finally, after analysing the variations of reading styles in comics, a method is proposed by this research for determining the reading style for different genres of comics. a model-based approach was presented, with a data set that is currently being developed. thus, by combining the various components of this research, it would be possible to yield a promising result for digitising comics. acknowledgement we appreciate the support of the department of computer science and engineering at the university of moratuwa, sri lanka. we are also grateful for the publishers of the public domain datasets and comics used for this research. references [1] t. tanaka, k. shoji, f. toyama and j. miyamichi, “layout analysis of tree-structured scene frames in comic images,” in 20th international joint conference on artificial intelligence, hyderabad, india, 2007. [2] y. wang, y. zhou and z. tang, “comic frame extraction via line segments combination,” in international conference on document analysis and recognition, nancy, france, 2015. [3] m. stommel, l. i. merhej and m. müller, “segmentation-free detection of comic panels,” in computer vision and graphics: proc. int. conf. iccvg, 2012, poland, 2012. [4] x. pang, y. cao, r. w. lau and a. b. chan, “a robust panel extraction method for manga,” in proceedings of the 22nd acm international conference on multimedia, orlando, florida, usa, 2014. [5] f. niklas, “image processing binarization,” 2019. [online]. available: http://felixniklas.com/imageprocessing/binarization. [6] “morphological dilation and erosion,” the mathworks inc., 2019. [online]. available: https://www.mathworks.com/help/images/morph ological-dilation-and-erosion.html. [7] c. rigaud, j.-c. burie and j.-m. ogier, text-independent speech balloon segmentation for comics and manga, 2017. [8] h. jomaa, m. awad and l. ghaibeh, “panel tracking for the extraction and the classification of speech balloons,” in international conference on image analysis and processing, genoa, 2015. [9] r. fisher, s. perkins, a. walker and e. wolfart, “image analysis connected component labeling,” 2003. [online]. available: http s://homepages.inf.ed.ac.uk/rbf/hipr2/label.htm. [10] c. rigaud, n. tsopze, j.-c. burie and j.-m. ogier, “robust frame and text extraction from comic books,” hal, no. hal-00841493, 2013. [11] a. ghorbel, j.-m. ogier and n. vincent, “text extraction from comic books,” in grec 2015 eleventh iapr international workshop on graphics recognition, nancy, 2015. [12] m. kass, a. witkin and d. terzopoulos, “snakes: active contour models,” international journal of computer vision, vol. 1, no. 4, p. 321–331, 1988. [13] c. rigaud, j. burie, j. ogier, d. karatzas and j. v. d. weijer, “an active contour model for speech balloon detection in comics,” in 2013 12th international conference on document analysis and recognition, washington, dc, 2013. [14] p. viola and m. j. jones, “robust real-time face detection,” international journal of computer vision, vol. 57, no. 2, pp. 137-154, may 2004. [15] w. sun and k. kise, “detection of exact and similar partial copies for copyright protection of manga,” international journal on document analysis and recognition (ijdar), vol. 16, no. 4, pp. 331-349, december 2013. [16] k. takayama, h. johan and t. nishita, “face detection and face recognition of cartoon characters using feature extraction,” in ieej image electronics and visual computing workshop, kuching, malaysia, 2012. [17] f. s. khan, r. m. anwer, j. v. d. weijer, a. d. bagdanov, m. vanrell and a. m. lopez, “color attributes for object detection,” in ieee conference on computer vision and pattern recognition, providence, ri, usa, 2012. [18] h. n. ho, c. rigaud, j.-c. burie and j.-m. ogier, “redundant structure detection in attributed adjacency graphs for character detection in comics books,” in 10th iapr international workshop on graphics recognition, washington, d.c., usa, 2013. [19] w. sun, j.-c. burie, j.-m. ogier and k. kise, “specific comic character detection using local feature matching,” in 12th international conference on document analysis and recognition, washington, dc, usa, 2013. [20] n.-v. nguyen, c. rigaud and j.-c. burie, “digital comics image indexing based on deep learning,” journal of imaging, 2017. [21] x. qin, y. zhou, z. he, y. wang and z. tang, “a faster r-cnn based method for comic characters face detection,” in 14th iapr international conference on document analysis and recognition, 2017. [22] n.-v. nguyen, c. rigaud and j.-c. burie, “multi-task model for comic book image analysis,” in mmm 2019, thessaloniki, 2019. [23] j. redmon and a. farhadi, “yolo9000: better, faster, stronger,” in ieee conference on computer vision and pattern recognition (cvpr), honolulu, hi, usa, 2017. [24] s. ren, k. he, r. girshick and j. sun, “faster r-cnn: towards realtime object detection with region proposal networks,” in tpami, 2017. [25] k. he, g. gkioxari, p. doll´ar and r. girshick, “mask r-cnn,” corr, vol. abs/1703.06870, 2017. [26] google, “tesseract open source ocr engine,” github repository, 2018. [27] c. rigaud, j.-c. burie and j.-m. ogier, “segmentation-free speech text recognition for comic books,” hal, vols. hal-01719619, 2018. [28] c. olah, “understanding lstm networks,” august 2015. [online]. available: http://colah.github.io/posts/2015-08-understanding-lstm s/. [29] t. breuel, a. ul-hasan, m. a. al-azawi and f. shafait, “highperformance ocr for printed english and fraktur using lstm http://felixniklas.com/imageprocessing/binarization https://www.mathworks.com/help/images/morphological-dilation-and-erosion.html https://www.mathworks.com/help/images/morphological-dilation-and-erosion.html https://homepages.inf.ed.ac.uk/rbf/hipr2/label.htm https://homepages.inf.ed.ac.uk/rbf/hipr2/label.htm http://colah.github.io/posts/2015-08-understanding-lstms/ http://colah.github.io/posts/2015-08-understanding-lstms/ extraction of semantic content and styles in comic books 12 international journal on advances in ict for emerging regions april 2020 networks,” in international conference on document analysis and recognition, washington, dc, 2013. [30] c. rigaud, n. l. thanh, j.-c. burie, j.-m. ogier, m. iwata, e. imazu and k. kise, “speech balloon and speaker association for comics and manga understanding,” in 2015 13th international conference on document analysis and recognition (icdar), tunis, tunisia, 2015. [31] c. guérin, c. rigaud, a. mercier, f. ammar-boudjelal, k. bertet, a. bouju, j.-c. burie, g. louis, j.-m. ogier and a. revel, “ebdtheque: a representative database of comics,” in 12th international conference on document analysis, aug 2013, washington d.c., 2013. [32] aizawa-yamasaki lab, “manga109,” [online]. available: http://www.manga109.org. [accessed 29 04 2019]. [33] f. schroff, d. kalenichenko and j. philbin, “facenet: a unified embedding for face recognition and clustering,” in ieee computer society conference on computer vision and pattern recognition, 2015. [34] m. iyyer, v. manjunatha, a. guha, y. vyas, j. boyd-graber, h. d. iii and l. davis, “the amazing mysteries of the gutter: drawing inferences between panels in comic book narratives,” in ieee conference on computer vision and pattern recognition, 2017. [35] “tensorflow/models/research/object_detection/,” github, [online]. available: https://github.com/tensorflow/models/tree/master/research/ object_detection. [accessed 25 february 2019]. [36] w. liu, d. anguelov, d. erhan, c. szegedy, s. reed, c.-y. fu and a. c. berg, “ssd: single shot multibox detector,” in european conference on computer vision, amsterdam, the netherlands, 2016. [37] j. redmon, s. divvala, r. girshick and a. farhadi, “you only look once: unified, real-time object detection,” in ieee conference on computer vision and pattern recognition, las vegas, nv, usa, 2016. [38] j. redmon and a. farhadi, yolov3: an incremental improvement, 2018. [39] w. abdulla, “mask r-cnn for object detection and instance segmentation on keras and tensorflow,” github repository, 2017. [40] k. he, x. zhang, s. ren and j. sun, “deep residual learning for image recognition,” in ieee conference on computer vision and pattern recognition, las vegas, 2016. [41] godanimereviews, “difference and origin of manga, manhua and manhwa,” 19 august 2018. [online]. available: https://godanimerevie ws.com/difference-origin-manga-manhua-manhwa/. [accessed 28 april, 2019]. [42] l. graillat, “america vs japan: the influence of american comics on manga,” refractory: a journal of entertainment media, vol. 10, 2006. [43] c. sager, “what's up with manga? a comics fan's deep dive,” 19 march 2012. [online]. available: http://geekout.blogs.cnn.com/2012/03/29/ whats-up-with-manga-a-comics-fans-deep-dive/. [accessed 28 april 2019]. [44] m. sandler, a. howard, m. zhu, m. zhu and l.-c. chen, “mobilenetv2: inverted residuals and linear bottlenecks,” in ieee/cvf conference on computer vision and pattern recognition, salt lake city, usa, 2018. http://www.manga109.org/ https://github.com/tensorflow/models/tree/master/research/object_detection https://github.com/tensorflow/models/tree/master/research/object_detection https://godanimereviews.com/difference-origin-manga-manhua-manhwa/ https://godanimereviews.com/difference-origin-manga-manhua-manhwa/ http://geekout.blogs.cnn.com/2012/03/29/whats-up-with-manga-a-comics-fans-deep-dive/ http://geekout.blogs.cnn.com/2012/03/29/whats-up-with-manga-a-comics-fans-deep-dive/ international journal on advances in ict for emerging regions 2011 04 (02) : 24 36 trust management in cloud computing: a critical review mohamed firdhous, osman ghazali and suhaidi hassan 1 abstract—cloud computing has been attracting the attention of several researchers both in the academia and the industry as it provides many opportunities for organizations by offering a range of computing services. for cloud computing to become widely adopted by both the enterprises and individuals, several issues have to be solved. a key issue that needs special attention is security of clouds, and trust management is an important component of cloud security. in this paper, the authors look at what trust is and how trust has been applied in distributed computing. trust models proposed for various distributed system has then been summarized. the trust management systems proposed for cloud computing have been investigated with special emphasis on their capability, applicability in practical heterogonous cloud environment and implementabilty. finally, the proposed models/systems have been compared with each other based on a selected set of cloud computing parameters in a table. index terms—cloud computing, trust, trust management, trust models i. introduction istributed systems like peer-to-peer systems, grid, clusters and cloud computing have become very popular among users in the recent years. users access distributed systems for different reasons such as downloading files, searching for information, purchasing goods and services or executing applications hosted remotely. with the popularity and growth of distributed systems, service providers make new services available on the system. all these services and service providers will have varying levels of quality and also, due to the anonymous nature of the systems, some unscrupulous providers may tend to cheat unsuspecting clients. hence it becomes necessary to identify the quality of services and service providers who would meet the requirements of the customers [1]. manuscript received on february 27, 2011. accepted july 16, 2011. mohamed firdhous is a senior lecturer attached to the faculty of information technology, university of moratwua, sri lanka. (email: firdhous@itfac.mrt.ac.lk & mfirdhous@internetworks.my) dr. osman ghazali is a senior lecturer and the head of school of computing, college of arts and sciences of the universiti utara malaysia. (email: osman@uum.edu.my) prof. suhaidi hassan is an associate professor and the asst. vice chancellor of the college of arts and sciences of the universiti utara malaysia. he is the team leader of the internetworks research group. (email: suhaidi@uum.edu.my) in this paper the authors take a look at the trust and trust management systems along with the trust models developed for distributed systems. then a critical look at the trust development and management systems for cloud computing systems reported in literature in the recent times has been taken with special reference to the pros and cons of each proposal. ii. cloud computing cloud computing has been called the 5 th utility in line of electricity, water, telephony and gas [2]. the reason why cloud has been nomenclature with such a name is that cloud computing has been changing the way computer resources have been used up to now. until the development of cloud computing, computing resources were purchased outright or leased in the form of dedicated hardware and software resources. cloud computing has brought a paradigm change in how computing resources have been purchased. with the advent of cloud computing, users can use the services that have been hosted on the internet without worrying about whether they have been hosted or managed in such a manner that the customers have to pay only for the services they consumed as in the case of making use of other services. cloud providers host their resources on the internet on virtual computers and make them available to multiple clients. multiple virtual computers can run on one physical computer sharing the resources such as storage, memory, the cpu and interfaces giving the feeling to the client that each client has his own dedicated hardware to work on. virtualization thus gives the ability to the providers to sell the same hardware resources among multiple clients. this sharing of the hardware resources by multiple clients help reduce the cost of hardware for clients while increasing profits of providers. accessing or selling hardware in the form of virtual computers is known as infrastructure as service (iaas) in the cloud computing terminology [3]. once a client has procured infrastructure from a service provider, he is free to install and run any operating system platform and application on it. other kinds of services that are made available via the cloud computing model are platform as a service (paas) and software as a service. figure 1, shows the architecture of a typical cloud computing system. under paas, the development platform in the form of an operating system has been made available where customers can configure the environment to suit their requirements and install their development tools [5]. paas helps developers d trust management in cloud computing: a critical review 25 september 2011 international journal on advances in ict for emerging regions 04 develop and deploy applications without the cost of purchasing and managing the underlying hardware and software. paas provides all the required facilities for the complete life cycle of building and delivering web applications. thus paas usually offers facilities for application design, application development, testing, deployment and hosting as well as application services such as team collaboration, web service integration and marshalling, database integration, security, scalability, storage, persistence, state management, application versioning, application instrumentation and developer community facilitation. saas is the cloud model where an application hosted by a service provider on the internet is made available to users in a ready to use state. sass eliminates the requirement of installation and maintenance of the application in the user‘s local computer or server in his premises [5]. saas has the advantage of being accessible from any place at any time, no installation or maintenance, no upfront cost, no licensing cost, scalability, reliability and flexible payment schemes to suit the customer‘s requirements. iii. trust and trust management the trust and reputation have their origin in the social sciences that study the nature and behavior of human societies [6]. trust has been studied by researchers in diverse fields such as psychology, sociology and economics [7]. psychologists study trust as a mental attitude and focus on what happens in a person‘s mind when he/she trusts or distrusts someone [8]. based on this notion, several cognitive trust models have been developed [9-12]. sociologists approach to trust as a social relationship between people. social context of trust has been commonly employed in multi agent systems and social networks [7,13-14]. the similarity between multi agent system and a social network are exploited in these works as agents and people behave in a similar fashion interacting with, gathering information from and modeling each other for developing trust in each other. economists perceive trust in terms of utility [15]. game theory has been one of the most popular tools used by experts in the computer field to study how users develop trust using different strategies [16-17]. the prisoner‘s dilemma is the commonly used scenario to study this scenario [18-19]. researchers in computer sciences have exploited the benefit of all these studies as they provide vital insight into human behavior under various circumstances [13, 20-21]. the role of trust and reputation in open, public distributed systems such as e-commerce, peer to peer networks, grid computing, semantic web, web services and mobile networks have been studied by several researchers [22-25]. although the rich literature available on trust from diverse fields is of great benefit to computer scientists, it has the drawback of presenting a complex and confusing notion for trust. this is mainly due to the reason that there is no common agreement of a single definition for what trust is? it can be seen that different researchers have defined trust as attitudes, beliefs, probabilities, expectations, honesty and so on. even if different disciplines and researchers look at trust from different angles, it is possible to identify some key factors that are common to everything. they are;  trust plays a role only when the environment is uncertain and risky.  trust is the basis based on which certain decisions are made.  trust is built using prior knowledge and experience.  trust is a subjective notion based on opinion and values of an individual.  trust changes with time and new knowledge while experience will have overriding influence over the old ones.  trust is context-dependent.  trust is multi-faceted. mcknight and chervany have identified 16 characteristics of trust and grouped them under five groups. they are,  competence; competent, expert, dynamic  predictability; predictable  benevolence; good (or moral), good-will benevolent (caring), responsive  integrity; honest, credible, reliable, dependable  other; open, careful (or safe), shared understanding, personally attractive [8]. de oliveira and maziero have classified trust relations into hierarchical trust, social groups and social networks. fig. 1. cloud computing architecture 26 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 hierarchical trust considers all relationships in a hierarchical manner and represented by a tree organization where nodes represent individuals and edges represent the trust degrees between the pair of nodes. any two nodes can define a trust degree between them through transitivity through other nodes [26]. zhang et al., have classified the trust functions based on the following four dimensions [27].  subjective trust vs. objective trust  transaction-based vs. opinion-based  complete information vs. localized information  rank-based vs. threshold-based capability of an entity's trustworthiness being measured objectively against a universal standard, results in objective trust. if the trust being measured depends on an individual‘s tastes and interest, the resulting trust is called subjective trust. decisions made based on the individual transactions and their results is known as transaction based trust, whereas the trust built based on just opinion of the individuals, is opinion based trust. if the trust building operation requires information from each and every node, it is called, complete information and it is known as either global trust function or complete trust function. if the information collected only from one‘s neighbors, it is called, localized information trust function. if the trust worthiness of an entity is ranked from the best to worst, it is rank based trust whereas the trust declared yes or no depending on? preset trust threshold is known as threshold based trust. iv. trust models several models have been developed by researchers for the purpose of building practical trust systems in distributed systems. this section takes a brief look at some of the commonly used trust models. a. cuboidtrust cuboidtrust is a global reputation-based trust model for peer to peer networks. it takes three factors namely, contribution of the peer to the system, peer‘s trustworthiness in giving feedback and quality of resources to build four relations. then it creates a cuboid using small cubes whose coordinates (x,y,z) where z – quality of resource, y – peer that stores the value and x – the peer which rated the resource and denoted by px,y,z. the rating is binary, 1 indicating authentic and (–1) indicating inauthentic or no rating. global trust for each peer has been computed using power iteration of all the values stored by the peers [28]. b. eigentrust eigentrust assigns each peer a unique global trust value in a p2p file sharing network, based on the peer‘s history of uploads. this helps to decrease the downloading of inauthentic files. local trust value sij has been defined sij = sat(i,j) − unsat(i,j), where sat(i,j) denotes the satisfactory downloads by i from j and unsat(i,j) is the unsatisfactory downloads by i from j. power iteration is used to compute the global trust for each peer [29]. c. bayesian network based trust management (bnbtm) bnbtm uses multidimensional application specific trust values and each dimension is evaluated using a single bayesian network. the distribution of trust values is represented by beta probability distribution functions based on the interaction history [30]. trust value of peer i is given by, (1) where and and are number of interactions with outcome and respectively. represent shipping goods, shipping lower quality goods and not shipping any goods and represent the converse. d. grouprep grouprep is a group based trust management system. this classifies trust relationships in three levels namely, trust relationships between groups, between groups and peers and only between peers [31]. trust of group i held by group j is given by: where and are utility and cost respectively assigned by nodes in group j to nodes in group i. is defined as the minimum trust value along the most trustworthy reference path. e. antrep antrep algorithm is based on swarm intelligence. in this algorithm, every peer maintains a reputation table similar to distance vector routing table. the reputation table slightly differs from the routing table in the sense that (i) each peer in the reputation table corresponds to one reputation content; (ii) the metric is the probability of choosing each neighbor as the next hop whereas in the routing table it is the hop count to destinations. both forward ants and backward ants are used for finding reputation values and propagating them. if the reputation table has a neighbor with the highest reputation, a unicast ant is sent in that direction. if no preference exists, broadcast ants are sent along all the paths [32]. once the required reputation information is found, a backward ant is generated. when this ant travels back, it updates all the reputation tables in each node on its way. { (2) = trust management in cloud computing: a critical review 27 september 2011 international journal on advances in ict for emerging regions 04 f. semantic web zhang et al., have presented a trust model which searches all the paths that connect the two agents to compute the trustworthiness between those two agents. for each path the ratings associated with each edge are multiplied and finally all the paths are added to calculate the final trust value [33]. the weight of the path i (wi) is calculated using; (3) where n – no. of paths between agents p and q di – no. of steps between p and q on the i th path. mi – q‘s immediate friend or neighbor on the i th path. (m – set of q‘s friends or neighbors) this gives a higher weight to shorter paths. if agent p and agent q are friends then p  q, or neighbors then p  q then p‘s trust in q can be computed directly. otherwise, (4) where reliability factor denotes to which degree i believes in j‘s words or opinions. g. global trust several authors have presented methods that compute an improved global trust value for selecting trusted source peer in peer to peer systems [34-36]. the global trust value for node i, ti is defined as: (5) where cki is the local trust value from peer k towards peer i and tk is the global trust value of peer k. h. peer trust this is reputation-based trust supporting framework. this includes a coherent adaptive trust model for quantifying and comparing the trustworthiness of peers based on a transactionbased feedback system. it introduces three basic trust parameters namely feedback a peer receives from other peers, the total number of transactions a peer performs, the credibility of the feedback sources and two adaptive factors that are transaction context factor and the community context factor in computing trustworthiness of peers, then it combines these factors to compute a general trust metric [37]. i. patrol-f patrol-f incorporates many important concepts for the purpose of computing peer reputation. the main components used in computing peer trust are: direct experiences and reputation values, the node credibility to give recommendations, the decay of information with time based on a decay factor, first impressions and a node system hierarchy [38]. it uses three fuzzy subsystems: 1. the first is used to set the importance factor of an interaction and related decisions. to decide and choose which data is critical or indispensable, or which data is needed more quickly, is a concept close to humans that fuzzy logic can model. 2. then there is the region of uncertainty where an entity is not sure whether to trust or not (when the reputation of a host between the absolute mistrust level φ , and the absolute trust level θ ). fuzzy techniques are effectively applied in this region. 3. finally, for the result of interaction (ri) value, fuzzy logic can be used to capture the subjective and humanistic concept of four level “good” or “better” and “bad” or “worse” interaction. ri is the result of several concepts effectively combined to produce a more representative value. the decay factor τ is calculated based on the difference of a host‘s values of ris between successive interactions. j. trust evolution wang et al., have presented a trust evolution model for p2p networks. this model uses two critical dimensions, experience and context to build trust relationships among peers. it builds two kinds of trust: direct trust and recommendation trust quantifies trust within the interval [0,1] [39]. direct trust (dt) between two peers is computed using the last n interactions between those entities. recommended trust is calculated using recommendations from other peers and the previous interactions with the recommending peers. k. time-based dynamic trust model (tdtm) tdtm is an ant colony based system that identifies the pheromone and the trust and the heuristic and the distance between two nodes. the trust value calculated by this model depends on the frequency of interaction where the trust value increases with frequent interactions and lowers as the interactions goes down [40]. trust-pheromone between nodes i and j at time (t +1) is defined as: (6) where ρ is the trust dilution factor and στij(t) is the additional intensity at each inter-operation between entities. 28 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 στij(t) is defined as: if the trust value pij(t) between nodes i and j at time t is greater than a certain threshold r, they can validate each other‘s certificate, otherwise not. l. trust ant colony system (tacs) tacs is based on the bio-inspired algorithm of ant colony system. in this model pheromone traces are identified with the amount of trust a peer has on its neighbors when supplying a specific service. it computes and selects both the most trustworthy node to interact and the most trustworthy path leading to that peer. each peer needs to keep track of the current topology of the network as every peer has its own pheromone traces for every link. ants travel along every path searching building the most trustworthy path leading to the most reputable server [41]. ants stop the search once they find a node that offers the service requested by the client and the pheromone traces belonging to the current path leading to it are above the preset threshold, otherwise they would follow on further selecting a neighbor that has not been visited yet. m. trummar (trust model for mobile agent systems based on reputation) trummar is a general model for the calculation of reputation values and the determination of trust decisions. trummar identifies three types of nodes from who it can receive trust values. they are neighbors, friends and strangers. neighbors are the trusting other hosts on its own network that are under the same administrative control, friends are the hosts from different networks that are under different, but trusted administrative control and strangers are the hosts that are willing to volunteer information but not neighbors or friends [42]. the trust value for y in x is calculated as follows: where represents the reputation value being calculated. represents the reputation value last calculated, modified to account for the time lapsed. weighted sum of reputation reported by neighbors. weighted sum of reputation reported by friends. weighted sum of reputation reported by strangers. i , j and l are weighing factors which depend on the reputation of the individual neighbors, friends, and strangers in the host space, respectively. a, b, c, and d are weighing factors for the respective reputation of with respect to self, neighbors, friends and strangers in the agent space and a > b > c > d. reputation values are restricted to values between 0 and k, i.e n. patrol (comprehensive reputation-based trust model) patrol is a general purpose reputation based trust model for distributed computing. patrol is an enhancement over trummar. this model is based on multiple factors such as reputation values, direct experiences, trust in the recommender, time dependence of the trust value, first impressions, similarity, popularity, activity, cooperation between hosts, and hierarchy of host systems. the decision to interact with another host depends on two factors namely, the trust in the competence of a host and the trust in the host‘s credibility to give trusted advice. the trust in the competence of a host is calculated from the direct interactions and this is the confidence that the other host would be able to complete the intended task to the initiator host‘s expectations. the trust in a host‘s ability to give trusted advice is the confidence that the host gives consistent and credible advice and feedback. the overall trust value is a combination of the weighted values calculated for different factors calculated independently [43]. the operation of the model is as given below: 1. host x wants to interact with host y. 2. x calculates the time since it interacted last with y, if this time is smaller than a predetermined threshold, it will decay the stored trust value compare against a predetermined threshold. if larger than the threshold, it will interact with y, otherwise not. 3. if the last interaction time was larger than the threshold, it will involve other trusted hosts in its calculation of trust value for y. if not, 4. queried hosts will decay their stored trust value for y and send it along with their reputation vectors. 5. x will calculate the trust for y and check against the threshold. if the trust value is greater than the threshold, it will interact with y, otherwise no interaction. = (8) (8) { if i and j interact at time t 0 otherwise (7) trust management in cloud computing: a critical review 29 september 2011 international journal on advances in ict for emerging regions 04 o. meta-tacs meta-tacs is an extension of the tacs algorithm developed by the [41]. they have extended the tacs model by optimizing the working parameters of the algorithm using genetic algorithms [44]. p. catrac (context-aware trustand role-based access control for composite web services) role-based access control (rbac) and trust-based access control (tbac) have been proposed to address threats to security in single web service scenarios. but these solutions fail to provide the required security level in situations related to composite web services. catrac has been proposed as a security framework related to composite web services [45]. catrac combines both rbac and tbac in order to arrive at an optimum solution. three conditions must be satisfied to gain access to a specific web service. they are:  client attributes must be authenticated by the web service provider.  client‘s global role must be valid and contains the right permissions.  client‘s trust level must be equal or greater than the threshold level set for the particular service. a trusted third party called the role authority issues, signs and verifies the roles assigned to the clients. trust levels are expressed as a vector ranging from 0 to 10, indicating the fully distrusted to the fully trusted respectively. five (5) indicates a neutral or uncertainty level which is commonly assigned to new clients. catrac is made up of three entities, namely role authority, servers and clients. clients accumulate trust points when their behavior is considered good and otherwise they lose trust points. also, clients trust level is decayed to the neutral value gradually with time, if no interaction takes place. trust level is decayed using the following formulae. (9) if the current trust level is above the neutral trust level. otherwise. (10) where – decayed trust level for client c – current trust level for client c – neutral trust level t – time elapsed memos – memory factor (constant) q. bayesian network -based trust model bayesian network–based trust model computes trust values by combining multiple input attributes [46]. in this model, the different capabilities of providers such as the type of the file, quality of the file, download speed etc. also, it looks at the contextual representation of trust values. that is, if two agents compute the trust values, they can trust each other‘s recommendation and if the agents use different criteria, they may not trust the each other‘s recommendation even if both are truthful. in this system each peer identified as an agent develops a naïve bayesian network for each provider it has interacted with. each bayesian network has a root node t with two branches named ―satisfying‖ and ―unsatisfying‖, denoted by 1 and 0, respectively. the agents overall trust in the provider‘s competencies represented by p(t=1), which is the ratio of interactions with satisfactory results out of all the interactions with the same provider. on the other hand p(t=0) is the ratio of unsatisfactory results under the same criteria. hence: p(t=1) + p(t=0) = 1 (11) depending on the results of the previous interactions, the agent creates a conditional probability in the form of p(file type = "music" | t = 1) or p(download speed = "high" | t = 1) for each quality attribute such as file type, file quality and speed. these conditional probability values are stored in a table called the conditional probability table (cpt). finally the provider‘s trustworthiness in different aspects such as p(t = 1 | file type ="music" and download speed = "high") is computed by combing the conditional probability values stored in the cpt using the bayes rule. this combined trustworthiness value is the overall trust score of the provider for the given attribute(s) or aspect(s). the models discussed above have been proposed for different types of distributed systems such as clusters, grids and wireless sensor networks. but none of the above models has been tested on the cloud computing environment. hence an extensive evaluation of these models needs to be carried out to understand the advantages and disadvantages of these models for use in cloud computing. the authors propose to carry out this kind of evaluation of these models in future work. next section takes an in depth look at the trust models proposed for cloud computing. v. trust in cloud computing security is one of the most important areas to be handled in the emerging area of cloud computing. if the security is not handled properly, the entire area of cloud computing would fail as cloud computing mainly involves managing personal sensitive information in a public network. also, security from the service providers point also becomes imperative in order to protect the network, the resources in order to improve the robustness and reliability of those resources. trust 30 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 management that models the trust on the behavior of the elements and entities would be especially useful for the proper administration of cloud system and cloud services. several leading research groups both in academia and the industry are working in the area of trust management in cloud computing. this section takes an in depth look at the recent developments in this area with the objective of identifying and categorizing them for easy reference. khan and malluhi have looked at the trust in the cloud system from a users perspective. they analyze the issues of trust from what a cloud user would expect with respect to their data in terms of security and privacy. they further discuss that what kind of strategy the service providers may undertake to enhance the trust of the user in cloud services and providers. they have identified control, ownership, prevention and security as the key aspects that decide users‘ level of trust on services. diminishing control and lack of transparency have identified as the issues that diminishes the user trust on cloud systems. the authors have predicted that remote access control facilities for resources of the users, transparency with respect to cloud providers actions in the form of automatic traceability facilities, certification of cloud security properties and capabilities through an independent certification authority and providing security enclave for users could be used to enhance the trust of users in the services and service providers [47]. zhexuan et al., have taken a look at the security issues saas might create due to the unrestricted access on user data given to the remotely installed software [48]. the authors have presented a mechanism to separate software from data so that it is possible to create a trusted binding between them. the mechanism introduced involves four parties namely the resource provider, software provider, data provider and the coordinator. the resource provider hosts both data and software and provides the platform to execute the software on data. the software provider and data provider are the owners of the software and data respectively. the coordinator brings the other parties together while providing the ancillary services such as searching for resources and providing an interface to execute the application on the data. the operation of the model is as follows: software provider and data provider upload their resources to the resource provider. these resources will be encrypted before stored and the key will be stored in the accountability vault module of the system. a data provider searches for and finds the required software through a coordinator and then runs the software on the data uploaded to the resource provider‘s site. once the execution has started an execution reference id is generated and given to the data provider. when the execution of the software is over, the results are produced only on the data provider‘s interface which can be viewed, printed or downloaded. data provider will then pay for the service that will be split between the software provider and resource provider. an operation log has been created and posted to the software provider without disclosing the data provider‘s identity or the content on which software was run. this helps the software provider know that his software has been used and the duration of use. even though the authors claim that this model separates the software and data, there is no assurance that the software cannot make a copy while the data is being processed as only the algorithm or description of the software is provided to the data owner. without the source code, there is no assurance that the code will not contain any malicious code hidden inside. also, since the software runs on data owner‘s rights and privileges, the software would have complete control over data. this is a security threat and the audit trail even if it is available, will not detect any security breaches. the authors do not address the question of trust on the proposed platform as this would be another application or service hosted on the cloud. both application providers and data providers need some kind of better assurance as now they are entrusting their data and software to a third party software. sato et al., have proposed a trust model of cloud security in terms of social security [49]. the authors have identified and named the specific security issue as social insecurity problem and tried to handle it using a three pronged approach. they have subdivided the social insecurity problem in to three sub areas, namely; multiple stakeholder problem, open space security problem and mission critical data handling problem. the multiple stakeholder problem addresses the security issues created due to the multiple parties interacting in the cloud system. as per the authors, three parties can be clearly identified. they are namely, the client, the cloud service providers and third parties that include rivals and stakeholders in business. the client delegates some of the administration/operations to cloud providers under a service level agreement (sla). even if the client would like to have the same type of policies that it would apply if the resources were hosted on site on the delegated resources, the provider‘s policy may differ from that of the client. the providers are bound only by the sla signed between the parties. the sla plays the role of glue between the policies. also the authors opine that once the data is put in the cloud it is open for access by third parties once authenticated by the cloud provider. the open space security problem addresses the issue of loss of control on where the data is stored and how they are physically managed once control of data is delegated to the cloud provider. they advice to encrypt the data before transferring, converting the data security problem to a key management problem as now the keys used for encryption/decryption must be handled properly. the mission critical data handling problem looks at the issue of delegating the control of mission critical data to a service provider. they advice not to delegate control of this trust management in cloud computing: a critical review 31 september 2011 international journal on advances in ict for emerging regions 04 data but to keep them in a private cloud in a hybrid setup, where the organization have unhindered control. however setting up of a private cloud may not be an option to small and medium sized organizations due to the high costs involved. hence enhancement of security of the public cloud is the only option to serve everybody. authors have developed a trust model named ‗cloud trust model‘ to address the problems raised above. two more trust layers have been added to the conventional trust architecture. these layers have been named as the internal trust layer and the contracted trust layer. the internal trust layer acts as the platform to build the entire trust architecture. it is installed in the in house facilities and hence under the control of the local administration. id and key management are handled under the internal trust. also any data that is considered critical or needs extra security must be stored under this layer. contracted trust has been defined as the trust enforced by an agreement. a cloud provider places his trust upon the client, based on the contract that is made up of three documents known as, service policy/service practice statement (sp/sps), id policy/id practice statement (idp/idps) and the contract. level of trust required can be negotiated by parties depending on the level of security needed for the data. a cloud system thus installed is called a secure cloud by the authors. li et al., propose a domain-based trust model to ensure the security and interoperability of cloud and cross-clouds environment and a security framework with an independent trust management module on top of traditional security modules [50]. they also put forward some trust based security strategies for the safety of both cloud customers and providers based on this security model. a cloud trust model based on the family gene technology that is fundamentally different from the public key infrastructure based trust models has been proposed by wang et al.,. the authors have studied the basic operations such as user authentication, authorization management and access control and proposed a family-gene based model for cloud trust (fbct) integrating these operations [51-52]. manuel et al., have proposed trust model that is integrated with care resource broker [53]. the proposed trust model can support both grid and cloud systems. the model computes trust using three main components namely, security level evaluator, feedback evaluator and reputation trust evaluator. security level evaluation has been carried out based on authentication type, authorization type and self security competence mechanism. multiple authentication, authorization mechanism and self security competence mechanisms are supported. depending on the strength of individual mechanism, different grades are provided for trust value. feedback evaluation also goes through three different stages namely feedback collection, feedback verification and feedback updating. the reputation trust evaluator computes the trust values of the grid/cloud resources based on their capabilities based on computational parameters and network parameters. finally the overall trust value has been computed taking the arithmetic sum of all the individual trust values computed. shen et al., and shen and tong have analyzed the security of cloud computing environment and described the function of trusted computing platform in cloud computing [54-55]. they have also proposed a method to improve the security and dependability of cloud computing integrating the trusted computing platform (tcp) into the cloud computing system. the tcp has been used in authentication, confidentiality and integrity in cloud computing environment. finally the model has been developed as software middleware known as the trusted platform software stack (tss). alhamad et al., have proposed a sla based trust model for cloud computing. the model consists of the sla agents, cloud consumer module and cloud services directory [56]. the sla agent is the core module of the architecture as it groups the consumers to classes based on their needs, designs sla metrics, negotiates with cloud providers, selects the providers based on non functional requirements such as qos, and monitors the activities for the consumers and the sla parameters. cloud consumer module requests the external execution of one or more services. cloud services directory is the one where the service providers can advertise their services and consumers seek to find the providers who meet their functional requirements such as database providers, hardware providers, application providers etc., the authors have proposed only the model and no implementation or evaluation has been developed or described. hence the each and every module will have to be evaluated for their functionality and the effectiveness and finally the overall model will have to be evaluated for its effectiveness. yong et al., have proposed a model called a multi-tenancy trusted computing environment model (mtcem) for cloud computing [57]. mtcem has been proposed to deliver trusted iaas to customers with a dual level transitive trust mechanism that supports a security duty separation function simultaneously. since cloud facilities belong to multiple stakeholders such as cloud service providers (csp) and customers, they belong to multiple security domain and server different security subjects simultaneously. the different stakeholders may be driven by different motives such as best service, maximization of the return on investment and hence may work detrimental to the other party involved. hence cloud computing should have the capability to compartmentalize each customer and csp and support security duty separation defining clear and seamless security responsibility boundaries for csp and customers. mtcem has been designed as two-level hierarchy transitive trust chain model which supports the security duty separation and supports three types of distinct stakeholders namely, csp, customers and auditors. in this model, csp assume the responsibilities to keep infrastructures trusted 32 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 while the customer assumes responsibility starting from the guest os which installed by the customer on the virtual machines provided by the csp. the auditor monitors the services provided by the csp on behalf of the customers. the authors have implemented a prototype system to prove that mtcem is capable of being implemented on commercial hardware and software. but no evaluation of the prototype on performance has been presented. yang et al., have studied the existing trust models and firewall technology. the authors have found that all the existing trust models ignore the existence of firewall in a network [58]. since firewall is an integral and important component of any corporate security architecture, this non inclusion of firewall is a huge shortcoming. the authors have proposed a collaborative trust model of firewall-through based on cloud theory. this paper also presents the detailed design calculations of the proposed trust model and practical algorithms of measuring and updating the value of dynamic trust. the model has the following advantages compared to other models:  there are different security policies for different domains.  the model considers the transaction context, the historical data of entity influences and the measurement of trust value dynamically.  the trust model is compatible with the firewall and does not break the firewall‘s local control policies. fu et al., have studied the security issues associated with software running in the cloud and proposed a watermarkaware trusted running environment to protect the software running in the cloud [59]. the proposed model is made up of two components namely the administrative center and the cloud server environment. the administrative center embeds watermark and customizes the java virtual machines (jvm) and the specific trusted server platform includes a series of cloud servers deployed with the customized jvms. only specific and complete java programs are allowed to run on the jvms while rejecting all the unauthorized programs like invasion programs. the main advantage of this approach is that it introduces watermark aware running environment to cloud computing. ranchal et al., have studied the identity management in cloud computing and proposed a system without the involvement of a trusted third party [60]. the proposed system that is based on the use of predicates over encrypted data and multi-party computing is not only capable of using trusted hosts but also untrusted hosts in the cloud. since the proposed approach is independent of a third party, it is less prone to attack as it reduces the risk of correlation attacks and side channel attacks, but it is prone to denial of service as active bundle may also be not executed at all in the remote host. takabi et al., have proposed a security framework for cloud computing consisting of different modules to handle security and trust issues of key components [61]. the main issues discussed in the paper are identity management, access control, policy integration among multiple clouds, trust management between different clouds and between cloud providers and users. the framework identifies three main players in the cloud. they are cloud customers, service integrators and service providers. the service integrator plays the role of the mediator who brings the customers and service providers together. service integrator facilitates collaboration among different service providers by composing services to meet the customer requirements. it is the responsibility of the service integrator to establish and maintain trust between provider domains and providers and customers. the service integrator discover the services from service providers or other service integrators, negotiate and integrate services to form collaborating services that will be sold to customers. the service integrator module is composed of security management module, trust management module, service management module and heterogeneity management module. the heterogeneity management module manages the heterogeneity among the service providers. in addition to the above modules there are other minor modules that handle small but important tasks. in overall this is a very comprehensive framework. but the authors have not discussed the interoperability issue of each component in the framework or implemented a prototype to evaluate the function and efficiency of the components or the overall framework. table 1 summarizes the proposed cloud computing trust management systems under different cloud computing parameters. from this table it is evident that most of the models proposed remain short of implementation and only a few have been simulated to prove the concept. also, there is no single model that meets all the requirements of a cloud architecture especially the identity management, security of both data and applications, heterogeneity and sla management. also none of these systems have been based on solid theoretical foundation such as the trust models have been discussed in section iv. trust management in cloud computing: a critical review 33 september 2011 international journal on advances in ict for emerging regions 04 table i summary and comparison of cloud computing trust management systems support across multiple heterogeneous clouds work type identity mgmt/ authentication data security cloud layer sla support heterogenei ty support* implemented comments [47] discussed discussed yes no no concrete proposal. only discussed the issues. [48] complete platform no yes saas no yes no only a mechanism has been proposed. no implementation or evaluation carried out. [49] social security based discussed discussed discussed no no no concrete proposal. only discussed the issues. [50] domain based no no saas paas iaas no yes no model has been tested using simulation. [51 52] family gene based discussed no no no no model has been tested using simulation. [53] integrated with care resource broker yes yes no yes no model has been tested using simulation. [54 55] built on trusted platform service yes yes iaas no yes no only a model has been proposed. [56] no no yes yes no only a model has been proposed. [57] built on trusted computing platform no no iaas no no prototype implemented concept has been proved with a prototype. [58] domain based no no no yes no model has been tested using simulation. [59] watermark based security no no saas no no prototype implemented concept has been proved with a prototype. [60] based on active bundles scheme yes no no yes prototype implemented concept has been proved with a prototype. [61] yes no saas paas iaas no yes no only a model has been proposed 34 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 vi. conclusions cloud computing has been the new paradigm in distributed computing in the recent times. for cloud computing to become widely adopted several issues need to be addressed. cloud security is one of the most important issues that has to be addressed. trust management is one of the important component in cloud security as cloud environment will have different kinds of users, providers and intermediaries. proper trust management will help the users select the provider based on their requirements and trust worthiness. also, trust management would help the providers select the clients who are trustworthy to serve. in the paper, a comprehensive survey has been carried out on the trust management systems implemented on distributed systems with a special emphasis cloud computing. there are several trust models proposed for distributed systems. these models were mainly proposed for systems like clusters, grids and wireless sensor networks. these models have not been used or tested in cloud computing environments. hence the suitability of these models for use in cloud computing cannot be recommended without an extensive evaluation. the authors propose to evaluate these models in future work. the trust management systems proposed for cloud computing have been extensively studied with respect to their capability, their applicability in practical heterogonous cloud environment and their implementabilty. the results have been presented in table for easy reference. during the evaluation of these systems, it was found that none of the proposed systems is based on solid theoretical foundation and also does not take any quality of service attribute for forming the trust scores. hence solid theoretical foundation for building trust systems for cloud computing is necessary. the theoretical basis required can be achieved by adapting the trust models proposed for other distributed systems. references [1] sheikh mahbub habib, sebastian ries, and max mühlhäuser, "cloud computing landscape and research challenges regarding trust and reputation," in symposia and workshops on ubiquitous, autonomic and trusted computing, xi'an, china, 2010, pp. 410-415. [2] rajkumar buyya, chee shin yeo, srikumar venugopal, james broberg, and ivona brandic, "cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility," journal of future generation computer systems, vol. 25, no. 6, pp. 599616, june 2009. [3] radu prodan and simon ostermann, "a survey and taxonomy of infrastructure as a service and web hosting cloud providers," in 10th ieee/acm international conference on grid computing, banff, ab, canada, 2009, pp. 17-25. [4] christian vecchiola, suraj pandey, and rajkumar buyya, "highperformance cloud computing: a view of scientific applications," in 10th international symposium on pervasive systems, algorithms, and networks (ispan), kaohsiung, taiwan, 2009, pp. 4-16. [5] michael boniface et al., "platform-as-a-service architecture for realtime quality of service management in clouds," in fifth international conference on internet and web applications and services (iciw), barcelona, spain, 2010, pp. 155-160. [6] han yu, zhiqi shen, chunyan miao, cyril leung, and dusit niyato, "a survey of trust and reputation management systems in wireless communications," proceedings of the ieee, vol. 98, no. 10, pp. 17551772, october 2010. [7] zaobin gan, juxia he, and qian ding, "trust relationship modelling in e-commerce-based social network," in international conference on computational intelligence and security, beijing, china, 2009, pp. 206210. [8] d harrison mcknight and norman l chervany, "conceptualizing trust: a typology and e-commerce customer relationships model," in 34th hawaii international conference on system sciences, island of maui, hi, usa, 2001. [9] wei wang and guo sun zeng, "bayesian cognitive trust model based self-clustering algorithm for manets," science china information sciences, vol. 53, no. 3, pp. 494–505, 2010. [10] mario gómez, javier carbó, and earle clara benac, "a cognitive trust and reputation model for the art testbed," inteligencia artificial. revista iberoamericana de inteligencia artificial (in english), vol. 12, no. 39, pp. 29-40, 2008. [11] huangmao quan and jie wu, "catm: a cognitive-inspired agentcentric trust model for online social networks," in ninth annual ieee international conference on pervasive computing and communications (percom), seattle, wa, usa, 2011. [12] cristiano castelfranchi, rino falcone, and giovanni pezzulo, "trust in information sources as a source for trust: a fuzzy approach," in proceedings of the second international joint conference on autonomous agents and multiagent systems (aamas '03), melbourne, australia, 2003, pp. 89-96. [13] stefano de paoli et al., "toward trust as result: an interdisciplinary approach," proceedings of alpis, sprouts: working papers on information systems, vol. 10, no. 8, 2010. [14] masoud akhoondi, jafar habibi, and mohsen sayyadi, "towards a model for inferring trust in heterogeneous social networks," in second asia international conference on modelling & simulation, kuala lumpur, malaysia, 2008, pp. 52-58. [15] ram alexander menkes, "an economic analysis of trust, social capital, and the legislation of trust," ghent, belgium, llm thesis 2007. [16] jie zhang and robin cohen, "design of a mechanism for promoting honesty in e-marketplaces," in 22nd conference on artificial intelligence (aaai), ai and the web track, vancouver, british columbia, canada, 2007. [17] jie zhang, "promoting honesty in electronic marketplaces: combining trust modeling and incentive mechanism design," waterloo, ontario, canada, phd theis 2009. [18] shashi mittal and kalyanmoy deb, "optimal strategies of the iterated prisoner's dilemma problem for multiple conflicting objectives," in ieee symposium on computational intelligence and games, reno, nv, usa, 2006, pp. 197 204. [19] jian zhou, jiangbo wang, rongshan liang, and yanfu zhang, "flexible service analysis based on the ―prisoner's dilemma of service"," in 6th international conference on service systems and service management (icsssm '09), xiamen, china, 2009, pp. 434 437. [20] hongbing huang, guiming zhu, and shiyao jin, "revisiting trust and reputation in multi-agent systems," in isecs international colloquium on computing, communication, control, and management, guangzhou, china, 2008, pp. 424-429. [21] lik mui, "computational models of trust and reputation:agents, evolutionary games, and social networks," boston, ma, usa, phd thesis 2002. [22] mohammad momani and subhash challa, "survey of trust models in different network domains," international journal of ad hoc, sensor & ubiquitous computing , vol. 1, no. 3, pp. 1-19, september 2010. [23] tzu yu chuang, "trust with social network learning in e-commerce," in ieee international conference on communications workshops trust management in cloud computing: a critical review 35 september 2011 international journal on advances in ict for emerging regions 04 (icc), capetown, south africa, 2010, pp. 1-6. [24] marcim adamski et al., "trust and security in grids: a state of the art," european union, 2008. [25] antonios gouglidis and ioannis mavridis, "a foundation for defining security requirements in grid computing," in 13th panhellenic conference on informatics ( pci '09), corfu, greece, 2009, pp. 180-184. [26] leonardo b de oliveira and carlos a maziero, "a trust model for a group of e-mail servers," clei electronic journal, vol. 11, no. 2, pp. 1-11, 2008. [27] qing zhang, ting yu, and keith irwin, "a classification scheme for trust functions in reputation-based trust management," in international workshop on trust, security, and reputation on the semantic web, hiroshima, japan, 2004. [28] ruichuan chen, xuan zhao, liyong tang, jianbin hu, and zhong chen, "cuboidtrust: a global reputation-based trust model in peer-to-peer networks," in autonomic and trusted computing. berlin / heidelberg: springer, 2007, vol. 4610, pp. 203-215. [29] sepandar d kamvar, mario t schlosser, and hector garcia-molina, "the eigentrust algorithm for reputation management in p2p networks," in proceedings of the 12th international conference on world wide web (www '03), budapest, hungary, 2003, pp. 640-651. [30] yong wang, vinny cahill, elizabeth gray, colin harris, and lejian liao, "bayesian network based trust management," in autonomic and trusted computing. berlin / heidelberg: springer, 2006, pp. 246-257. [31] huirong tian, shihong zou, wendong wang, and shiduan cheng, "a group based reputation system for p2p networks," in autonomic and trusted computing. berlin / heidelberg: springer, 2006, pp. 342-351. [32] wei wang, guosun zeng, and lulai yuan, "ant-based reputation evidence distribution in p2p networks," in fifth international conference grid and cooperative computing (gcc 2006), hunan, china, 2006, pp. 129 132. [33] yu zhang, huajun chen, and zhaohui wu, "a social network-based trust model for the semantic web," in autonomic and trusted computing. berlin / heidelberg: springer, 2006, pp. 183-192. [34] fajiang yu, huanguo zhang, fei yan, and song gao, "an improved global trust value computing method in p2p system," in autonomic and trusted computing. berlin / heidelberg: springer, 2006, pp. 258267. [35] weijie wang, xinsheng wang, shuqin pan, and ping liang, "a new global trust model based on recommendation for peer-to-peer network," in international conference on new trends in information and service science, beijing, china, 2009, pp. 325-328. [36] xueming li and jianke wang, "a global trust model of p2p network based on distance-weighted recommendation," in ieee international conference on networking, architecture, and storage, hunan, china, 2009, pp. 281-284. [37] xiong li and liu ling, "peertrust: supporting reputation-based trust for peer-to-peer electronic communities," ieee transactions on knowledge and data engineering, vol. 16, no. 7, pp. 843-857, july 2004. [38] ayman tajeddine, ayman kayssi, ali chehab, and hassan artail, "patrol-f a comprehensive reputation-based trust model with fuzzy subsystems," in autonomic and trusted computing. berlin / heidelberg: springer, 2006, pp. 205-216. [39] yuan wang, ye tao, ping yu, feng xu, and jian lü, "a trust evolution model for p2p networks," in autonomic and trusted computing. berlin / heidelberg: springer, 2007, pp. 216-225. [40] zhuo tang, zhengding lu, and kai li, "time-based dynamic trust model using ant colony algorithm," wuhan university journal of natural sciences, vol. 11, no. 6, pp. 1462-1466, 2006. [41] felix gomez marmol, gregorio martinez perez, and antonio f gomez skarmeta, "tacs, a trust model for p2p networks," wireless personal communications, vol. 51, no. 1, pp. 153-164, 2009. [42] ghada derbas, ayman kayssi, hassan artail, and ali chehab, "trummar a trust model for mobile agent systems based on reputation," in the ieee/acs international conference on pervasive services (icps 2004), beirut, lebanon, 2004, pp. 113-120. [43] ayman tajeddine, ayman kayssi, ali chehab, and hassan artail, "patrol: a comprehensive reputation-based trust model," international journal of internet technology and secured transactions, vol. 1, no. 1/2, pp. 108-131, august 2007. [44] felix gomez marmol, gregorio mrtinez perez, and javier g marinblazquez, "meta-tacs: a trust model demonstration of robustness through a genetic algorithm," autosof journal of intelligent automation and soft computing, vol. 16, no. x, pp. 1-19, 2009. [45] cesar ghali, ali chehab, and ayman kayssi, "catrac: contextaware trustand role-based access control for composite web services," in 10th ieee international conference on computer and information technology, bradford, england, 2010, pp. 1085-1089. [46] yao wang and julita vassileva, "bayesian network-based trust model," in ieee/wic international conference on web intelligence (wi 2003), halifax, canada, 2003, pp. 372 378. [47] khaled m khan and qutaibah malluhi, "establishing trust in cloud computing," it professional, vol. 12, no. 5, pp. 20 27, 2010. [48] zhexuan song, jusus molina, and christina strong, "trusted anonymous execution: a model to raisetrust in cloud," in 9th international conference on grid and cooperative computing (gcc), nanjing, china, 2010, pp. 133 138. [49] hiroyuki sato, atsushi kanai, and shigeaki tanimoto, "a cloud trust model in a security aware cloud," in 10th ieee/ipsj international symposium on applications and the internet (saint), seoul, south korea, 2010, pp. 121 124. [50] wenjuan li, lingdi ping, and xuezeng pan, "use trust management module to achieve effective security mechanisms in cloud environment," in international conference on electronics and information engineering (iceie), vol. 1, kyoto, japan, 2010, pp. 14-19. [51] tie fang wang, bao sheng ye, yun wen li, and yi yang, "family gene based cloud trust model," in international conference on educational and network technology (icent), qinhuangdao, china, 2010, pp. 540 544. [52] tie fang wang, bao sheng ye, yun wen li, and li shang zhu, "study on enhancing performance of cloud trust model with family gene technology," in 3rd ieee international conference on computer science and information technology (iccsit), vol. 9, chengdu, china, 2010, pp. 122 126. [53] paul d manuel, thamarai selve, and mostafa ibrahim abd-ei barr, "trust management system for grid and cloudresources," in first international conference on advanced computing (icac 2009), chennai, india, 2009, pp. 176-181. [54] zhidong shen, li li, fei yan, and xiaoping wu, "cloud computing system based on trustedcomputing platform," in international conference on intelligent computation technology and automation (icicta), vol. 1, changsha, china, 2010, pp. 942 945. [55] zhidong shen and qiang tong, "the security of cloud computing system enabled by trusted computing technology," in 2nd international conference on signal processing systems (icsps), vol. 2, dalian, china, 2010, pp. 11-15. [56] mohammed alhamad, tharam dillon, and elizabeth chang, "slabased trust model for cloud computing," in 13th international conference on network-based information systems, takayama, japan, 2010, pp. 321 324. [57] xiao yong li, li tao zhou, yong shi, and yu guo, "a trusted computing environment model in cloudarchitecture," in ninth international conference on machine learning and cybernetics (icmlc), vol. 6, qingdao, china, 2010, pp. 2843-2848. [58] zhimin yang, lixiang qiao, chang liu, chi yang, and guangming wan, "a collaborative trust model of firewall-through based on cloud 36 mohamed firdhous, osman ghazali and suhaidi hassan international journal on advances in ict for emerging regions 04 september 2011 computing," in 14th international conference on computer supported cooperative work in design (cscwd), shanghai, china, 2010, pp. 329 334. [59] junning fu, chaokun wang, zhiwei yu, jianmin wang, and jia guang sun, "a watermark-aware trusted running environment for software clouds," in fifth annual chinagrid conference (chinagrid), guangzhou, china, 2010, pp. 144 151. [60] rohit ranchal et al., "protection of identity information in cloud computing without trusted third party," in 29th ieee international symposium on reliable distributed systems, new delhi, india, 2010, pp. 1060-9857. [61] hassan takabi, james b.d joshi, and gail joon ahn, "securecloud: towards a comprehensive security framework for cloud computing environments," in 34th annual ieee computer software and applications conference workshops, seoul, south korea, 2010, pp. 393 398.  international journal on advances in ict for emerging regions 2011 04 (02) : 12 23  abstract—cloud computing is one of today’s most exciting technologies because of its capacity to lessen costs associated with computing while increasing flexibility and scalability for computer processes. during the past few years, cloud computing has grown from being a promising business idea to one of the fastest growing sectors of the it industry. but on the other hand, it organizations have expressed concerns about critical issues such as security that accompany the widespread implementation of cloud computing. security, in particular, is one of the most debated issues in the field of cloud computing and several enterprises look at cloud computing warily due to projected security risks. also, there are two other issues. they are the reliability and availability of the cloud which are as important as security. although each of those three issues is associated with usage of the cloud, they will have different degrees of importance. examination of the benefits and risks of cloud computing is necessary for a full evaluation of the viability of cloud computing. this article reviews issues and challenges of cloud computing’s reliability, availability and security (ras). beginning with a brief discussion on virtualization technology, a key element of cloud infrastructure, it examines issues facing in cloud ras fields. then, it addresses the challenges and problems in cloud computing ras. it also examines intrusion detection methods and outlines counter measures to improve cloud ras. index terms— cloud computing, virtualization, reliability, availability, security, threat, intrusion, countermeasure. i. introduction loud computing is based on virtualization technology, in which each user uses a virtual machine. virtualization technology includes two levels of virtual machines, which are vms (virtual machine) and hypervisors. the hypervisor has administrative rights to control vms. but virtualization has some issues that could endanger system performance. from a cloud viewpoint, there are many important dimensions of virtualization technology to consider, but the hypervisor’s reliability, availability and serviceability (ras) is an important aspect of virtualization technology and requires special attention. for example, from security viewpoint, if someone gets control of the hypervisor, he will gain full manuscript received on march 14, 2011. accepted july 16, 2011. farzad sabahi is with the azad university, school of computer and electrical engineering, iran. (e-mail: f.sabahi@ieee.org). control of all vm that are under the hypervisor control. consequently, cloud technology has some problems in ras that it has inherited from virtualization technology. one such problem involves overflows of system due to excessive combination of vm to a physical server that affects availability and reliability. because of these issues, cloud systems are vulnerable to traditional attacks as well as new attacks that some of them have migrated from virtualization. privacy is another issue which can decrease virtualization and cloud’s overall performance, because the vms are located practically in a multitenant environment, thus making it possible for a user to access a past tenant’s information in the same space. although the use of encryption algorithms could be a good solution for the user or cloud provider by making the appropriate arrangements, such as using advanced algorithms to wipe the user's data for avoidance from information leaks. but the use of encryption algorithms has problems as well, such as the inability of owners to recover their data when they lose the decoding key. as we know in the world of network computing, there is a variety of attacks that can cause serious problems for internetbased technologies such as cloud. this can make the cloud vulnerable to some attacks, like the dos family (denial of service) which aims to make the target server inaccessible to legitimate users. the cloud can be a victim of dos attacks, but it can also be part of the solution by allocating more resources to a user under a dos attack in order to prevent the user from crashing. therefore, applying countermeasures to deal with security problems in the cloud is critical, whereas one of the main countermeasures is controlling access control in the cloud. generally, it seems the security countermeasures in the access control part of the cloud often involve prevention—for example, management of permissions for the account to determine access to different levels of virtualization in the cloud. besides security, cloud providers are also responsible for reliability and availability, because all users expect the highest level of qos (quality of service). the cloud providers use some solutions such as partitioning to achieve maximum performance. but according to whether the cloud is based on public, private, or hybrid, the management and control of these performance parameters from ras viewpoint will vary. cloud computing reliability, availability and serviceability (ras): issues and challenges farzad sabahi c cloud computing reliability, availability and serviceability (ras): issues and challenges 13 september 2011 international journal on advances in ict for emerging regions 04 this paper is organized as follows:  section 2 provides a general overview of cloud computing.  section 3 describes the virtualization technology that is the basis of cloud computing.  section 4 overviews information security policies in cloud computing.  section 5 comprehensively reviews the ras factor in virtualization.  section 6 elaborates on the ras factor with particular attention to cloud computing.  section 7 covers intrusion detection systems in cloud computing.  section 8 describes security management and countermeasures to take against intrusions.  finally, section 9 concludes the paper. ii. cloud computing: an overview cloud computing is a network-based environment that focuses on sharing computations and resources. clouds are internet-based and try to reduce complexity for clients by allowing them to virtually store data, applications and technologies at a remote site rather than keeping voluminous amounts of information on personal computers or on local servers. this is accomplished using virtualization technologies in combination with self-service abilities for computing resources via network infrastructure, especially the internet. in cloud environments, multiple virtual machines are hosted on the same physical server as infrastructure. customers only pay for what they use and avoid having to pay for local resources such as storage and infrastructure. cloud computing, then, ultimately refers to both applications delivered as services over the internet, and the hardware and systems software in the datacenters that provide those services. currently, three types of cloud environments exist: public, private, and hybrid. a public cloud is a standard model in which providers make several resources such as applications and storage available to the public. public cloud services may be free, or may come with an associated fee. in public cloud environments, applications are run externally by large service providers, offering some benefits over private cloud environments. for a private cloud, a business has internal services that are not available to other people. essentially, the term “private clouds” is a marketing term for an architecture that provides hosted services for a particular group of people behind a firewall. a hybrid cloud is an environment in which a company provides and controls some resources internally and provides other services for public use. in this type, the cloud provider has a service whereby a private cloud can be created (and is only accessible by internal staff; it would be protected by firewalls from outside access), and a public cloud environment for access by external users is also created. cloud is a style of computing where massively scalable and flexible it-related abilities are provided “as services” to external customers using internet technologies. cloud providers offer various services in a xaas collection that can offer the following main services. a. saas saas (software as a service): software as a service is a well-known service that offers network-hosted applications. saas is a software application delivery model by which cloud providers develop web-based software applications and then host and operate those applications over the infrastructure (usually the internet) for use by their customers. as a result, cloud customers do not need to buy software licenses or additional equipment and they typically only pay fees (also referred to as annuity payments) periodically to use the cloud provider’s web-based software [3]. there are two major kinds of saas: business applications that offer software which helps various businesses perform their tasks quickly and accurately. other type of saas is development tools which consist of software that is used mainly for product development and management. b. paas paas (platform as a service): in this category of service, cloud users are given a platform [4]. they can use it as their application platform independent of using their own local machine for installing those platforms. c. iaas iaas (infrastructure as a service): infrastructure as a service is a provision model in which an organization outsources the equipment used to support operations, including storage, hardware, servers and networking components. the service provider owns the equipment and is responsible for housing, running, and maintaining it. the client typically pays on a per-use basis [5]. iaas is sometimes referred to as hardware as a service (haas). d. other types of services it is important to mention that iaas, paas, and saas are the three main categories of cloud computing services and that fig. 1: unified ontology of cloud computing. 14 farzad sabahi international journal on advances in ict for emerging regions 04 september 2011 the other types of cloud services are subsidiary branches of these three major categories. a typical cloud computing ontology for some of these categories is illustrated in figure 1. other cloud services include the following:  daas (database as a service): database systems provide a user friendly interface for accessing and managing data. this type of service is very useful like many financial, business, and internet-based applications [6].  naas (network as a service): with naas, providers offer customers a virtualized network [7].  ipmaas (identity and policy management as a service): with this service, providers deliver identity and policy management to customers [8]. e. cloud customers nowadays, many it-related clients decide to use cloud computing for their own purposes. these can be divided into three main groups: regular customers, academics, and enterprises. 1) regular customers this group of users merely uses the services from the cloud [1]. they are not concerned with high performance; rather, they concentrate on the service and the privacy of their data on the cloud. saas is the most appropriate service for this group [9]. 2) academics academics usually have good networks and they often prefer to use the infrastructure that they already have to improve the performance of computations and resolve grid limits. for this group, cloud computing provides convenient access to a high-performance cluster or grid-based computation infrastructure and eliminates the need to buy new hardware. 3) enterprises the it industry reaps the most considerable benefits of cloud computing [10]. many companies have decided to enter cloud-related industries or use cloud services to reduce costs and improve performance in their own (it-related or non-itrelated) businesses. a) small and mid-size enterprises lower costs are attractive, particularly for small enterprises that simply cannot afford the cost of solutions [4]. with distributed processing, small enterprises can afford industrystandard pcs and network servers but not expensive supercomputers. in addition, they can use cloud software instead of local software or abstruse infrastructure, which can reduce the cost of purchasing and maintaining the required software. for mid-size businesses that are growing, cloud computing can also provide a cost-effective and efficient path to enterprise-grade software and infrastructure [11]. b) large-scale enterprises for these enterprises, lower costs are not as important as privacy. thus, large companies often create their own clouds or are skeptical about moving to the cloud. however, privacy of information is the most important issue, and most large companies have already spent significant amounts of money on their local systems [1]. nowadays, large-scale enterprises often collect and analyze large amounts of data to derive business insights. however, there are at least two challenges to meet the increasing demand. first, the growth in the amount of data far surpasses the growth in the computation power of uniprocessors [12]. the growing gap between the supply and demand of computation power forces enterprises to parallelize their application codes. unfortunately, parallel programming is both time-consuming and error-prone. second, the emerging cloud computing pattern imposes constraints on the underlying infrastructure, which forces enterprises to rethink their application architectures. iii. virtualization virtualization is one of the most important elements of cloud computing. it is a technology that helps it organizations optimizes their application performance in a cost-effective manner, but it can also present application delivery challenges that cause security difficulties. most of the current interest in virtualization revolves around virtual servers, in part because virtualizing servers can result in significant cost savings. the phrase virtual machine refers to a software computer that, like a physical computer, runs an operating system and applications. an operating system on a virtual machine is called a guest operating system. a layer called a vmm (virtual machine monitor), or hypervisor, creates and controls the virtual machine's other virtual systems. figure 2 illustrates a typical virtual machine architecture foundation in a cloud environment. a. hypervisor a hypervisor (see figure 2) is one of many virtualization techniques allowing multiple operating systems, termed guests, to run concurrently on a host computer using a feature called hardware virtualization. it is so named because it is conceptually one level higher than a supervisor. fig. 2. typical virtual machine architecture [1]. cloud computing reliability, availability and serviceability (ras): issues and challenges 15 september 2011 international journal on advances in ict for emerging regions 04 the hypervisor presents a virtual operating platform to the guest operating systems and also monitors the execution of them. multiple instances of a variety of operating systems may share the virtualized hardware resources. hypervisors are installed on server hardware dedicated to run guest operating systems [13]. iv. information security policies cloud computing raises a range of important policy issues which include issues of privacy, security, anonymity, telecommunications capacity, government surveillance, reliability, and liability, among others [1]. but the most important of these issues is; security and how it is assured by the cloud provider. in addition, according to this fact that security effect on computing performance, cloud providers have to find a way to combine security and performance. for example for enterprises, the most important problem is security and privacy because they may store their sensitive data in cloud. for them, high performance processing may not be as critical as for academia users. to satisfy enterprise needs, the cloud provider has to ensure robust security and privacy more than other needs. in cloud there are several security and privacy issues but in [14] there are the gartner’s seven well-known security issues which cloud clients should advert are listed below:  privileged user access: sensitive data processed outside the enterprise brings with it an inherent level of risk because outsourced services bypass the "physical, logical and personnel controls" it shops exert over in-house programs.  regulatory compliance: customers are ultimately responsible for the security and integrity of their own data, even when it is held by a service provider. traditional service providers are subjected to external audits and security certifications. cloud computing providers who refuse to undergo this scrutiny are "signaling that customers can only use them for the most trivial functions," according to gartner.  data location: when clients use the cloud, they probably will not know exactly where their data is hosted. distributed data storage is usually used by cloud providers, but this can cause lack of control and is not good for customers who have their data in a local machine before moving to the cloud.  data segregation: data in the cloud typically exists in a shared environment alongside data from other customers. encryption is effective but is not a cure-all. encryption and decryption is a classic way to cover security issues, but heretofore it could not ensure a perfect solution. while it is difficult to assure data segregation, customers must review the selected cloud’s architecture to ensure data segregation is properly designed and available but without data leakage. although data leakage has solution technology that named dlp.  recovery: if a failure occurs with the cloud, it is critical to completely restore client data. as clients prefer not to let a third-party control their data, this will cause an impasse in security policy in these challenging situations.  investigative support: cloud services are especially difficult to investigate because logging and data for multiple customers may be co-located and spread across an ever-changing set of hosts and data centers.  long-term viability: ideally, a cloud computing provider will never go bankrupt or be acquired by a larger company with new policies. however, clients must be sure that their data will remain available even after such an event. v. virtualization ras issues in a traditional environment consisting of physical servers connected by a physical switch, it organizations can get detailed management information about the traffic between the servers and the physical switch. unfortunately, that level of information management is not typically provided by a virtual switch. in such a scenario the virtual switch has links from the physical switch via the physical nic (network interface card) attached to virtual machines. the resultant is lack of visibility into the traffic flows between and among the virtual machines on the same physical level affects security and performance surveying. a potential problem also exists for virtualization when a provider combines too many virtual machines onto a physical server. this can result in performance problems caused by impact factors such as limited cpu cycles or i/o bottlenecks [15]. these problems can occur in a traditional physical server, but they are more likely to occur in a virtualized server because a single physical server is connected to multiple virtual machines all competing for critical resources. therefore, management tasks such as performance management and capacity planning management are more critical in a virtualized environment than in a similar physical environment. this means that it organizations must be able to continuously monitor the real-time utilization of both physical servers and virtual machines. this capability allows users to avoid both over and underutilization of server resources. in addition, they will able to reallocate resources based on changing business requirements. this capability also enables it organizations to implement policy-based remediation that helps them to ensure that their desired service levels are being met [16]. another challenge with virtualization is cloud organization management of virtual machines sprawl [17]. in virtualized environment with virtual machine sprawl, the number of virtual machines running in it increases because of unnecessary new virtual machines created rather than business necessity. virtual machine sprawl concerns include the overuse of infrastructure. to prevent virtual machine sprawl, a virtual machine manager should carefully analyze the need for all new virtual machines and ensure that unnecessary virtual machines migrate to other physical 16 farzad sabahi international journal on advances in ict for emerging regions 04 september 2011 servers. in addition, by migration, an unnecessary virtual machine will be able to move from one physical server to another with high availability and energy efficiency. determination of the virtual machine destination can be challenging; it is necessary to ensure that a migrated virtual machine keeps the same security, qos configurations and needed privacy policies. on the other hand, the destination must assure that all the required configurations of the migrated virtual machine are kept. a. virtual machine security and threats as illustrated in figure 2, there are at least two levels of virtualization which are virtual machines and the hypervisor. virtualization technique which is used in the virtual machines is not as new technology. unfortunately, it has several security issues which are now migrated to cloud technology and they are not good heritage for cloud. there are also other vulnerabilities and security issues which are exclusive or may have a more critical role in the cloud environment. as mentioned before, in the hypervisor, all users see their systems as self-contained computers isolated from other users, even though every user is served by the same machine. in this context, a virtual machine is an operating system that is managed by an underlying control program. hence there are various threats and attacks in this level, but some of them are important than others that mentioned below:  virtual machine-level attacks: the hypervisor or virtual machine technology used by cloud vendors are potential problems in multi-tenant architectures [18]. these technologies involve “virtual machines,” remote versions of traditional on-site computer systems, including the hardware and operating system. the number of these virtual machines can be expanded or contracted on the fly to meet demand, creating tremendous efficiencies [19].  cloud-provider vulnerabilities: these could be platformlevel vulnerabilities, such as sql-injections, or cross-site scripting vulnerabilities that exist in the cloud service layer and cause insecure environments.  expanded network-attack surface: the cloud user must protect the infrastructure used to connect and interact with the cloud, a task complicated by the cloud being outside the firewall in many cases [4].  authentication and authorization: the enterprise authentication and authorization framework does not naturally extend into the cloud. enterprises have to merge cloud security policies with their own security metrics and policies.  availability of the cloud provider: cloud providers guarantee that their servers’ uptime compares well with cloud users’ own data centers and cloud providers ensure the clients which providers can handle their applications. an enterprise must be assured that a cloud provider is faithfully running a hosted application and delivering valid results [4]. scheduled and unscheduled maintenance is another availability factor that exists and it can harm the availability ratio of the cloud provider. although regularly scheduled maintenance does not count as downtime, unscheduled maintenance increases downtime and affects availability [20].  lock-in: there seems to be a great deal of anxiety regarding lock-in in cloud computing. the cloud provider can encrypt user data in a particular format if a user decides to migrate to another vendor or a similar situation arises [21].  data control in cloud: for mid-size businesses used to having complete visibility and control over their entire it portfolio, moving even some components into the cloud can create operational “blind spots,” with little advance warning of degraded or interrupted service [11]. b. hypervisor security in a virtualization environment, there are several virtual machines that have independent security zones that are not accessible from other virtual machines that have their own zones. in a virtualization environment, a hypervisor has its own security zone and is the controlling agent for everything within the virtualization host. a hypervisor can touch and affect all of the virtual machine’s actions running within the virtualization host [22]. there are multiple security zones, but these security zones exist within the same physical infrastructure that, in a more traditional sense, generally only exists within a single security zone. this can cause security issues, as if an attacker is able to take control of a hypervisor, then the attacker has full control of all the works within the territory of the hypervisor. another major virtualization security concern is “escaping the virtual machine” or being able to reach the hypervisor at the virtual-machine level. this will become an even greater concern in the future as more apis (application program interface) are created for virtualization platforms [23]. thereupon, so undamaged controls are to disable the functionality within a virtual machine, and this can reduce performance and availability. 1) confronting against hypervisor security problems as mentioned before, hypervisors are management tools, and the main goal of creating this security zone is building a trust zone. other available virtual machines are under the approval of the hypervisor, and they can rely on it, as users are trusting that administrators of system will do what they can to do tasks properly. as for security characteristics, there are three major levels in the security management of hypervisors:  authentication: users have to authenticate their account properly using the appropriate standard and available mechanisms.  authorization: users must receive authorization, and they must have permission to do what they are trying to do.  networking: using mechanisms that assure a secure connection to communicate by using available cloud computing reliability, availability and serviceability (ras): issues and challenges 17 september 2011 international journal on advances in ict for emerging regions 04 administration applications that most likely launch and work in a different security zone than that of users. authentication and authorization are some of the most interesting auditing aspects of management because there are so many methods available to manage a virtual host auditing purpose [24]. the general belief is that networking is the most important issue in transactions between users and the hypervisor, but there is much more to virtualization security than just networking. networking plays a critical role in security, but it is not solely significant for ensuring security. it is just as important to understand the apis and basic concepts of available hypervisors and virtual machines, and how those management tools work [22]. if a security manager can address authentication, authorization, virtual hardware, and hypervisor security as well as networking security, cloud clients are well on the way to a comprehensive security policy [1, 22]. if a cloud provider at the virtualization level does not, or just depends on network security to do the tasks, then the implemented virtual environment is at risk and has poor security capability. it is a waste of money if a cloud provider spends too much money on creating a robust secure network and neglects communication among virtual machines and the hypervisor, as this can cause several problems for the provider as well as for the users. vi. cloud ras issues using cloud means, that applications and data will move under a third-party control. the cloud services delivery model will create clouds with virtual perimeters as well as a security model with responsibilities shared between the customer and the cloud service provider. this shared-responsibility model will bring new security management challenges to the organization’s it operations staff [25]. basically, the first question an information security officer must answer is whether he/she has adequate transparency with cloud services to manage the governance (shared responsibilities) and implementation of security management processes (preventive and detective controls) to ensure the business that the data in the cloud is appropriately protected. the answer to this question consists of two parts: what security controls must the customer provide over and above the controls inherent in the cloud platform and how should an enterprise’s security management tools and processes adapt to manage security in the cloud. both answers must be continually reevaluated based on the sensitivity of the data and the service-level changes over time [25]. a. data leakage basically, when moving to a cloud, there are two changes for customers’ data. first, the data will be stored away from the customer's locale machine. second, the data is moved from a single-tenant to a multitenant environment. these changes may raise an important concern called, data leakage. this has become one of the greatest organizational risks from the security standpoint [26]. virtually every government worldwide has regulations that mandate protections for certain data types [26]. the cloud provider should have the ability to map its policy to the security mandate the user must comply with and discuss the issues. 1) dlp nowadays, there is an interest in the use of data leakage prevention (dlp) applications to protect sensitive data with the appearance of cloud computing. to prevent data leakage, some companies have thought of dlp products. dlp products existed before cloud computing. these products aim to ensure data confidentiality and detect unauthorized access to data, but they are not intended to be used for ensuring the integrity or availability of data. as a result, experts don’t expect from dlp products to address data’s integrity or availability in any cloud model. if data is stored in a public cloud, because of its nature, using dlp products is worthless to protect the confidentiality of that data in all types of clouds. generally in saas and paas, because cloud clients do not have control over security management used by the cloud provider, discovery of the client’s data with dlp agents is not possible except when the provider puts this capacity into its service. however, it is possible by embedding dlp agents into virtual machines in iaas to achieve some control over associated data. in private clouds, the customer has direct control over the whole infrastructure; it is not a policy issue whether dlp agents are deployed in connection with saas, paas, or iaas services. however, it may very well be a technical issue whether dlp agents interoperate with saas or paas services as architected [27]. in a hybrid cloud, if service is iaas, the client could embed dlp agents for some control over data. b. privacy cloud clients’ data stores in data centers that cloud providers diffuse all over the globe within hundreds of servers that communicate through the internet have several wellknown potential risks within them. because cloud services are using the internet as their communication infrastructure, cloud computing involves several kinds of security risks [26]. cloud providers, especially iaas providers, offer their customers the illusion of unlimited computer, network, and storage capacity, often coupled with a frictionless registration process that allows anyone to begin using cloud services [28]. the relative anonymity of these usage models encourages spammers, malicious users and other hackers, who have been able to conduct their activities with relative impunity [29]. paas providers have traditionally suffered most from such attacks; however, recent evidence shows the hackers have begun to target iaas vendors as well [28]. as is clear in cloud-based services, a user's data is stored on the third-party’s storage location [1]. a service provider must implement sufficient security measures to ensure data privacy. generally, data encryption is a solution to ensure the privacy of the data in the databases against malicious attacks. therefore, encryption methods have significant performance 18 farzad sabahi international journal on advances in ict for emerging regions 04 september 2011 implications on query processing in clouds. integration of data encryption with data is useful to protect the user’s data against outside malicious attacks and to restrict the liability of the service provider. it seems protection from malicious users who might access the service provider's system is the final goal, but this is not enough when clients also prefer privacy protection from a accessing to their data by provider. any data privacy solution will have to use particular encryption, but this causes another availability issue: data recovery [30]. assume a user’s data is encrypted with a user-known key, and the user loses his/her key. how can the provider recover his/her data when it doesn’t know what the key is? if the user gives the provider authority to know the key, then this makes privacy by using a user-known encryption key useless. the simple way to solve this problem is to find a cloud provider which users can trust. this way is acceptable when data stored in the cloud is not very important. this method seems useful for enterprises with the maximum size of a small company which may decide to find trustable providers rather than finding a solution for the data recovery problem. for medium-sized companies to largesized companies, it is more critical to develop techniques and methods that enable query processing directly over encrypted data to ensure privacy from cloud providers [30]. if the service providers themselves are not trusted, protecting the privacy of users' data is a much more challenging issue. however, for those companies it seems using a private cloud is a wise solution. if data encryption is used as a wise solution for the data privacy problem, there are other issues in this context. one of the most important issues is ensuring the integrity of the data. both malicious and non-malicious users can cause compromise of the integrity of the users’ data when this happens and the client does not have any mechanism to analyze the integrity of the original data. hence, new techniques have to be applied to provide methods to check the integrity of users’ data hosted at the service provider side [8]. all encryption methods rely on secure and impressive key management architectures. one of the problems that can occur in an encrypted environment is encryption key management in the cloud. in the cloud environment several users may use their own encryption method, and managing these keys is another issue to address in the context of encrypted data. for example, if the cloud provides database service (daas), the cloud provider faces more challenges in key management architectures, such as generation, registration, storage, and update of encryption keys. 1) ras issues in database-based service: an example cloud systems provide an extremely attractive interface for managing and accessing data and have proven to be widely successful in many financial, business and internet applications. however, they have several serious limitations in database-based service such as the following which are mentioned in [6]:  database systems are difficult to scale: most database systems have hard limits beyond which they do not easily scale. once users reach these scalability limits, timeconsuming and expensive manual partitioning, data migration and load balancing are the only recourse.  database systems are difficult to configure and maintain: administrative costs can easily account to a significant fraction of the total cost of ownership of a database system. furthermore, it is extremely difficult for untrained professionals to get good performance out of most commercial systems.  diversification in available systems complicates selection: the rise of specialized database systems for specific markets complicates system selection, especially for customers whose workloads do not neatly fall into one category.  peak provisioning leads to unnecessary costs: database workloads are often tandem in nature hence they provision for the peak often results in an excess of resources during off-peak phases and thus causes unnecessary costs. c. data remanance data remanence is the residual physical representation of data that has been in some way erased. after storage media is erased, there may be some physical characteristics that allow data to be reconstructed [31]. as a result, any critical data must not only be protected against unauthorized access, but also it is very important that it is securely erased at the end of the data life cycle. basically, it organizations that have full control of their own servers use various available tools that give them the ability to destroy unwanted and important data for privacy and safety purposes. but when data is migrated to a cloud environment, they now have virtual servers that are controlled by a third party. as a solution, it organizations must choose cloud providers that can guarantee that all customer erased data is erased immediately and securely. a traditional solution to deleting data securely is overwriting, but this technique does not work without the collaboration of the cloud provider [4, 30]. in a cloud environment, customers can’t access the physical device or the data level. thus, there is only one solution: those customers encrypt their data with a confidential key that prevents reconstruction of the erased data from residual data. d. cloud security issues as mentioned before, the internet is the communication fig. 3. attack scenario within cloud. cloud computing reliability, availability and serviceability (ras): issues and challenges 19 september 2011 international journal on advances in ict for emerging regions 04 infrastructure for cloud providers that use the well-known tcp/ip protocol, which uses ip addresses to identify internet users. similar to a physical computer in the internet which has an ip address, a virtual machine in the internet also has an ip address. a malicious user, whether internal or external, like a legal user who exists in network, can find these ip addresses as well. in this case, a malicious user can find out which physical servers the victim is using, and implant a malicious virtual machine at that location from which to launch an attack [28]. because all users use the same infrastructure as the virtual machine, if a hacker steals a virtual machine or takes control of it, he also inherits the data within it. the hacker can then copy the data into his/her local machine before the cloud provider detects that the virtual machine is out of control; then the hacker can analyze the data, and may find valuable data afterward. 1) attacks in cloud nowadays, there are several kinds of attacks in the it world. basically, the cloud can give service to legal users, but it can also give service to users who have malicious purposes. a hacker can use a cloud to host a malicious application to achieve a task, which may be a ddos (distributed denial of service) attack against the cloud itself, or arranging an attack against another user in the cloud. for example, an attacker knows that his victim [30]. this situation is similar to this scenario in that both the attacker and the victim are in the same network, but with the difference that they use virtual machines instead of a physical network (figure 3). a) ddos attacks against cloud ddos attacks typically focus a high quantity of ip packets at specific network entry elements; usually any form of hardware that operates on a blacklist pattern is quickly overrun. in cloud computing, where the infrastructure is shared by a large number of clients, ddos attacks have the potential of much greater impact than they do against singletenant architectures. if the cloud does not have sufficient resources to provide services to its customers, the cause may be undesirable ddos attacks [30]. the traditional solution for this event is to increase the number of such critical resources. but a serious problem occurs when a malicious user deliberately performs a ddos attack using bot-nets. most network countermeasures cannot protect against ddos attacks, because they cannot stop the deluge of traffic, and typically cannot distinguish good traffic from bad traffic. ips (intrusion prevention systems) are effective if the attacks are identified and have pre-existing signatures, but are ineffective if there is legitimate content with bad intentions [27]. unfortunately, similar to ips solutions, firewalls are vulnerable and inefficient against ddos attacks because an attacker can easily bypass firewalls and ipss, because they are designed to transmit legitimate traffic, and attacks generate so much legitimate like traffic from so many distinct hosts that a server, or a cloud’s internet connection, cannot handle the traffic [27]. it may be more accurate to say that ddos protection is part of the network virtualization layer rather than server virtualization. for example, cloud systems that use virtual machines can be overcome by arp spoofing at the network layer; ddos protection is really about how to layer security across multivendor networks, firewalls, and load balances [32]. b) cloud against ddos attacks ddos attacks are one of the most powerful threats available in the world, especially when launched from a botnet with huge numbers of zombie machines. when a ddos attack is launched, it sends a heavy flood of packets at a web server from multiple sources. the cloud may be part of the solution; it’s interesting to consider that websites experiencing ddos attacks, which have limitations in server resources, can take advantage of using a cloud that provides more resources to tolerate such attacks [30]. cloud technology also offers the benefit of flexibility, with the ability to provide resources almost real-time as necessary and almost instantaneously to avoid site shutdown. vii. intrusion detection in cloud as we know, ids have been used widely to detect malicious behaviors in several types of network. ids management is an important capability for distributed ids solutions, which makes it possible to integrate and handle different types of sensors or collect and synthesize alerts generated from multiple hosts located in the distributed environment. facing new application scenarios in cloud computing, the ids approaches yield several problems since the operator of the ids should be the user, not the administrator of the cloud infrastructure. extensibility, efficient management and compatibility with virtualizationbased contexts need to be introduced into many existing ids implementations. additionally, the cloud providers need to enable possibilities to deploy and configure ids for the user. within this paper, we summarize several requirements for deploying ids in the cloud and propose an extensible ids architecture that is easily used in a distributed cloud infrastructure [33, 34]. a. intrusion detection at service level 1) ids in saas attacks on networks are a reality in the world. detecting and responding to those attacks is considered due diligence. the reality is that in saas, users will have no choice except to trust their providers to perform intrusion detection properly. some providers give their users the option of getting some system logs and users can use custom application for monitoring those data, but in reality, most intrusion detection activities must be done by the provider and the user can only report suspicious behavior for analysis. 2) ids in paas in paas, similarly to saas, most of the intrusion detection activities must be done by the cloud provider but with a little 20 farzad sabahi international journal on advances in ict for emerging regions 04 september 2011 difference. if intrusion detection systems are outside of the users’ application, they have no choice and must rely on the provider to implement ids. but paas configuration is more flexible than saas and users may have the choice to configure the security parameters of platforms that log on to a centralized place and users can incorporate intrusion detection performance [34, 35]. 3) ids in iaas iaas is the most flexible service for intrusion detection implementation. but the most important challenge in constructing a secure cloud-computing infrastructure is transparency. without it, the user cannot know if the cloud provider meets significant security requirements or not. moreover, the user cannot properly design application architecture to mitigate any risks that may exist. b. intrusion detection placement for operating intrusion detection in the cloud properly, the user must identify the possible and also proper places for hosting ids. in the traditional network, using intrusion detection allows the user to monitor, detect and alert about traffic that passes over the traditional network infrastructure [9, 36]. generally, there are some places in the network with more traffic than in other place (hotspots). placement of ids in the physical network part of the cloud is similar to a traditional network, because hotspots in both of them are the same. 1) in the virtual machine and network layer using intrusion detection in the virtual machine layer allows the user to monitor the system and detect and alert about issues that may arise. in addition, using intrusion detection to monitor the virtual network allows the user to monitor the network traffic between the virtual machines on the host, as well as the traffic between the virtual machines and the host. it should be noted that this network is different from traditional networks and that traffic never hits it [35]. 2) in the hypervisor layer as said before, the hypervisor presents to the guest operating systems a virtual operating platform and monitors how the guest operating systems are running [8]. deploying intrusion detection in the hypervisor allows the user to monitor everything that passes between the virtual machines. as illustrated in figure 4, the hyids runs inside the hypervisor. because the hypervisor interposes on all accesses between the guest kernel and the hardware, isis can monitor all operating system events and data structures for intrusions. c. intrusion detection techniques performance as is well known, intrusion detection has three wellknown main groups: host-based, network-based, and hybrid. this section discusses performance issues of intrusion detection techniques in the cloud, some of which are traditional solutions and some of which are special rectification solutions for use within the cloud. 1) traditional ids solutions in cloud a) host-based intrusion detection the first choice in intrusion detection is the traditional hids (host-based intrusion detection), which examines events and transmissions such as what file was accessed and what application was executed. this type of ids can be used on virtual machines as well as in the hypervisor level of cloud environments. using intrusion detection in the virtual machine layer allows the user to monitor the activity of the system and detect and alert about issues that may arise. at this level, the user can use an hids and have control over it. this type of ids can detect intrusion against his/her virtual machine. the provider may also deploy an hids in the hypervisor layer but only the provider is authorized to manage and configure it. in the hypervisor level, hids can also monitor traffic between virtual machines. the hids on the virtual machine would be used by the user of the cloud but the hids on the hypervisor level is for provider control; if the user wants to use the hypervisor intrusion detection data in his independent ids, he would have to coordinate with the provider. this issue is likely to pose difficulties because most cloud providers prefer not to share such data with customers due to privacy policies [26, 34]. while hids on the hypervisor level would be under the responsibility of the cloud provider, deploying and managing an hids on the virtual machine would be the user’s responsibility. b) network-based intrusion detection a network intrusion detection system (nids) is another traditional solution for performing security policies in computer networks. nidss work by examining network traffic but with this characteristic, only the cloud provider can deploy it. unfortunately, in cloud, because of the nature of nids, this type of ids has limitations. for example, it is unable to detect attacks within a virtual network that runs completely within the hypervisor. also, nids is useless in encrypted environments. this type of placement of ids is useful in detecting some attacks on the vms and hypervisor but it does have three important constraints. the first is that it is not useful when it comes to malicious activities within a vm, which is fulfilled completely in the hypervisor level. secondly, it has limited visibility into the host itself. thirdly, if the network traffic is encrypted by users, nids cannot decrypt the traffic for analysis [20, 34]. even if nids has all encryption keys used in the cloud, nids needs more computation resources to perform the decrypting. moreover, analyzing these data results in an increased cost of detection. 2) performance of traditional ids it seems that nids works better than hids but it must be considered that hidss are easy to implement while nids are difficult or at times impossible to fulfill in the cloud environment. in addition, in the cloud, nids falls completely into the area of the provider to operate and control. this paper cloud computing reliability, availability and serviceability (ras): issues and challenges 21 september 2011 international journal on advances in ict for emerging regions 04 has shown that cloud users need to think more about moving toward the cloud and also that cloud providers should give more attention to security matters. 3) hypervisor-based intrusion detection system another intrusion detection method is to use ids, which launches at the hypervisor layer but is not strictly a hids for the hypervisor, which is called hypervisor-based ids (hyids) [34] or isis ids (intrusion sensing and introspection system) [36]. one of the promising technologies in this method is the use of vm introspection. this type of ids allows users to monitor and analyze communications between vms, between the hypervisor and vm and within the hypervisor-based virtual network. the advantage of the hypervisor-based id is the availability of information, as it can see everything. the disadvantage is that the technology is new and users really need to know what they are looking for [21, 34]. there is a special type of intrusion detection in the hypervisor because of the level of accessing it contains and it has a good potential for improving the performance of intrusion detecting. as illustrated in figure 4, the hyids runs inside the hypervisor. the hypervisor can interpose on all accesses between the guest vm kernel and the hardware, while hyids can monitor all operating system events and data structures for intrusions. like nids, control and implementation of hyids is done entirely by the cloud provider [34, 37]. viii. countermeasures there are several traditional solutions to mitigate security problems that exist in the internet environment and the cloud infrastructure, but the nature of clouds causes some security problems that exist especially in cloud environments. on the other hand, there are traditional countermeasures against popular internet security problems that may be usable in clouds, but some of them must be improved or changed to use in cloud environments. a. access control to ensure the accessibility of authorized users the prevention of unauthorized access to information systems, formal procedures should be in place to control the allocation of access rights to services. the procedures should cover all stages in the lifecycle of user access, from the initial registration of new users to the final de-registration of users who no longer require access to information systems and services. special attention should be paid, where appropriate, to the necessity to control the allocation of privileged access rights, which allow users to override system controls [38]. the following are the six control statements [38]:  control access to information.  manage user access rights.  encourage good access practices.  control access to network services.  control access to operating systems.  control access to applications and systems. in the saas model, the cloud provider is responsible for managing all aspects of the network, server, and application infrastructure. in that model, since the application is delivered as a service to end users, usually via a web browser, networkbased controls are becoming less relevant and are augmented or superseded by user access controls, e.g., authentication using a one-time password [27, 38]. hence, customers should focus on user access controls (authentication, federation, privilege management, provisioning, etc.) to protect the information hosted by saas [39]. in the paas delivery model, the cloud provider is responsible for managing access control to the network, servers, and application platform infrastructure. however, the customer is responsible for access control to the applications deployed on a paas platform. access control to applications manifests as end user access management, which includes provisioning and authentication of users [28]. iaas customers are entirely responsible for managing all aspects of access control to their resources in the cloud. access to the virtual servers, virtual network, virtual storage, and applications hosted on an iaas platform will have to be designed and managed by the customer. in an iaas delivery model, access control management falls into one of the following two categories. access control management to the host, network, and management applications that are owned and managed by the cloud provider and user must manage access control to his/her virtual server, virtual storage, virtual networks, and applications hosted on virtual servers [30, 38]. b. incident countermeasure and response one of the important issues in cloud security, similar to other it fields, is finding problems and vulnerabilities that exist, but a more important issue is that the cloud provider has appropriate responses against all problems that it finds. basically, the cloud systems are built on a collection of storage and process engines, driven by a configurable distributed transaction coordinator. to achieve some important parameters such as flexibility, scalability and efficient usage of resources, cloud providers must face major challenges in the area of adaptability and workload. one of the main requirements of the cloud is the ability to be flexible; in the context of a cloud service, flexibility means dedicating resources where they are most needed [6]. this is particularly challenging in a database environment where there are large amounts of data that may need to be moved in order to reconcile data [6]. to allow high performance workloads to scale across multiple computing nodes, it is important for cloud provider to divide their data into partitions that maximize service performance. the main idea behind partitioning is to lessen the probability that a typical transaction has to access multiple nodes in cloud to compute its query. in migration, available methods must be able to predict adaptation time and try to avoid cloud node overload by some 22 farzad sabahi international journal on advances in ict for emerging regions 04 september 2011 procedure, such as partitioning, fragmenting, breaking big data packets in smaller pieces, and maintaining the ability to execute transactions while movement occurs [36]. to balance workloads on virtual machines properly, it is necessary to analyze and classify cloud providers resource requirements to decide how these can be allocated to virtual machines. c. security management in the cloud the relevance of various security management functions available for each cloud delivery model is dependent on the context of deployment models. as mentioned before in the introduction, there are several important parameters in cloud security management: availability management, access control management, vulnerability and problem management, patch and configuration management, countermeasure response, and cloud system use and access monitoring. thus, according to the type of service provided, the customer or the provider must manage some or all of them independently, or perhaps partially [30]. thus, if a cloud is a private cloud, then the cloud provider generally manages all mentioned functions. but if a cloud is a public or hybrid cloud, then who manages which aspect depends on the type of cloud and the service provided. for example, if a cloud is saas, then the customer must partially manage access control and monitor system use and access, and also must manage incident response, and the cloud provider must manage the other functions. in other types of clouds (paas and iaas), the functions are limited to customer applications deployed in paas or iaas. ix. conclusion as outlined in the article, cloud computing helps it enterprises to optimize and secure application performance in a cost effective manner. cloud-based applications are based on network software running on a virtual machine in a virtualized environment. in view of the vital role of the hypervisor in a virtualization system, security at this level of virtualization needs special consideration. generally, a virtual application relieves some of the management issues in enterprises because most of the maintenance, software updates, configuration and other management tasks are automated and centralized at the cloud provider’s datacenter. but this way for decentralized application and access creates its own set of challenges and security problems. there are, however, risks and hidden costs in managing cloud compliance. cloud providers often have several powerful servers and resources that provide appropriate services for their users, but the cloud is at risk to a degree similar to that of other internet-based technologies. unfortunately, there are some attacks for which no perfect defense exists such as a powerful dos attack. but as paper discussed in occurrence of dos attacks, cloud may be a good solution or mitigation because cloud providers can use mirrors or devote more resources to protecting against attacks. however this solution’s performance depends on provider facilities. issues introduced by this paper are the main reasons for the precaution exercised by many enterprises and even some ordinary users to the adaptation of cloud computing. but benefits of using cloud have caused some of enterprises to have a plan in which cloud computing is used for lesssensitive data, but they may have local machines to store data which are of greater sensitivity. should cloud providers wish for clients to store greater amounts of sensitive data in the cloud computing environment, improving security (and also, of course, client perception of that security) is paramount. whilst cloud computing is an important trend that keeps transforming and will continue to transform the it industry, it doesn’t mean that all business it needs should move to the cloud computing model. the key to successful cloud computing initiatives is achieving a balance between the business benefits and the hidden potential risks lurking on the path to implementation. references [1] "securing virtualization in real-world environments," white paper2009. [2] p. coffee, "cloud computing: more than a virtual stack," ed: salesforce.com. [3] software as a service. available: http://www.wikinvest.com/concept/software_as_a_service [4] g. reese, cloud application architectures: o’reilly media, inc.,, 2009. [5] n. mirzaei, "cloud computing," 2008. [6] "database as a service," mit-csail-tr-2010-014. [7] m. riccuiti. stallman: cloud computing is stupidity. available: http://news.cnet.com/8301-1001_3-10054253-92.html [8] n. antonopoulos and l. gillam, cloud computing: springer-verlag london limited, 2010. [9] k. jackson, "secure cloud computing: an architecture ontology approach," defense information systems agency2009. [10] r. raja and v. verma, "cloud computing: an overview," research consultant, iiit hyderabad. [11] d. rowe. (2011, the impact of cloud on mid-size businesses. available: http://www.macquarietelecom.com/hosting/blog/cloudcomputing/impact-cloudcomputing-midsize-businesses [12] s. hanna, "cloud computing: finding the silver lining," juniper networks2009. [13] cloud computing. available: http://en.wikipedia.org/wiki/cloud_computing [14] j. brodkin. (2008). gartner: seven cloud-computing security risks. available: http://www.networkworld.com/news/2008/070208cloud.html [15] j. metzler. (2009, virtualisation can make application delivery much, much harder but you can fight back! available: http://searchnetworking.techtarget.com.au/articles/33471virtualisation-can-make-application-delivery-much-much-harder-butyou-can-fight-back [16] "virtualization: the next generation of application delivery challenges." [17] (2011). what is cloud sprawl and why should i worry about it? available: http://www.cloudbusinessreview.com/2011/06/08/what-iscloud-sprawl-and-why-should-i-worry-about-it.html [18] r. chow, p. golle, m. jakobsson, e. shi, j. staddon, r. masouka, and j. molina, "controlling data in the cloud: outsourcing computation without outsourcing control," presented at the ccsw'09, chicago, illinois, usa., 2009. [19] d. talbot. (2009). vulnerability seen in amazon's cloud-computing. available: cloud computing reliability, availability and serviceability (ras): issues and challenges 23 september 2011 international journal on advances in ict for emerging regions 04 http://www.technologyreview.com/printer_friendly_article.aspx?id=23 792 [20] j. w. rittinghouse and j. f. ransome, cloud computing implementation, management, and security: taylor and francis group, llc, 2010. [21] p. sefton, "privacy and data control in the era of cloud computing." [22] texiwill. (2009). is network security the major component of virtualization security? available: http://www.virtualizationpractice.com/blog/?p=350 [23] d. e. y. sarna, implementing and developing cloud computing applications: taylor and francis group, llc, 2011. [24] t. ristenpart and e. al, "hey, you, get off of my cloud: exploring information leakage in third-party compute clouds," 2009. [25] s. k. tim mather, and shahed latif, cloud security and privacy: o’reilly media, inc., 2009. [26] c. almond, "a practical guide to cloud computing security," 2009. [27] cloud security. available: http://cloudsecurity.trendmicro.com/ [28] n. mead, e. hough, and t. sehny, "security quality requirements engineering (square) methodolgy," carnegie mellon software engineering institute. [29] k. k. fletcher, "cloud security requirements analysis and security policy development using a high-order object-oriented modeling," master of science, computer science, missouri university of science and technology, 2010. [30] f. sabahi, "analysis of security in cloud environments," presented at the international conference on computer science and information technology, chengdu, china, 2011. [31] p. r. gallagher, a guide to understanding data remanence in automated information systems: the rainbow books, 1991. [32] (2009). cloud computing. available: http://groups.google.com/group/cloudcomputing/browse_thread/thread/21e585b137125554 [33] s. roschke, f. cheng, and c. meinel, "intrusion detection in the cloud," presented at the eighth ieee international conference on dependable, autonomic and secure computing, chengdu, china, 2009. [34] f. sabahi, "intrusion detection techniques performance in cloud environments," presented at the international conference on computer design and engineering, kuala lumpur, malaysia, 2011. [35] p. cox, "intrusion detection in a cloud computing environment," 2010. [36] l. litty, "hypervisor-based intrusion detection," master of science, 2005. [37] l. ponemon, "security of cloud computing users," 2010. [38] (2010). security management in the cloud. available: http://mscerts.net/programming/security%20management%20in%20th e%20cloud.aspx [39] (2010). security management in the cloud access control. available: http://mscerts.net/programming/security%20management%20in%20th e%20cloud%20-%20access%20control.aspx ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2014 07 (2) 1 a game based learning approach to enrich special education in sri lanka nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe abstract— in this fast moving world, providing equal access to information and knowledge to everybody is restricted due to specific groups of students who are not capable to engage in learning in a regular manner because of their physical, mental or psychological disabilities. these kinds of students require individual attention and special assistance from their parents and teachers in learning process. they might benefit from some other special learning techniques provided as learning aids rather than conventional learning methods. since information and communication technology (ict) based learning is a novel approach which integrates learning with computing, this research is intended to explore the relevancy of ict based education for enhancing the learning effectiveness of students with special needs. a few game based activities have been developed considering both functional and non-functional requirements, gathered through discussions had with doctors and teachers in special education. these activities focus on basic concepts of three subject areas as colour, number and language. in order to evaluate the performance improvement, two tests were carried out as ‘pre-test’ and ‘post-test’ for a sample of students with the developed activities. number of levels completed by each student, number of mistakes made during the game and time taken to finish a game were measured throughout the user evaluation phase. with this experiment, it was proven that ict can be used as a driving tool to enhance the learning effectiveness in special education domain. index terms— autism, down syndrome, ict based education, learning disabilities, skill development, special needs students i. introduction the term ‘special educational needs’ is characterized by a collection of physical and mental disorders which affect the learning of students regardless of their intellectual abilities. they suffer from several emotional and behavioural disorders, developmental disorders and communication challenges that make it difficult for them to access education [1][2]. in this research, the focus is on students with intellectual disabilities such as autism spectrum disorder (asd), attention deficit hyperactivity disorder (adhd), down syndrome, auditory processing disorder and cerebral palsy. in accordance with the present circumstances in sri lanka, there is a considerable number of students who are suffering from such disabilities and they are does not have even reading and writing abilities [3]. according to [4], there are 17 special schools in sri lanka and the ratio of ‘student to teacher’ in government, private, and special schools is about 20:1. although those students could be improved through a specialized education, majority of them are not given such an education due to many reasons including economic barriers, lack of proper facilities, and lack of specialized teachers. ict has leaped beyond boundaries, especially in the education sector by opening new paths to access knowledge and information. new forms of ict based education systems have been emerged such as e-learning, game based learning, learning via virtual reality. consequently, there is a possibility of applying ict in special education. most of the countries in the world are using ict for special education in more effective and efficient ways [15], however, still it is in a very low state in sri lanka. although there are some specialized software materials which have been developed by other countries, sri lankan students are unable to use them because of the language barriers. therefore, the primary objective of this research is to investigate the possibility of introducing ict for special education in sri lanka and how effective it would be to the sri lankan society. since ict is still a new tendency for a country like sri lanka, it is necessary to have a proper background analysis and an investigation about the feasibility of ict usage within the discipline of special education. a game based learning approach is proposed through this research by developing number of computer based games and activities specially designed for students with special educational needs. these games have been developed covering three subject areas: colour, number and language. in order to eliminate the language barrier, it was intended to develop games in sinhala language. the effectiveness of the proposed solution is analysed by conducting a pre-test and a post-test. the evaluation was based on the number of steps each student proceeded with, number of mistakes made and average time taken to complete a single step. in addition to that, the type of the disorder was taken in to consideration for the analysis. in this paper, section ii summarizes a thorough investigation of previous efforts in using ict to help those with special educational needs. section iii introduces the methodology carried out to introduce ict based education in this research, while design considerations are discussed in section iv. implementation and evaluation of proposed solution are included in section v and vi respectively. next an analysis is done using the gathered data in section vii and finally conclusions are summarized while describing some possible future work related to this research in later sections. manuscript received may, 02, 2014. recommended by dr. hakim usoof on june 25, 2014. nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe are with the university of colombo school of computing, no. 35, reid avenue, colombo 07, sri lanka (e-mail: nadiratharanga@gmail.com, ishadiwijerathne@gmail.com, maduwijesooriya@gmail.com, atd@ucsc.lk, arw@ucsc.lk ) mailto:nadiratharanga@gmail.com mailto:ishadiwijerathne@gmail.com mailto:maduwijesooriya@gmail.com mailto:atd@ucsc.lk mailto:arw@ucsc.lk nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 2 ii. related work a. special education in sri lankan context in order to evaluate the current status and problems of teachers in special education, a survey has been carried out by the department of special needs education at the open university of sri lanka [5]. under their review assignment, eight aspects of education have been considered including curriculum design, content and skill development. according to the department of special needs education of open university of sri lanka, equal educational opportunities should be provided for all children in the country and therefore, giving educational opportunities for children with special educational needs is a responsibility of the whole society [5]. as mentioned in [5], when preparing learning modules, they are prepared first in english and then translated into sinhala and tamil. as a result, the translated modules become less comprehensive because the translations are done by people who do not know the subject. b. based on disability many research papers have been published on methods of utilizing ict to overcome the difficulties of the students with learning disorders. these students mostly lack the communicational and behavioural skills and slow in understanding what is taught in the classroom. the software “learning to shop” [6] assists the people with autism spectrum disorder (asd) to learn a shopping scenario. it consists of four stages including preparing the shopping list, preparation of the money needed with the use of a wallet, selection of the items in the list from the supermarket shelves, and the payment at the cashier. the first level of this software facilitates photographs and videos for low-functioning users who can understand visual representation. second level consists of pictures and animations while the third level is enriched with written words. fourth level is designed for higher-functioning students who can read words and sentences. the importance of this software is that it can be adapted for students with asd according to their level of disability. in fact, it is vital to have such a categorization in this type of software since all these students cannot be placed at the same level [7]. as most of these students lack behavioural skills, the suggested “visual schedules” [8] can be used to display planned activities using several symbols such as icons, images, photographs and actual objects. it helps students to understand the sequence of the steps of an activity. because of having both visual cues and verbal cues, it is very valuable for students with special needs. further, teachers use hardware devices to communicate with autistic students [9]. it is acknowledged in [10] that people with communication difficulties could be assisted by using gestures, a non electronic communication board, or an electronic voiceoutput device. personal support technologies, such as personal digital assistants (pdas) will be useful for learners with cognitive disabilities. specialized pda software which enables users to manage personal schedules is currently available. in [11], silva et al have focused only on the group who has autism spectrum disorder (asd) and identified the benefits of using ict to develop the communicative competences of those students using a novel multimedia platform called trocas. their focus on customizability is very important since this solution is for a special group of students, not for normal students. ease of adaptability to the target group, ease of being used by students and teachers, portability, configurability and adaptability according to the degree of severity of the target group are some of the features of trocas. this platform was designed to support photos, videos, audio and some more things as revealed on [11]. the message board is very useful since it helps these students to enhance their communication competences by exchanging their ideas. further, it does not require a special knowledge regarding the use of the computer, since it can be managed as the standard operating system. alphasmart keyboard is a strategy for augmentative communication for autistic children which shows pictures for keys and output the spoken words when the child chooses a picture [12]. this will be more appropriate for autistic children to improve their communication skills through this aural aid. it has been reported that dyslexics retain information more easily when more senses are involved in the learning process [13]. the use of ict can motivate and encourage the dyslexic learners by creating visual, auditory and kinesthetic environment to improve subject specific concepts and selfesteem with less fatigue. according to [13], ict usage offers the greatest independence in learning to all dyslexics, because of the speech recognition software. even today, the speech recognition software is not matured to an acceptable level and at least normal students are not using that software because it requires an immense amount of prior training and a noiseless environment for better work. because of the inability to identify letters correctly at once, dyslexic students have to write the same word again and again. however, if they can type the piece of work, it removes the pressure of rewriting the same thing many times in order to generate a neat piece of writing [13]. additionally, the font size, colour and line spacing can be changed accordingly to use in a comfortable way. though pupils have extremely slow typing speed, practice will develop a greater speed soon. again the author in [13] argues that handwriting of dyslexic pupils may deteriorate after the keyboard is introduced in. c. ict based special education according to the research “preparing special education frontline professionals for a new teaching experience” [14], the key factor of using ict for special education is not the technology itself, but the pedagogy used and the interaction between teacher, student and content. if the technology is used to support the acquisition of traditional skills, it will be a waste of time and technology. therefore, both theoreticians and practitioners have to think in a different and innovative way to apply the technological advancement into the real context. further it is suggested that ict can be used as a motivating factor rather than just using to communicate with the available knowledge because younger generation is very enthusiastic in new technological innovations. the students may easily be attracted to new educational environment and then they can be guided to the learning process indirectly. as a result of interacting with computers, the students will get more effective and efficient education than the traditional education. a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 3 according to the analysis done by british educational communications and technology agency [15], ict can be used to support the learning of students with special educational needs in uk. the technology can help these students to overcome many of their communicational difficulties. this analysis has covered several areas including communication aids, software and web accessibility, teacher training and support and connecting learning communities. by using special access devices to use the computers, the students with physical disorders will motivate for doing the same lesson again. as revealed by analysis in [15], ict helps the students to communicate and learn more effectively. in addition, there are number of benefits for both teachers and parents [15]. further, it is essential to keep the relationship between the gaming subject and the educational subject in order to enrich both the game skill and curriculum knowledge. the facility of repeat and try again is very useful because usually these students are learning through repetition [16]. mainly the majority of special students are educated at normal schools. the inadequate resources in schools is one of the main problems and providing teachers’ training on special education is emphasized [17]. ghana has come up with many solutions that can address the problem of learning disabilities of these students. content-rich software is the main software that they are using in classroom to assist these students. they have refrained from using american and british software as it is difficult for these students to adapt to them. the attempt of conducting a teacher training program may add a value to this program because this may not succeed if teachers are not willing to adapt new technologies in to their teaching strategies [18]. explaining the importance of ict usage in special education and giving the hands-on experience for the teachers are worth in order to improve the usage of new technologies in education curriculum. teachers were able to build the structure of the lesson according to the individualized learning even at group works. it lets students to build the self-confidence and independent learning skills under their own comfortable speed. the findings of this study have showed that a classroom with technology has a motivational setting than a place without technology. therefore, using ict is worth since it is one of the successful motivation factors of learning. students are able to try out tasks in different levels of the games, quizzes and they will be challenged in every level. it helps them to bear a challenge in their real lives. the provided games will enhance the critical thinking power of students by aiming at winning the game, while forgetting they are learning through the computer. it is a good strategy for teaching, because the students may not be interested on those activities if they feel that they are learning. by gaining the competency with these games, students will experience a higher level of self-esteem when they get the chance to complete a task which they previously unable to achieve. d. serious games for students with special needs the students tend to absorb new information when playing digital games. self learning is one of the better ways of learning rather than teaching everything by a tutor. since the students can engage with goal oriented tasks in both real world and non-real world scenarios, it aims to improve the attention, memory, behaviour, cognitive and motor skills [19]. the ability of personalization of the games is useful because it can be adapted according to their level of disability of each individual. since these serious games deal with educational materials, the students can enhance their intellectual abilities while having fun. as stated in [9], serious games are a more effective way of learning. however, having a considerable difference between two subjects will be a burden for students and keeping cross disciplinary content is important for reducing the gap between subjects. as mentioned in [20], when developing a serious game for autistic students, attention should be paid for the characteristics related to interaction with the game, other than studying the associated technologies. further, it has mentioned that serious games for autism cover the subjects related to education, therapy for communication, psychomotor treatment and behaviour enhancement. according to f. shahbodin et al, arshia et al’s computer game based on digital story telling concept which supports autistic children of age between 9 and 14 years to practice the use of money. additionally, it provides an understanding about the social behaviour while helping to experience the shopping [20]. when designing serious games for students with special educational needs, designing the user interfaces for maximum accessibility and usability is recommended to reduce the cognitive load placed on the user when using software [21]. with regard to that, the authors have emphasized the design guidelines such as importance of using graphics, animations, providing alternatives to text, and auditory output to uphold user commitment. ‘cheese factory’ is a serious game [21] designed for teaching basic mathematics such as percentages, fractions and decimals where students have to match the given shape against another one in the interface to produce a full ‘cheese’. there were several speed levels and difficulty levels in that game to make it scalable for any student with special educational needs. further, the user interfaces are simple, colours are matched with overall interface, and instructions are clear. ‘my appearance’ [21] supports the students to understand everyday “morning duties” from getting up until leaving home including having a shower and the breakfast. the game has been developed using flash and the graphic interfaces are clear and understandable. finally, the students receive a feedback on their performance using sounds and subtitles. e. effect of human computer interaction human computer interaction (hci) has been considered as the main factor on developing applications for autistic students as they are more interactive with attraction [25]. as discussed below, the interfaces, colour combinations and other key features should be taken into careful consideration since these students are different to normal students. it has been acknowledged that designing human-computer interfaces for students with physical disabilities is harder, because their abilities are dispensed in a larger range than able-bodied. biswas [22] has introduced a simulator to evaluate the assistive interfaces which can predict the interaction patterns when using variety of input devices. as mentioned in [23], there are several recommendations which should be accomplished when developing a system. using graphics and icons, using clear, unambiguous text, nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 4 simple screen layout, consistency, contrasting colours, and large, clear navigation buttons, descriptive hyperlinks, minimizing scrolling, limiting the number of fonts, and supporting text browsers are some of them. graphics and icons will be more attractive than text because these students have difficulties in reading. further, using large fonts, understandable buttons and clear and simple layout is recommended for eliminating ambiguity. the colour combination should also be considered. the headings, titles and appropriate prompts should be placed at the same place of each page to keep the consistency. the name of the user can be used for the username in the login page. it will be more convenient for them because memorizing an extra word or phrase as the username is not required. it is better to use a natural voice of a person rather than using a synthesized voice. the assignment submission process can be done by providing the space to do the assignment at the same place in this e-learning system instead of changing the file format and attaching them. this type of system will help students to do their tasks without any supervision from a tutor. f. virtual environments florian and lani [25] mentioned that if students are suffering from severe learning disabilities, they could be provided with simulation and virtual environment to improve. adjustable virtual classroom [26] has mainly two components: “editor tool” and “viewer tool”. the editor tool is for the teacher or parent to edit the settings of the virtual classroom. according to the modifications done in editor tool, the viewer tool displays the classroom to the student. as in figure 1, the questions appear on the blackboard. the most important thing is the editor tool. one of the major advantages of this software is the ability to include the real faces on to the teacher and students’ avatar faces. the other avatars, in addition to teachers’ avatar are controlled by the software randomly. fig. 1 an adjustable virtual classroom proposed by konstantinidis et al [19] konstantinidis et al [27] have used an avatar to respond on activities with correct and incorrect answers. this is really important on developing activities including words, colours and other related things and responses are motivating factors for these students. they have tried to enhance teacher-child education process in their approach by providing several difficulty levels. students are given images to select in a semi virtual environment. it uses an avatar to respond on activities with correct and incorrect answers respectively. developing activities including words, colours and other related things as responses are also important when developing activities, since they are motivating factors students. with the information provided, this approach seems to be an effective solution for autistic students. it has been acknowledged that the reaction of students with learning disabilities (especially adults) to the use of multimedia personal computer was extremely positive, and it was clear that the multimedia capability and modern software had considerable potential on those students [28]. g. special software it has been acknowledged that interactive software is capable of keeping the concentration of people in relevant educational activities [29]. it is essential to include more graphics, sounds and more interactive activities when developing software for the users with learning disabilities. there are professional resources, special needs software and assistive technologies such as; 1. software and typing books for students with cerebral palsy, missing fingers, learning disabilities, dyslexia, visually impaired. 2. learning solutions for classrooms and homes for kids 3. accessibility software and text to speech software further, word-processors, screen-reading software and problem-solving software packages still provide useful snapshots of ict applications. however, the ict itself also can create barriers. william’s study cited in [30] concludes that the current use of ict in education doesn’t concern about people with learning difficulties. although ict can be used for special educational needs, it will be useful only when it is specially designed for these types of students. screen readers, spell checkers, word prediction, text to speech and speech recognition software are some of the examples. further, accessible options built in to microsoft products such as magnifier, narrator and on-screen keyboard are also useful as temporary methods of accessing computers. according to [28], nordis educational software, written primarily for students with moderate/severe learning disability, is popular in schools and homes. integrating the computer games and animations in to learning activities was the main aim of nordis software. thomson [13] argued, the programs that use robotic sounding (synthesized) speech can help pupils to determine the accuracy of their text. since the special needs students cannot recognize at least the human voice well, recognizing a robotic voice is questionable. however, to improve the accuracy of the text, predictive programs can be used. those software help to reduce keystrokes, save typing time and aid spelling by suggesting the regular words that the user is trying to type [13]. the programs such as penfriend xp, co writer and texthelp are some examples. williams [10] reported that most of the teachers in uk think it is better if students are searching the internet for information or images of interests. it may be impossible for the dyslexics because of their reading and spelling difficulties. as a solution, dyslexic students can be allowed to ‘read’ the web via a screen reading software. a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 5 iii. methodology interviews and participant observations are useful in understanding the students, their disabilities and the traditional special education strategies. as the first phase, seven teachers in special education centres and government schools, a consultant and a doctor who is specialized in autism were interviewed. some classroom activities of several learning categories such as speech, mathematics and arts were monitored. while observing, we could experience the way of teaching a 15-year old male student and his ability of working with microsoft word, excel and paint. for the convenience of the teaching process, the students with special needs are categorized in to three main levels as primary, intermediate and senior. the categorization is not based on the students’ age, but on their ability of knowledge acquisition. basically, colour, language and number skills of the students with special needs are addressed in their educational curriculum. therefore, developing games covering those three main subject areas is required for the three main levels. three primary colours, red, yellow and blue, are the main targets of the colour games developed for the primary students. regarding number concepts, only the numbers from one to nine are considered and those numbers has been simply introduced using the shape and the phonic of the particular number. same game theories were applied to introduce basic letters in alphabet. the same subject areas (colour, number, and language) that used for primary students were applied for the secondary students also. in order to improve colour identification skills, the games have been developed using both primary colours and secondary colours. giving activities only on secondary colours would badly affect on learning process of those students since there is a possibility of forgetting the primary colours that they have learnt earlier. the number games designed for the intermediate students will teach to map the shape of the number with real value of it and language games designed are aimed at simple words with two or three letters. for the senior level students, instead of colour games, more games are created to introduce the basic shapes such as circle, square, and triangle. gradually, the students will be taught to perform small calculations with the four basic mathematical operations (addition, subtraction, multiplication and division) through number games. to introduce sentences, different language games will be created for senior level students. a. activity designing 1) introductory sessions: the introductory sessions or activities have been designed to introduce the colours, numbers, letters, and sentences. these introductory sessions are not interactive and the user can view them and learn without doing an exercise. figure 2 shows colour introductory activity designed for primary level students. similar, introductory activities are designed to introduce numbers and letters for those students. for an example, it illustrates how the shape of number ‘1’ is mapping with the phonic of ‘one’ (as ‘එක’), the shape of letter ‘අ’ is mapping with the phonic of ‘අ’ (as ‘අයන්න’). in the beginning the letters which are easy to write (ex. ර, ට, ග etc) will be taught. fig. 2. colour introduction of red colour for primary students 2) games: the games can be used to practice the lessons learnt through the activities and traditional learning methods. in addition to that, the progress of the students’ also can be evaluated by using these games. the games are more interactive and they will react on the users’ responses on given activities. as an example if the user provides the correct answer for the given exercise it will send out a pleasant sound or an image of a happy face. an unpleasant sound or an image of an unhappy face will be the reaction for a wrong answer. the activity showed in fig. 3 is an example for a game. that aims at distinguishing primary colours and the final goal of that game is dragging the starts in to the cage of corresponding colour. fig. 3. a sample colour identification game for primary students further, some more interactive games are designed to practice the number and language skills. as an example, students are asked to build up some words by combining a set of given letters and afterwards to construct some small sentences by combining given words. the mathematical concepts will also be practiced by playing some games related to the mathematical calculation concepts. same as any other game, these games also should be designed in a way that the difficulty of the game is increasing gradually. the evaluation games (the games that used for evaluate the research) for colour, number and language have been developed considering many difficulty levels. in all of those levels, only abstract objects like circles, stars, balloons are used for easier understanding. nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 6 3) tests tests are collections of several random games for each level in each category (colour, number or language) covering all required subject matters. once a student finishes the test, a parent or a teacher can check the time taken to finish that task. iv. design considerations as described in [30], referring to the study of friedman & bryen, designing computer game interfaces for students in special education is totally different than designing such interfaces for normal students. they have recommended several design considerations that should be followed when designing systems for students with learning disabilities. following are some of the design considerations used in practice when developing games for such students. a. colour combination of the interface colour is one of the most sensitive factors to be considered when designing interfaces [15]. when using colours in interfaces, applying only light colours and applying white background avoiding black, red and other dark colours are recommended. using many colours within a single interface will confuse the students and there is a possibility of rejecting games with such interfaces. fig. 4 illustrates one of the activity interfaces developed to test how primary students recognize numbers. this has been developed according to the design considerations figured out during the early phases of research. fig. 4. interface of number identification activity for primary students b. complexity as mentioned in [15], the complexity of a game is a major concern when designing interfaces for students with special needs. the simpler it is, easier for them to understand. using clear graphics and icons, unambiguous text, simple screen layout, consistency, contrasting colours, large and clear navigation buttons, descriptive hyperlinks, minimizing scrolling and limiting number of fonts are some of the recommendations to be attained in developing games for students with special needs [13]. further, there should not be any feature that can confuse and scare them. c. speed games for students with special needs should be interactive, flexible and should be in an appropriate speed [15]. since these students are very slow in their performance and response, this is a key factor to be addressed when designing games. d. clear audio in order to create a multi sensory learning environment, an auditory output also has been used for the activities and games. therefore, it may help the students to capture information than just viewing through the computer screen and to keep their concentration on games. basically, there are two kinds of audio clips that included in the activities and games. 1. human voice 2. audio clips to indicate correctness or incorrectness of the provided answer for a particular activity based on the recommendation of special education teachers, instructions were embedded in to the games as human voice, which were recorded with an adult female voice. it helps to eliminate the complexity of using instructions in textual format. in addition to that, another sound clip was used to indicate the end of the game while it serves as a reward for the student. however, it is not recommended to include a sound clip as background music, because it will disturb the students’ concentration. e. graphics as mentioned in [15], images and animations support to increase the interactivity between the games and the student. therefore, both images and animations have been used for the games as well as activities to make them more user-friendly. however, those images and animations should be simpler, clearer and familiar than the graphics used in games for ordinary students. further, graphics can be used to reward the students when they finish the game successfully. f. clear and unambiguous text the characteristics of text which is used for the games and activities of special students should be different from the characteristics of text used in games for ordinary students. the font size used in special educational games should be larger and they should be in same colour throughout the screen. further, the instructions should be given in very simple terms in a clearly understandable manner such that, how it is spoken in day-to-day life. however, it is not recommended to use text for the activities of primary level students. because those students get distracted with letters and it will be a barrier to acquire the targeted knowledge from the activities and games. g. simple screen layout a simple screen layout was maintained throughout the whole game because different layouts will loose the consistence of the sequence and it may distract the students’ acquiring abilities of the subject content. therefore, a single interface should be used with only the required changes in the interface without any additional things. as an example, when a child is playing a game, even navigation buttons should not be there on that screen. those buttons will be displayed after the student finishes the game. h. large clear navigation buttons the navigation buttons should be large enough in size in order to facilitate the visibility. further the buttons should be meaningful and visible clearly. a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 7 v. implementation implementation standards and best practices were followed in order to produce a better product and to reduce the probability of introducing errors in the application. during the implementation, the games were developed to improve the colour, number and language skills of the students with special educational needs. the development of each game/s for different categories is discussed below. macromedia flash 8 and adobe flash cs4 were used with the help of action script 2.0 and action script 3.0 as development languages. adobe photoshop cs3 has been used as a supportive tool for editing the images used in the interfaces. the adult woman voice, which gives the instructions for the games, has been recorded, edited and optimized using audacity 1.3. further, format factory 2.10 was used to convert recorded voice clips and used sound clips into suitable file formats for maintaining quality of the developed games and games. to execute these activities, macromedia flash player 8 (or above) or any web browser with shockwave flash 10.3 (or above) as a plugin will be required. at the moment, the games can be played only using key board and mouse; however, they can be extended to play using special key boards and special mouse specifically designed for the children with special needs. further, output sound devices (as speakers) are required to produce the instructions and other sounds of the games. since these students can use only the primary hardware devices to interact with the computers, no advanced hardware devices are considered during implementation. by combining all the games in colour, number and language skills, the final product was created as a collection of games. as it contains attractive and very simple interfaces to select appropriate games, even the students can handle it without the help of a teacher or a parent. all the interfaces are in sinhala language and instructions also are given in sinhala language. in order to facilitate the usability of the final product, a user manual is provided which was written in sinhala. all the instructions to access the games are explained so that the teachers or parents can assist students to enter into the game. to introduce the product to the special education community in sri lanka, a compact disk (cd) was created including the final game. the game cd has been given to the teachers in some special education centres and for the parents of students in those centres. further, this game was given to some teachers in government schools which having a separate special education classes. however, providing the final game in a cd is limited to several special education centres and governmental schools. therefore, to improve the accessibility for this special game, a blog was created and the games also are uploaded for free access: http://ictbasedspecialeducation.blogspot.com/ this project has been released in two packages under gnu general public license (gpl) version 3. one package consists of the games in a non editable format (includes only.swf files) targeting normal users of the product. under the license, the users of that package have the freedom to share and distribute copies. the second package consists of the games in editable format (includes both .swf files and .fla files) with the source code. it was released targeting the content developers. however, because of the gnu gpl license, they also have the freedom to share and distribute copies and an additional freedom to modify the games. by making the source code available, evolving the game content is expected with the support of the community, who willing to improve these kind of products for students with special needs. when distributing these packages for a fee or even free of charge, all the recipients distribute the same freedom because gnu gpl is a copyleft license. vi. evaluation evaluation was done after several discussions with special education teachers on how the activities should be arranged for this purpose. the evaluation process was organized into three main sections: i. pre test round ii. practice round iii. post test round further, with the guidance of special education teachers, the test game was created to cover up the colour, number and language skills. when considering one game, it has many levels including activities which gradually increases the difficulty from level to level (fig.5). if student is capable of doing one level correctly then he/she can move in to the next level. however, if the student is not capable of completing it correctly, he/she would derive to the level itself to try out and complete it again. each level has been included with restart button to start the game from beginning according to their preference. fig. 5. flow diagram of levels within a single game the evaluation was done with a sample of 27 students covering both private special education centres and government schools. the sample was selected covering several disorders including autism, down syndrome and adhd. initially the pre-test round started with colour game which has ten levels. according to the teachers, starting from colours is better for these children since they are very interested in colours. therefore, primary colours, red, blue and yellow were considered at this stage and then their capabilities of distinguishing colours out of two to three colours was tested. coloured, real world objects have been used in the last level to make them familiar with day to day objects. when a student completes the game, the time time taken http://ictbasedspecialeducation.blogspot.com/ nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 8 duration spent to complete is displayed in the interface (figure 6). fig. 6. displaying time duration after completing a game after completing the first round of pre-test with colour game, the students were engaged with number games. even though ten levels were include in colour games, the umber pre-test game consists with only six levels because it was evident that having ten levels within one game is fatigue for them. the focus was on identifying their knowledge on numbers as one, two and three in primary level. similar to the colour game, this game also consists of introducing numbers and examining students’ knowledge in distinguishing two numbers. the language skills were evaluated as the last round of pre-test where the letter identification was tested for primary level students using three sinhala letters, 'ට', 'ම' and 'ර'. each student is allowed to do these pre test games individually and their capabilities were identified. time taken to complete a game, number of mistakes made within the game and number of levels completed were considered when students play the game. in addition to those details some other details such as the type of disorder, age, duration of formal special education, computer accessibility were recorded. further, some special comments were recorded on how they perform in individual games. after the pre test round, the sample of students has practiced for a practice round, with the developed games for nearly a month and during that period each student got around 1 to 3 hours training depending on his/her acquiring ability. afterwards, the post-test stage was carried out to verify the improvement of these students. as done in the pre-test stage, the same order was carried out starting from colour and then number and language skills. each student was individually examined and details were recorded including the time taken to complete the game, number of mistakes and number of levels they went through. according to the observations, these students were keen on playing games and it helps a lot to carry out the evaluation process properly. considerations in evaluation:  before starting pre test, a few time was given for students to practice mouse movements and clicks  mouse pointer had to be changed into a larger one (using windows default mouse icons)  more sounds were included into games for better attention vii. analysis of results in order to make decisions based on students’ performance, a quantitative analysis was done with 35 student participants. however, all the students couldn’t participate in the post-test. thus, a comparison between the pre-test and post-test results was done based only on the 27 students. (in all the figures hereafter, “std” stands for “student” and accordingly “std i” stands for the i th student, where i varies from 1 to 27.) a. analysis based on number of levels completed 1) colour tests: according to the results of the pre-test and post-test, 2 students (7%) have not proceeded beyond the level 0 while 13 students (48%) have completed all 10 levels in both tests. rest of the students (45%) have completed even a single level in pre-test or post-test and generally increased their performance in post-test better than the pre-test. therefore, it is clear that the improvement of colour skills is about 44% in terms of number of levels. however, no conclusion can be made with regard to those who completed same number of levels in both tests. figure 7 and table i show the summary of these statistics. fig. 7 comparison of number of steps – colour skills table i results of colour games with respect to the number of steps no. of students (total = 27) percentage completed all steps in both tests 13 48.14% escalation in post test than pre test 12 44.44% no improvement in tests 2 7.41% 2) number tests: completed number of levels related to pre test and post test of number skills indicates an improvement in almost all the students while one student has not attempted even the first level in both tests. 15 students (56%) have completed all 6 levels and 2 students (7%) have done 5 levels in both tests. 8 students (30%) have had an increase in the number of levels in the post test. notably, one student (4%) has ended up with a lesser number of levels in the post test. see figure 8 and table ii for the summary of the statistics. hours minutes seconds milliseconds time taken a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 9 fig. 8 comparison of number of steps – number skills table ii results of number games with respect to the number of steps no. of students (total = 27) percentage complete all steps in both tests 15 55.55% escalation in post test than pre test 8 29.62% complete same number of steps in both test 2 7.41% less number of steps in post test than pre test 1 3.70% 3) language tests: total number of students participated for the language test was 25, since there were 2 students (std 12 and std 15) who are tamil students and therefore, cannot understand sinhala language. the improvement of students in language skills based on the number of levels is about 28% because, 7 students have completed more levels in post test than pre test. number of students who did all 6 levels in both tests is 14 (56%) and 3 students (12%) have ended up with a lesser number of levels in post test while only one student has completed the same number of levels, however not all 6 levels in both tests. these statistics are presented in table iii and figure 9 (where std 12 and std 15 have been removed). fig. 9. comparison of number of steps – language skills table iii results of language games with respect to the number of steps no. of students (total = 25) percentage complete all the steps in both tests 14 56% escalation in post test than pre test 7 28% complete same number of steps in both test 1 4% less number of steps in post test than pre test 3 12% b. analysis based on average time taken per level this analysis might not be practical if the comparison was done with the total time each student spent in both pre-test and post-test, because different students have left the game in different levels. therefore, average time taken to complete a single level was calculated and a comparison was done between pre test and post test based on number of levels. 1) colour tests: fig. 10 shows how the students have spent time in colour game, both pre-test and post-test. according to the graph, 26 students have taken less average time per level in the post test compared to the pre test. as a percentage, it is nearly 96%. from those 27 students, only 1 student spent more average time in the post test. fig. 10. average time spent for a level by each student – colour skills the summary of above statistics is shown in table iv below. table iv results of colour games with respect to the average time 2) number tests: as shown in fig. 11, among the sample of 27 students, 19 students (about 70%) have taken less average time whereas the rest of the students (about 30%) have taken a higher average time in post test than pre test. nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 10 however, compared to the colour tests, the students have not performed well in number tests. fig. 11. average time spent for a level by each student – number skills the summary of above statistics is shown in table v below. table v results of number games with respect to the average time 3) language tests: fig. 12. average time spent for a level by each student – language skills among 25 students who participated in language tests, 22 students have taken less average time in post test indicating a percentage of around 88%. other 3 students, about 12% of the sample have taken a higher average time in post-test than the pre-test. the summary of above statistics is shown in table vi below. table vi results of language games with respect to the average time c. analysis based on average number of mistakes similar to the average time taken, it is unable to compare the number of mistakes done by students when comparing their pre-test with post-test since most of the students have completed different number of levels in those two tests. therefore, the average number of mistakes per level was compared in this analysis. 1) colour tests: according to the results obtained in colour tests, one autistic student has completed the post-test with a less average number of mistakes than the pre-test. in the category of down syndrome, 10 students (about 63%) have done fewer mistakes while other 6 students have completed the post test with a higher number of mistakes. fig. 13. average number of mistakes per level – colour tests the summary of above statistics is shown in table vii. table vii average number of mistakes done by students in colour games 2) number tests: out of 6 autistic students in the sample group, there were 3 students (50%) have done post test with a less average number of mistakes than pre test. further, 13 students with down syndrome (about 81.25%) and 1 student from the other category (20%) have made less mistakes in post test than pre test. however, no interpretation can be a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 11 made about those who have not done any mistake in both tests. fig. 14. average number of mistakes per level – number skills the summary of above statistics is shown in table viii. table viii average number of mistakes done by students in number games 3) language tests: among 25 students (two students were not participated for language tests), average number of mistakes in post test has been reduced compared with pre test in language category for 3 students with autism (50%), 10 students with down syndrome (66.67%) and 2 students from the other category (50%). fig. 15. average number of mistakes per level – language tests the summary of above statistics is shown in table ix below. table ix average number of mistakes done by students in language games d. analysis based on disorder type although the proposed solution is for all the categories of special educational needs, it is intended to test out whether there is a significant impact of that solution on any specific disorder. therefore, the collected data set is analysed categorizing students according to the disorder as down syndrome, autism and other. students who are having cerebral palsy and adhd belong to “other” category. in that sample of 27 students, there are 6 students with autism, 16 students with down syndrome and 5 students are belonging to ‘other’ category. for the language test, out of the sample of 25 students, 6 students are with autism, 15 students are with down syndrome and 4 students are belonging to ‘other’ category. the analysis is based on average number of mistakes done by a particular student per level and average time taken per level. in order to get a clarified interpretation the derived diagrams have been divided into four parts as a, b, c and d as shown in below diagrams (fig 16 to fig 21). all the discussions are based on transitions of students’ positions inbetween those four parts during pre test and post test. the transitions are as follows; a to b time has been increased while number of mistakes is high a to c taken time is high, number of mistakes has been reduced a to d number of mistakes has been reduced while time is less b to a time has been reduced while number of mistakes is high b to c number of mistakes has been reduced while time is high b to d both time and number of mistakes, have been reduced c to a number of mistakes has been increased,taken time reduced c to b number of mistakes has been increased while time is high c to d time has been reduced while number of mistakes is less d to a number of mistakes has been increased while time is less d to b both time and number of mistakes, have been increased d to c time has been increased while number of mistakes is less 1) colour skills colour pre test according to figure 16, std 4 and std 7 can be identified as outliers compared to the rest of the sample. std 4 has some disorders in his brain and as a result he lacks attention and eye contacts even with his parents and teachers. std 7 is with the disorder adhd. adhd students also lack attention. therefore, interacting with a computer for a nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 12 long time is a good practice for such students as they would be able to focus even though they cannot accomplish given activities. fig. 16. comparison of students based on disorder type colour pre test colour post test fig. 17. comparison of students based on disorder type colour post test table x consists of the data derived by analysing figure 16 and figure 17. according to table x, nearly 43.75% of down syndrome students have done colour post test better than colour pre test and nearly same percentage of down syndrome students are in the same level in both pre test and post test. however, nearly 12.5% of students have not been improved compared to the rest of the sample of down syndrome. according to the statistics of autistic students, 50% are in the same level in both tests and 50% have not been improved when the whole sample is compared. when the “other” category is considered, all the students are in the same level, i.e. no visible improvements. the summary derived through the comparison of colour tests is shown in table x. table x summary of data in fig. 16 and fig. 17 2) number skills the students’ performance on number skills in terms of average time spent and average number of mistakes per level is depicted in figure 18 and figure 19. number pre test fig. 18. comparison of students based on disorder type number pre test number post test the summary of figure 18 and figure 19 is included in table xi and it depicts the details about number skills of sample students. among 16 of the down syndrome students, nearly 44% have done post test better than pre test in the total sample. 44% of students are in the same level in both tests while 13% of students have not been improved. approximately 17% of autistic students have done post test better than pre test. about 83% of students are in the same level in both tests. in “other” category, 80% of the students are in the same level compared to the whole sample a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 13 while 20% of the students have ended up with less performance in the post test than the pre test. fig. 19. comparison of students based on disorder type number post test table xi summary of data in fig. 18 and fig. 19 3) language skills language pre test fig. 20. comparison of students based on disorder type language pre test language post test fig. 21. comparison of students based on disorder type language post test the data about language skills included in table xii has been derived from figure 20 and figure 21. according to the statistics of down syndrome students, about 33% have done post test better than pre test. about 47% of students are in same level in both tests and 20% of students have ended up with less performance in post test than pre test compared to the whole sample. according to the pre test and post test results of autistic students, improvement rate is about 17%. however, all the students in the “other” category have maintained the same level in both tests compared to the whole sample. table xii summary of data in fig. 20 and fig. 21 viii. discussion there is a list of significant findings that identified by analysing the diagrams, tables and information discussed in ‘analysis of results’ chapter. when comparing the ‘average time taken to complete a single level in the game’ and nadira t. perera, induni s. d. wijerathne, manori m. wijesooriya, a. t. dharmarathne, a. r. weerasinghe international journal on advances in ict for emerging regions 14 ‘number of mistakes per level’, it is clear that the performance improvement rates are higher in terms of average time. that means the efficiency of students has been improved rather than the effectiveness.  the improvement rate of students who have computers at home (about 85%) is higher than those who do not have computers at home (50%). therefore, the practice of using a computer at home also may have an effect on students’ performance improvement.  when the students who do not have computers at home are considered, students with down syndrome have improved more than the autistic students.  female students have shown a better improvement than male students in both number and language skills activities.  however, both genders have shown equal improvements in colour skills. ix. conclusion even though the society is in the knowledge era, unfortunately there is no opportunity for the students with special needs to take the advantage of ict based education. although most of the countries have carried out many researches and projects with regards to how ict can be utilized addressing this issue, in sri lankan context, it is a stranger. as a result, this research was carried out in order to study the potential of ict usage at special education in sri lanka and analyse the effect of ict based education those students’ performance. in this study it was focused on a critical evaluation on students’ performance using some computer based educational games specially developed for students with special educational needs, covering the three main subject areas: colour, number and language. those games were developed under the guidance of consultants and teachers in special education. since this is for a special category of students, many design considerations have to be considered. in order to evaluate the improvement of performance, two tests were carried out as pre test and post test. in addition to that, the sample of students was given a one-month training using designed games in between pre test and post test. a quantitative analysis was done based on the number of levels a particular student completed, average time taken per level and average number of mistakes made per level. however, it is impractical to have a qualitative analysis in this kind of a research area because, it is impossible to get feedbacks from students with special needs about the proposed solution. however, throughout the research, it was evident that these students are very interested in their lessons with the help of these applications. according to the analysis on the number of average mistakes per level, about 48% in colour skills 63% in number and 60% in language have made less number of mistakes per level in both tests. therefore, these rates indicate improvement of performance ultimately. thus, it is evident that usage of ict in a game based approach as learning aids has a positive impact on the improvement of the performance of students with special needs. however, it will not be able to alter the traditional special education entirely from ict based special education. the proposed solution would be an aid for sharpening the knowledge of students with special needs further. x. future work repetition is a vital factor when it comes to students with special needs. although the quantitative analysis gives positive results while indicating a considerable improvement of the performance of these students, there can be a significant improvement if they are being given some more practice within the school time as well as at home. therefore, the significance of this game-based learning approach will have a higher success rate through continuous training. since some of the activities developed under this research are based on mouse clicking and dragging, handling the computer mouse is also a vital skill. it was somewhat problematic for these students and there were some students who struggled with the mouse, specially the children with cerebral palsy disorder. those students would benefit a lot from some hardware solutions such as touch screens as it would be much easier for them involve in the activities. the students get distracted when they see a pop up menu as they usually tend to click the right mouse button instead of the left mouse button. it was also noticed that most of the students are interested in using the wheel of the mouse. as they lose their concentration, it is better to have a specially designed device for them with a single button, without a wheel. if these games could be optimized as apps for mobile devices such as i-pads, the students would be happier to use them than computers. further, it will help to break the monotony of these games and create more interest on students. xi. acknowledgement we extend our gratitude to all the teachers at chithra lane special education centre and mahawila junior school, panadura for giving us valuable domain knowledge and allowing us to work with special needs students. further, we appreciate the great support of dr. sudath damsinghe and the staff of daddys lanka autism centre. xii. references [1] m. a. g. kemp, m. a. s. smith and j. segal (2012, january). learning disabilities in children. web. available: http://www.helpguide.org/mental/learning_disabilities.htm [2] l. silbert (2012). list of learning disabilities. web. available: http://drlindasblog.com/list-of-learning-disabilities/ [3] wijeya newspapers (2011, may 15). mediscene. magazine. available: http://sundaytimes.lk/110515/mediscene/mediscene_1.html [4] s. kuruppuarachchi, “status of school library development in sri lanka,” sri lanka journal of librarianship & information management, vol. 1, no.1, pp 25-30. [5] r. p. mahaliyanaarachchi, k. p. perera and w. p. perera, “subject review report,” department of special needs education, open university of sri lanka, pp. 1-13. [6] k.staikou, s.avropoulou and c. karagiannidis, “development of educational software for teaching daily life skills to students in the spectrum of autism”, pp.105-108, 2008. [7] d . mioduser, h . tur-kaspa and i. leitner, “the learning value of computer based instruction of early reading skills,” computer assisted learning, vol. 16, pp. 54-63, 2000. [8] g.r.hayes,“understanding educational technology through special education and autism”, university of california, irvine, 2007. http://www.helpguide.org/mental/learning_disabilities.htm http://drlindasblog.com/list-of-learning-disabilities/ http://sundaytimes.lk/110515/mediscene/mediscene_1.html a game based learning approach to enrich the special education in sri lanka international journal on advances in ict for emerging regions 15 [9] a . luneski, i. evdokimos, m. hitoglou, and p.d. bamidis, “affective computer-aided learning for autistic children,” 1st workshop on childcomputer and interaction, 2008. [10] y. liu, a. cornish, j. clegg, "ict and special educational needs: using meta-synthesis for bridging the multifaceted divide," 18-25, 2007. [11] m. silva, c. simoes, d. gonçalves, t. guerreiro h. silva, f. botelho, "trocas: communication skills development in children with autism spectrum disorders via ict," 645-648, 2011. [12] g. r. hayes, “understanding educational technology through special education and autism,” university of california, irvine, 2007. [13] m. thomson, “using ict to support dyslexia in the secondary school,” dyslexia, britain, 2007. [14] j.m.ribeiro, a.moreira and a.m.p.almeida, “preparing special education frontline professionals for a new teaching experience”, elearning papers, issn 1887-1542, pp. 1-10,2009. [15] becta, “what the research says about ict supporting special educational needs and inclusion” 2003. [16] m. saridaki, l and c. mourlas, “fingers on the screen: game based learning for students with intellectual disabilities”, e-learning papers, pp. 1-4, 2011. [17] c. hayford, leslie, lynch, paul, “ict based solutions for special educational needs in ghana,” 2003. [18] wettasinghe, marissa, m. hasan, “investigating the efficacy of the use of ict for slow learners: case studies in singapore primary schools,” in proceedings of the icl conference 2007, villach, austria, 1-11, 2007. [19] a.protopsaltis, l.pannese, d.pappa, and s.hetzner, “serious games and formal and informal learning” e-learning papers, issn: 1887-1542, pp. 1-8, 2011. [20] s. arnab, r. berta, j. earp, s. de freitas, m. popescu, m. romero, i. stanescu, and m. usart, “framing the adoption of serious games in formal education,” electronic journal of e-learning, vol. 10, no. 2, 2012. [21] c. s. lanyi, d. j. brown, p. standen, j. lewis, v. butkute, and d. drozdik, “goet european project of serious games for students with intellectual disability,” in cognitive infocommunications (coginfocom), 2011 2nd international conference on, 2011, pp. 1–6. [22] p.biswas, “simulating hci for special seeds”, sigaccess newsletter, vol. 1, no. 89, pp. 7-10, 2007. [23] m.p.wachowiak, r.w.smolikova and g.d.fryia, “practical considerations in human-computer interaction for e-learning”, international journal of information studies, vol. 2, no. 1, pp. 60-70, 2010. [24] m. p. wachowiak, r. w. smolikova, g.d. fryia, “practical considerations in human-computer interaction for e-learning,” international journal of information studies, vol. 2, no. 1, 60-70, 2010. [25] l. florian, “uses of technology that support pupils with special educational needs”, 2003. [26] c.s.lanyi, z.geiszt, p.karolyi, a.tilinger and v.magyar, “virtual reality in special needs early education”, the international journal of virtual reality, vol. 5, no. 4, pp. 55-67. 2006. [27] konstantinidis , i. evdokimos, luneski, andrej, frantzidis, a. christos, nikolaidou, maria, h. antoniadou, magda and bamidis, d. panagiotis, “information and communication technologies (ict) for enhanced education of children with autism spectrum disorders”, the journal on information technology in healthcare, vol 7 (5), pp. 284-292, 2009. [28] a.aspinall and r.hegarty “ict for adults with learning disabilities: an organization wide audit”, british journal of educational technology, vol.32, no. 3, pp. 365-372 [29] c. l. m. santarosa, o. l. basso, "eduquito: virtual environment for digital inclusion of people with special educational needs," journal of universal computer science, vol. 15, no. 7, 1496-1507, 2009. [30] m. p. wachowiak, r. w. smolikova, g.d. fryia, “practical considerations in human-computer interaction for e-learning,” international journal of information studies, vol. 2, no. 1, 60-70, 2010. microsoft word 212 path index based keywords to sparql query transformation for semantic data federations international journal on advances in ict for emerging regions 2016 9 (1) june 2016 international journal on advances in ict for emerging regions path index based keywords to sparql query transformation for semantic data federations thilini cooray, gihan wikramanayake abstract— semantic web is a highly emerging research domain. enhancing the ability of keyword query processing on semantic web data provides a huge support for familiarizing the usefulness of semantic web to the general public. most of the existing approaches focus on just user keyword matching to rdf graphs and output the connecting elements as results. semantic web consists of sparql query language which can process queries more accurately and efficiently than general keyword matching. there are only about a couple of approaches available for transforming keyword queries to sparql. they basically rely on real time graph traversals? for identifying subgraphs which can connect user keywords. those approaches are either limited to query processing on a single data store or a set of interlinked data sets. they have not focused on query processing on a federation of independent data sets which belongs to the same domain. this research proposes a path index based approach eliminating real time graph traversal for transforming keyword queries to sparql. we have introduced an ontology alignment based approach for keyword query transforming on a federation of rdf data stored using multiple heterogeneous vocabularies. evaluation shows that the proposed approach have the ability to generate sparql queries which can provide highly relevant results for user keyword queries. the path index based query transformation approach has also achieved high efficiency compared to the existing approach. keywords— semantic web; keyword query processing; sparql query generation; rdf federations i. introduction wadays the world wide web (www) has become essential to everyone. people always tend to search the web to retrieve information about almost everything. once a enters a query, the underlying query processors must be able to gather results from available sources. what user is interested is, receiving relevant answers for their questions, efficiently. ability to understand the meaning of user query is important to provide relevant results. once the user requirement is understood, it should be presented in a way which underlying data sources can understand and process. the relevancy of results provided by the data source depends on both the completeness of data stored in the source and how well the user query is understood by the data source. manuscript received on 23 nov 2015. recommended by prof. k. p. hewagamage on 16 june 2016. this paper is an extended version of the paper “an approach for transforming keyword-based queries to sparql on rdf data source federations” presented at the icter 2015 conference. thilini cooray holds a b.sc. (honours) in computer science from the university of colombo school of computing, sri lanka.(e-mail: thilinicooray.ucsc@gmail.com). prof. gihan wikramanayake is a senior lecturer at the university of colombo school of computing. (e-mail: gnw@ucsc.cmb.ac.lk). www contains huge amount of details about variety of topics. most of them are stored as web documents. web documents are capable of preserving complete details about topics rather than compacting them to traditional databases where only the details matches with the database schema are stored while skipping others despite their necessity for the completeness of information. however, as the amount of web documents are extremely increasing, requirement for effective storage mechanisms and efficient searching mechanisms were highly demanded. this paved way to the emergence of the concept of transforming web documents to web data. semantic web1 was introduced as a method of storing web data in such a manner which is understandable to a computer. . resource description framework2 (rdf) was presented as the standard format for storing and exchanging data. rdf preserves the interconnections among data elements and use graph structures for storage. using graph structures for data storing, is crucial for web data as they contain huge amounts of relationships that relational databases are incapable of maintaining. these relationships are essential when recognizing the relevancy of data for a user query. recently, many researchers and academic institutes have taken the initiative of exposing their data to the web in rdf format. sparql3 is the query language for rdf data. it is capable of both representing information needs along with relationships among elements and dive in rdf sources to extract information considering those relationships. relaxed models such as keyword queries are convenient for general users to query data sources as they do not have to consider the underlying complexity such as data structures and schema when composing queries. many researches have been carried out in keyword query search over tree [1], [2], [3], [4] and graph [5], [6] structured data. basic idea behind keyword search is to identify matching data elements for keywords from the underlying data source and retrieve substructures which connect all those identified elements. structured queries are capable of retrieving more relevant results efficiently and accurately compared to keyword queries. however, composing structured queries require expertise knowledge which is lacking among general users. sparql is capable of retrieving more relevant results from web data. therefore, bridging the gap between user friendly keyword queries and sparql allows general users to retrieve highly relevant results without having knowledge about underlying complexities. transforming keyword queries to sparql is still a novel topic which has not gained much attention among semantic research attention. however, it could be identified as one of the key points in familiarizing the importance of semantic web to general public and sharing 1 http://en.wikipedia.org/wiki/semantic_web 2 http://www.w3.org/rdf 3 http://www.w3.org/tr/rdf-sparql-query n path index based keywords to sparql query transformation for semantic data federations 2 international journal on advances in ict for emerging regions june 2016 its privileges with them for fulfilling their information needs efficiently while enhancing the relevancy of results. the process of translating keyword queries to sparql can be decomposed to following steps. 1) mapping user keywords to data elements. 2) identifying sub-graphs which can connect mapped data elements. 3) generating queries based on the relationships in the sub-graphs. most of the available approaches exploit graph traversal in real time for identifying suitable sub-graphs [3], [4], [6], [8], [9]. only limited set of functions are carried out as preprocessing. most of these approaches provide approximate results because traversing rdf graphs with millions or billions of data is very expensive and highly time consuming. hence there is a requirement to seek for approaches which can reduce graph traversal in query generating time. there are many contributors of the www who provides information on related topics individually. for an example, dblp and acm contain academic publication data individually. none of them have entire publication data. in contrast, google scholar connects sources such as dblp and acm to provide more complete set of results for the public. therefore, general public is attracted more towards google scholar for their publication related information needs. most rdf sources are also maintained as individual dumps. in order to provide more complete results with high accuracy, it is important to combine those together. rdf federations have been presented as a solution for this problem. yet existing rdf federations only accept sparql queries. seeking for approaches which can direct user queries to rdf federations will enhance completeness and accuracy of provided results. this research focuses on transforming keyword queries to sparql on a rdf federation in order to allow general users to access semantic web and fulfill their information needs. following are the main contributions of this research: • proposing an approach to map user keywords to data elements resolving vocabulary level heterogeneity an available ontology alignment mechanism is utilized for resolving vocabulary level heterogeneity. results of the alignment mechanism are combined with a keyword index to map source wise matching elements for user keywords. this mechanism is capable of returning a set of keyword matching elements for each data source in the federation. • building a path index capturing full paths accurately path index is an existing concept which reduces the cost of real time graph traversing for keyword query processing. the existing logic intends to store full paths from vertices to sink nodes as a preprocessing task. however, the breadth-first search based algorithm presented is unable to filter only full paths, causing unnecessary graph traversals at real time. therefore, this research has proposed a depthfirst search based approach which is capable of accurately capturing full paths. • utilizing the stored templates in the path index to generate sparql queries without graph traversing in real time path index was previously used only for keyword mapping. this research proposes a way which path index can be utilized for sparql query generation. the proposed approach is capable of generating queries which can be directly executed on sparql query engines. results of generated queries exhibit high precision and recall values. the paper is organized as follows: section 2 discusses related work; section 3 gives an overview of the methodology, section 4, 5 and 6 gives an in-depth explanation about each component of the suggested approach; finally section 7 provides experimental evaluations and in section 8, conclusions and future work are presented. ii. related work even though there is no existing approach for transforming keyword queries to sparql on rdf federations, research has taken place in addressing each step required for keyword to sparql transformation on rdf federations. they are as follows; resolving vocabulary level heterogeneity, mapping user keywords to data source elements and identifying suitable sub-graphs connecting keyword elements. heterogeneity resolution is a core function of federations according to sheth and larson [16]. there are different levels of heterogeneity such as vocabulary level heterogeneity and data level. however, vocabulary level heterogeneity must be resolved to process keyword based queries in a federation. sparql query rewriting is one approach for resolving vocabulary level heterogeneity [17], [18], [19]. the input query must be written in sparql using a specific vocabulary for applying this solution. this cannot be applied when input query is in keyword. ontology alignment is another method proposed for heterogeneity resolution among rdf data sources. concept level, property level and instance level are the ontology alignment types according to gunaratna et al. [20]. blooms [13], aroma [21] and rimom [14] are some concept level alignment approaches. gunaratna et al. [20] have mentioned that very less amount of research have been focused on property level alignments. alignment api [12] can be identified as a proper tool for property level alignments from different vocabularies. since this research aims at heterogeneity resolution for a federation, both concept level and property level vocabulary resolutions are required. indexing is the common approach adopted by almost the entire keyword query processing approaches for matching data elements with keywords. however types of indices they have used are different. searchweb [8], bidirectional search [6] keeps indices for both vertices and edges. they consider the possibility of a user keyword occurs at an edge as well as at vertices. blinks [5] believes that keywords can only occur in vertices of the graph so index only vertices. path index [7] only store sink nodes in their index arguing that keywords only reside in sink nodes. when storing details, most approaches index only the label of the graph element. however searchweb [8] and hermes [15] store some additional information along with the label. those details are used for efficient sub-graph identification on that approach. many approaches have been suggested for identifying suitable sub-graphs which can connect keyword elements. basic tree search algorithms such as breadth first search [22] and depth first search [23] were first applied for substructure identification in tree structured data. then several other algorithms [24], [25] were proposed by modifying those basic concepts. as rdf data sets mostly have a graph structure, graph exploration approaches were 3 thilini cooray, gihan wikramanayake june 2016 international journal on advances in ict for emerging regions proposed such as backward search [9] and bidirectional algorithm [6]. searchweb [8] has suggested a summarized graph which has reduced the size of graph which needs to be traversed at real time for finding suitable sub-graphs. real time graph traversing is highly time consuming. mks [26] uses a matrix to store the keyword relationships to eliminate graph traversal at real time. they only focus on binary relationships. g-ks [27] proposes a keyword relationship graph to find a suitable sub-graph to resolve the weakness of m-ks. tran et al. [10] suggest a graph based model to sub-graph identification. cappellari et al. [7] suggest a path index based approach, which stores edge sequences from source nodes to sink nodes of rdf graphs. as they have totally eliminated graph traversing at real time, efficiency of this approach is really high. however they require more storage space than other approaches. among keyword searching approaches, only three methods have been proposed for transforming keyword queries to sparql. searchweb [8] and hermes [15] have proposed a method for converting keyword queries to sparql by identifying suitable sub-graph and converting it to conjunctive queries. they have only proposed that approach either to a single data source or a set of linked data sources. they have not considered the federation scenario. unger et al. [28] have suggested a linguistic analysis based approach for transforming natural language queries to sparql. they have ignored the capabilities of semantic web in their solution. iii. methodology we propose an approach for transforming keyword queries to sparql on rdf data source federations within a set of defined limitations. the main objective of our research is to examine the feasibility of proposed approach for keyword to sparql transformation on federations as no existing approach has addressed this issue. we do not focus on examining the generalizability and scalability of this approach at this initial stage. we use academic publication data for explaining and evaluating this approach at the initial phase even though this proposed approach can be used domain independently. author name, published year and publication title are the initial set of keyword fields we are using for generating keyword queries. these fields were selected as those are the fields which are highly queried regarding academic publications even in digital libraries such as acm and dblp. we have defined a specified format of queries to this approach. all conditions of the user query should be represented according to the format :. comma should be used if multiple conditions are presented. field which needs results should be indicated using a question mark (?). for an example, accepted keyword query for “what are the publications of james published in 1995?” is “publication:?,author:james,year:1995”. rdf data are stored in graph structures. therefore cycles can occur. inverse relationships can be identified as a main reason for causing cycles in rdf graphs. this research only aims at resolving cycles caused by inverse relationships on rdf graphs. heterogeneity resolving is a main focus in federations. vocabulary level heterogeneity and instance level heterogeneity are the main levels of heterogeneity in rdf federations. approaches for dealing with those heterogeneous situations are needed to be included in the architecture. resolving vocabulary level heterogeneity in a federation is essential for retrieving complete results, because different vocabularies usually use different terminologies for similar concepts. identifying the similarity among vocabularies paves the way for extracting relevant results from heterogeneous data sources. when consider instance level heterogeneity, it contains a separate level of complexity. different data sets may store same value in different formats. for an example, an author name "a.bernstein" may have stored in one data set as "a.bernstein", while another data set as "arnold berstein". same name can reside in several attribute fields as well. for an example, name "levenshtein" can b e a name of an author as well as a part of an article titled such as "levenshtein distance". heterogeneity caused by same entity identified by different names are not going to be resolved in this approach. because the original format which the data are stored at each data store is required for generating sparql queries and retrieve answers. the ambiguity of same literal reside under different fields (author, title etc.) are resolved by introducing specific keyword fields to the user. architecture of our proposed approach is depicted in fig. 1. it contains three main components namely query validator which validates the user inserted query, keyword to attribute mapper which identifies the elements in user query, their relationships and how they can be related to existing data elements in the federation. final component is query converter where the sparql queries are generated based on the keyword queries entered by user. each of these components is discussed in upcoming sections. fig. 1 architecture of the proposed approach iv. query validator this component is used for validating the format of input keyword query. keyword queries adhere to a defined format are only focused in this methodology. keyword queries of proposed format are only expected by next steps in this approach as well. therefore, validating the input query format is essential. a simple regular expression based lexical path index based keywords to sparql query transformation for semantic data federations 4 international journal on advances in ict for emerging regions june 2016 analyzer and a parser is used for this purpose. once the user query is inserted, it is directed to the lexical analyzer for tokenizing and removing unnecessary white spaces. if any error occurred during this stage, the query will be rejected and an error is displayed. if the query was accepted by the lexical analyzer, a token steam is sent to the lexical parser. a regular expression which defines the accepted format of user query is included in the lexical parser. if the input query matches the grammar rules, it is a valid query. invalid queries are notified as errors. valid queries are stored as key-value pairs and sent to the next component. v. keyword to attribute mapper once the query validator sent a set of key value pairs based on the input user query, keywords to attribute mapper have several tasks to complete. they are as follows: identifying the fields which user query consists of, retrieving matching data source elements for user keywords from the federation, retrieving vocabulary dependent terms for user query’s variable fields from heterogeneous vocabularies in the federation, clustering identified matching element sequences for each data source based on the vocabulary they belong to and finally ranking those clusters based on their capability to answer the user query. there are several sub tasks aligned with the above list of tasks. in order to identify matching elements for user keywords, this research proposes maintaining pre-processed data as practiced by several other existing solutions [1], [6], [8], [10], [11]. with the pre-processed data, searching can be done faster. two types of pre-processed data are proposed for this phase namely attribute mapper and attribute index. a. attribute mapper for heterogeneity resolution in the federation vocabulary level heterogeneity is a common characteristic of rdf data source federations. even in the domain of academic publications, fields similar in meaning are represented in different labels. for an example, acm rdf data source identifies publication title as "title" and swe dblp identifies publication title as "label". however, in order to identify that all those different labels means the same entity, a mapping is required among those vocabularies. ontology alignment is a highly researched field which can also be utilized for resolving vocabulary level heterogeneity only. on the other hand, wordnet4 is a lexical database created for english. it has categorized different english words based on their similarities. also, it keeps details about the origin of words. for example, if we consider the word, "author", we can retrieve that the parent class of "author" is "person". also if we input "author" and "editor", we can retrieve the common word "person" which can be used to identify both of them. hyponym and hypernym relations indicated in wordnet serve this purpose. these are two options for resolving vocabulary level heterogeneity. when considering the terms in vocabularies, it was identified 4 http://wordnet.princeton.edu/ that some terms which are defined as concepts on some vocabularies are defined as properties on other vocabularies in the federation. therefore, both concept level and property level alignments are required in this approach. alignment api [12] was selected for this task as it has capability of property and concept alignment. its accuracy is better than other approaches [13], [14] and it also has the capability of integrating wordnet which provides added advantage. alignment api [12] is only capable of aligning two vocabularies at a time. therefore, this research uses a semiautomated approach to resolve vocabulary level heterogeneity among all the vocabularies in the federation and construct the attribute mapper. first, the required concepts for the specified scope are manually identified from a single vocabulary. for an example, if dblp is considered, "label" is the predicate used to identify publication title, "author" is the predicate used for indicating author list of a particular publication etc. then all other vocabularies in the federation are aligned with dblp source using alignment api. for an example align vocabulary of acm, vocabulary of citeseer with dblp each at a time. once all the output alignment details are received, only the alignment of all vocabularies gets started. all the entities (classes, attributes etc.) aligned with previously identified concepts of dblp are clustered together along with their data source details. this helps to extract different terms used by heterogeneous sources to identify same entity in the federation. b. attribute index for mapping keywords with data elements in the federation matching data elements for user keywords must be identified as the first step of keyword to sparql conversion. commonly used keyword index approach [5], [15] is decided to use for this phase with several modifications. definition 1: a keyword index is a keyword to element map which returns a set of matching elements to a keyword. rdf is a graph structured data store. data vocabulary elements are represented by vertices and edges of a graph. definition 2: a rdf graph g is a tuple (v, l, e) where • v is a finite set of vertices as the union ve u vv with entity vertices ve and value vertices vv • l is a finite set of edge labels as the union lr u la with relation labels lr and attribute labels la • e is a finite set of edges of the form e(v1,v2) with v1,v2 ϵ v and e ϵ e. following types of edges can be defined: o e ϵ la (attribute edge) if and only if v1ϵ ve and v2 ϵvv o e ϵ lr (relation edge) if and only if v1,v2ϵ ve o type is a special relation which indicates the membership of an entity to a particular class most of the available approaches have indexed both v and l in their keyword index, arguing that keywords can occur in v and l both. but cappellari et al. [7] mentions that keywords can mostly reside on sinks. sink is a node in rdf graph which does not have any outgoing edges from it. they have introduced a term called source to identify vertices which have no incoming edges. so they have only indexed sink vertices of a rdf graph. we also have decided to follow this approach and generate the index only for sink vertices. we are adopting the data structure used by tran et al. [8] to 5 thilini cooray, gihan wikramanayake june 2016 international journal on advances in ict for emerging regions store additional details about indexed elements. additional details are the set of adjacent edges directed from one-edge distant vertices to this element, set of primary edges which are the adjacent edges to a source which this element belongs to, type of the source ve to which this sink vv belongs to and data source identifier. this additional information is required efficiently identifying suitable sub-graphs for query generation. apache lucene5 document index was utilized for building our attribute index. in most data sources, sink elements do not reside in one edge distance from its source vertex. for an example if we consider a publication, it can have many authors. in such cases, a collection approach is required to store multiple authors. so the adjacent edge to the element may not be the adjacent edge to the source. adjacent edge to the source is the one which defines the relationship not the adjacent edge to the sink. situations where these two are different have not been addressed by previous approaches. therefore, this research proposes a primary edge to be stored as well as the adjacent edge. primary edge is the adjacent edge of the source to which element connects. there can be either no other edges between primary edge and adjacent edge on the path or both adjacent and primary edges can be same or several edges can occur between primary and adjacent edge. but they are not subject to store in the data structure. proposed approach uses same keyword index to store all sink values from all the data sources in the federation. if same element resides in two or more datasets, several data structures for each source are created and stored under the same key. only the sink element labels get indexed while all the information resides in data structure gets mapped to indexed element. therefore, it takes a less amount of space to store this keyword index than other methods’ indexes who index all the vertex data. fig. 2 shows a sample data graph fragment of dblp. “james peter” is a source as it does not have any outgoing edges. paper-peter95 is its source. source does not have any incoming edges. the data structure which returns from the attribute index for the term “james peter” is [james peter (node label), ns#1(adjacent edge), author (primary edge), book chapter (type of source), dblp (data source id)]. once attribute mapper and attribute index are ready, real time processing begins. using the key-value pairs received from query validator, this component identifies what are the variable fields of the keyword query. those variables are sent to attribute mapper and retrieve vocabulary dependent terms for them. condition values of the keyword query are sent to attribute index and receive matches. 5 http://lucene.apache.org/ fig. 2. data graph fragment of dblp primary edges of those index results are matched with attribute mapper to make sure we receive the values with the required attribute we want. now cluster those results based on the data source identifier. then we employ a data source ranking approach to decide which source is most capable of answering user query. in a nondistributed environment, it is most advisable to generate single query, process and send its results to users without keeping user waits until all queries for the entire federation are generated because users need efficiency. if a data source has matching elements for all the user keywords while another source only have matching elements for some of the user keywords, the former data source gets a high priority when query converting. those ranked data sources are then sent to query converter component. following table shows sample tuples of dblp and acm for the query “publication:?,author:james,year:1995”. variable field of the query is “publication”. that was sent to attribute mapper and “label” and “has-title” was retrieved for dblp and acm respectively. when consider condition values, dblp has matching elements for both “james” and “1995” while acm has matching elements only for “james”. therefore it is clearly understood that there is a more possibility of retrieving accurate results from dblp them acm for this query. therefore dblp gets higher priority. table i output tuples from keyword to attribute mapper source keyword primary edge adjacen t edge source type dblp james peter author ns#1 book chapter 1995 year year book chapter variable field : publication , vocabulary dependent value : label path index based keywords to sparql query transformation for semantic data federations 6 international journal on advances in ict for emerging regions june 2016 acm james mcclelland has-author fullname article reference variable field : publication , vocabulary dependent value : has-title vi. query converter this component is used to identify the most suitable subgraph which can connect identified keyword elements from the previous component and generate sparql queries based on identified sub-graphs. sparql queries are generated by identifying the format (template) of the sub-graphs which consist of the answers for the query. vertices of the target sub graph were found by keyword to attribute mapper. however, edges which connect those vertices were not provided. hence query converter first has to seek for a method for identifying relationships among sent elements by previous component. many approaches [6], [8], [9] try to find suitable sub-graphs by graph traversing at real time, which is highly time consuming. a path store [7] is utilized in this research for sub-graph identification because it has totally eliminated real time graph traversal. objective of path index is to keep records of how vertices (classes, property values and literals of rdf graphs) are connected to each other prior to actual query processing starts. paths are defined as the route from given element to another in rdf graph. efficiency of query processing improves by utilizing those pre-processed data. therefore, the cost required for real time graph traversal reduces heavily. those path data are stored in a relational database. sources and sinks are the main elements required for defining paths. full path is defined by a route from a source to a sink. template of a path is retrieved by replacing vertices of a path by wild cards. templates indicate the relationship among vertices. when considering the sparql query generation context, those templates are the components which we need to find for generating queries. when considering a sparql query, intermediate nodes of a path is always indicated by variables. therefore, we can easily generate queries by utilizing suitable templates. database schema presented in path index mainly consists of four main relations; node, template, path and pathnode. path index assumes that user queries targets only sinks [7]. therefore, only data about sinks are stored in node table. path table keeps data about all the unique paths from sources to sinks. each tuple consists of path is, template id, length of the path and id of the final node of the path. pathnode table keeps track of which node resides in which position of a path. index organized tables concept is used for indexing these tables. database schema presented in path index was adopted for this research due to its simplicity and usefulness for generating queries. cappellari et al. [7] has presented a breadth first search based approach for exploring the rdf graph when populating tables with data in the database. this approach stores intermediate paths as well as full paths in path table. when utilizing path index for sparql query generation, it was decided that storing full-paths are only required. since the main intention of using this index is to identify relationships among keyword elements, sources can be considered as most promising connecting elements which can reach many other elements quickly once matching elements are found from sinks. therefore sources can behave as local connecting points when we are trying to find the best sub-graph to connect keyword elements. a. full-path identification cappellari et al. [7] have presented a breadth-first search based algorithm for capturing full-paths in a given rdf dataset. however that algorithm stores full paths as well as partial paths which requires huge unnecessary space. therefore a depth-first search based algorithm was proposes in this methodology which can identify full-paths accurately avoiding unnecessary space wastage caused by the original algorithm. proposed algorithm explores the rdf graph in depth first manner. sources available for each dataset are identified and depth first traversal starts from each source. it searches the entire rdf graph until it meets all the sinks which can be reached by the current source. algorithm locally creates an n-ary tree considering current source as the root. all the triples whose subject is root are considered as the branches of the tree. if the object of each triple has become a subject in another triple, those branches grow accordingly. sink nodes never become a subject of a triple so occur as leaves of the tree. proposed algorithm for generating path tree for each source is shown below. algorithm 1 dfs based graph exploration for full-path capturing input : rdf triple dataset data, path tree tree, parent node p, matching triple set triples 1. for each triple t in triples do 2. add t to node p on tree 3. var newtriples = get triples from 4. data (subject = t.object) 5. if (newtriples.count > 0) then 6. dfsbasedgraphtraversal (data, tree,t.object,newtriples) 7. end if 8. end for once a source-tree is generated, a method is required to identify all the full paths in the tree because those are needed to be stored in path index. since tree nodes have only a single parent, full-paths can be captured by recursively traversing from each leaf to root. proposed algorithm is shown below. algorithm 2 complete algorithm for path index building input : rdf triple dataset data, path tree tree, source list sources 1. for each source s in sources do 2. var triples = get triples from data (subject = s) 3. var tree = generate tree (root = s) 4. algorithm 1 (data,tree,s,triples) 5. var pathlist 6. var leaves = tree.leaves 7. for each leaf l in leaves do 8. var leafnode = leaf 9. var path 10. while (leafnode tree.root) do 11. add leafnode to path 12. leafnode = leafnode.parent 13. end while 14. add path to pathlist 15. end for 16. store pathlist in pathindex 17. end for 7 thilini cooray, gihan wikramanayake june 2016 international journal on advances in ict for emerging regions b. resolving cycles caused by inverse properties inverse relationships are a main reason for occurring cycles in rdf graphs. suppose there is a triple is a rdf graph whose subject is s, predicate is p, object is o. if there exists another triple in the same rdf graph whose subject is o, predicate is p’ and object is s, p and p’ has an inverse relationship. both triples connected by an inverse relationship contains same amount of information. therefore removing one does not cause any information loss in the rdf graph. a popularity based approach is used for filtering the most appropriate triple. popularity is measured by the number of unique predicates each subject has. higher the number of unique predicates, higher the amount of information it has access to. proposed algorithm for inverse property based cycle resolution is shown in algorithm 3. algorithm 3 resolution for inverse property based cycles in graphs input : rdf triple dataset data output : cycle resolved dataset data initialisation : inverse statement list inverselist 1. for each triple t in data do 2. if (inverselist.notcontain(t)) then 3. var inversetriples = get triples (subject=t.object,object=t.subject) 4. for each inverse i in inversetriples do 5. var subjectpopularity = get unique predicate count (i.subject) 6. var objectpopularity = get unique predicate count (i.object) 7. if (subjectpopularity > objectpopularity) then 8. add t to inverselist 9. remove t from data 10. else 11. add i to inverselist 12. remove i from data 13. end if 14. end for 15. end if 16. end for c. path index based sub-graph identification and sparql query generation once path index generated, sparql query generation should be done in real time. once mappings are retrieved, path index is queried to retrieve paths whose final node is the value of the mapping received from attribute index. there can be several sub-graphs in a data source which can connect those keyword elements. but they all use the same schema (vocabulary) consider a situation in dblp where 3 subgraphs exist which connects author james, year 1995 with 3 publications. those publications become answers as publication was the variable in user query. if all the vertices of each sub-graph are replaced with wild cards, the result graph is totally similar. that graph is the sub-graph which is needed to traverse to get answers. that is the sub-graph which we should convert to sparql syntax for generating the query. this sub-graph with wildcards is known as template graph. . for an example, template of full-path “paperperter95-year-1995” is “#-year-#”. when converting a sub-graph to sparql syntax, we have to replace the vertices by variables. several different templates can be received as matching paths when same keywords repeat under different concepts in the vocabulary. for an example, consider person "james". he can be a program committee member of one conference in 1995 while being an author of publications. since templates were extracted only considering full-paths whose sink nodes are keywords without focusing on their relationship with variable field, templates matching for both scenarios will occur. if results were generated for both these template graphs, overall relevancy of results will become low. therefore a filtering process is required. additional information stored in index documents comes to use at this situation. filtering process considers templates which matches with primary and adjacent edges keyword elements and ignore others. if there are several results on this approach, shortest template will be selected as the suitable sub-graph as lesser the length of the path, faster the query processor can reach it and output results. tran et al. [8] has also mentioned path length as a common matrix used by many graph traversal approaches to rank selected sub-graphs. lesser the path length, there is a high probability that it would reduce the overall size of the sub-graph it resides in. next this proposed methodology looks for extracting suitable templates for variable fields in the user query. shortest template which contains the vocabulary specific properties of the variable field is selected as the template. a sub-graph which can connect all the keywords is required for generating a sparql query. now paths for each field have been retrieved, finding connecting elements is required to generate a sub-graph using these paths. details stored in path index are used for finding connecting elements. sources of the data sets were identified while building path index. sources are operating as centre nodes to connect all the property values of the source together. sources mostly represent the main focus of the vocabulary. for example, if dblp and acm vocabularies are considered, publication is their main focus. all the attributes related to publication are defined as properties. therefore sources can be identified as a potential element for connecting the paths for generating a possible sub-graph. sparql queries are generated based on this argument. following is a sample query generated for the example keyword query “publication:?, author:james, year:1995”.. if there are several matching elements for a single keyword (ex : many different people with "james" as a part of their name) or many different sub-graphs match for the query, filter option of sparql is used in generated queries. following is an example is a scenario where there are multiple authors named "james" available. select ?z where { ?x type book chapter . ?x label ?z . ?x year 1995. ?x author ?y. ?y ns#1 “james peter”. } path index based keywords to sparql query transformation for semantic data federations 8 international journal on advances in ict for emerging regions june 2016 vii. results the proposed approach was implemented using java with the support of apache jena semantic web library and oracle 11g dbms. once a keyword query is inserted, it identifies the data sources in the federation which can answer the keyword query and transforms input query to vocabulary dependent sparql queries to match with the data sources in the federation. evaluation setup is as follows. a federation of 3 publication data sets were created. dblp 6 rdf dataset with 37446 triples, acm7 rdf dataset with 63980 triples and semantic web dog food8 (sdf) dataset with 37105 triples were used as test data. portions from dblp and acm data sets were used because sdf data set was not large enough as them and publication details between 1986 and 2005 were only assessed due to the availability of data on all 3 data sets. these three sources were selected based on following aspects. publishers of these data sets have exploited three different ontologies (swetodblp, aktors and swrc ontologies respectively) for publishing these data. therefore schema level heterogeneity is clearly showcased among the selected data sources. rdf data are stored in graph structures. therefore problems in graph data handling also arise when dealing with rdf data. cycles are one such problem. here we selected two data sets with cycles. acm data set has cycles caused by the subject of a triple has become its own object. swrc ontology is an integrated ontology of several ontologies. therefore it has inverse relations. these inverse properties have introduced cycles in sdf data set. these data sets are used to experiment the proposed approach’s ability to deal with common cyclic scenarios of rdf data. dblp data set does not consist of cycles. however, specialty with this data set is that it consists of blank nodes. blank nodes are anonymous nodes in a rdf graph. these can be used to group subproperties of an instance. likewise, data sets which exhibit different characteristics which covers most of the common characteristics of rdf graphs are 6 http://lsdis.cs.uga.edu/semdis/swetodblp/ 7 http://datahub.io/dataset/rkb-explorer-acm 8 http://data.semanticweb.org/ used for the experiment in order to show the generalizability of this approach for rdf federations. experiments were conducted on a machine with amd v120 processor of 2.2 ghz and 4gb ram. a test keyword query set of 10 queries were created by considering all the possible combinations of the three keyword fields (author, year, publication) we selected for academic publication data. table ii shows test query set. table iii test query set keyword query q1 publication : consistency , author : ? q2 publication : distributed , year : ? q3 author : sylvia , year : ? q4 author : andrew , publication : ? q5 year : 1986 , author : ? q6 year : 1988 , publication : ? q7 publication : multimedia , author : daniel , year : ? q8 publication : concurrent , year : 1990 , author : ? q9 author : david , year : 2004 , publication : ? q10 publication : protocol , author : ? , year : ? a. quality evaluation of the federation the first evaluation criterion was to evaluate whether our proposed approach actually carries out its intended task of correctly translating user keyword queries to sparql. measurements were taken by considering the relevancy of the retrieved results by executing the generated queries. since main intention of this approach is to give general users more relevant and accurate results using the privileges of semantic web, quality of results were measured using following measurements. quality of the generated sparql queries were evaluated by measuring precision, recall and f-measure of the results received for above mentioned test query set. fmeasure is a balanced measurement used to capture the balance between precision and recall of each result set. this measure was used as some results can be high in precision but low in recall and vice versa. f-measure gives a balanced score in such situations. gold standard results were obtained by running sql queries on path index of the data sources and manual evaluation on the raw rdf data sets. fig. 3 shows the results graphically. fig. 3. quality evaluation of the federation select ?z where { ?x type book chapter . ?x label ?z . ?x year 1995. ?x author ?y. ? y n s # 1 ? a . f i l t e r ( r e g e x ( s t r ( ? a ) , " j a m e s " ) ) . } 9 thilini cooray, gihan wikramanayake june 2016 international journal on advances in ict for emerging regions overall precision has an average of 0.98 and overall recall has an average of 0.9. based on the precision results, proposed approach is capable of generating queries which can give more accurate results. however, a loss of recall was detected compared to precision. this was caused by the decision made about finding the connecting elements of subgraphs. the source node was selected as a connecting element and its type was decided by the type which has maximum matching from the attribute index. sometimes this type did not match with the actual source of the paths which were retrieved from the path index. it caused loss of results. another point was that the capability of a data source in producing results were decided for a user query if that source have matching elements for all the keywords in the query. however there are situations which keyword element reside in the rdf graph, but there is no a sub-graph which can actually connect them all. in such situations generating a query is wastage of effort. then recall values were compared of each data source with the recall of the federation. this evaluation was done to identify whether there is a significant impact on the result by processing the query over a federation rather than on an individual data source. one of the objectives of transforming a keyword query to sparql on a federation was to give a more complete result set to the user. if a single source is capable of giving the same result set, there is no requirement of a federation. recall is the measurement which indicates the contribution of each source over gold standard result set. fig. 4 shows that federation gives more complete results over individual sources. therefore it can be shown that federation approach is capable of producing more complete results than any of the individual sources. fig. 4. recall comparison of the federation b. performance comparison for the proposed path index based approach performance evaluation was carried out after receiving satisfying results for the quality of the proposed approach. prior to this research, all the presented approaches for this task [8], [28] have either used online graph traversal approach or natural language processing support for query translation. path index [7] was first suggested for keyword searching. its functionality was exploited for generating sparql queries from keyword queries. an index based approach was adopted related to this method used by tran et al. [8] both these approaches and searchweb have utilized a keyword index for mapping keywords with data source elements. however proposed method used a path index based approach for identifying substructures which can connect keyword elements while they followed a real time graph traversal mechanism. the graph exploration and top-k query calculation approach presented in [8], performs better compared to other available methods for finding sub-graphs such as backward search [9], bidirectional search [6], breadth first search and depth first search. tran et al. [8] has shown in their evaluation that query generating time has achieved a comparable decrement by using their graph exploration mechanism. natural language based approach suggested in [28], is totally deviated from the proposed approach. the intention of using a path index based approach for query generation was to experiment whether it can achieve a performance gain in query generation time by pushing graph traversing to the pre-processing stage. therefore, query generation time of proposed approach against searchweb approach presented by tran et al. [8] was evaluated. time taken to identify the most suitable query substructure which can be used for query generation as our matrix of performance was measured. once it is identified, either proposed approach or their approach could have been used for translating the substructure to match with sparql query syntax. time taken by searchweb approach to generate top-1 query was only considered since proposed approach only output a single query per data source. query generation time only for acm data source was compared at this section as searchweb method has only suggested for query generating for a single data source. in the next section, the performance comparison when it comes to federation scenario will be demonstrated. fig. 5 shows the performance comparison. fig. 5. searchweb vs. proposed approach for a single source figure illustrates that proposed approach has a significant performance gain compared with searchweb suggested by tran et al. [8] for the tested query set. since search web has already been outperformed other related methods, we only evaluated against searchweb. main reason behind the high query time for search performance is its real time graph traversing. even though searchweb doesn’t do a full graph traversal in query generation time, it uses graph traversal. first, they create a graph summary by extracting the class vertices and entity vertices from the original rdf graph. this summary graph behaves as the schema. once the matching elements are retrieved using the keyword index, searchweb embeds those matching elements to the summary graph while exploiting the "adjacent edge" property in the retrieved index records. therefore, more the keyword matching elements, bigger the summary graph will be. path index based keywords to sparql query transformation for semantic data federations 10 international journal on advances in ict for emerging regions june 2016 keyword elements matching for "distributed" in q2 is around 50 and "andrew" in q4 is around 30. once those 30 vertices are added to the summary graph, it gets bigger. searchweb traverse through all the possible starting from the shortest once generated the augmented summary graph by adding those matching elements. this is done to retrieve elements which can connect all keyword elements. once a connecting element is found, searchweb considers all the path combinations among keyword elements even they have same adjacent edge to get the shortest path. bigger the graph, higher the number of combinations will be. this leads to high query generation time. q7, q8 and q9 have two conditional keywords in the query. therefore augmented graph of searchweb becomes bigger. number of possible paths among elements also increases drastically. that is the reason for the huge query generating time. in comparison, proposed approach shows a huge performance gain as there is no real time graph traversal in it. no matter how much matching elements are output from the keyword index, if they have same template, all of them are considered as a single element from template’s point of view. c. performance comparison for the federation query generation times for the entire federation. proposed approach is capable of generating sparql queries to all the sources in the federation in one go. however searchweb can only generate a sparql query for one source at a time. therefore queries to all the sources in the federation were repeatedly generated and got the total time for the comparison. a huge performance gain was retrieved by this proposed approach this time as well. this showed that real time graph traversal for sparql query generation and keyword searching is highly inefficient when dealing with huge data sources. proposed approach is the first research which utilized path index [7] for sparql query generation. satisfying results were obtained on its performance. fig. 6 shows performance comparison for the federation. fig. 6. performance comparison for the federation d. proposed federation approach vs. digital libraries the main reason for emerging semantic web concept over web of data is ability of semantic web to extract the meaning and relationships of data elements and exploiting them forgiving more meaningful and relevant results for user queries. once after comparing the performance of the proposed approach, this section focused on evaluating this aspect by running the same set of user queries on both digital libraries and proposed semantic web approach. three digital libraries were selected for this evaluation. dblp, acm and semantic web dog food data dumps were used for sample federation in this research. therefore dblp digital library 9 , acm digital library 10 and semantic web 11 online data store were selected as test digital libraries for this evaluation criterion. google scholar4 was not used for this evaluation as it consists of publications from many sources, not only from above three. among these three digital libraries, dblp and acm only have advanced search capabilities. semantic web dog food website doesn’t have an advanced search capability. therefore, semantic web digital library was excluded from evaluation. when consider the query executable ability of digital libraries compared to proposed semantic web federated approach, proposed approach was capable of executing all the test queries. this means the proposed approach was capable of executing any combination of target query fields (publication title, author name, published year). in contrast, digital libraries were mostly capable of providing direct answers when only question needs publication titles as results. for an example, consider q4 and q6. they ask for the paper titles authored by author named "andrew" and paper titles published in year 1988. in q9 user requests titles of publications authored by "david" in 2004. dblp and acm were able to answer those three queries correctly but semantic web search was not advanced enough to directly provide answers to match with those conditions. when it comes to queries like q1, q2, q3, q5, q7, q8 and q10, those queries does not request publication titles. for an example, q1 wants all the author names publishes papers titles with "consistency". q3 wants a list of years which author "sylvia" has publications. digital libraries were not capable of providing either an author name list or year list as answers for these queries. they provided a list of publications with word "consistency" for q1. by manually extracting only their author names could be retrieved. similarly for q3, a publication list was received which were authored by "sylvia". their years needed to be manually extracted. years were easier to be captured in dblp while author list could be easily received with filtering in acm. in comparison, proposed semantic web approach was capable of providing direct answers for all those queries as well. fig. 7 shows comparison among f-measures of dblp and acm digital libraries with proposed federated approach. fig. 7 quality comparison of digital libraries vs. proposed approach 9 http://dblp.uni-trier.de/ 10 http://dl.acm.org/ 11 http://data.semanticweb.org/ 11 thilini cooray, gihan wikramanayake june 2016 international journal on advances in ict for emerging regions this shows that our semantic web based federation approach is capable of giving highly relevant results for user queries than existing keyword matching digital libraries. google scholar12 follows a nonsemantic web approach for combining all those results. however it also cannot answer queries like q1, q2 and q5 directly. it also only focuses on filtering publications. if this proposed approach can be applied on all the publication sources, it could have performed better in answering all the publication related queries no matter whether it is about an author, publication, year or venue. this clearly exhibits the usefulness of semantic web and level of accuracy which can be gained from executing sparql queries over keyword based queries. in queries like q1, q2 and q5, user needs a list of authors or years. semantic web approach was capable of giving that relevant information to user. however digital libraries were unable to output the relevant results. users have to manually filter in order to retrieve the relevant result set. therefore in relevancy of results, proposed semantic web approach stays in a high level compared to digital libraries including google scholar. viii. discussions a path index based keywords to sparql query transformation mechanism which aims at rdf data source federations is presented in this paper. a keyword index along with ontology alignment based vocabulary level heterogeneity resolution approach was suggested to identify matching elements for user keywords. a path index based approach was used for identifying suitable sub-graphs for connecting keyword elements. real time graph traversal is one of the drawbacks of existing keyword query processing approaches as they extremely effect the performance as rdf data sources contains thousands to billions of nodes. this approach has totally eliminated real time graph traversal for query generation by generating an index for graph data in the pre-processing stage. it showed a significant performance gain over existing query generation approaches. promising level of results were achieved from the quality evaluation of this approach and it was proved that federations are capable of giving more complete results for a user query than just querying from a single data source. this research also emphasized that semantic web related keyword query processing approaches can give more relevant results for user queries than traditional keyword matching. this research can be further extended by including capabilities for handling more relaxed format queries on this approach. an approach which can accurately decide the connecting elements for the extracted paths from the path index in order to generate sub-graphs can be used to further enhance the accuracy of this approach. a mechanism which can decide whether there is actually a sub-graph existing for a given set of keywords before generating the sparql query is 12 https://scholar.google.com/ needed to be merged with this approach in the future. also this research can be further extended to generate sparql queries for linked data sources by exploiting the path index. map-reduce based mechanisms can be used to enhance this approach to function on distributed environments which will improve the scalability of this approach. acknowledgment we would like to thank sarasi lalithasena, phd candidate of kno.e.sis research center, wright state university, usa, for her expert advice and encouragement throughout this research. references [1] hristidis l. g. v. and papakonstantinou y. (2003). efficient ir-style keyword search over relational databases. vldb, pp. 850–861. [2] hwang v. h. h. and papakonstantinou y. (2006). objectrank: a system for authority based search on database. sigmod conference, pp. 796– 798. [3] liu w. m. f., yu c. t., and chowdhury a. (2006). effective keyword search in relational databases. sigmod conference, pp. 563–574. [4] kimelfeld b. and sagiv y. (2006). finding and approximating top-k answers in keyword proximity search. pods, pp. 173–182. [5] wang h. he, h., yang j., and yu p.s. (2007). blinks : ranked keyword searches on graphs. proceedings of the 2007 sigmod international conference on management of data, acm. [6] kacholia s. c. s. s. r. d. v., pandit s., and karambelkar h., (2005). bidirectional expansion for keyword search on graph databases. vldb, pp. 505–516. [7] cappellari p., de virgilio r., maccioni a., and roantree m. (2011). a path-oriented rdf index for keyword search query processing. dexa, pp. 366–380. [8] tran s. r. t., wang h., and cimiano p. (2009). top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. icde,ieee. [9] bhalotia c. n. s. c. g., hulgeri a., and sudarshan s. (2002). keyword searching and browsing in databases using banks. icde, pp. 431–440. [10] tran t. and zhang l. (2013). keyword query routing. ieee transactions on knowledge and data engineering, 1(2). [11] freitas a., curry e., oliveira j.g., and o’riain s. (2011). a distributional structured semantic space for querying rdf graph data. international journal of semanticcomputing, 05: 433–462. [12] david f. s. j., euzenat j., and dos santos c. (2011). the alignment api 4.0. semantic web journal, 2(1): 3–10. [13] jain a. k. p., hitzler p., and yeh p. (2010). ontology alignment for linked open data. the semantic web iswc 2010, pp. 402–417, springer berlin heidelberg. [14] li y. j., tang j. and luo q. (2009). rimom: a dynamic multistrategy ontology alignment framework. knowledge and data engineering, ieee transactions, 21: 1218–1232. [15] tran t., wang h., and haase p. (2009). hermes : data web search on a pay-as-you-go integration infrastructure. web semantics: science, services and agents on theworld wide web, 7(3):189–203. [16] sheth a.p. and larson j.a. (1990). federated database systems for managing distributed , heterogeneous , and autonomous databases. acm computing surveys (csur), 22(3): 183–236. [17] makris k., gioldasis n., and bikakis n. (2010). ontology mapping and sparql rewriting for querying federated rdf data sources ( short paper ). on the move tomeaningful internet systems, otm 2010, pp. 1108–1117, springer berlin heidelberg. path index based keywords to sparql query transformation for semantic data federations 12 international journal on advances in ict for emerging regions june 2016 [18] correndo g., salvadores m., millard i., glaser h., and shadbolt n. (2010). sparql query rewriting for implementing data integration over linked data. proceedings of the 2010 edbt/icdt workshops, p. 4, acm. [19] makris k., bikakis n., gioldasis n., and christodoulakis s. (2012). sparql-rw : transparent query access over mapped rdf data sources. proceedings of the 15th international conference on extending database technology, no. c, pp. 610–613, acm. [20] gunaratna s. k. and sheth a. (2014). alignment and dataset identification of linked data in semantic web. wiley interdisciplinary reviews: data mining and knowledge discovery, 4: 139–151. [21] david f. j. and briand h. (2006). matching directories and owl ontologies with aroma. proceedings of the 15 th acm international conference on information and knowledge management, pp. 830–831, acm. [22] wikipedia, breadth-first search,” 2002. [online], http://en.wikipedia.org/wiki/breadth-first_search (accessed: 21 june 2014). [23] wikipedia, “depth-first search,” 2002. [online], http://en.wikipedia.org/wiki/depth-first_search (accessed: 21 june 2014). [24] florescu d. k. d. and manolescu i. (2000). integrating keyword search into xml query processing. computer networks, 33(1-6): 119–135. [25] guo c. b. l., shao f. and shanmugasundaram j. (2003). xrank: ranked keyword search over xml documents. sigmod conference, pp. 16–27. [26] yu b., li g., sollins k.r., and tung a.k.h. (2007). effective keyword-based selection of relational databases. proceedings of the 2007 acm sigmod international conference on management of data, pp. 139–150, acm. [27] vu q.h., ooi b.c., papadias d., and tung a.k.h. (2008). a graph method for keyword based selection of the top-k databases. proceedings of the 2008 acm sigmod international conference on management of data, pp. 915–926, acm. [28] unger c., bühmann l., lehmann j., ngonga ngomo a.c., gerber d., and cimiano p. (2012). template-based question answering over rdf data. proceedings of the 21st international conference on world wide web, vol. acm, pp. 639–648. microsoft word revised model paper final 1 abstract—undoubtedly information and communication technologies (ict) contribute to development; however there is a need to know how and the extent to which development occurs. moreover the evaluation of the ict contribution to development has been challenged from theoretical, ethical and methodological angles. this paper addresses some of these challenges by proposing a model that enables systematic evaluation of the ict contribution to development. the proposed model is conceptually motivated by amartya sen’s capability approach that defines development as freedom. development is a process that involves the provision of opportunities (capabilities) from an ict resource, as well as actually exploiting the opportunities to realize development benefits. the conversion of resources to opportunities and opportunities to development benefits is facilitated or inhibited by various contextual factors. development from the capability perspective is both peoplecentered and multidimensional. this requires consideration of both instrumental effectiveness and intrinsic importance. consequently five evaluation dimensions concerning social and economic development are proposed; namely research and education opportunities, healthcare, economic facilities, political freedoms and psychological wellbeing. ict4d evaluation indicators are suggested for each dimension and a multi-criteria decision analysis (mcda) structured evaluation process is proposed to guide the evaluation. the application of a structured evaluation approach is illustrated through the example of an online learning environment at a university in a developing country. future research is underway to further apply and validate the model in practice. index terms—capability approach, development evaluation, ict contribution to development, ict indicators, ict4d, ict4d impact evaluation, multi-criteria decision analysis. i. introduction arge information and communication technology (ict) investments especially in developing countries are motivated by the notion that ict actually contributes to development[1]. despite the high rates of ict diffusion and uptake over the years, the equally high failure rates continue to raise skepticism as to whether ict is actually contributing to development. this explains the increase in ict evaluation studies aimed at establishing whether and how ict contributes to development [c.f. 2, 3-7]. such evaluation facilitates the identification of benefits achieved from investments; advises future investments; enables prioritization; forecasts potential impacts; as well as facilitating accountability exercises [1:628]. the evaluations further provide an understanding of the complexities involved in the translation of an ict resource into a development benefit. over the years ict4d evaluation has moved from focusing on assessing diffusion in terms of availability, access and use of ict, to measuring benefits and sometimes challenges of ict [1]. while availability, access and use evaluation approaches are mostly performed at the macro level of analysis; the latter is mostly micro-based. it focuses on individual or community evaluation and is achieved predominantly through qualitative in-depth descriptions. despite the increase of studies in ict4d impact evaluation, the contribution ict makes to development is still elusive [8]. this is evidenced by the existence of calls for such studies [i.e. 1, 9-11], to which the current study aims to respond. one of the concerns is how an objective and structured approach can facilitate the evaluation of the ict contribution to development. it is noted that while the qualitative evaluation approaches to ict4d offer rich in-depth explanations of how development has occurred, they are normally difficult to report, and may require longer study periods which are subsequently expensive[8].the difficulty in reporting is especially true for evaluations at macro level, or exercises involving several projects for which in-depth qualitative assessments may not be viable. the evaluation is also challenged by other methodological and ethical factors. principle among these is how a developmental impact could be attributed to a single intervention since impact occurs after sometime, and there could be other contributing factors. although this could be addressed through systematic evaluation exercises, such approaches may not be possible in certain instances e.g. due to costs or when the need for impact assessment is realized later in the project lifecycle. gomez and pather [8] and heeks [1] also cite the lack of wellformulated theoretical foundations to devise appropriate impact measures that guide data collection as well as analysis. in addition since studies into the contribution of ict4d evaluation exercises are in their infancy, there are challenges regarding the availability of data. this calls for structured approaches to facilitate an objective impact evaluation process of the ict contribution to development [8, 12]. it is envisaged that the structured approach streamlines the data collection and analysis process to ensure that the method is not too simplistic to overlook essential details and neither is it too elaborate to inhibit proper reporting. to contribute to a growing field of ict4d evaluation, this paper addresses some of these challenges by specifically proposing a model that enables systematic evaluation of the ict contribution to development based on an indicator-based approach. while the use of indicators in the towards a structured approach for evaluating the ict contribution to development florence nameere kivunike, love ekenberg, mats danielson, f. f. tusubira l 2 evaluation of the ict contribution to development is still in its infancy, this has been demonstrated in an earlier research which laid a foundation for this extended study [13]. gigler [14] also demonstrates how an indicator based approach can be used to evaluate people’s perception of the impact of the internet in the different well-being dimensions. he further demonstrates the importance of target beneficiaries in such an evaluation exercise; and the contextual factors determine whether icts will or will not contribute to wellbeing. it is also envisaged that a structured approach would facilitate the evaluation of the ict contribution at a higher (macro/meso) level such as national development goals or strategies at policy level e.g. healthcare delivery, education, universal access etc. according to walsham [15] evaluations at the strategic policy level are indicative of a move towards inclusive development rather than selective development for only a selected few. a literature review of the current state of evaluating the ict contribution follows. a discussion of the underlying conceptual foundations applied in this study; as well as the composition and interactions of the proposed model are then presented in the next section. this is followed by an outline of a multi-criteria decision approach that illustrates how the evaluation can be performed. the paper concludes with a discussion of limitations, and recommendations for future works. ii. literature review – evaluating the ict contribution to development there is substantial research into the evaluation of the ict contribution to development, specifically from a development perspective. the main development concepts guiding ict4d evaluation are amartya sen’s [16] capability approach and the sustainable livelihoods approach sla [17]. for example hatakka and lagsten [18] apply the capability approach to assess how students use internet resources to facilitate their learning. respondents were masters’ students of an international e-government course that mostly consisted of students from developing countries. the study reveals that internet resources facilitate student learning at the educational, personal and professional levels. however the students’ choice to exploit/benefit from the resources is restricted by a number of factors including personal interests and motivation, the incentives to use the resources, the applied pedagogical techniques, etc. the main aim of the study was to test the capability approach as a development evaluation approach. ibrahim-dasuki et al [3] also apply the freedoms concept of the capability approach as an evaluative space of the developmental impact the electricity pre-paid billing system has had in nigeria. they generally established that the project had not fully realised its development impact. for while the pre-paid billing system had enabled the freedom of transparency through the elimination of estimated billing, the officials still requested for bribes to have the system installed at the consumers’ premises (pp. 43). furthermore de’[19] and madon [7] demonstrate how various components of the capability approach may be applied for the development appraisal of e-government projects in india. de’[19]applies the development concept to evaluate bhoomi, a land records digitization project. while it was a successfully implemented project that met all the implementation goals, the development benefits towards meeting citizen needs were not so clear. parkinson and ramirez [20]also applied the sla for the assessment of the impact of a telecentre in colombia on the livelihoods of people within the community. they argue that the sla facilitates a broader scope of analysis and provides better analytical rigour. it enabled them to establish the kinds of risks and vulnerabilities that people faced, some of the key factors that determined their livelihoods, and how their use of the internet or other telecentre services link to their livelihood strategies. the majority of the ict4d evaluation studies cited above apply the capability approach to perform in-depth descriptive analysis. as a point of departure and contribution to this body of knowledge, the model suggested in this study illustrates the use of a structured evaluation approach based on indicators in the evaluation of the ict contribution to development. iii. conceptual foundations drawing on theoretical and conceptual foundations is essential to realizing sound evaluation approaches to support ict evaluations [1, 8]. this facilitates the understanding of how technology interacts with society to achieve development. ict4d studies fall within an emergent multidisciplinary field now referred to as “development informatics” that seeks to integrate development theories within information systems, communication studies as well as computer science [21]. this fairly new field resulted from the realization that there is more to ict4d than just diffusion, adoption and use. the need to establish the real ict benefits in terms of what they are used for within various contexts called for new approaches. consequently, there is need for sound theoretical premises as a basis for research on how ict got integrated and affected people’s everyday lives, businesses as well as national and international development goals. starting with the ict4d value chain as a guide, the focus of evaluation in terms of the ict4d implementation lifecycle in this study is identified. the capability approach is then applied to facilitate the definition and understanding of what development is and how it is realized. a. the ict4d value chain the ict4d value chain model [22, 23] facilitates an understanding of ict4d evaluation. it is based on the standard input-process-output model linking resources and processes to systematically analyze the stages an ict initiative traverses over time (see fig. 1). according to the value chain, the input i.e. an ict4d intervention in combination with fulfilled prerequisites (i.e. policies and implementation skills) will result into a successful deliverable e.g. a telecentre, e-library platform etc. these deliverables once exploited by the target beneficiaries result into outputs, which lead to outcomes and ultimately impact. the realization of outcomes from outputs as well as impact from outcomes is affected by various contextual factors such as skills, institutional barriers and 3 cultural or personal beliefs among others. over the years interest in the domains along the value chain has shifted from readiness, availability and uptake towards development impact [1, 24]. this shift arises from the need for ict4d initiatives to demonstrate that they actually contribute to social and economic development. however the challenge in such evaluations is that as one moves from outputs to impact, evaluation becomes more complex since focus shifts from the technology to the development goals. as a result outcomes and impact cannot be attributed to a specific initiative since there are other factors or even initiatives that could have affected the resultant outcome. to address this challenge, it is argued in this paper that rather than aim at proving causality, emphasis should be placed on the contribution an initiative has made on social and economic development [25]. this refers to the change in terms of social and economic development resulting from the presence of that intervention, within the boundaries of the contextual factors. furthermore focusing on the contribution is appropriate in situations where baseline studies were not performed to facilitate a longitudinal evaluation of the initiatives. fig. 1. ict4d value chain adopted from [26] moreover the impact concepts i.e. outputs, outcomes and impact as per value chain have been severally defined based on various approaches applied to the design and evaluation of projects or programmes in international development. some of the main and interrelated approaches are the logical framework (lfa) [27], theory of change (including contribution analysis) [28], and results based management (rbm) [29] (see table i).to some extent all these are theorybased approaches that rely on the theory of change techniques to facilitate the assessment of whether and how an initiative causes or contributes to an impact. they may also apply as underlying guides to the value (or result) chain above to guide program or intervention designs or evaluations [30].this study adopts the output, outcome and impact definitions suggested by the ict4d value chain since it is a pivotal framework in this study. in addition, the ict4d value chain definitions are similar to those suggested by the rbm approach that is a widely accepted development evaluation approach. generally outputs are the immediate results of the program or initiative. these can either be goods or services such as workshops held, information produced or changes in skills etc. in this study ict4d outputs are the behavioral changes associated with technology use. heeks cites these as consisting of new information and decisions, new communication patterns and new actions and transactions that an ict enables. moreover outputs in telecommunications are similarly defined as information made available and retrievable by computer. outcomes (purpose) on the other hand are the effects of outputs; in this study they are the direct benefits in terms of measurable (both quantitative and qualitative) benefits as well as costs associated with the outputs. finally development impact refers to the ict contribution to the broader development goals: impacts are less tangible. they are the long-term effects of the interventions [24, 31]. the output and outcome definitions adopted in this study are similar to the opportunities and achievements concepts that are discussed in the subsequent section. however the value chain assumes a linear relationship between ict and development. this does not sufficiently represent the development process, since there are several aspects involved in explaining how and why development would result from an ict4d initiative[32].for this reason and the need to adequately define what development is and how it is realized in a given context; there is a need to adopt and integrate a development perspective as discussed in the following section. tablei outputs, outcomes and impact definitions approach output outcome impact results based management [29] outputs are changes in skills or abilities and capacities of individuals or institutions, or the availability of new products and services that result from the completion of activities outcomes represent changes in the institutional and behavioral capacities for development conditions that occur between the completion of outputs and the achievement of goals. impact implies changes in people’s lives. such changes are positive or negative longterm effects, directly or indirectly, intended or unintended. logical framework [33] these are the specific, direct deliverables of the project necessary to achieve the outcome. the immediate impact on the project area or target group i.e. the change or benefit to be achieved by the project the higherlevel identified situation that a project contributes towards achieving. contribution analysis [25, 34] these are goods and services outcomes cover the sequence of results (or impacts are the final or long-term 4 produced by the program. for example checks delivered, advice given, people processed, information provided, reports produced effects) – immediate, intermediate and final outcomes – following the delivery of outputs. consequences. for example environment improved, stronger economy, safer streets, energy saved ict4d value chain[1] outputs are the micro level behavioural changes associated with technology use. outcomes are the wider costs and benefits associated with ict. development impacts are the contribution of the ict to broader development goals. b. the capability approach a development theory perspective facilitates the definition of what constitutes development. for this purpose, sen’s capability approach [16, 35] is adopted since it facilitates a multi-dimensional, people-centered approach of defining what constitutes development. development according to sen is the expansion of freedoms (capabilities or opportunities) to enable people lead the lives they value [16:18]. development is more than the provision or access to a resource e.g. ict: it is about what ict can enable people be or do given their contextual aspects. one of the reasons freedom is central to development is for purposes of evaluation. sen [16:4] points out that “assessment of progress has to be done primarily in terms of whether the freedoms that people have [or value] are enhanced”. basically it looks at 1) development in terms of values e.g. being healthy, being educated or being happy; and 2) evaluate how these have been enhanced by, for example, access to the internet in a given context. the capability approach premise is that a vector of a resource is transformed into a capability set within the restriction of conversion (contextual) factors. the capability set consists of functionings things one can be or do to obtain the life they value. simply defined, the capability set is the opportunities a development initiative offers. achieved functionings on the other hand are the opportunities one chooses to exploit given his/her specific context. the multidimensional nature of the approach argues for a holistic evaluation of wellbeing that is not only limited to income, since wellbeing consists of aspects that income cannot satisfactorily measure [36]. examples include: greater access to knowledge, better nutrition and health services, more secure livelihoods, security against crime and physical violence, political and cultural freedoms, or participation in community activities. depending on the nature and purposes of evaluation, one can choose to focus on the opportunities1 (capability set), the achieved functionings, or both. the 1 for this study, opportunities capabilities and capability set are the same construct and may be used interchangeably to refer to a set of valuable functionings that an initiative offers or a person has effective access to. majority of applications of the capability approach focus on opportunities, arguing that while policymakers are mandated to deliver development opportunities, they to a great extent cannot decide how people choose to benefit from them [37]. the capability approach is also concerned about human diversity which results from people’s personal as well as external factors [35]. these factors, referred to as conversion (or contextual) factors determine people’s preferences and choices of the potential functionings. conversion factors are classified as personal the individual characteristics such as physical disabilities, motivation, level of education, age, gender and sex; social factors the external legalities or societal requirements that may consist of public policies, social or cultural norms and discriminating practices. another emerging social factor here are the intermediaries e.g. nongovernment agencies that seek to promote ict usage[32]. lastly, environmental aspects focus on location and accessibility to facilities, as well as technical aspects such as quality of service [18, 38]. an individual’s capability set comprises both wellbeing – the opportunities availed for a better life; as well as agency – one’s ability to choose from the availed opportunities based on personal values and circumstances[16]. agency takes into consideration the active involvement of beneficiaries in their development process; i.e. whether they choose to exploit the available facilities for the improvement of their lives or not depending on what they value, and the circumstances they are in. the following are the multiple evaluation spaces within which policies and initiatives can be evaluated:wellbeing freedom which focuses on the capabilities or opportunities an initiative fosters; wellbeing achievement which is the achieved functionings; agency freedom which evaluates the freedom to achieve whatever a person decides he or she should achieve[39:203]; and finally agency achievement which are the outcomes in terms of one’s values, including those of other people, beings and things[40:341]. iv. proposed ict4d evaluation model as suggested by the capability approach, the realization of development from an initiative is a process that besides the provision of the opportunities (capabilities) also involves the interaction of these capabilities with choice that is influenced by the conversion factors. for example hatakka and de' [41] analyze a project that successfully set up a distance education system providing opportunities for people to attain formal learning. however factors such as pedagogical training and low computer literacy were ignored, which affected people’s choices regarding the exploitation of the provided opportunities. this highlights two aspects: first, the need to perform a process analysis from capabilities to achieved functionings; and second, the need to explicitly establish the conversion factors that affect people’s choices. focusing on achieved functioning alone denies one insight into the process that is very essential given that development initiatives are highly contextually dependent. on the other hand, focusing on capabilities alone offers a limited development evaluation that does not actually establish whether development has occurred. 5 focusing on capabilities alone may also be perceived as techno-centric since evaluation is only performed on the opportunities an initiative can offer and does not investigate whether these were achieved or the factors that influenced this process. in relation to the value chain outputs, outcomes and impact concepts, the ict contribution to development is evaluated in terms of the outputs contribution to outcomes. the underlying assumption is that exploiting (choosing to use) available opportunities (outputs) will to a great extent contribute to the development achievements (outcome). in essence evaluation can be performed for the contribution the output makes; and the outcome where applicable; which is a process evaluation of how an initiative has contributed to development. a similar assumption is held and proven for the empowerment [42] and the choice [4, 43] frameworks both of which rely on the capability approach as a theoretical foundation. similarly garnham [12:33] points out that “[t]he point from a capabilities perspective is the assessment of what contribution the medium makes to enhancing its users’ range of functionings…”. according to the empowerment and choice approaches it is assumed that empowerment results into the realization of development outcomes. alsop and heinsohn [42:5] define empowerment as “enhancing an individual’s or group’s capacity to make choices and transform those choices into desired actions and outcomes”. individuals use their agency (human diversity, personal conversion factors) to explore opportunity structures (capabilities, social factors) resulting into empowerment (presence and exercise of choice) that enables development outcomes. development outcomes according to kleine [4:122] are ‘complex to describe’ but consist of choice as the primary outcome, and secondary outcomes which ‘will often be either sketches of overarching aims or limited to aspects relevant to a given context’. these secondary outcomes may consist of goals/aims that individuals or groups value within a given context or achieved functionings – a subset of the capabilities. based on the above discussion, the constructs of the proposed evaluation model include ict characteristics, conversion factors, opportunities (capabilities), and achievements (choice, personal or community goals, and achieved functionings) as shown in figure 2. the ict characteristics that a resource enables (communication; production, processing and distribution of information) provide opportunities within the limitations of the personal, social and environmental factors. achievements are the opportunities one chooses to exploit within the restriction of conversion factors, and choice is also explicitly evaluated as one of the achievements [44:74]. fig. 2. proposed ict4d evaluation model adapted from [35] . according to robeyns and van der veen [37:44] although governments (development partners, etc.) can provide opportunities, they cannot decide on how people live their lives. it is assumed that if someone’s ability to make choices is increased or strengthened, it will enable the choice of capabilities so that one lives the life they value. outputs as per these definitions are the opportunities, while outcomes are the achievements. the achievement of certain functionings enables other opportunities: this is shown by the double pointing arrows between outputs and outcomes. for example, sensitization on the benefits of using the internet empowers individuals to make wise decisions on how to use it. a. ict4d evaluation criteria dimensions and indicators according to sen, the expansion of freedom is both the means and end of development. from sen’s [16] viewpoint,“[t]he intrinsic importance of human freedom [ends], in general, as the preeminent objective of development is strongly supplemented by the instrumental effectiveness [means] of freedoms of particular kinds to promote freedoms of other kinds” (p. xii). an initiative should therefore be evaluated on its ability to increase people’s substantive freedoms such as self-esteem, as well as the instrumental freedoms that contribute to and guarantee people’s substantive freedoms. consequently, alkire [45] emphasizes that in defining an evaluative space based on the approach, it is important that considerations of both instrumental effectiveness and intrinsic importance are considered. sen proposes five instrumental freedoms that enhance people’s capabilities i.e. social opportunities, economic facilities, political freedoms, transparency guarantees and protective security. it is argued that the extent to which these are secured 6 is indicative of the level of an individual, household or community development [3:36]. since these freedoms are interrelated and supplement each other, earlier studies proposed three (social opportunities, economic facilities and political freedoms) out of the five domains for this study[13, 46]. a fourth dimension, psychological wellbeing is proposed as this evaluates the substantive freedoms such as choice and self esteem [32]. depending on the nature of the initiative being assessed all or just some of the dimensions may be applied except psychological wellbeing that should be evaluated for all initiatives since it affects the achievements in other dimensions (e.g. as discussed in relation to choice). as gomez and pather [8:11] argue, “attention to intangible and unquantifiable aspects (e.g. self-worth and the strengthening of social fabric) that are facilitated indirectly through use, or even the presence of icts, will provide a more complete and holistic perspective of ict impact.” alkire [47:7] further confirms that these measures “[..] might be used to provide insights into people’s values and perceptions with respect to other dimensions of interest [..]”. the dimensions are defined as follows: 1) social opportunities: which refer to arrangements society makes available to enable an individual to live a better life. from the capability perspective, this specifically focuses on education and healthcare 2) economic facilities: these refer to the opportunities that individuals enjoy utilizing resources for the purpose of consumption, production or exchange. this includes aspects such as productivity, employment, etc. 3) political freedoms: are the opportunities people have to exercise their political rights e.g. being able to participate in local elections, community development programmes, etc. 4) psychological wellbeing: refers to the physical, emotional and personal development opportunities. these are mostly a result of using ict or participating in ict4d projects. examples include gaining respect from peers or having an increase in self-esteem. psychological wellbeing has both substantive and instrumental value that enables people to exploit other opportunities in pursuit of development. it is envisaged that a set of criteria for each of these dimensions could facilitate an evaluation of the ict contribution to development (see proposal in appendix a). for each dimension, achievements (outcomes), and opportunities (outputs) are proposed. for example it is presumed that to assess whether an initiative has improved access to formal or non-formal education (outcome/achievement) in the research and education dimension; the following opportunities (outputs – what people do) are evaluated:  accessing information in relevant online resources e.g. research journals, online libraries  participating in online research collaborations e.g. through discussion forums  producing and publishing research outputs e.g. journals, patents etc. these are further granulated to define output and outcome indicators (see illustrative example in section iv). the indicators measure whether end users exploit the opportunity in terms of quality and the usage. quality seeks to establish whether end users actually value the opportunity, which will determine whether it is exploited. on the other hand usage focuses on the level of use. using the electricity analogy, roberts points out that ict is also ubiquitous. therefore “it is not the electricity or icts as such that make the (bulk) impact on economy and society but how they are used to transform organizations, processes and behaviours.”[48:90]. details of criteria formulation are discussed in another publication [49]. the indicators proposed in this study are mostly qualitative and do not require precise data specifications. it is envisaged that the qualitative assessment facilitates a structured, approach that provides sufficient information to report the ict contribution to development. elicitation of data for this approach relies on beneficiaries’ perceptions, which can be imprecise information about how initiatives have been of benefit to peoples’ wellbeing. moreover the use of structured approaches to evaluate the ict contribution to development is also recommended as a replacement for access and usage measures which offer little in as far as defining the actual ict benefits is concerned [12]. v. applying the evaluation model this section presents a structured approach that applies multi-criteria decision analysis (mcda) techniques to facilitate the evaluation of development initiatives. a. an mcda approach to evaluate the ict contribution to development the evaluation and selection of ict4d initiatives is a complex decision problem that would benefit from the application of mcda techniques [10]. besides facilitating multidimensional and multi-stakeholder assessments; mcda provides a means for handling uncertainty arising from incomplete and vague information. this is a key requirement for the evaluation of the ict contribution to development which relies on stakeholder value-judgments, perceptions and beliefs of how ict has affected people’s lives. in addition mcda techniques offer a structured evaluation process of development outcomes as alternative to the predominately descriptive, and often difficult to report ict4d evaluation approaches [18]. it further relaxes the requirement of quantitative measures which call for data that is in some cases not accessible, and may be more taxing on the stakeholders. as a structured decision making process the mcda methodology typically consists of three stages [50]: 1)information gathering or problem structuring – involves the definition of the decision problem to be addressed as well as the criteria and alternatives where necessary, 2) modelling stakeholder preferences the structured decision problem, i.e., criteria and alternatives are modelled using a decision support tool; and 3) evaluation and comparison of alternatives. while the application of mcda techniques to decision making situations in the developing country context is 7 appropriate, it is challenged by cultural, organizational, and infrastructural barriers among other factors [51]. examples specifically include low literacy levels, the lack of electricity, and uneven access to ict infrastructure as well as elite resistance resulting from leaders being afraid of losing their political position [52]. this calls for appropriation of the mcda approach and process in a way that takes into account the contextual limitations in the developing country context, and the specific ict4d evaluation exercise. this section illustrates how an mcda technique can be applied for the evaluation of an ict4d initiative. it specifically applies the technique using a subsection of the proposed criteria for the evaluation of the impact of an online learning environment on students’ access to learning. b. the case: evaluating the impact of muele on students’ access to learning 1) problem definition and structuring the makerere university e-learning environment (muele) is one of several initiatives aimed at leveraging faculty effectiveness and improving access to learning at makerere university [53]. muele is a learning management system (lms) based on moodle. it has been in existence since 2009 and boasts of a steady growth of users over the years. for instance the active courses increased from 253 in 2011 to 456 in 2013, while the users increased to 45,000 to date from 20,000 in 2011. despite this state of progress, the use of muele is mostly as a course information repository even after lecturers obtained training in online course authoring and delivery [54]. this has been attributed to the attitude towards the adoption of e-learning, concerns from lecturers regarding the increase in workload resulting from large student numbers, and increased course preparation time. consequently this illustrative study seeks to establish whether muele has improved students’ access to learning. more specifically this sought to establish whether muele contributed or did not contribute to improved (access to) learning; and an assessment of how the initiative performed on the different output and outcome indicators; highlighting the most significant outcomes. the criteria consist of the output and outcome indicators relevant for the evaluation of the impact of muele on access to learning. this is a subsection of the criteria suggested in appendix a, specifically aimed to evaluate improved access to formal and/or non-formal education.the criteria also include the contextual factors known to have an effect on the use of ict to support learning. the criteria are summarized in tables ii – iv in appendix b. 2) problem modeling and elicitation the proposed criteria consisted of two decision models; the outputs and outcomes decision model. the output model sought (see figure 3) to establish the perception of students on whether muele had improved on their access to learning. on the other hand the outcomes model sought to measure the actual improvement in student learning. the contextual factors had an influence on both models, either facilitating or restricting the improved access to learning. preference modelling and elicitation considered two aspects: 1) evaluating the relative importance of criteria (eliciting weights); and 2) evaluating the initiative performance against criteria (eliciting scores).  weight elicitation:-this is expressed through the assignment of a weight which reflects the importance of one dimension (criteria) relative to the others and can be achieved through various methods [55]. ideally weight elicitation should be performed for each of the levels of the decision tree hierarchy. this study applied the rank-order approach in which criteria were ranked in order of importance from most to least important including equal ranks, as well as assigning of weights. rank-ordering was performed for the outcome model and the bottom-level criteria of the output model (output indicators). equal weights were assumed for the other levels of the hierarchy i.e. outputs and output indicator categories. the weights were developed through consultation with experts in the field – lecturers and muele administrators – who assessed the relative importance of the criteria.  eliciting scores:this involved evaluating perceptions of how muele had performed on various criteria. responses were elicited from students who had used muele for at least a year or more. verbal-numerical scales which have been applied in various domains [5658] as well as binary (yes/no) scales were used for the elicitation. the verbal-numerical scale is a combination of verbal expressions (e.g. unlikely, strongly agree etc) and the corresponding numerical intervals (see table 5). since elicitation involved vague and imprecise value judgements of how e-learning had improved learning, it was appropriate to adopt a verbal-numerical scale. while the verbal facilitated stakeholders to vaguely state their preferences, the corresponding numerical ranges were applied for representation and analysis in the decision analysis tool. studies have established that 8 fig. 3. output evaluation model cr. 10 w. tot. w:[50.0%, 100.0%] i was mostly just r... cr. 9 w. tot. w:[0.0%, 50.0% ] i frequently partic... cr. 8 w. tot. w:[0.0%, 50.0%] course discussion... cr. 7 w. tot. w:[33.333%, 100.0% ] posts on the discu... cr. 6 w. tot. w:[0.0%, 33.333%] encouraged me t... cr. 5 w. tot. w:[0.0%, 50.0%] i prefer using cour... cr. 4 w. tot. w:[50.0%, 100.0% ] i frequently use th... cr. 3 w. tot. w:[33.333%, 50.0%] course material o... cr. 2 w. tot. w:[0.0%, 33.333% ] i think the course ... cr. 1 w. tot. w:[33.333%, 50.0%] course material w... cr. 17 [0.0%, 4.762% ]w. tot.: w:[0.0%, 14.286%] traditional face t... cr. 16 [0.0%, 5.556%]w. tot.: w:[0.0%, 16.667% ] mandatory to use ... cr. 15 [0.0%, 5.556%]w. tot.: w:[0.0%, 16.667%] can afford a pers... cr. 14 [0.0%, 5.556% ]w. tot.: w:[0.0%, 16.667%] unreliable or slow... cr. 13 [0.0%, 5.556%]w. tot.: w:[0.0%, 16.667%] limited access ti... cr. 12 [4.762%, 33.333% ]w. tot.: w:[14.286%, 100.0%] personal motivation cr. 11 [0.0%, 5.556%]w. tot.: w:[0.0%, 16.667% ] relevant skills ch7 w: 50.0% level of use of for... ch5 w: 50.0% quality of forums ch3 w: 50.0% use of course mat... ch1 w: 50.0% quality of course ... ch8 w: 33.333% contextual factors ch6 w: 33.333% participation in di... ch2 w: 33.333% access to course ... ch4 improved access t... 9 people assess in terms of words or numbers in varied ways; however the use of a combined verbal-numerical scale is a more effective and simplified elicitation approach [58]. for this study the verbal-numerical scale in table v was adapted from budescu et al.[59] mainly because it had been empirically developed and was appropriate to illustrate the use of a structured approach in evaluating development initiatives. table v example of a verbal-numerical scale [59] verbal statement interval range virtually certain [100-99]% very likely [98-90]% likely [89-66]% neutral [65-33]% unlikely [32-10]% very unlikely [9-1]% exceptionally unlikely [0.9-0]% since multiple responses were elicited from the students and experts, aggregation was required for the elicited information. the aggregation approach was dependent on the nature of response scales; for example the simple weighted sum approach was applied for the aggregation of the students responses obtained from the verbal-numerical scale [50]. this involved assuming equal weights for each stakeholder and calculating the expected value. the simple weighted sum approach has been used in the aggregation of imprecise values because it has proven to be the most effective aggregation approach [50]. since the ranking and binary (yes/no) scales are ordinal, the mode was applied as the preferred measure of central tendency was applied to obtain the aggregate value(s) for the analysis [60]. 3) results: analysis and evaluation in this study the decideit decision support tool [61-63] was used to analyse and evaluate the decision problem. decideit is based on multi attribute value theory [64] and supports both precise and imprecise information. decideit supports various data formats i.e. imprecise data in terms of interval values, comparative statements or weights and even precise values. the rank ordered values depicting the relative importance of criteria were modelled as comparative statements, while the student perceptions obtained through the verbal-numerical and binary (yes/no) scales were modelled as intervals, and precise values respectively. evaluation was performed for each of the models (outputs and outcomes) and the results are discussed below. i. respondent demographics eight (8) experts, 4 male and 4 female were consulted on the ranking of importance of indicators used to evaluate the impact of an e-learning environment on improved students learning. seven were lecturers, while one of them was an administrator in charge of muele. twenty (20) students 17 male and 3 female participated in the evaluation of the impact of muele on improved access to learning. with an exception of 2 students in their second year and 3 who had completed their studies, majority (15) were in their final year of study, and had used muele for an overall period of 2 to 4 years. most of the participants (10) used it 2-3 days a week, 7 used it almost everyday, while 2 rarely used it and 1 used it once a week. finally while 14 of the student participants were in the 16 to 25 age bracket, the rest were in the 26 to 35 age bracket. clearly the sample is not representative enough of the student population that uses muele. however this was sufficient to illustrate the structured evaluation process. ii. output model evaluation value profiling: provides an assessment of how outputs (evaluated in terms of output indicators) have performed in as far as meeting the overall objective is concerned. it assesses the contribution or relevance of the outputs to the overall objective. in this case the quality and level of use of course material are the most significant contributors to improvements in accessing learning materials, while participation in online discussions is average. finally, satisfaction with the quality of discussion forum posts is the least contributor to improved access to learning (fig. 4). evidentily muele is mostly used as a course repository as previuosly established [54]. fig. 4.performance of individual outputs on improved access to learning. tornado graphs: facilitate the identification of the critical issues that have the highest impact on the expected value (fig. 5). the least contributing (outputs) indicators as per value profiling analysis above i.e. participation in the discussion forums (cr.7, cr.9, cr.10, cr.8), were the most critical aspects affecting the expected value measure. on the other hand, the high contributors, i.e. the quality and level of use of course material, had the least impact on the expected value. such information may challenge decision makers to develop strategies for the improvement of the current initiative or streamline the development of future similar initiatives. for example establishing that participation in discussion forums is a critical aspect in realising improved learning through muele would challenge the lecturers to actively engage the students in this activity, or to investigate further on why this is an important aspect.   10 fig. 5. critical outputs in the realization of student learning.   expected value graph: the expected value interval [0.68 0.84] of the outputs measuresstudent performance in terms of the extent by which access to muele has improved access to learning (fig. 6). this implies that based on the outputs, it is perceived that muele had a fairly high potential of improving access to learning with very limited possibility or chances of failure. this serves as confirmation of the ict potential towards improving access to learning in this particular context. fig. 6. expected value graph evaluating the outputs’ contribution to improved access to learning. iii. outcome model evaluation expected value graph: depicts an expected value interval [0.52 0.92] and the focal point of all interval statements (the 100% contraction value) at 0.73 (fig. 7). this implies that the different outcomes derived from muele effectively contributed to improved access to student learning by a range of [0.52 0.92]. fig. 7. expected value graph evaluating the outcomes’ contribution to improved access to learning. value profiling: the outcomes to which muele most significantly contributed were improvements in student learning, facilitation of student participation in personal learning, and a better chance of obtaining employment (fig. 8). there was an average effect on the psychological aspects i.e. improved levels of confidence and whether people felt more valued or respected. there was however a low chance that muele had a significant negative impact such as affecting concentration or self-discipline, as well as personal health. on the other hand there was a high chance that access to muele increased student dependence on computers. fig. 8. performance of individual outcomes in terms of outcome indicators. tornado graphs:the contextual factors, i.e. relevant skills, limited access to computers, unreliable or slow internet connection, ability to afford a personal computer, as well as the mandatory requirement to use muele were the most critical aspects affecting the realisation of improved access to student learning (fig. 9). the difference in factors affecting the realization of outputs and outcomes is essential for midterm evaluation; helping implementers address the identified gaps and ensure success of the initiative. 11 fig. 9.critical outputs in the realization of student learning. as is seen from the results above, the aim in such an analysis is not necessarily to obtain an aggregated value explaining the overall performance of an initiative. focus is on facilitating a structured approach to explaining various aspects, such as how an initiative performs on different outputs and outcomes, the most critical factors affecting the realization of the overall objective etc. it is important to note that while the findings in this illustrative example may not be representative of the elearning status at makerere university, they are a good illustration of the evaluation process. for example the realization that contextual factors are an essential aspect in meeting the initiative goal will shift focus from just providing the e-learning environment to addressing the most critical contextual factors. furthermore the low performance of discussion forums will probably encourage further investigation into the pedagogical requirements that would integrate forums into the students’ learning process. the mcda tool provides a rich, detailed and structured assessment of the different factors that warranty further investigations in its use as an ict4d evaluation approach. vi. conclusions and future works this paper proposes and illustrates an mcda structured approach for the evaluation of the ict contribution to developmentthe model is based on the capability approach with aspects drawn from the ict4d value chain as conceptual framework. a major challenge with the capability approach has always been its strong philosophically profound basis, which complicates attempts of its operationalization. the work presented here contributes to the operationalization of the capability approach or more generally applying a development perspective to the evaluation of the ict contribution to development. however, unlike the existing applications of the approach, the model suggested in this study illustrates the use of indicators in the evaluation of the ict contribution to development. moreover the proposed indicator-based evaluation offers more in comparison to the quantitative evaluations of availability and uptake. it is also multi-dimensional, evaluating more than just economic benefits. it explicitly considers the instrumental and substantive ict benefits, as well as the context in which they should be obtained. it further stresses the need to evaluate psychological wellbeing alongside the other dimensions, because this is both a means and an end in ensuring development. the approach will benefit ict4d evaluation efforts for which in depth descriptive evaluations are not possible due to various constraints that are budgetary, logistical, or related to insufficiency of data. it may also serve for the comparative evaluation of multiple projects. for instance a subsection of the criteria proposed in this study will also be applied in imentors2, an eu project developing a platform that will enable donors and development partners to review complete or existing projects to provide policy support and assist programme planning and implementation. the applicability of the model is illustrated through its use in the evaluation of an online learning environment, muele aimed to leverage faculty effectiveness and improve access to learning at makerere university. the illustrative example reveals that such a structured approach can facilitate a sufficient assessment of the performance of development initiative, as well as the most critical factors influencing the attainment of the development goals. the model however does not explicitly address unintended or negative benefits that are prevalent in any development initiative. future work will seek to address this gap, as well as validate and test the mcda evaluation model in other ict4d contexts. references [1] r. heeks, "do information and communication technologies (icts) contribute to development?," journal of international development, vol. 22, pp. 625-640, 2010. [2] m. hatakka, a. andersson, and å. grönlund, "students' use of one to one laptops: a capability approach analysis," information technology & people, vol. 26, pp. 94-112, 2013/04/21/08:37:15 2013. [3] s. ibrahim-dasuki, p. abbott, and a. kashefi, "the impact of ict investments on development using the capability approach: the case of the nigerian pre-paid electricity billing system," the african journal of information systems, vol. 4, 2012. [4] d. kleine, "the capability approach and the `medium of choice': steps towards conceptualising information and communication technologies for development," ethics and information technology, vol. 13, pp. 119-130, 2013/04/26/13:41:25 2011. [5] h. grunfeld, s. hak, and t. pin, "understanding benefits realisation of ireach from a capability approach perspective," ethics and information technology, vol. 13, pp. 151-172, 2013/05/15/09:20:47 2011. 2http://www.gov2u.org/projects/imentors/ 12 [6] j. johnstone, "technology as empowerment: a capability approach to computer ethics," ethics and information technology vol. 9, pp. 73-87, 2007. [7] s. madon, "evaluating the developmental impact of egovernance initiatives: an exploratory framework," the electronic journal on information systems in developing countries, vol. 20, pp. 1-13, 2004. [8] r. gomez and s. pather, "ict evaluation: are we asking the right questions?," the electronic journal of information systems in developing countries, vol. 50, pp. 1-14, 2013/05/04/14:19:04 2012. [9] m. l. best, d. thakur, and b. e. kolko, "the contribution of userbased subsidies to the impact and sustainability of telecenters the ecenter project in kyrgyzstan," 2009, pp. 192-200. [10] a. blake and m. q. garzon, "boundary objects to guide sustainable technology-supported participatory development for poverty alleviation in the context of digital divides," the electronic journal of information systems in developing countries, (ejisdc), vol. 51, pp. 1-25, 2012. [11] h. e. chew, p. v. ilavarasan, and m. r. levy, "the economic impact of information and communication technologies (icts) on microenterprises in the context of development," the electronic journal on information systems in developing countries, vol. 44, pp. 1-19, 2010. [12] n. garnham, "amartya sen's "capabilities" approach to the evaluation of welfare: its application to communication," the public, vol. 4, pp. 25-34, 1997. [13] f. n. kivunike, l. ekenberg, m. danielson, and f. f. tusubira, "using a multi-criteria tool for selecting ict development initiatives " presented at the ist-africa 2008 conference proceedings windhoek, namibia, 2008. [14] b.-s. gigler. (2011, 29th april 2013). informational capabilities— the missing link for the impact of ict on development? . available: http://siteresources.worldbank.org/informationandcommu nicationandtechnologies/resources/d4s2p3bjorngigler.pdf [15] g. walsham, "development informatics in a changing world: reflections from ictd2010/2012," information technologies & international development, vol. 9, pp. pp.-49-54, 2013/05/12/09:07:13 2013. [16] a. sen, development as freedom. . oxford: oxford university press, 1999. [17] r. chambers and g. conway. (1992). sustainable rural livelihoods: practical concepts for the 21st century. available: http://www.ids.ac.uk/publication/sustainable-rural-livelihoodspractical-concepts-for-the-21st-century [18] m. hatakka and j. lagsten, "the capability approach as a tool for development evaluation analyzing students' use of internet resources," information technology for development, vol. 18, pp. 23-41, 2013/04/20/21:41:02 2012. [19] r. de', "evaluation of e-government systems: project assessment vs development assessment," egov 2006, pp. 317-328, 2006. [20] s. parkinson and r. ramirez, "using a sustainable livelihoods approach to assessing the impact of icts in development," the journal of community informatics, vol. 2, 2013/05/08/07:41:09 2007. [21] r. heeks, "using competitive advantage theory to analyze it sectors in developing countries: a software industry case analysis," information technologies and international development, vol. 3, pp. 5-34, 2007. [22] a. adamali and b. lanvin, "e-strategies monitoring and evaluation toolkit," the world bank, washington, dc2005. [23] r. heeks, "title," unpublished|. [24] r. gomez and s. pather, "ict evaluation: are we asking the right questions?," the electronic journal on information systems in developing countries, ejisdc, vol. 50, pp. 1-14, 2012. [25] j. mayne, "contribution analysis: coming of age?," evaluation, vol. 18, pp. 270-280, 2013/04/15/09:58:34 2012. [26] r. heeks and a. molla. (2009, impact assessment of ictfordevelopment projects: a compendium of approaches. the development informatics series. [27] dfid, "title," unpublished|. [28] i. vogel, "review of the use of ‘theory of change’ in international development " uk department for international development (dfid) 2012. [29] undg, "title," unpublished|. [30] j. mayne, "best practices in results-based management: a review of experience," united nations2007. [31] t. leimbach, s. kimpeler, m. pero, i. bertschek, d. cerquera, and b. engelstaetter, "development of impact measures for einfrastructures," european commission: information society and media directorate general, unit f32012. [32] b.-s. gigler, "title," unpublished|. [33] dfid. (2011). dfid how to note: guidance on using the revised logical framework. available: https://http://www.gov.uk/government/publications/dfid-how-tonote-guidance-on-using-the-revised-logical-framework [34] j. mayne, "title," unpublished|. [35] i. robeyns, "the capability approach: a theoretical survey," journal of human development, vol. 6, pp. 93 117, 2005. [36] s. deneulin, "amartya sen's capability approach to development and gaudium et spes: on political participation and structural solidarity," journal of catholic social thought, vol. 3, pp. 355372, 2006. [37] i. robeyns and r. j. van der veen, "sustainable quality of life: conceptual analysis for policy-relevant empirical specification," netherlands environmental assessment agency (mnp), bilthoven and amsterdam2007. [38] p. alexander and l. phahlamohlaka, "amartya sen’s capability approach applied to information systems research," south african computer journal, pp. 1-11, 2006. [39] a. sen, "well-being, agency and freedom: the dewey lectures 1984," the journal of philosophy, vol. 82, pp. 169-221, 1985. [40] d. gasper, "what is the capability approach: its core, rationale, partners and dangers.," journal of socio-economics, vol. 36, pp. 335-359, 2007. [41] m. hatakka and r. c. de', "development, capabilities and technology: an evaluative framework," in proceedings of the 11th international conference on social implications of computers in developing countries. partners for developmentict. actors and actions, kathmandu, nepal, , 2011. [42] r. alsop, j. holland, and m. bertelsen. (2006). empowerment in practice : from analysis to implementation (first printing ed.). available: https://openknowledge.worldbank.org/bitstream/10986/6980/1/350 320empowerm1ctice01official0use1.pdf [43] d. kleine, "ict4what? using the choice framework to operationalise the capability approach to development," in 2009 international conference on information and communication technologies and development (ictd), 2009, pp. 108-117. [44] m. fleurbaey, "development, capabilities, and freedom," studies in comparative international development, vol. 37, pp. 71-77, 2013/04/23/07:45:38 2002. [45] s. alkire, "choosing dimensions: the capability approach and multidimensional poverty," in the many dimensions of poverty, kakwani, nanak, and j. silber, eds., ed: palgrave-macmillan, 2008, p. 28. [46] f. n. kivunike, l. ekenberg, m. danielson, and f. f. tusubira, "perceptions of the role of ict on quality of life in rural communities in uganda," information technology for development, vol. 17, pp. 61-80, 2011. [47] s. alkire, "title," unpublished|. [48] s. roberts, "the global information society: a statistical view," united nations, santiago, chile2008. [49] f. n. kivunike, l. ekenberg, m. danielson, and f. f. tusubira, "developing criteria for the evaluation of the ict contribution to social and economic development," in sixth annual sig globdev pre-icis workshop, milan, italy, 2013, p. 24. [50] r. t. clemen and t. reilly, making hard decisions with decisiontools, second ed. pacific grove, ca: duxbury press, 2001. [51] c. karvetsk, j. lambert, and i. linkov, "emergent conditions and multiple criteria analysis in infrastructure prioritization for developing countries," journal of multi-criteria decision analysis, vol. 16, pp. 125-137, 2009. [52] m. riabacke, s. bohman, m. danielson, l. ekenberg, w. faye, and a. larsson, "public decision making support: a developing 13 country perspective," in ist-africa 2010 durban, south africa, 2010. [53] report, "e-learning report – 2011, makerere university," makerere university2011. [54] e. k. kahiigi, "a collaborative e-learning approach: exploring a peer assignment review process at the university level in uganda," phd, department of computer and systems sciences, , stockholm university, stockhom, 2013. [55] m. riabacke, "a prescriptive approach to eliciting decision information," degree of doctor of philosophy doctoral thesis, faculty of social sciences, department of computer and systems sciences, stockholm university, kista, 2012. [56] m. d. piercey, "motivated reasoning and verbal vs. numerical probability assessment: evidence from an accounting context," organizational behavior and human decision processes, vol. 108, pp. 330-341, 2013/06/08/13:19:57 2009. [57] b. rohrmann, "verbal qualifiers for rating scales: sociolinguistic considerations and psychometric data," university of melbourne, melbourne2007. [58] c. witteman and s. renooij, "evaluation of a verbal numerical probability scale," international journal of approximate reasoning, vol. 33, pp. 117-131, 2013/06/10/08:07:26 2003. [59] d. v. budescu, s. broomell, and h.-h. por, "improving communication of uncertainty in the reports of the intergovernmental panel on climate change," psychological science, vol. 20, pp. 299-308, 2013/06/08/13:20:16 2009. [60] s. manikandan, "measures of central tendency: median and mode," journal of pharmacology & pharmacotherapeutics, vol. 2, pp. 214-215, 2013/08/10/12:33:44 2011. [61] m. danielson, l. ekenberg, j. idefeldt, and a. larsson, "using a software tool for public decision analysis: the case of nacka municipality," decision analysis, vol. 4, pp. 76-90, 2013/05/19/13:30:25 2007. [62] m. danielson, l. ekenberg, j. johansson, and a. larsson, "the decideit decision tool," in 3rd international symposium on imprecise probabilities and their applications, lugano, switzerland, 2003, pp. 204-217. [63] k. hansson, m. danielson, and l. ekenberg, "a framework for evaluation of flood management strategies," journal of environmental management, vol. 86, pp. 465-480, 2008. [64] j. s. dyer, "maut multiattribute utility theory " in multiple criteria decision analysis state of the art surveys, j. figueira, s. greco, and m. ehrgott, eds., ed: springer, 2005, pp. 3-24. appendix a: ict4d projects evaluation criteria dimension achievements (outcome) opportunities (outputs) (a) research & education improvement in research quality and innovations accessing information in relevant online resources e.g. research journals, online libraries participating in online research collaborations e.g. through discussion forums producing and publishing research outputs e.g. journals, patents etc. improved access to formal and/or non-formal education accessing information in relevant online resources e.g. online courses/tutorials, e-learning platform, research journals, online libraries participating in ict-enabled learning forums e.g. discussion forums producing and publishing learning outputs e.g. journals, patents etc. (b) healthcare improved access to health services accessing health-related information e.g. websites or short text messaging services that share information on good health practice, immunization, or pandemics etc. remotely consulting medical personnel e.g. through phone calls, video calls etc. improved delivery of health services accessing health management information systems e.g. drug tracking and dispensing systems, patient records management systems participation in collaborations and co operations among health workers (c) economic opportunities improved productivity accessing information from relevant resources e.g. websites or short text messaging services farming/agricultural resources, smes, small scale industries participation in relevant online communities e.g. farming blogs, content production improved income (&income generation opportunities) access to relevant information e.g. new employment opportunities, stocks, investment opportunities, market information etc. participating in relevant (ict-related) training & skills development activities e.g. content development, ict literacy, advanced techniques performing ict-related transactions e.g. e-commerce, e-tax, money transfers, remittances (d) political freedoms improved participation in local/community or national politics accessing relevant online resources e.g. e-voting, institutional, community/national websites participating in local/community or national political activities e.g. elections, debates, radio talk shows etc. improved national/institutional/community transparency accessing relevant online resources e.g. budgets on community/national websites, citizen online databases etc. participating in national/community policing e.g. freely reporting fraud through hotlines, forums on websites improved institutional/ organizational efficiency accessing relevant platforms e.g. education management systems, human resource management systems 14 participate in inter-organizational/institutional networking e.g. exchange of research students performing relevant transactions e.g. salaries remittances, timetabling, production of reports etc. (e) personal and psychological wellbeing individual empowerment strengthened ability to influence personal choices perceived improvements in self-esteem and self-confidence feeling more valued and respected being able to analyze and solve own problems improvements in family relationships and social ties level of use of relevant media and/or applications e.g. online social mediafacebook, twitter, mobile phones etc. quality of relevant media and/or applications to interact with family and friends having a sense of belonging related to participation in an online group entertainment and fun level of access/use to online fun activities e.g. music, movies, or games level of access to online news updates i.e. local, sports, international news appendix b: proposed evaluation criteria for the e-learning contribution to improved learning table. iii. outputs model output categories and indicators indicator categories operational definitions output indicators quality of relevant online resources perception of qualityof online resource(s) in terms of relevance, and usefulness ; as well as sufficiency in meeting stakeholder needs the course material made available through the online learning environment was very useful in my studies i think the course material offered through the online learning environment was somewhat sufficient to satisfy my learning goals the course material obtained through the online learning environment was always relevant to my learning goals. level of use of relevant online resources e.g. online courses, elearning platform a qualitative measure of frequency of use of online resources i frequently refer to/use/apply the course material posted on the online learning environment for my learning needs i prefer accessing/using course material posted on the online learning environment rather than the traditional classroom approach quality of it-enabled forum in terms of degree of activity e.g. discussion forums perception of quality of it-enabled forum in terms of relevance, and cooperation; as well as perception of ease of use the nature of content posted in the forums hosted on the online learning environment encouraged me to participate more actively in the discussion the posts on the discussion forums somewhat satisfy my learning goals the posted course discussions are extremely relevant for my studies level of participation in ict-enabled learning forums a qualitative measure of frequency of participation in discussions on ict-enabled learning forums i frequently participate in /contribute to the discussion forums relevant to my studies i was mostly just reading messages posted in discussion forums and haven't contributed a lot to the discussions table iii outcomes model outcomes operational definition indicator statements level of students performance perception of the level of improvement in student performance i think the use of muele helps me to improve my performance in this course unit efficient and timely feedback are students getting feedback on their submissions in time? i always obtained an efficient and timely feedback through muele level of student(s) participation in their own learning degree by which students are taking persona initiative in their learning using muele enables me to participate in my own learning in a better way chances for (better) employment degree by which one’s chances of obtaining better employment have increased i think i have better chances of obtaining employment because of the skills i have obtained through the use of muele attainment of new/advanced has the initiative enabled the participants obtain i was able to obtain advanced skills 15 skills or academic credentials new skills? ability to participate in a course from anywhere at anytime does the student have the ability to participate in an online course irrespective of where they are i am able to undertake (participate in) my course from anywhere at anytime ability to make personal choices has the participation in e-learning improved one’s ability to make personal choices? using muele (has) strengthened my ability to make personal choices levels of confidence/self esteem degree by which participation in the initiative has improved confidence/self esteem participation in an online learning environment has improved my confidence levels earned respect from peers degree of increase in value and respect by peers i now feel more valued and respected by my peers because of the skills i have obtained through the use of muele changes in responsibility and demands on the student degree by which student responsibilities and demands have affected i feel participating in an online course has increased my responsibility and demands on me as a student concentration and selfdiscipline issues level of increase of concentration and selfdiscipline i feel my participation in the online course has negatively affected concentration and self-discipline problems in class computer dependence degree of increase of computer dependence the use of the e-learning environment has strongly increased my dependence on computers health concerns the perceived impact of muele on personal health i think the use of the e-learning environment will negatively affect my health in the long run table iv: contextual factors affecting the realization of improved access to learning through muele factor operational definition indicators personal factors  relevant skills whether the possession or not of relevant skills affected one’s access to muele i lacked the relevant skills to use the e-learning environment personal interest level of personal motivation to exploit the elearning application i was personally interested in using the e-learning environment to facilitate my learning afford a personal computer whether an individual can or cannot afford a personal computer i could afford a personal computer which has strongly contributed to my use of the e-learning environment social factors mandatory to use muele is it compulsory to use muele to support learning it was mandatory to use e-learning for the course unit attended knowledge obtained through traditional face to face lectures is sufficient does the system offer any added value in comparison to the traditional face to face lectures the knowledge i obtained through our traditional face to face lectures is sufficient to meet my learning goals environmental factors access time on the pcs the extent by which available access time on shared pc is sufficient having limited access time on the computers in the lab limited my use of the e-learning environment internet connection whether the quality of the internet connection affects the use of ict the unreliable/slow internet connection frustrated my use of the e-learning environment paper title (use style: paper title) international journal on advances in ict for emerging regions 2021 14 (3): international journal on advances in ict for emerging regions july 2021 a scoping review on automated software testing with special reference to android based mobile application testing fathima naja musthafa#1, syeda mansur2, andika wibawanto3 and owais qureshi4 abstract— despite all the techniques practiced for ensuring the quality of a software product, software testing is being the widely accepted practice. with the explosive evolution and the usage of mobile application, new developments in the process of software testing are introduced too to acquire market presence in mobile application development by introducing high quality products. as of this the introduction of automated tools for testing has gained attention in the last few years. though the topic of automation in software testing has been there for a while, introduction of new tools and techniques has gained attention recently. hence, this research work focuses on investigating and analyzing the current trends on automated testing of mobile application by choosing the android platform as a case study. with the aim of fact finding, a systematic literature review was carried out on existing studies which were retrieved from different databases by exploring the electronic search space. it discusses the points based on the chosen research questions by referring the papers cited. the topics discussed in this review article includes the facts related to why and how automated testing on mobile application, the tools and techniques used and the challenges on it. this work also highlights why the focus has been concentrated on the mobile application testing rather than generally highlighting the importance of automated software testing. as a conclusion the paper proposes some good practices on the topic based on existing literature reviewed and referred throughout the study. keywords— automated testing techniques, mobile application testing tools, quality assurance, software testing i. introduction despite the works done by researchers and practitioners about the numerous techniques for software quality assurance, it is widely accepted that software testing is the most practiced approach for evaluating and assessing the quality of a software product [50]. the main goal of this paper is to provide the insights from successful research works carried out in software testing and testing techniques for mobile application which appears to be the most significant points relevant to the topic. li and his co-authors, in their work published in 2014 state that the main objectives of software testing are: 1. a test is carried out to demonstrate the errors that is present in a product. 2. a well-defined testing approach has the higher chance of discovering errors that exist. 3. a successful test operation should always discover any future faults and regression failures. the common term used in literature as ‘mobile testing’ refers to various testing strategies like testing mobile devices, testing mobile applications and mobile web applications testing [22]. thus, the term ‘mobile application testing’ in this paper refers to testing mobile applications that run on mobile platforms with the use of popular testing methods and tools that ensure quality in behaviors and functions as well as features like usability, security, connectivity and so forth. various work in the literature highlights the fact that mobile application testing is much different from the conventional software testing as it has unique requirements which includes device compatibility of the application with different mobile devices with ranging screen sizes to ui lags [35]. apart from that, since mobile applications are developed to run on mobile devices that operates on different operating systems, having different size and computing power resources [22], the way of testing those also must be to the standard that differs from normal conventional software products. hence, this paper clearly highlights the importance of testing mobile with the support of literature in the second part of section two ‘critical evaluation of literature’. according to cap gemini quality report [1], the barriers to testing mobile application have moved from tools to methods; 56% of companies do not possess the right testing process/method, 52% do not have the devices instantly available, 48% do not have test experts, 38% do not have inhouse testing environment, 37% do not possess the right testing tools, and 33% do not receive enough time to test. however, the data shows that mobile testing rose rapidly in 2013 compared to 2012 where statistics prove that 55% of organizations implemented new methods and tools to test functionality, performance, and security of mobile applications and devices in contrast to 33% in 2012. the rise in percentage is optimistic. based on the strategies used to carry out the testing process, the automated software testing for mobile application is classified under different categories. although various techniques used differ in the approaches used, these testing does not fail to accommodate the concept of correspondence: f. n. musthafa #1 (e-mail: mmfnaja@gmail.com) received: 28-12-2020 revised:17-05-2021 accepted: 24-05-2021 f. n. musthafa#1, s. mansur2, a. wibawanto3 and o. qureshi4 are form university of malaya, malaysia. (mmfnaja@gmail.com, smansur.irene@gmail.com ,andika.wibawanto@gmail.com ,umerkhattab42@gmail.com) this paper is an extended version of the paper “automated software testing on mobile applications: a review with special focus on importance, tools and challenges in android platform” presented at the icter conference (2020) doi: http://doi.org/10.4038/icter.v14i3.7227 © 2021 international journal on advances in ict for emerging regions a scoping review on automated software testing with special reference to android based mobile application testing 2 july 2021 international journal on advances in ict for emerging regions automation. this paper gives a detailed insight of some of such techniques used and the tools utilized in addressing the features of those techniques under the third topic of the second section. despite the developments and research in software testing for mobile applications, the challenges faced continues in relevant to test environment and standards, modeling, and coverage criteria [22]. this paper is organized in the form of sections. section one states about the content of the sections in this paper as well as highlight the key points explained in this review followed by the second section that discusses the research methodology adopted for this study. the third section discusses the findings and results based on the identified research questions, that explains about the key phrases found to be relevant to the topic and they are well explained with the support of literature referred in there. the final section is the conclusion, and it discusses the key points and important conclusions arrived during this literature review and provide information about major recommendations on these topics. reference list follows the conclusion part, and it lists all the references used for this literature review work. ii. research method this study adopts a systematic literature review process proposed by kitchenham et al. on how to perform systematic literature review in software engineering fields [2]. the subsequent parts of this manuscript shows how the methodology has been adopted in conducting this systematic literature review. a. research questions there are four main research questions in this review and the discussion part mainly adhere to the focus on these research questions. 1. what does automation in software testing refers to? 2. what is the importance of automated mobile application testing? 3. what tools and techniques are used in automated software testing of mobile applications with respect to android platforms? 4. what are the major challenges in automated software testing on mobile applications? b. search strategy the search for study was conducted in the electronic search space. electronic databases were explored with the target of research questions based on key word search approach. to identify the suitable literature, inclusion and exclusion criteria was used among the search results. the researchers also used some screenings on the process of identifying the relevant literature to remove redundancies and irrelevant studies with mutual consents. c. information sources popular scientific databases were chosen to conduct the electronic search and retrieve the relevant literature for this review. the databases include ieee xplore, acm digital library and sciencedirect. additional records were also found via google scholar. d. search terms the scope of the study being little broader without fixed taxonomy, a range of search strings were used as keywords across the electronic search space. the keywords defined were used with boolean combinations such as “and” and “or” with the aim of minimizing the irrelevant results. the keywords are “automated testing”, “mobile application”, “testing tools”, “quality assurance”, “software testing”, “test automation”, “testing challenges”, “testing techniques”, “android platform”, “mobile devices” and “test strategies”. e. inclusion and exclusion criteria the literature selection for the study opt to be with the following characters. 1. the study must be published within the timeframe of 2000 to 2019. 2. the study must be in english 3. the study must include relevant information of mobile application testing covering the scope of the study. studies which were excluded were based on the following concerns. 1. if the study does not fulfill the inclusion criteria 2. if the study does not technically prove to be rich in content. iii. literature review and key findings this section highlights the facts adopted from the reviewed literature that adheres to the research questions identified and given in the section two of this paper. although the main focus was to investigate the current state-of-the-art facts based on the relevant research questions and the key topic, this section tends to summarize the findings in the form of structured way as to clearly relate the facts identified from different literature so as to provide the connectivity between the findings. also, the results are based on literature and articles published between 2000 and 2019, thus giving importance to the recent developments in the topic. though the articles retrieved from the internet are verified according to the source origin, it was double checked for the authenticity of the content. the consequent sub sections of the third portion of this paper provide detailed discussion on the research questions thereafter. 3 f. n. musthafa#1, s. mansur2, a. wibawanto3, o. qureshi4 july 2021 international journal on advances in ict for emerging regions a. software test automation software testing is one of the important phase of any software development and has been widely used in the industry as a quality assurance technique for evaluating the specification, design and source code [23], [7]. since the software is designed in a way that is more complex, the need of testing complex software becomes the important phase of any software development. hence, the importance of testing should not be underestimated [60]. in fact, software testing is a part of any software development and plays a major role in the cost factor of any software [23]. software testing is expensive and labor intensive. according to literatures, software testing process covers up to 50% of software development costs and it is even more for safety-critical applications [6]. the main objective of a software testing is ensuring a quality software product [23]-[25], [50], [27] at the end of a development phase before it is put into deployment. it does not mean that software testing is carried out at the end of an software development life cycle (sdlc). but it may be performed at any stage wherever necessary and it totally depends on the project and the model of sdlc used for the development. in the process of software testing, an outcome of a software development process is evaluated for the overall functionality and behaviors using a set of test cases whether it satisfies the specification requirements or shows any behavior of fault in the software. although the concept of software testing is stated to be used for demonstrating the absence of errors [35], [27] in any software product, testing is always defined to be finding errors as much as possible as it improves the assurance that the software being tested is much reliable. thus, a set of test plans are executed to find out that. the way in which these test plans are executed, divides the software testing process in to two major categories like manual and automatic testing. in manual testing, a tester carries out a written set of test plans which contains the test cases [23]. here a tester manually executes the program to check for each test case. as this is the approach of manual testing, the automated testing is automating these test activities and the whole test cases are carried out automatically. garousi and mäntylä describe the automation testing in their work published in 2016 as the “use of special software (separate from the software being tested) to control the execution of tests and the com-parison of actual outcomes with predicted outcomes” [23]. key points that these authors trying to mention are “special software” and “control the execution’. by this, the authors mean that the testing process uses a software other than the software needed to be tested and the whole process is automated. although various researchers define the term automation testing in their own style of wording, none of them have missed to portray the same concept. in every test activity, it is always essential to find out why an approach is selected. since software testing is one of the major phases in any software development, it is labour intensive and expensive. according to a literature, it is stated that testing takes up to 50% of the total cost of any software development. it is sometimes even more than that according to some literature. as this is the fact regarding testing cost, it is essential to manage it and that is the goal of automation testing. another goal of automation testing according to literatures is, minimizing human error [30]. mistakes, made by human beings, be-come errors which tend to become faults and failures. another goal is making regression testing easier [30], meaning, that when automation testing is executed to find the errors in any software testing, it makes the process of finding any consequences of any patch works done during a bug fix. thus, this will ease the problem of overcoming any possible future errors caused by a bug fix. software testing phase on any software development process tend to start along with the beginning of the development in order to avoid the complexities in testing at later stages. although the approach implemented for testing fully depends on the basis of project requirements and the model of software development used, it is mandatory to thoroughly study which approach is to be used for the purpose of testing in order to avoid future errors and failures in testing. thus, it is essential to identify when and what is to be tested using a particular testing strategy. a test process comprises several steps, beginning from planning, test specification, executing up to reporting. each of these steps could be carried out using various approaches. apart from using automation testing in the test execution process it could be used in various stages for various purposes too. thus, the potential use of automation in various testing stages can be test case design, scripting, evaluation, and test result reporting [23], [27], [30], [32]. based on these, an overview of this is summarized in figure 1. figure 1: an overview of automation across the software testing process b. importance of automated mobile application testing recent years have proven to be a great revolution for mobile industry and market due to its extensive support and sophisticated tasks being performed, rather than just simple operations as a decade ago [32], [33]. the mobile has been used by many giant tech industries and companies which are not directly related to technology. mobiles are being used as a common platform to manage tasks for everyday life and activities. since its involvement has become a norm in our lives for every mere and essential task, any breakdown/unexpected behavior of any sophisticated mobile application can result in a great disaster for industries and enterprises. users’ unpleasant experience (crashing, bugs etc.), while exploring mobile app, can consequently prevent the users from reusing the app. nearly 48% of users will not try the application again based on one survey [14]. this can lead to lower downloads, thus reduced revenues. therefore, to avoid such consequences after all the time, energy and money invested on any mobile app, various software testing on a scoping review on automated software testing with special reference to android based mobile application testing 4 july 2021 international journal on advances in ict for emerging regions mobile application is certainly necessary. meeting graphical interface sketches, functional requirements and flow of app can be achieved and may attract users. ensuring high quality is essential than anything else. the audience of mobile application is escalating immensely as mobile devices continue to be used and seen everywhere. marketwatch’s research states that, 14% of online purchases are made through mobile platforms and will continue to grow gradually in upcoming future. one of the statements from paypal’s senior director of global initiatives, anuj nayar, to marketwatch states that “we’ve seen our mobile growth rise from less than one percent of our payment volume in 2010 to more than 20 percent in 2014”. this notable new height in this area shows more and more businesses are cashing-in on. needless to say, making sure an app is working correctly is essential. the same hard work that is required for product concept and building a business, is also necessarily to be done with quality control and testing for mobile applications, and that kind of testing is not something that can be done inhouse. it can only be achieved by using professional mobile testers’ skills, who can identify issues before it affects the end-user which gets them frustrated, as well as architect ways to fix them before the application is rolled-out. the following comparison reflects the idea of why mobile application is foremost important than web and desktop platform. it also provides an insight of all possible aspects which explains why mobile platform application is complex, time consuming and detailed in comparison to its counterparts. table i. importance of mobile app testing over web / desktop system testing with comparative summary criteria mobile web/ desktop frequency of release release software updates quite frequently for the improvement of devices, security and ui lags, etc. this effect mobile behavior in a way that old compatible apps stop working. thus, the testing team needs to be cautious. web and desktop versions are released not so often, hardly twice or once in a year or two. usage mobile apps have large adaption from mid – high level enterprises and being used for a general purpose which makes it complex enough to support all genre of applications supporting up to desktop applications are mostly used by big enterprises where there is less variety of desktop machines to support. for webapp, only concern is on which cloud or another server the web-app needs to be deployed. 13000 devices as per google play console [19]. communication link mobile applications are connected through sophisticated interlinked bridge called restful apis or web services which make the transaction happen through a mobile meaningful format known as json [56]. web application is hosted on the same server where the database is deployed. thus, the chances for being vulnerable in terms of security, non-availability of data is less than its counterpart mobile application [21]. development life cycle built with a complex life cycle to handle all kinds of unexpected, interrupted behaviors in more intelligent way. no such life cycle due to not being developed with an intention for being a personal platform rather being originally used by everyone on same hosted server serving millions of users. possible outcomes, when any app is not tested with the context of mobile lifecycle, may lead to great consequences such as, mobile being used with a purpose of multi-tasking device. it drives the concept of background and foreground app [18]. however, backgrounded apps brought to the foreground will often crash if state is not persisted properly. states can be seen in figure 2 – android lifecycle (for instance, onstart/onstop and viewwillappear/viewwilldisappear for android. 1. the testers should also be aware that mobile’s intelligent algorithm destroys app whenever it deems necessary for more memory allocation. if ondestroy state is not correctly defined it might lead to unexpected behavior or loss of user’s data on next round of using app. 2. mobile’s platform also forces the developers to build the cache controller to load heavy data for avoiding misinformation or non-availability of data [37]. 3. there are often multiple ways that a hook can be called, and the testers need to be aware of the differences in certain situations. 4. as what was mentioned during comparison in table 1, often updates/patches may affect the system’s overall flow in older android system. no guarantee of onstop state being called on request is given even by developers. thus, reviewing official documentations is also a way to cope with unforeseen bugs by mobile testers. 5 f. n. musthafa#1, s. mansur2, a. wibawanto3, o. qureshi4 july 2021 international journal on advances in ict for emerging regions figure 2: basic android activity lifecycle as we have noticed in our mobile phones, even though our phone’s screen is off or partially idle state/sleep mode, it still popups and alerts messages/notification. for instance, whatsapp/messenger text messages notification. this entire process runs in a background called a background thread/service which is mostly used in real-time service intact application as it can be noticed in figure 3 below, most top 10 crashed application of 2017 list were those which uses background threads almost for every task. figure 3: survey results of top 10 crashing apps as given in [18] in addition to background-running apps, some critical user conditions to be tested against include: 1.geographical location. 2.device and operating systems commonly used in these geographies. 3.most-used mobile apps running in the background. 4.network conditions. 5.interruptions occurrence while using app (calls, messages, other popups). when testers mimic such experiences while testing the app, the recommended way to assess such application is to get familiar with the application’s type along with real-time service providers. c. tools and techniques used in automated software testing of mobile applicationsin android platform based on recent survey by stat counter, android operating system is the most popular operating system in the world [47]. the test input generation tools for mobile application usually target mobile apps developer with primary goals to detect existing faults in mobile apps or to maximize the code coverage. the source code of the app must be open source in order to allow the tool to do checking, and after checking is done the mobile apps developer then can catch the possible errors and fix them. the functional defect is not the main problem for apps developer because they can do testing manually, the most important concerns are portability, malware and energy issues that can be effectively detected by executing the code. most of the time the mobile apps are in the idle state waiting user input such as clicks, scrolls, or system event, such as notification, sms, or gps location update. application also may need input from the users by input certain value into widget, selecting from a list, and so on. because mobile application is event-driven, testing tools will treat an input from the user as an event or break them into sequences of event that model user action or model user input. the sequences of events and the inputs can be generated by random value or can follow a systematic approach. a systematic approach, usually the model of the application, is guided by the process to limit the search space. these models can be manually, statically or dynamically built. the capture and replay technique or model-driven technique are state of the art to manage traditional event-based system. in the capture and replay technique [3], [4], tester first do manual testing by recording his interaction with the gui then the recording will be replayed during testing automation. in model-driven technique [58, 41] mobile application model to be created first before automation testing can be done. both techniques require tester involvement, thus may not detect corner case that human testers are unaware of. another technique that does not need manual tester involvement is by extracting directed graph model from the gui with crawling technique like in web development [3,42] test sequences is produced by those graphs, but still may fail to identify a system that can be explored with. the android software development kit is already included with powerful testing framework [11]. the android testing framework is an extension of junit framework with addition of tool to test specific android application. the addition is to address fundamental issue with mobile application a scoping review on automated software testing with special reference to android based mobile application testing 6 july 2021 international journal on advances in ict for emerging regions development like android views, activity, content provider, and specific set of assertion classes designed for them. there are many tools available outside of default android testing tool. robotium [52] also build on junit framework. robotium uses gui assertions like web application testing with selenium framework. the selenium framework is very popular and simpler to write tests with and mostly used for black-box testing, and is very useful to do functional testing, system testing, and acceptance testing. ui automator [57] does not use junit but provides same functionality for the test engineers to build gui tests like clicking buttons, text input, scrolling and swiping. the uiautomator has special abilities to check the state of application before and after user actions in gui; this can be useful for black-box testing of apps through gui. it also supports gui assertions. monkey runner [9] has android emulator that can be controlled from outside of android code. monkey runner can provide screenshots and is very useful for regression testing by comparing screenshot with functional testing. espreso [12] is the latest android test automation framework by google. it is a more advanced generation and builds on top of monkey runner. it has similar functionality but is more reliable. the subset of espresso is espresso test recorder which can record the interaction with device and do assertion to verify ui element. this recording can be rerun in the future. robolectric [52] uses java reflection api at runtime and use shadow class to test on real device outside of emulator. this tool also has the ability to run the test directly accessing android libraries file with java reflection api. it replaces the body of android api methods at runtime using java reflection. the type of mobile app testing techniques and information on the tools adopting these techniques to test android apps are summarized as follows. 1. radom techniques based on the study of choudhary et al. [17] random testing technique is the best automated testing for android app. the android monkey [8], one of the tools they studied, is the best performance tools available for test input generation tools. android has characteristic of event need to be initiated by user or system event by android framework itself. usually system event can be triggered by specific condition. as a result of this behaviour random testing is not very efficient. most of random testing technique for android such as [40], [8], focus on generating only gui events. android monkey is a part of the android developer's toolkit and is widely used by both developers and app market managers. it uses brute-force mechanism that generate pseudo-random streams of user event such as clicks, touch, gestures in a random. it is monkey that fires off both gui and system events based on the number of events that are specified by the tester and utilizes a completely random strategy [8]. dynodroid also uses random values and sequences of events, but it has added few heuristic approaches to improve android monkey’s performance [40]. one of the approaches is checking android manifest file to generate only relevant system events for the application. it also keeps the track histories of the type and number of events used and not randomly generated next event but uses a least recently used algorithm. the tester can also manually enter specific values for specific input text like text boxes to make it least random. there is also another group of random testing techniques [54], which focuses on testing inter-application communications by randomly generating values for intents (intent fuzzing). intent fuzzers mainly serves the purpose of generating invalid intents to test application robustness and to reveal vulnerabilities by generating malicious random content. several other approaches are built on random testing techniques. amalfitano et al. [4] presentet a gui crawlingbased approach like in web application testing with selenium framework that uses random inputs completely to generate unique test cases. hu and neamtiu [29] describe a random approach for generating gui tests that use the android monkey to execute. random testing techniques are very efficient to generate events, but it is not suitable for generating specific input. they also produce redundant events that are already covered in previous cycles. 2. model-based techniques web-based testing application give inspiration for android testing. they follow same technique in an eventbased system to systematically generate sequences of events that resemble the behaviour of the mobile application. the tools discussed below use static and dynamic analysis technique to generate machine state by capturing the activities of the application in the transition of events. mobiguitar [4] builds a model of the application by dynamically exploring an app gui with gui ripping technique. it builds on top of guitar [50]. the model then is traversed by a depth-first search strategy to generate test cases. when the tool cannot detect new states during traversing then it can be restarted. mobiguitar also can use random strategy or tester can manually input constant values during exploration. orbit [60] analyses the source code and manifests file to identify relevant ui events. it also statically analyses source code to identify state transition between activities. this technique is called grey-box model because it analyses not only the gui but also the source code. while orbit uses static analysis swifthand [16] uses dynamic analysis and machine learning to organize state model of the app during testing. the machine learning is used to visit unexplored states of the application. the model is refined dynamically during the execution of the app using the generated inputs. the main focus of swifthand is to optimize the exploration strategy in order to minimize the restarts of the app during the exploration. a3e [13] also uses static analysis technique for building an app model for automated exploration of an app’s activities. a depth-first search strategy is used for reaching a certain state in model. this technique is important for construction of model testing. puma [27] uses dynamic analysis to build the model. the goal of this tool is more to provide infrastructure for dynamic analysis of application. it is built on top of uiautomator [57]. instead of reinventing the wheels puma uses monkey’s exploration strategy but it provides a framework that can be extended to implement any exploration strategies. most of the tools above focus on construction of models for testing that are covered using a depth-first search strategy for the generation of event sequences. model based technique are useful for complex application that has infinite state and cannot be explored using random technique. 7 f. n. musthafa#1, s. mansur2, a. wibawanto3, o. qureshi4 july 2021 international journal on advances in ict for emerging regions 3. record and reply techniques monkey recorder [9,10] and reran [26] implement record and replay techniques for android apps. monkey recoder allows testers to record a script for gui events of an application on the device and the recording can be saved and rerun in the future. as of now monkey recorder only collects click, swipe, and text-input events. reran, on the other hand, logs the event system commands of the android operating system to generate low-level event traces. because it is low level event it is dependent on the hardware like screen size and cannot be rerun in other devices. these scripts are analyzed and turned into runnable scripts. reran replays the recorded script [26]. record and replay technique can be useful for stress testing and regression testing, but the scripts need to be generated manually. because of this they are usually biased towards only certain features and do not capture the behavior of the app completely. these techniques can only replay what is recorded and do not consider other combinations of events for replay. table ii. summary of techniques with their advantages and disadvantages and tools adopting the techniques techniqu e advantag e disadvant age tools refere nce random efficiently generates events, suitable for stress testing hardly generates specific inputs, generate redundant events, no stopping criterion android monkey [8], dynodr oid [40] [17],[40 ],[8], [54], [4] modelbased more effective, can reduce redundant events does not consider events that alter non gui state mamb a, ssda, mobig uitar, orbit, swiftha nd, a3e, puma [5],[57], [50], [60],[27 ], [68], [69] record and replay useful for stress and regression testing test script are generated manually monkey recorde r, rera n [9],[10], [26], [67] d. challenges in automated software testing on mobile applications several studies have been conducted by many researchers, on the challenges of mobile apps testing and its potential research possible targets [15]. all their studies come to some common major challenges they found. some significant points noted were (1) mobile applications are very different from traditional ones and thus different and specialized techniques are involved in the testing and (2) there are many challenges, most still with no optimum solution [37]. for instance, the randomness of the testing environment greatly manipulates the reliability, performance, security, and energy. in the following section, some of the major challenges are discussed. 1. device fragmentation one of the major challenges of software testing is device frag-mentation [32], [4], [19], [32], [45], [1], [14], [33]. variations in the hardware or o.s. components can cause mobile applications, while running on different devices, to behave differently where each application has its own unique business and data flow [1]. a study reported the existence of 1.800 hardware/o.s. different configurations as of 2012 considering the fact that (as of 2012) there were around 130 different mobile phones operating on android, 7 versions of the os, and presuming two firmware per device [49]. mobile device fragmentation is a phenomenon that takes place when older version of an os runs on a device, while newer versions are already in existence. there are several mobile os available. an app performs differently in different platforms. a testers goal should be to provide a consistent user experience across platforms. using a framework that supports multiple objects can help as it assists to isolate the functionality of a specific object, determining whether it needs an alter for other platforms or not. for instance, if an app has a selection menu that needs to present as a scrolling list for android and a radio-button selection list for windows phone, a testing solution is required that supports multiple objects, to test both the scenarios [14]. according to testing experience test devices – fragmentation can be grouped into three categories [35]: group 1: small devices having a small cpu, ram and low resolution, older software versions and older browsers. group 2: mid-range devices having an average cpu, ram (<512 mb), good screen size and resolution, older software versions. group 3: high-end devices having a dual/quad-core cpu, ram (>512 mb) and a high screen resolution, latest software versions. therefore, the following choices adds to the challenges when testing on varied combinations of devices with right combination of operating systems: whether to use manual testing or automated tools, in-house teams or outsourced partners, guided testing or exploratory testing, emulators and simulators or remote access [35],[15]. due to compatibility issues, different user interfaces increase level of challenge. user’s application experience is significantly affected by mobile devices network performance; where multiple network technologies may be supported by each mobile operator and unfamiliar or local networking standards may be used by some as well. to test mobile application in all these probable connected networks, travelling to every network operator is commanded which can be very costly and time consuming. although this network challenge can be overcome by bypassing the lower layers of network to test the application via internet on network by using device emulator and thus saving time and cost of travelling, bypassing cannot exactly imitate the effect and timing of network. security is another aspect of the effectiveness and validity of the application; ensuring the application is secured and does not surpass user’s private and sensitive data is thus mandatory. significant hardware component in addition to its system (for example, gps, telemetry, scanners etc.) presents a great challenge and since a scoping review on automated software testing with special reference to android based mobile application testing 8 july 2021 international journal on advances in ict for emerging regions mobile applications are used by different category of people stretching from zero it background to top notch it, the usability testing must involve a comprehensive range of scenarios taking into consideration in their own environments [47]. 2. connectivity apart from the hardware and software issues, the functionality of an application is also affected by the performance of carrier’s network. the application is expected to work with 2g, 3g, 4g or 5g network, infrared, bluetooth, gps, nfc (near-field communication), wimax, low signal strength and different wi-fi speeds [1]. some applications are even expected to work the same in nonetwork or offline condition as well with synchronization done [19], [14]. in addition, a single application can also be expected to sustain in multiple types of connectivity simultaneously [32]. slow and fallacious wireless network connection having low bandwidth is found to be a common obstacle for mobile applications in many studies [35]. network latency (time taken to transfer data) will be random when apps communicate over network boundaries. this results in unpredictable speeds in data transfer [14]. gateways in a wireless network convey content more appropriate for specific devices while acting as data optimizers. again, data optimization process may result in decreased performance for heavy traffic. testing should establish the network traffic level at which the performance of the mobile application is influenced by gateway capacities [14]. 3. device limitations it may be unsuitable in some devices to interpret images locate elements on the screen resulting from the difference in display sizes across mobile devices and their various models. limitations in processing speed, memory size (ram, secondary storage), cpu power, power management dependencies, battery life dependencies, cumbersome input ui of mobile devices result in variations of application’s performance across different types of devices. the display capability of mobile devices supports much less display resolution in comparison with desktops. low resolution can degrade the quality of multimedia information displayed on the screen of a mobile device [62]. so, testing must guarantee that the application has the capacity to deliver optimum performance and usability for all anticipated configuration of the hardware and software involved. mobile devices also have different application runtimes. some of the runtimes commonly available in mobile devices are binary runtime environment for wireless (brew), java, and embedded visual basic runtime. applications should be tested intensively for the variations particular to runtime only [14]. 4. input interfaces to input user data into a mobile application touch screen is mainly used. however, the device resource utilization affects the system response time to a touch, and it may become slow in certain contexts like entry level hardware, busy processor and so on. to validate the touchscreen performance under different such contexts (for instance, resources handlings, load from processor, memory and so on) and within different mobile devices, testing techniques have to be created. different context contributors may provide inputs to mobile apps as well, i.e., users, sensors (like noise, light, motion, image sensors) and connectivity devices (some examples have been mentioned earlier), inputs that vary from different as well as changing contexts the mobile device can step towards. all those devices may supply a combination of inputs starting from brightness, temperature, altitude, noise level, type of connectivity, bandwidth to even neighboring devices that vary, even unpredictably, subject to the environment and user activities [49],[35]. validating whether the app is going to appropriately function on any environment and given any contextual influence is a conundrum and may result in combinatorial explosion. 5. rapid application development (rad) methodology in order to cater to the benefits of faster time to market, rad environments are exploited for mobile application development. since the introduction of rad tools reduce the time taken for development, builds are presented for testing much earlier. rad methodology thus enforces an implicit pressure on testers to reduce the testing cycle time, not compromising quality and coverage of course [14]. iv. discussion and conclusion a thorough review of literature has been conducted to provide detailed discussion on the identified research questions. the study has given priority for the android platform to discuss certain points like tools and techniques. this has been done to specifically mention few points with example as there a number of mobile platforms in use these days and android is considerably one of the popular one. though the reference is on android platform, the challenges and importance of mobile application testing when performing automated testing, as discussed under the a and d subsections of the third section is common to all the mobile application and certain points in the b and c subsections can also be taken into consideration for other platforms too. as the main idea behind this study was the target on the practitioners of automated testing, this study would definitely be useful for them to gain certain knowledge on the topic and the researchers in this field too would be benefited from the findings of this study. as of that, the authors wish to summarize the following to the practitioners of the automated software testing on mobile applications as the best practice to be adopted and these facts are based on the literature review carried out for this study. literature reveals that all the software testing techniques involve in ensuring a quality product, before the implementation of any such technique to evaluate a software product, the particular approach must be well studied as it may be the ideal approach for the testing process. based on the way the testing process is carried out, testing is classified as automation and manual. although the potential benefits of using automation techniques are higher compared to manual testing, it is always advisable to look in to when and what to auto-mate and whether the particular approach can well define the needs. although from low scale to high-end scale business have adapted mobile as a source to accomplish their daily tasks professionally. but, today the enterprises/businesses are more concerned with application that scales automatically as per the data grow along with high availability of access across the globe. its functionality, usability, and consistency, these all characteristics can be evaluated, by performing automation or manual testing for 9 f. n. musthafa#1, s. mansur2, a. wibawanto3, o. qureshi4 july 2021 international journal on advances in ict for emerging regions high quality of application, end-to-end testing is also very important to ensure application downloadable effectively, works seamlessly, and no lag during screens transitive which makes mobile application testing different from other platforms but complex and lengthy for testers due its complex lifecycle and detailed process and understanding of back-ground threads and so. at the initial phase, practitioners could commence the automation testing using tool like monkey because it is already included in standard android developer toolkit and does not need any additional requirements. monkey is also very popular and has support from google, the owner of android. no single strategy alone seems to be effective enough to cover all behaviors; a combination is more effective. random technique can be used for stress testing, record and replay is suitable for regression testing, and for application that is complex and has many ui activities model-based can be considered. in terms of challenges as we can see challenges in mobile application are huge in number and complex in nature, it is either required to plan a test strategy that is mobile-specific, or else we may over-look crucial areas of testing like how network connectivity (or lack thereof) distresses an application, how screen resolution and orientation changes could spoil a user’s whole experience, and whether our application accomplishes what users of a particular device have come to expect, or, we may opt for something like what google, the big blue chip, is researching on, which is, modular phones. as an effort to come up with an approach that amalgamates most benefits of the other approaches, google endeavors to introduce new modular phone. a modular phone with working userinterchangeable components can let the users to upgrade their mobile easily and efficiently as all main components are interchangeable via modules that click in and out; this can facilitate testing process as well. apart from the best practices suggested for the practitioners, the researchers in the field would be suggested for using this study for any domain specific research related to this field as the study gives detail specifications under the topics discussed. as the authors take this as the starting point for further research to be carried out on the compound testing strategies ideal for automated software testing for mobile application, the future research on this study could be expanded based on the software testing phases where automation can be implemented while automating the whole procedure of software testing too. apart from this, the researchers in this field could also investigate the challenges specific to mobile application testing and whether it could be mitigated with automated tools. also, as this scoping review mainly refer to the android platform, future studies are also encouraged on other platforms. acknowledgment the authors of this paper work take this opportunity to thank all those from the department of software engineering, university of malaya, malaysia who helped throughout this research work. the authors also complement and acknowledge the reviewers and the panel of the icter2020 conference for suggestions and ideas for improvements as the concept of this paper was presented as a poster at the icter 2020, colombo, sri lanka. references [1] akour m., ahmed a., falah b., bouriat s., alemerien k. (2016), “mobile software testing: thoughts, strategies, challenges, and experimental study”, vol. 7, no. 6, 2016. [2] kitchenham, b., pretorius, r., budgen, d., pearl brereton, o., turner, m., niazi, m., & linkman, s. (2010). systematic literature reviews in software engineering – a tertiary study. information and software technology, 52(8), 792-805. doi: 10.1016/j.infsof.2010.03.006 [3] amalfitano d., fasolino a., and tramontana p. (2011), "a gui crawling-based technique for android mobile application testing". software testing, verification and validation workshops (icstw), 2011 ieee fourth international conference on, march 2011, pp. 252– 261. [4] amalfitano d., fasolino a. r., tramontana p., de carmine s., and memon a. m. (2012), "using gui ripping for automated testing of android applications". proceedings of the 27th ieee/acm international conference on automated software engineering, ser. ase 2012. essen, germany: acm, 2012, pp. 258–261. [5] amalfitano d., fasolino a., tramontana p., ta b., and memon a. (2014), "mobiguitar–a tool for automated model-based testing of mobile apps". 2014. [6] ammann p.,and offutt j. (2008), “introduction to software testing”. cambridge university press the edinburgh building, cambridge cb2 8ru, uk, published in the united states of america by cambridge university press, new york, page: 10 [7] anand s., naik m., harrold m. j., and yang h. (2012), "automated concolic testing of smart-phone apps". proceedings of the acm sigsoft 20th international symposium on the foundations of software engineering, ser. fse ’12. cary, north carolina: acm, 2012, pp. 59:1–59:11. [8] android monkey. retrieved on 5 january 2020, from: https://developer.android.com/studio/test/monkey [9] android monkey runner. retrieved on 5 january 2020, from: https://developer.android.com/studio/test/monkeyrunner. [10] monker recorder. retrieved on 06 january 2020, from : https://developer.android.com/studio/test/monkeyrunner. [11] android testing framework. retrieved on 5 january 2020, from: http://developer.android.com/guide/topics/testing/index.html. [12] “espreso”. retrieved on 7 january 2020, from https://developer.android.com/training/testing/espresso. [13] azim t. and neamtiu i. (2013), "targeted and depth-first exploration for systematic testing of android apps". proceedings of the 2013 acm sigplan international conference on object oriented programming systems languages and applications, ser. oopsla ’13. indianapolis, indiana, usa: acm, 2013, pp. 641–660. [14] baride s., dutta k. (2011), “a cloud based software testing paradigm for mobile applications”, acm sigsoft software engineering notes, vol. 36, no.3, may 2011. doi: 10.1145/1968587.1968601 [15] bhuarya p., nupur s., chatterjee a. and thakur r.s. (2016), “mobile application testing: tools and challenges”, international journal of engineering and computer science issn: 2319-7242, volume 5 issue 10, oct. 2016, pp. 18679-18681. doi: 10.18535/ijecs/v5i10.57 [16] choi w., necula g., and sen k. (2013), “guided gui testing of android apps with minimal restart and approximate learning”. proceedings of the 2013 acm sigplan international conference on object oriented programming systems languages and applications, ser. oopsla ’13. indianapolis, indiana, usa: acm, 2013, pp. 623–640. [17] choudhary s. r., gorla a. and orso a. (2015), "automated test input generation for : are we there yet". to appear at the 30th international conference on automated software engineering, ser. ase ’15, 2015. [18] dilger e. d, (march 13, 2018), the mystery of crashing apps on ios and android. retrieved on 11 april 2020 from: https://appleinsider.com/articles/18/03/13/the-mystery-of-crashingapps-on-ios-and-android [19] de souza, l. s. and de aquino, g. s. (2014), “mobile application development: how to estimate the effort?”, b. murgante et al. (eds.): iccsa 2014, part v, lncs 8583, pp. 63–72, 2014. springer international publishing switzerland 2014. [20] elliott d, (mar 9, 2018), “a guide to the google play console” retrieved on 5 march 2020, from: a scoping review on automated software testing with special reference to android based mobile application testing 10 july 2021 international journal on advances in ict for emerging regions https://medium.com/googleplaydev/a-guide-to-the-google-playconsole-1bdc79ca956f. [21] espresso, retrieved on 4 march 2020, from: https://google.github.io/android-testing-supportlibrary/docs/espresso/index.html. [22] enge e., (april 11, 2019), mobile vs desktop traffic in 2019, retrieved from https://www.stonetemple.com/mobile-vs-desktopusage-study/. [23] gao, j., bai, x., tsai, w. t., and uehara, t. (2014). mobile application testing: a tutorial. computer. https://doi.org/10.1109/mc.2013.445 [24] garousi, v., and mäntylä, m. v. (2016). “when and what to automate in software testing? a multi-vocal literature review”. information and software technology, 76(april), 92–117. https://doi.org/10.1016/j.infsof.2016.04.015 [25] garousi, v., and zhi, j. (2013). “a survey of software testing practices in canada”. journal of systems and software. https://doi.org/10.1016/j.jss.2012.12.051 [26] gomez l., neamtiu i., azim t., and millstein t. (2013), "reran: timing-and touch-sensitive record and replay for android". software engineering (icse), 2013 35th international conference on. ieee, 2013, pp. 72–81. [27] hao s., liu b., nath s., halfond w. g., and govindan r. (2014), "puma: programmable ui-automation for large-scale dynamic analysis of mobile apps". proceedings of the 12th annual international conference on mobile systems, applications, and services, ser. mobisys ’14. bretton woods, new hampshire, usa: acm, 2014, pp. 204–217. [28] hooda, i., and singh chhillar, r. (2015). “software test process, testing types and techniques”. international journal of computer applications, 111(13), 10–14. https://doi.org/10.5120/19597-1433. [29] hu c. and neamtiu i. (2011), "automating gui testing for android applications". proceedings of the 6th international workshop on automation of software test, ser. ast ’11. waikiki, honolulu, hi, usa: acm, 2011, pp. 77–83. [30] jensen c. s., prasad m. r., and møller a. (2013), "automated testing with targeted event sequence generation". proceedings of the 2013 international symposium on software testing and analysis, ser. issta 2013. lugano, switzerland: acm, 2013, pp. 67–77. [31] jorgensen p,c.. (2014). “software testing a craftsman’s approach”. guest editors introduction, ieee computer (vol. 47). doi: 10.1109/test.1991.519785. [32] joorabchi m. e., mesbah a. and kruchten p. (2013), “real challenges in mobile app development,” ieee international symposium., baltimore md. 2013, pp. 15-24. doi: 10.1109/esem.2013.9. [33] kaur a. (2015), “review of mobile applications testing with automated techniques”, international journal of advanced research in computer and communication engineering vol. 4, issue 10, october 2015. doi: 10.17148/ijarcce.2015.410114. [34] kaur a. and kaur k. (2018), “systematic literature review of mobile application development and testing effort estimation”, j. king saud univ. comput. inf. sci.. [35] kirubakaran b., karthikeyani v. (2013), "mobile application testing— challenges and solution approach through automation", proc. ieee int. conf. pattern recognit. inform. mobile eng., pp. 79-84. [36] kochhar, p. s., thung, f., nagappan, n., zimmermann, t., and lo, d. (2015). “understanding the test automation culture of app developers”. ieee 8th international conference on software testing, verification and validation, icst 2015 proceedings, april 13–17. https://doi.org/10.1109/icst.2015.7102609 [37] kong p., li l., gao, j., liu, k., bissyande t. f., and klein j. (n.d.), “automated testing of android apps: a systematic literature review”. [38] lee g., (november 23, 2014), “ios > android: view life cycle, retrieved from http://gregliest.github.io/mobile/view-controllerlifecycle/" [39] li, y. f., das, p. k., and dowe, d. l. (2014). two decades of web application testing a survey of recent advances. information systems, 43, 20–54. doi: 10.1016/j.is.2014.02.001. [40] machiry a., tahiliani r., and naik m., "dynodroid: an input generation system for android apps". proceedings of the 2013 9th joint meeting on foundations of software engineering, ser. esec/fse 2013. saint petersburg, russia: acm, 2013, pp. 224–234. [41] mahmood r., mirzaei n., and malek s. (2014), "evodroid: segmented evolutionary testing of android apps". proceedings of the 2014 acm sigsoft international symposium on foundations of software engineering, ser. fse ’14. hong kong, china: acm, november 2014. [42] mehlitz p., tkachuk o., and ujma m. (2011), "jpf-awt: model checking gui applications". proceedings of the 2011 26th ieee/acm international conference on automated software engineering, ser. ase ’11. washington, dc, usa: ieee computer society, 2011, pp. 584–587. [43] memon a., banerjee i., and nagarajan a. (2003), "gui ripping: reverse engineering of graphical user interfaces for testing". proceedings of the 10th working conference on reverse engineering, ser. wcre ’03. washington, dc, usa: ieee computer society, 2003, pp. 260–. [44] memon a. m., pollack m. e., and soa m. l. (2000), "automated test oracles for guis". proceedings of the 8th acm sigsoft international symposium on foundations of software engineering: twenty-first century applications, ser. sigsoft ’00/fse-8. san diego, california, usa: acm, 2000, pp. 30–39. [45] méndez-porras a., quesada-lópez c. and jenkins m. (2015), “automated testing of mobile applications: a systematic map and review”, conference paper, april 2015. [46] monkeyrunner. retrieved on 11 march 2019, from: http://developer.android.com/tools/help/monkeyrunnerconcepts.html. [47] mohammed z., shamlan a., hazeera., rizny a. (n.d.), "challenges in mobile application testing in the context of sri lanka". [48] most popular operating system. retrieved on 17 march 2019, from: http://gs.statcounter.com/os-market-share. [49] muccini, h., francesco, a., and esposito, p. (2012), "software testing of mobile applications: challenges and future research directions", proc. 7th int. workshop autom. softw. test, ast 2012, zurich, switzerland), ieee, pp. 29-35. [50] nguyen b. n., robbins b., banerjee i., and memon a. (2014), "guitar: an innovative tool for automated testing of gui-driven software". automated software engineering, vol. 21, no. 1, pp. 65–105, 2014. [51] orso, a., and rothermel, g. (2014). software testing: a research travelogue (2000–2014), 117–132. https://doi.org/10.1145/2593882.2593885 [52] robolectric. retrieved on 4 march 2019, from: http://pivotal.github.com/robolectric/. [53] robotium. retrieved on 15 march 2019, from: http://code.google.com/p/robotium/. [54] sasnauskas r. and regehr j. (2014), "intent fuzzer: crafting intents of death". proceedings of the 2014 joint international workshop on dynamic analysis (woda) and software and system performance testing, debugging, and analytics (pertea). acm, 2014, pp. 1–5. [55] segue technologies (april 15, 2015), why is mobile application testing important? retrieved on 5 march 2019, from: https://www.seguetech.com/why-mobile-application-testingimportant/. [56] stackoverflow (2015), retrieved on 8 march 2019, from: https://stackoverflow.com/questions/28969032/what-the-equivalentof-activity-life-cycle-in-ios. [57] thu e. e., aung t. n., (august 2015), “developing mobile application framework by using restful web service with json parser”, genetic and evolutionary computing: proceedings of the ninth international conference on genetic and evolutionary computing, august 26-28, 2015, yangon, myanmar volume ii. [58] ui automator. retrieved on 14 march 2020, from: http://developer.android.com/tools/testing-support-library/index.html. [59] white l. and almezen h. (2000), "generating test cases for gui responsibilities using complete interaction sequences". software reliability engineering, 2000. issre 2000. proceedings. 11th international symposium on, 2000, pp. 110–121. [60] yang w., prasad m. r., and xie t., "a grey-box approach for automated gui-model generation of mobile applications". proceedings of the 16th international conference on fundamental approaches to software engineering, ser. fase’13. rome, italy:springer-verlag, 2013, pp. 250–265. [61] zaeem, r. n., prasad, m. r.,and khurshid, s. (2014). automated generation of oracles for testing user-interaction features of mobile apps. proceedings ieee 7th international conference on software testing, verification and validation, icst 2014, 183–192. https://doi.org/10.1109/icst.2014.31 [62] zhang, d. and adipat, b. (2005), “challenges, methodologies, and issues in the usability testing of mobile applications”, international journal of human–computer 11 f. n. musthafa#1, s. mansur2, a. wibawanto3, o. qureshi4 july 2021 international journal on advances in ict for emerging regions interaction, 18(3), 293–308 copyright © 2005, lawrence erlbaum associates, inc. doi: 10.1207/s15327590ijhc1803_3. [63] franke, d., elsemann, c., kowalewski s., and weise, c. (2011). “reverse engineering of mobile application lifecycles”. 18th working conference on reverse engineering. [64] 63. l. malisa, k. kostiainen, m. och, and s. capkun, “mobile application impersonation detection using dynamic user interface extraction,” in proceedings of the eur. symp. res. comput. secur., 2016, pp. 217–237. [65] 64. j. c. j. keng, l. jiang, t. k. wee, and r. k. balan, “graph-aided directed [66] testing of android applications for checking runtime privacy behaviours”. proceedings of the ieee 11th int. workshop automat. softw. test, 2016, pp. 57–63. [67] l. clapp, o. bastani, s. anand, and a. aiken, “minimizing gui event traces”. proceedings of the acm sigsoft int. symp. found. softw. eng., 2016, pp. 422–434. [68] y.-m. baek and d.-h. bae, “automated model-based android gui testing using multi-level gui comparison criteria”. proceeding of the int. conf. automated softw. eng., 2016, pp. 238–249 [69] h. zhang, h. wu, and a. rountev, “automated test generation for detection of leaks in android applications”. in proceeding of ieee 11th int. workshop on automat. softw. test, 2016, pp. 64–70. [70] z. qin, y. tang, e. novak, and q. li, “mobiplay: a remote execution based record-and-replay tool for mobile applications”. proceeding of ieee/acm 38th int. conf. softw. eng., 2016, pp. 571–582 [71] k. moran, m. linares-vásquez, c. bernal-cárdenas, c. vendome and d. poshyvanyk, "automatically discovering, reporting and reproducing android application crashes," 2016 ieee international conference on software testing, verification and validation (icst), chicago, il, 2016, pp. 33-44, doi: 10.1109/icst.2016.34. i. introduction ii. research method a. research questions b. search strategy the search for study was conducted in the electronic search space. electronic databases were explored with the target of research questions based on key word search approach. to identify the suitable literature, inclusion and exclusion criteria was us... c. information sources popular scientific databases were chosen to conduct the electronic search and retrieve the relevant literature for this review. the databases include ieee xplore, acm digital library and sciencedirect. additional records were also found via google sch... d. search terms the scope of the study being little broader without fixed taxonomy, a range of search strings were used as keywords across the electronic search space. the keywords defined were used with boolean combinations such as “and” and “or” with the aim of min... e. inclusion and exclusion criteria iii. literature review and key findings a. software test automation b. importance of automated mobile application testing c. tools and techniques used in automated software testing of mobile applicationsin android platform d. challenges in automated software testing on mobile applications iv. discussion and conclusion acknowledgment references  international journal on advances in ict for emerging regions 2011 04 (01) :26 38  abstract — in much literature on social capital, it has been a widely held assertion that networks of informal relations are beneficial for the development of local regions as well as larger nations. in the last decade mobile communication tools have rapidly saturated several markets in asia, promising to contribute positively to social capital development. this might have important implications for small businesses that in many cases have limited access to pcs. yet, very few studies have managed to conduct empirical studies of how mobile phones actually are used among small enterprises, and how this affects on the way they handle their business relations. this paper discusses about an in-depth study of small malaysian retailer’s use of mobile phones to build up and sustain their business related connections. based on qualitative interviews and tracking of their mobile calls /messages we found that the mobile phone was mainly used to support stronger ties in their networks. the boundaries between family, friends and business relations are highly blurred and friends and families are the main communication partner for the mobile use. the mobile phone is also extensively used to coordinate business internal tasks during the day and in-between face-to-face meetings with suppliers. index terms — retailers, malaysia, smes, mobile communication, social networks. i. introduction mall retailers are a familiar sight wherever you go in asia. small shops selling food, beverages, t-shirts, souvenirs and much more are so numerous in most cities that it is practically impossible to avoid running into them. despite their omnipresence the smaller retail enterprises often get little attention in public debate and their economic impact is often neglected in literature about smes (small and medium enterprises). yet the retail enterprises are overwhelming in their volume, in malaysia as in most asian countries. in malaysia the thousands of small retailers are not only central to provide the population with workplaces; but they play a crucial part in meeting the bulk of malaysians‟ needs for food, clothing, everyday services, and much more. manuscript received november 11, 2010. recommended by prof. maria lee on may 20, 2011. tom e. julsrud was with telenor research and innovation center, asia pacific (tricap, selangor malaysia. he is now with norwegian centre for transport research, no-0349 oslo, norway (e-mail: tej@toi.no) even though small retailers‟ investments in technical equipment are often minimal, they have,as most small enterprise owners in asia , proved to be early adopters of mobile phones [1-4]. today, it is close to impossible to find a malaysian retailer without a mobile phone or two within reach. mobile phones or “hand phones” as they are usually called in this part of the world , have become part of retailer‟s standard equipment alongside paper, pencils, fax machines and cash registers. in most papers dealing with communication technologies and enterprises,the use of mobile phone is seen as a part of the general uptake and use of information and communication technologies (ict). as such it is has been warmly greeted by politicians as well as researchers as a way to make the small enterprises more efficient and competitive [5]. during the last decade, the advent of mobiles has triggered hope that this can be used to strengthen the small enterprises‟ role in the economy. alongside other initiatives, increased uptake and use of communication technology is believed to be one central way to help malaysian smes to become more efficient, innovative and competitive [6]. in a report from undp, it is hoped that ict will help to connect asian smes into more efficient industrial clusters: “the vast potential of ict to assist smes lies in their capacity to instantaneously connect vast networks of sme´s across geographical distances at very little cost “ (undp 2003, p 33). however, mobile phones may not necessarily work in the same way as computers, and in fact a large group of smes in asia does not even have access to a pc at their workplace. retailers are in general much more dependent on mobiles than they are on computers. if we are to understand the role of mobile phones for retailers in malaysia and elsewhere, a more nuanced approach to the analysis of mobile phones is necessary. this paper addresses this issue and sets forth to investigate the way mobiles are used in this group of users, that are among the most numerous type of smes in asia, and more precisely the way it is used to support the social relations and networks that connect their enterprise to the wider networks of partners and collaborators. this paper then links up to the discussion of how mobiles can support and strengthen the social capital of small retailers and in society as such [7-10]. based on a mobile phones and the development of social capital among small malaysian retailers tom e. julsrud s tom e. julsrud 27 december 2011 international journal on advances in ict for emerging regions 04 small set of cases, this study provides insight into the kind of communication that is supported through the use of mobiles, including sms as well as voice communication. the structure of the paper is as follows: in section ii,we will outline the role of small retailers in malaysia and explain why it is interesting to study social networks related to these companies. section iii outlines the research approach we use in this study and we will rely on a very general framework, proposing that social relations represent social resources for individuals and enterprises in the form of social capital, that can be developed, sustained and mobilized by the use of mobile phones and other available ict. section iv presents the methodology used, and in section v, case studies of small retailers situated in the selangor district of malaysia are presented. this chapter has a descriptive style, drawing on qualitative as well as quantitative data. in the sixth part we will discuss the findings in more detail, highlight some of the core insights, and discuss the role mobile phones seems to play in supporting the retailers‟ relations and (therefore also) their social capital. ii. small retail enterprises in malaysia small enterprises are not a very homogenous group in malaysia or elsewhere. often lumped together within the overall small and medium sized enterprise‟ category (sme), this classification tends to mix companies of very different types and qualities. 1 in general however, small enterprises are expected to have fewer than twenty employees. micro enterprises in malaysian statistical sources are usually considered as a subgroup including enterprises with five or less employees. even though the definitions of small enterprises often vary across national statistics, there is no doubt that this group represents a large bulk of enterprises in most asian economies. overall about 99.2% of all malaysian smes are categorized as smes [12]. small enterprises are usually found within the three broad sectors: agriculture, manufacturing and services. retailers represent the largest groups within the latter category and there are at least 150,000 retailers in malaysia today [3]. in addition, there exist a significant number of informal retailers, operating outside the official statistics. the exact number for this for malaysia is difficult to estimate, but looking at similar numbers in other asian countries, it is likely to be significant [13]. even if this sector involves a lot of 1 different definitions are used to classify small and medium sized enterprises (sme). in malaysian statistics sme is used to describe enterprises that have less than 50 employees and a turnover less than 5 million rm. medium sized enterprises have from 20-50 employees, small enterprises from 5-19 employees and micro enterprises less than 5. criteria for turnover are sometimes also used in addition or as a supplement to numbers of employees. (see also saleh and ndubusi . european statistics tends to include larger enterprises in definitions of micro-enterprises, often less than 10 employees [11] companies and employees, the retail sector consists of many small units: over 85 % of all retailers have less than five employees and over 99.2 % have less than 20 employees. fig.1. overview of number of small enterprises and category, malaysia 2005 (source: smidec) the large number of small players makes the retail sector flexible, but also vulnerable. it is flexible because it can adapt quickly to change in accordance with shifts in the economy; small companies are easy to set up but also easy to close down if necessary. in many cases, small retailers have more than one business going on, so phasing out one does not have to be highly dramatic. however, it is also vulnerable because smaller companies might have fewer resources to spend on building up competence and knowledge necessary to increase competitiveness [14, 15]. their capacity to spend time and resources on for instance implementing new technical solutions is usually very limited. thus many small retail companies run the risk of being wiped out by larger retail stores offering cheaper products of increased quality [3, 6]. the fact that retailers in general are organized as multiple small entities makes interaction and collaboration with other parties particularly important. small retailers usually need to be in contact with at least a handful of suppliers or wholesalers, as well as getting continuous help from friends and family members. as we will come to the later part of this paper, there are often different types of partners or contact persons involved as well. a particular common constellation in asia is the family businesses, where the boundaries between family members and employees are highly blurred. in such businesses family ties often connect clusters of small companies that collaborate within certain branches, as when father, sons and in-laws establish similar businesses within the same sector. such enterprises are famous for putting a high value on a dense network of informal social relations (“guanxi”) [1, 16, 17]. although malaysia is a multicultural nation, the chinese part of the population is dominant amongst the small retail shops. this makes the chinese business culture a part of the cases we discuss in this paper. we will come back to this in the last part of the paper. 0 50 000 100 000 150 000 200 000 250 000 300 000 350 000 400 000 450 000 500 000 service manufacturing agriculture n u m b e r o f e n te r p r is e s small & medium micro 28 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 iii. sme in a network perspective a. social capital theories social networks have for a long time been recognized as a central theoretical and methodological approach to studies of small enterprises [18-20]. several recent studies have suggested that the social relations cutting across small enterprises are a vital element for sharing of knowledge and business innovations [21-23]. such benefits are often described as social capital, i.e. a form of resources embedded in a particular set of social relations and networks operating across the boundaries of the individual enterprise. different variations of social capital theory exist, emphasizing structural, relational or cultural aspects of these relations [24-26]. a full review of the large theoretical field of social capital falls beyond the ambitions of this paper. for the purpose of this work, it is sufficient to note that according to theories of social capital, social relations represent resources that can later give returns. as a form of capital, the relational (social) capital is however less tangible and less of an individual possession than money, stocks or property. needless to say, perhaps, this type of capital is also much more difficult to analyze and measure 2 . social capital oriented approaches to organizations are frequently linked to other theoretical positions, such as resource dependency theory [27, 28]. the resource dependency theories are a point of departure for nan lin, who explicitly locates social capital in the tradition of neocapital theory as well as in granovetter‟s description of network as important embedded resources for economic actors [28-30]. from his point of view, social capital is seen as investment that actors make in social relations with the hope and expectancy of some form of return at a later stage. these are returns that go beyond what is usually accounted for by traditional “capital”. in general it facilitates four types of returns: the flow of information, that can be used to exert influence on other agents, it helps build up social credentials and reinforces identity and recognition. for instance by forging a tie with a specific other sme, a retailer may get access to useful information about a supplier or wholesaler at a later stage. or by forging a tie with a powerful wholesaler it might build up his reputation as a trustworthy partner, increasing his social credentials and reputation. on an analytical level, one may distinguish between both instrumental and expressive returns: the first type includes economic and political returns; the second includes returns in the form of mental and physical health and other forms of life satisfaction (lin 1998). following a resource dependency approach, we may further draw a distinction between network structure, accessibility 2 a term that is related to social capital is network capital. according to larsen, urry and axhausen, this encompasses all artifacts that increase the accessibility of ties in social networks. in this perspective the mobile phone is the network capital, along with trains, cars, roads, etc. see also rettie on mobile phones as a particular form of network capital. of resources in a network, mobilization of these resources and potential returns. in contrast to some other approaches, this approach perceives an actor‟s position in a network (or a particular structure), not as social capital per se, but as factors influencing the ability to gain access to resources. the accessibility of resources, then, is analytically separated from general collective assets such as trust and social norms, even though these affect on the ability to access resources in a network (see model below). finally the accessibility is seen as distinct from the possible mobilization of these resources and the returns that they can give. fig 2. a general social capital framework figure 2 shows the rough conceptual model that has been useful to our study, based on the conception of social capital as a resource. trust is here seen as a quality of a particular dyadic tie or as a generalized norm perceived as one crucial factor that affects on the accessibility of resources in a network. in sme networks where trust is high, more resources can be expected to circulate between partners, and less effort is needed to check or secure against fraud or cheating [31, 32]. we would like to add, however, that (increased) trust also can be seen as an important return, usually strengthening positive feelings and what giddens calls “ontological security” [33]. 3 clearly, it is possible to think about relations as resources not only for individuals, but also for the larger organization and constellations of organizations. social capital can also represent collective goods. most theories in this field agree that social capital operates on both micro and macro levels, even though the boundaries between these levels are indeed fuzzy 45 [25]. it is frequently held that smes connected in larger clusters may represent significant benefits for people working in the enterprises, the organizations, as well as for the larger region they belong to [37, 38]. still, even if social 3 the role of trust for network developments among enterprises is more closely described elsewhere. 4 definitions of social capital often try to capture the multiple levels. nahapieth and ghoshal, for instance, state that this is: “the sum of the actual and potential resources embedded within, available through, and derived from the network of relationships possessed by an individual or social unit” [24]. 5 in organizations, a distinction is sometimes made between individual and corporate social capital [36]. tom e. julsrud 29 december 2011 international journal on advances in ict for emerging regions 04 capital figures on a group or institutional level, it is the employees and managers in the enterprises who make the maintenance and reproduction of social capital possible. it therefore makes sense to analyze social capital as individual relations and networks. this paper focuses mainly on the types of relations and networks the managers and employees access through their daily usage of mobile communication. we will however direct attention to the mobiles as a tool to develop accessibility and mobilize resources. b. type of relations what kind of relations is important for small enterprises? following economically oriented studies, on an overall level, a distinction is often drawn between three general types of relations: informal social relations, market relations and hierarchical relations [39]. the difference and current transformation between these networks in modern markets is widely discussed in economic literature. for instance, recent contributors have argued that informal relations are becoming more and more important in modern organizations, at the expense of the other two, propelling a more community oriented type of organization [40]. most business relations consist of a mix of these dimensions rather than being purely market based, hierarchical or social [41]. in particular, relations in small enterprises tend to blur the distinction between the private/social, market related and hierarchical. as already mentioned, the informal relations tend to play a particularly important role for many small enterprises in asia [1]. not only do the different coordination mechanisms of market, hierarchies and informal ties play a role when targeting the small enterprises networks, but also the quality of the ties should be considered important. from a more actor oriented point of view, a branch of studies has focused on the quality and the quantity of the enterprise managers‟ individual social networks [42]. one central finding from these studies is that because relations are created by processes of ongoing interaction, their structure fluctuates and their boundaries are usually “fuzzy” [43]. and because the managers are embedded in such fluctuating networks, it is unlikely that they will make decisions about their firms in isolation of these influences. another central finding is that the size and constellation of the individual network effect on the efficiency of the manager and their company. for entrepreneurs, the value of having a rich network of contact persons has been documented as vital, in particular in initial stages of the development [44]. still, these personal networks are not assembled within a single type of relations, but are found in private realms of life as well as among business colleagues or partners. the core area of interest for the bulk of social network studies lies within the realm of informal social networks. such relations are in turn often categorized as stronger or weaker, where the weaker form is typical for occasional interaction between acquaintances while the stronger form is more typical for relatives and close friends 6 . much research has elaborated on the distinction between strong and weak ties and a general argument coming out of the „strength of weak tie‟ hypothesis is that weaker ties have benefits related to giving access to new information that make them particularly important for knowledge development and information access. on the other hand, stronger ties have proven to be more important for the development of trust and stability as well as the transfer of tacit knowledge [4648]. still, it is difficult to make an absolute distinction between strong and weak qualities of relations. in practice, most work-based relations fall somewhere “in between” the weak and strong ties [49, 50]. in small enterprises, most weak ties will typical go beyond the individual enterprise towards people in other businesses or organizations. stronger ties are on the other hand, more common for the internal relations of colleagues and managers. as in all organizations, however, the formal ties and the more market oriented relations will interfere with the more informal ties. in network studies, some bodies of work have been particularly occupied with analyzing the overall structure of ties, related to social capital. from a whole-network point of view, social capital is a product not only of the kind of relation but the constellation of these ties in denser or more open structures [48, 51]. the value of denser communities is sometimes contrasted with the benefits of more open and wide spanning networks connecting distant clusters. still, empirical evidence suggests that the value of particular constellations must be seen in relation also to the type of relations involved and the particular type of organizational context involved [46, 52]. following the framework suggested earlier, however, structure and structural positions are not social capital per se, but important factors impacting on the accessibility of resources. in summary then, an examination of the small businesses networks important for the development of social capital, should include relations based on informality (like friendship), on hierarchies (like manager-subordinate), on markets (buyer-seller) and any combinations of these. the intensity of these relations will vary over time, as will the particular quality of a relation. for example; a business relation can turn into a friendship and a friendship might develop into a family tie. following the social capital theoretical framework outlined above, such networks represent resources that can be mobilized for the return of instrumental and expressive returns. whether the mobile phone actually can play a role in these processes, is the topic for the subsequent parts of this paper. 6 the concept was initially proposed by mark granovetter. according to granovetter strong ties can be described as “ties that have a combination of long duration, high emotional intensity, intimacy and reciprocity “ [45]. 30 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 c. social networks and mobile communication according to most network approaches regular communication between partners is a criterion for the development of interpersonal social relations. the general point of view shared by most network analysis (and social capital theories) is that communication between humans may over time develop into relations and networks [53-56]. following this line of thought, modern ict and mobile telephones, representing ‟communication enhancing devices„, should have much to offer small enterprises in developing, strengthening or exploiting their social capital. compared to a landline phone, the mobile not only have the benefit of being portable; it usually also have several additional functions, like address book, texting options, and more (dependent on model and network available). and recently, due to the prepaid subscription offers, it has also been a cheaper option than the landlines. some recent studies have provided evidence that there actually is a positive link between social capital and mobile phone usage on a community level. goodman found in two (non-representative) surveys from south africa and tanzania that mobile phone users were more active in participating in local community groups than non-users. the mobile users also had a wider network of both weaker and stronger relations than non-mobile users [7]. action research with female entrepreneurs in india also seems to support the idea that the mobile can be used to develop social relations and increase social capital [57]. referring to the general model above, the mobile phone offers opportunities to strengthen enterprises‟ social capital in at least three different ways: first it can be used to invest in social relations and resources. in this way it is a tool for keeping relations alive and “giving” partners and collaborators attention and support. mobile phones may make it more easy and convenient to develop and sustain social relations that are at a distance, and sms and mms have proven to be important as a tool to sustain both affective and instrumental communication, even at work [58, 59]. second, it can be used to increase their accessibility to others. just by having a phone, the employees or managers can more easily be reached by millions of others even when outside offices. much like email and social network software, a mobile phone opens up a world of latent ties, i.e possible contacts [60]. third, it can be used to mobilize resources in a pre-existing social network. given that a manager or employee already have access to a certain number of connections, the mobile phone can make it easier to exploit these resources in a very short time. in practice, it is difficult to distinguish strongly between these processes. a typical communication situation between partners or friends may include a mix of different purposes. however, insights indicating whom employees and managers in smes call or send messages to during the day can give some indications as to what kind of resources they access with their mobile. the study presented here analyses these mobile supported networks by quantitative as well as qualitative means. since we do not have much information on the content of the communication, it is difficult to analyze what kind of returns that they actually get from the mobile-based contacts. our focus here is therefore focusing on what kind of relational resources that is addressed are developed by the use of mobile phones dialogues and messages. iv. methodology a. comparative case studies this study is based on a series of case studies 7 of small enterprises in malaysia. comparative case studies usually deploy a cross-case design, where several similar cases are compared [61, 62]. typically such studies intend to explain similarities between different cases, or differences between similar cases. in any way the cases are connected, or part of the same group, in one way or another. a distinction is usually made between studies that are case-based or variable-oriented [62]. in this paper a „mixed strategy‟ is used, where case-based and variable-based methods are combined [62]. first, we will focus on the cases one by one. second, we go on to analyze certain variables cutting across all cases related to their social network use. in this second phase, the analysis thus becomes more variable-oriented and focused on developing theory. b. case selection the cases in this paper are selected on the basis of their general belonging to the category of „small malaysian retailers‟ . moreover, they are all located in the kuala lumpur district, meaning that at least the general national and regional cultural context can be expected to be fairly similar across the cases. the cases differ, however, along the dimension of size and branches: two of the cases are smaller (below five employees), one is slightly bigger. they are also picked from two different sub-segments: food and beverage, and motor equipment and vehicles. this “slight diversity” was allowed to capture some of the diversity in practices within the segment “retailers”. 8 c. techniques and tools used the study is based on a mix of qualitative and quantitative techniques. for the qualitative part we conducted interviews with managers and with two to three employees in all 7 in general, case studies can be described as a research strategy aimed at studying a limited number of phenomena within their natural context. yin describes this as: “an investigation of a contemporary phenomenon within its real life context, especially when the boundaries between phenomenon and context are not clearly evident.” [61]. 8 note that the three cases presented in this paper are part of a study of a larger number of sme cases from different business segments. tom e. julsrud 31 december 2011 international journal on advances in ict for emerging regions 04 enterprises. for each enterprise three employees were selected randomly (to avoid a „managerial bias‟). the interview lasted for about 30-50 minutes and was recorded and later coded and analyzed with the use of alas-ti software. in addition, we asked the manager to draw his network of relations of relevance for his daily work. the quantitative part involved a short survey distributed to all employees in the companies. the questionnaire included a general information section asking about informants‟ age, type of work, media usage, subscription type, and work mobility. the second section was used to track the individuals‟ networks based on their most recent use of mobile phones. a traditional „ego-network design‟ was included to capture ingoing and outgoing phone calls, text messages (sms) and e-mails [63, 64]. rather than using a traditional name generator, however, we based the personal network on the actual incoming and outgoing mobile traffic. to do this, we first asked the informants to open the call directory in their mobile phone. then we asked them to read off and register the last ten ingoing and outgoing calls and messages, and classify them in accordance with seven predefined categories. the same procedure was then followed for sms. mapping media use as relational networks has the advantage that it gives a more detailed picture than traditional “average scores”, and it gives the researcher a better picture of how different media is used to connect to different nodes in a network. obviously, tracking media use based on record logs is a more reliable way of getting data than information from informants‟ memory [65]. there is always a risk that individuals do not register what they see in the log or that they change this to make them more „central‟ in the enterprise. we have no reason to believe, however, that this happened in our study. analyzing social networks based on mediated interaction runs the risk of being narrowly concerned about only a few media channels, forgetting about communication taking part in different media and channels [66]. we should not therefore expect the networks expressed through media-based to reflect a “correct” picture of social relations or communication going on in the enterprise. what it does give is an indication of what kind of relations that mobile phones and sms were used to support. here, we will compare the traffic diagrams with the information from the qualitative inquiry. v. case studies a. lhl wholesale general: this enterprise trades dry foods from a central location in ampang, outside the capitol of kuala lumpur (kl). the central food storage is an outlet selling to smaller retailers and individual customers. related to the enterprise is also a small network of shops located in the region. at the central premises orders from the smaller outlets are handled, as well as orders from drive-in customers. the enterprise comprises about 20 employees, of whom a significant part is foreign workers. for lhl there were basically two different kinds of customers coming to the store: small visitors buying goods or small retailers buying larger amounts of goods. both, however, required person-to-person interaction with the staff. work task: all retailers were involved in some manual work tasks in the shop or warehouse, including sending or packing the goods to customers. as for lhl, it involved filling up lorries and cars with the assigned products and filling up the stock with new goods as they arrived. the manager was very central in all work, even though he delegated some responsibility to the warehouse manager and the administrative officer. this officer was in charge of contact with a lot of foreign workers and the two drivers. the manager was the only one that communicated with the suppliers. much of the day-to-day work in the store was coordinated by informal communication among the workers. typically, they would turn to each other for immediate information about where certain items were located. a large group of foreign workers operated in the warehouse and the informal communication seemed to be more active among these workers. key relations: lhl was, according to the manager, a “typical chinese family business”. connected to the enterprise were five other outlets run by relatives of the lhl manager. this core network of family members was essential as much of the internal communication in the enterprise was structures between these brothers. in addition, a network of suppliers was important for lhl. this included 2-300 different producers of drinks, sweets, cigarettes etc. the manager of lhl also had strong ties to other small enterprises in the local area and elsewhere, and sometimes he would mobilize these contacts to buy larger stocks of goods at lower prices. to the core network we should add a stable network of customers, mainly small retailers in the kl district. media access: in lhl the mobile was used on a daily basis by all employees (see fig. 3). the manager used the mobile phone book as a “memory” when being contacted by customers. it is worth noting, however, that even though the mobile was the most used medium, there were pcs available in all enterprises, and at least by some it was used on a daily basis. the use of computers was, however, very basic, usually limited to surfing the net and exchanging emails. telefax was used daily to communicate with suppliers and other outlets. use of mobile across networks: looking at the phone calls conducted in the company for the last ten days, it turned out that sms and mobile calls followed very similar patterns. most calls were to family and friends and to colleagues at work (see fig 4). the mobile was a central tool for coordinating work tasks to the other outlets, to the drivers and to coordinate work at the shop floor. probably, a lot of 32 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 the calls to family members were due to the fact that partners in the firm were all family members. this is the case for the brothers operating outlets in other parts of the kl area. more importantly, however, many employees in this company were foreign workers. an important reason for them to buy phones, was to keep in contact with family members in their home country. as such, this makes the family/friends category very large. the similarity in the use of sms is striking, and an analysis of incoming and outgoing calls/messages also indicates a strong reciprocity in the inand outgoing calls/ messages. fig 3. daily media use among managers and employees in lhl percent. fig. 4. numbers of inand outgoing calls and smss among managers and employees in lhl. last 10 days. b. kang zheng general: kang zheng is a small outlet in the subang area of kl that trades herbs and traditional grocery goods. the main business is the selling of the herb medicine, while the grocery is a side business that gives them additional income. the shop is run as a small family business, including a man and wife as well as a young cambodian assistant. work tasks: most work is conducted at the small shop, where customers come and buy herbs recommended or prescribed by chinese doctors. in addition, there is administrative work related to ordering, billing etc. the family sometimes gets help and assistance from relatives running a grocery shop in a town nearby. key relations: the closest partner was the relatives nearby who did similar business. but also neighbors were counted in as important relations. the manager had established a close collaboration with doctors prescribing herbs as a part of the recommended treatment for patients. this had turned out to be a central part of the business for the small company. in addition the enterprise had a set of suppliers feeding the regular shop with food and beverages. media access: at kang zheng the mobile was used in combination with landline available in the shop (see fig. 5). most calls were made from the mobile, and both partners had individual mobile lines. the couple had a special arrangement: the man had a mobile phone that was used for work purposes, while his wife‟s mobile phone was used for private calls. a pc in the backroom was used by the husband/manager, mainly to surf the web looking for prices and products. fig. 5. daily media use among managers and employees in kz. percent fig. 6. numbers of in -and outgoing calls and smss among managers and employees in kz. last 10 days. media use across the networks: as with the previous enterprise, most calls were also here to/from friends and family members (see fig. 6). sms use is in general sent to the same persons as are being called. the suppliers, however, are supported more frequently by the use of mobile phone calls than by the use of sms. compared to lhl, the company internal calls are lacking but this is obviously due to the small size of the enterprise. 0 10 20 30 40 50 60 70 80 90 100 lhl mean 4 retailers 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 mobile voice sms 0 10 20 30 40 50 60 70 80 90 100 kang zheng mean 4 retailers 0 50 100 150 200 250 300 mobile voice sms tom e. julsrud 33 december 2011 international journal on advances in ict for emerging regions 04 c. meng chong motor this enterprise has been in the motorcycle repair business for 18 years. it span off from the husband of the owner‟s involvement in licensing motors and road tax payments. the business involves insurance, spare parts repairs, and buying and selling of motorbikes. as a side business, this small retailer also handles insurance of motor vehicles (mostly from walk-in clients) and helps some foreigners apply for an international license by sending remittances to indonesia. to help in the business, one senior mechanic and one temporary employed mechanic from myanmar are employed. work tasks: the main work processes are motor repair, ordering spare parts, handling motor insurances, renewing road tax, and collecting money. there is no record system for an inventory of spare parts. what is usually done is to separate parts by brand, find the parts when needed, and order when there is no more stock. key relations: this sme‟s network consists of three suppliers of spare parts, two insurance companies, an agent in indonesia for the remittance side business, and other shops within the area which do not have insurance products and buys from them. among the motor repair clients within the community, they have over 100 insurance customers every month. the network of customers was very local, where as a lot of the customers were returning ones. media access: the mobile was used on a daily basis as suppliers of spare parts call every day to check any orders. as for the other cases, the mobile was the dominant media that everyone had access to (see fig. 7). the senior mechanic received phone calls from suppliers to ask about stocks (ask about price, availability of stocks). a voice call was preferred because it is faster. the senior mechanic also received calls from clients regarding motor repairs. the other mechanic used his own mobile but mainly for personal use (e.g., calling his family in myanmar). the landline was used as a back-up device when the employee could not connect through the mobile or the reception is not clear. sometimes, calls were made ten times to coordinate and to check prices and stocks with suppliers. the fax machine was used more for insurance, road tax, invoice and to communicate with agents in indonesia on the remittance side business. the need for written evidence and copies for records were central motives of sticking to the fax. two computers were used regularly by the owner: one each for different insurance companies they made use of. the internet is used simultaneously for insurance companies, although not on a daily basis. media use across the networks: interestingly, this enterprise displays a somewhat different picture than the other two enterprises (see fig. 8). the family/friends ties are still dominating the volume of calls and messages. mobiles are also used a lot to communicate with business acquaintances. this is certainly to a large degree to call contacts in the insurance companies. still, sms does not follow the pattern of the calls but is used primarily as a tool to support customers. these are customers for the auto shop as well as for the related insurance business. the divergent pattern of texting was partly due to a need for informing customer about when to pick up their repaired motorcycles; partly to coordinate the money transferring for foreign workers in the local area. fig.7. daily media use among managers and employees in meng chong motor. percent. fig. 8. numbers of in -and outgoing calls and smss among managers and employees in meng chong. last 10 days. vi. discussion this paper proposes a social capital framework to better understand the value of mobile phones for small retailers in malaysia. given the premise of these theories, that social relations represent resources for enterprises, the way in which the mobile is used to strengthen or support these relations is of interest. in this section we will point at some of the core findings from studying mobile use among the three retailers and then discuss some implications for the future role of mobiles to develop social capital among malaysian retailers. a. core insights mobile as a tool to support stronger ties: the mobile phone was in all our retail cases the most central tool for communication during the work day. even though all 0 10 20 30 40 50 60 70 80 90 100 meng chong mean 4 retailers 0 50 100 150 200 250 300 350 400 450 mobile voice sms 34 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 actually had access to a pc with internet at home or in the back room, this was mainly a tool they used for searching on the web and internet banking. however, we found that the mobile was mainly used to support family relations and relations to friends. this is interesting given that we asked for calls made on the phone used for work. accordingly, the mobile seemed primarily, to be a tool to support the stronger personal ties also during work hours. this finding is in line with results emerging from similar studies conducted amongst smes in africa. a survey of micro entrepreneurs in rwanda, using mobile phone call logs, found that as much as two thirds of all calls were to family and friends. in a follow-up survey, it was found that calls to friends involved as much as 45 % of all calls, and family members 26 % [67, 68]. in our cases the findings can be explained by at least three central factors: firstly by the fact that the small retailers operated in very close circles, relying heavily on family and friends. the networks of business partners were in most cases long-time relations that were based on trust rather than on formal contracts. in some cases the colleagues at the work actually was also part of their families (spouses , brothers, in-laws). this is in line with what has been written earlier on chinese family businesses as having: “strong family ties, sharing and pooling of resources, efficient use of manpower practice of thrift, use of low gearing and flexibility of operations” [1]. from our cases it appears that the retailers in malaysia are developing their smes much in accordance with these traditions. in particular for the smallest enterprises (like kang zheng) family and work are very hard to separate, and sisters, brothers and parents were as much colleagues as family members. secondly, in all our cases a significant number of employees were immigrants from neighboring countries. these workers use the mobile extensively to keep in contact with their families at home. some employees said that their main motivations for purchase of mobiles were to make long distant phone calls to their home countries. this clearly contributed to the large number of family and friends calls. a third reason was that many employees in these small retail enterprises pay for their own mobile phone use and use the same phone for private as for business. the employer sometimes would compensate for the employer if he or she used the phone extensively for work. the result was, however, that the mobile phone had become a hub for all kinds of social relations, not only work relations. the fact that the bulk of mobile communication was to support ties with family and friends should not however be seen as an indicator of mobiles becoming a medium for nonwork relations. on the contrary, mobiles are also used heavily to support communication with colleagues and partners during work time. for the somewhat larger smes (like lhl) the internal communication was actually as high as the number of calls/sms to family and friends. thus the internal day-to-day coordination was also an important use of the mobile. among the retailers both mobile voice and sms were used for private as well as business needs. in general, the large bulk of family calls and messages in these small firms, seem to strengthen the argument that there is a strong overlap between different type of relation (market, trust, hierarchies) among small firms [39, 40]. mobile as tool to support internal work coordination: in the larger retailer and wholesaler companies the mobile was widely used as a tool to coordinate work during the day. the mobile was used to inform each other continuously about detailed work instructions, check out prices and give instructions to the two drivers. sms was here seen as particularly convenient for communicating detailed information that needed to be “documented” or remembered. as such, it had become something of a channel for coordinating internal tasks, perhaps similar to how email functions in larger and more bureaucratic organizations [69]. sms as an emerging customer-relation tool: in most cases mobile voice and sms followed similar relational patterns: i.e. sms and mobile voice were used in concert to support the same relations to family and friends, as well as to coordinate work. this is illustrated well in the lhl and the kang zheng cases. however, in the meng chong case we found a strikingly divergent pattern of use for sms and mobile talk across the enterprise: sms had developed into a customer-relation tool 9 . here sms was used to inform customers about their repaired cars and other customers about the insurance business. the side business, to support indonesians to transfer money also generated a lot of sms interaction: ”at 4.15 pm i will receive a sms to notify me that the money has been already credited to the particular account at indonesia and i will sms to inform the indonesian here who sends the money.” (meng chong manager) thus, sms had here developed into a central tool for handling the business interaction. we should note firstly, that this is possible, since the penetration among malaysians now is so high that all their customers could be expected to have access to a mobile phone. secondly, we should also note that this is probably particularly relevant for more service oriented retailers and retailers with regular customers. mobile as coordinator in-between face–to-face meetings: as mentioned, the smes relied heavily on a durable and local network of relations to wholesalers, suppliers and even customers. the suppliers of goods to the enterprises and their important partners were all based on face-to-face contact. the mobile was used to support these relations, although not as a replacement for face-to-face meetings, but 9 in the study there was a very high reciprocity between ingoing and outgoing calls. tom e. julsrud 35 december 2011 international journal on advances in ict for emerging regions 04 to coordinate between these meeting such as taking care of the more instrumental communication in-between the meetings. take for instance kangh zeng, the small herb dealer. the manager described that he mostly uses the mobile to coordinate prices and deliverables, and to clear out problems: “most of the time, it is ordering, sometimes when the prices are not stated in the invoice we also call them or when there is a problem with the goods we have sent.” still the relations to the partners are mainly based on regular meetings in the shop. on the relations to the chinese doctors the manager emphasizes this: “basically they are my friends; they will come here 1 to 2 times a month to share their experience in terms of the pricing, the medicine, the quality of the medicine, how to mix the medicine and herb because different herbs have different use”. (kang zengh manager) the supplier of other goods to the shop also comes to visit the shop regularly; the mobile is used to regulate or prepare in-between these meetings: ”they will come service us in two weeks, one week, or every day. every week they will come to serve. if we lack something, then we will call them” (kang zheng, wife) this herb dealer is clearly strongly attached to their business partners and suppliers, as was the case for the other retailers. for these relations there seems to be a strong need for physical meetings. the coordination in-between is however mainly dealt with by the use of mobile calls and messages. thus, the use of mobile phone is closely related to physical meetings and business travelling. rather than being a substitute for travelling, however, the mobile phone was used to coordinate and keep the “positive spirit” alive in between the regular meetings. this finding resembles similarities to what has been reported in studies of the way young professionals and young adults use the mobile phone to coordinate in between physical meetings [70-72]. mobile as a relational data bank: finally we found that in all retailers the mobile was used to keep track of relations to multiple customers. an important difference between a fixed line phone and a mobile is the possibility to save multiple numbers in the sim card or the phone memory. for the managers in our retail business this is a crucial function. the manager in lhl, for instance, told us that he had all his core customers in his mobile phone: ”i rely on my mobile phone, because all the index is inside and i don‟t need to dial all the number, that‟s why i cannot lose my phone, if i lose the phone then my memory is all gone…” (manager lhl) the manager in meng chong had two phones. this helped him to sustain a certain control of family and private oriented relations. the young couple running kang zheng, however, had another arrangement; the wife used her phone mainly for private calls and the husband took care of the professional connections. by sharing the responsibilities and using two phones the couple had a shared responsibility to handle different parts of their networks. b. the impact of mobiles on retailers ´relations and networks looking back at the resource based model of social capital presented earlier, a distinction was made between accessibility and mobilization of relational resources. from our data is seems clear that the use of mobiles affected directly the retailers‟ accessibility to social relations and their resources in different ways. on the one hand, the portability of mobile phones made managers and employees more accessible for the customer, as well as for each other and their network of collaborating partners during the work day. the managers and employees in our cases had to a large extent abandoned the fixed line phone to increase their own accessibility by using a mobile phone (the fixed line phone was still used, but mostly for incoming calls from customers). the fast access to contacts through the mobile phone book was one interesting element here that seemed to have increased the accessibility of network resources. it made the customer register portable and easy to update. another was the use of sms to access customers (in particular in the last case). the messaging channel appeared as a new and more efficient way to relate to customers within small auto retailers. on the other hand, the smes had also increased their own accessibility to customers and potential partners by always being accessible. for lhl, this was seen as an important benefit: “definitely, telecommunications is important to us in many ways such as, connecting with customers, connecting with all the importers, suppliers, and getting wider coverage” (manager of lhl). the incoming customer calls to the mobiles were mainly limited to the use of mobile voice, and sms messages from customers were rare. however, partners and business contacts sometimes sent sms to get detailed information or to follow up mobile dialogues. in our study it is hard to separate the mobilization of resources from investment or return, since that would require more detailed information about the content of the communications. these processes are after all intertwined in everyday interactions at work. what our cases suggest, however, is that many calls and messages were directed to access resources from the closer family/friend circles, rather than more distant acquaintances or business partners. the mobile communication seemed to facilitate mobilization of resources in the networks of strong ties, rather than the weaker connections. frequent calls to friends and family can be seen as an indicator that mobile and sms were used as a 36 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 channel to give and receive social support. for many of the foreign workers the mobile was clearly important as a way to keep in contact with close family members in other countries, such as cambodia and indonesia. the importance of trust for these small retailers was hard to overlook. the building up of trust relations, and the mobilization of them, seemed to be embedded in a long chain of regular meetings and interactions face-to-face. the small retailers seemed to be intertwined in what often is described as embedded relationships 10 [30, 32]. the role of the mobile in this setting seemed to be for regular inbetween calls to partners and suppliers, to sort out potential problems and misunderstandings to ensure that trust would not be jeopardized. thus it seemed that the mobile was important not only to mobilize but to sustain trusting ties. the daily use of mobile phones seem to help the retailers to gain access to, and to mobilize, social capital resources. as such it can be seen as an artifact used to sustain and mobilize social capital resources. according to the draft model presented earlier in this paper, we should expect that the mobile phone to give the retailers particular instrumental and/or expressive returns. the emphasis from our informant was, as seen above, mainly related to instrumental gains related to savings of time, getting access to new customer bases, and improvement of information flow within the company. given the interconnectedness of business and family relations, there are reasons to believe that the mobile phone helped to produce both these types of benefits, although the boundaries between them are blurred. a further specification of benefits – and liabilities for small enterprises are beyond the scope of this paper, although it has touched upon in other works, focusing on mobile use in developing countries [73-75]. vii. conclusions former research has to a certain degree been able to support the argument that mobile phones alongside other types of ict can increase the efficiency of small enterprises. on a general level, managers of small enterprises tend to report that they believe that the mobile makes their businesses more profitable [76, 77]. this was a point of view that was shared among the managers in our cases: most managers believed that mobiles were positive for their business and made them more efficient. this „positive impact‟ has, however, been further explored in this paper, by investigating how mobiles are used to support social relations by employees and managers in three small retail enterprises. we found that the mobile was used mainly to develop and strengthen existing strong ties to friends and 10 the main components that embedded relationships have, are to regulate exchange partners‟ behaviours and expectations: facilitation of trust, fine-grained information exchange and joint problem-solving. uzzi defines embeddedness as: “the process by which social relations shape economic action” (uzzi 1996: 674). family as well as local vendors and suppliers. in particular, it was used to coordinate and make formal agreements inbetween the regular physical meetings. thus, the hope expressed by undp earlier in this paper, that ict could “connect vast networks of sme´s across geographical distances”, might still be out of range when it comes to mobile use among small retailers in malaysia. yet, the emergent use of sms to support and cultivate customer relations indicates how mobile applications in the future may go beyond these local networks to handle a wider set of relations through messaging systems. this paper has drawn on a resource dependency based framework for analyzing social capital, focusing on mobile phone as a tool that may be used both to increase their access to new relational resources and mobilize pre-existing connections. the cases analyzed in this paper indicate that so far, the mobile phone was mainly a vehicle to mobilize pre-existing connections, mainly within the spheres of the close families and friends. acknowledgment the author wishes to thank collaborators at telenor research and innovation center asia pacific (tricap) and the retailers who participated in this study. references [1] g. t. t. sin, "the management of chinese small-business enterprises in malaysia," asia pacific journal of management, vol. 4, pp. 178-186, 1987. [2] c. castelli, "smes in malaysia: leading in mobility," 2007. [3] a. s. saleh and n. o. ndubusi, "an evaluation of sme development in malaysia," international review of business research papers, vol. 2, pp. 1-14, 2006. [4] k. j. kumar and a. o. thomas, "telecommunications and development. the cellular mobile "revolution" in china and india," journal of creative communication, pp. 297-309, 2006. [5] g. l. harris, "united nations development information programme. ict 4d sme & entrepreneurship in asia-pacific," undp, 2003. [6] o. k. ting, "smes in malaysia. pivotal points for change," 2004. [7] j. goodman, "linking mobile phone ownership and use to social capital in rural south africa and tanzania," 3, 2005. [8] r. ling, the mobile connection. the cell phone´s impact on society. san fransisco: elsevier, morgan kaufmann, 2004. [9] j. katz and m. aakhus, "conclusions:making meaning of mobiles a theory of apparatgeist," in perpetual contact. mobile communication, private talk, public performance, j. katz and m. aakhus, eds., cambridge: cambridge university press, 2002, pp. 301-320. [10] k. j. arrow, "observations on social capital ," in social capital: a multifaceted perspective p. dasgupta and i. serageldin, eds., washington, d.c.: the world bank., 2000, pp. 3-5. [11] europeancommission. (2009). enterprise and industry -sme definition.http://ec.europa.eu/enterprise/policies/sme/facts-figuresanalysis/sme-definition/index_en.htm [12] smidec, "sme performance report 2005," smidec, kuala lumpur, malaysia2006. [13] p. vandenberg, "is asia adopting flexicurity? a survey of employment policies in six countries," international labour office, geneva 4, 2008. [14] h. aldrich and e. auster, "ewen dwarf started small: liabilities of size and age and their strategic implications," in research in organizational behavior. vol. 8, m. staw and l. l. cummings, eds., greenwich: jai press, 1986, pp. 165-198. http://ec.europa.eu/enterprise/policies/sme/facts-figures-analysis/sme-definition/index_en.htm http://ec.europa.eu/enterprise/policies/sme/facts-figures-analysis/sme-definition/index_en.htm tom e. julsrud 37 december 2011 international journal on advances in ict for emerging regions 04 [15] r. ellis, "underdstanding small business networking and ict: exploring face-to-face and ict-related opportunity creation mediated by social capital in east of england micro-businesses," 2010. [16] a. millington, et al., "guanxi and supplier search mechanisms in china," huma relations, vol. 59, pp. 505-531, 2006. [17] w.-p. wu and a. leung, "does a micro-macro link exist between managerial value reciprocity, social capital and firm performance? the case of smes in china," asia pacific journal of management, vol. 22, pp. 445-463, 2005. [18] n. tichy and c. fombrun, "network analysis in organizational settings," human relations, vol. 32, pp. 923-965, 1979. [19] m. kilduff and w. tsai, social networks and organizations. london: sage, 2003. [20] p. r. monge and n. s. contractor, theories of communication networks. new york: oxford university press, 2003. [21] a. greve and j. salaff, "the development of corporate social capital in complex innovation processes," research in the sociology of organizations: social capital in organizatons, vol. 18, pp. 107-134, 2001. [22] d. knoke, "organizational networks and corporate social capital," in corporate social capital and liability, r. t. leenders and s. m. gabay, eds., dordrecht: kluwer academic publ, 1999, pp. 17-42. [23] w. powell, "trust-based forms of governance," in trust in organizations.frontiers of theory and research, r. kramer and t. tyler, eds., thousand oaks: sage, 1996, pp. 51-67. [24] j. nahapiet and s. ghoshal, "social capital, intelectual capital and the organizational advantage," academy of management journal, vol. 23, 1998. [25] j. field, social capital. london, 2003. [26] a. portes, "social capital. its origins and applications in modern sociology," annual reviews in sociology, pp. 1-24, 1998. [27] d. knoke, changing organizations. business networks in the new political economy: westview press, 2001. [28] n. lin, social capital. a theory of social structure and action. new york: cambridge university press, 2001. [29] n. lin, "building a network theory of social capital," connections, vol. 22, pp. 28-51, 1999. [30] m. s. granovetter, "economic, action, social structure, and embeddedness," american journal of sociology, pp. 481-510, 1985. [31] d. m. rousseau, et al., "not so different after all: a crossdisipline view of trust," academy of management journal, vol. 3, pp. 393-404, 1998. [32] b. uzzi, "social structure and competition in interfirm networks: the paradox of embeddedness.," administrative science quarterly, pp. 35-67, 1997. [33] a. giddens, "risk, trust, reflexivity," in reflexive modernization, u. beck, et al., eds., cambridge uk: polity press, 1994, pp. 184-197. [34] r. bachmann and a. zaheer, eds., handbook of trust research. cheltenham, uk: edward elgar, 2006. [35] r. m. kramer and k. s. cook, "trust and distrust in organizations: dilemmas and approaches," in trust and distrust in organizations. dilemmas and approaches, r. m. kramer and k. s. cook, eds., new york: russel sage foundation, 2004, pp. 1-17. [36] s. m. gabbay and r. t. a. j. leenders, "social capital of organizations: from social structure to the management of corporate social capital," in social capital of organizations. vol. 18, s. m. gabbay and r. t. a. j. leenders, eds., oxford, uk: elsevier science, 2001, pp. 1-20. [37] f. fukuyama, "trust. the social virtues and the creation of prosperity," new york, vol. , p. , 1995. [38] r. putnam, bowling alone: the collapse and revival of american community. new york: simon schuster, 2000. [39] p. s. adler, "market, hierarchy, and trust: the knowledge economy and the future of capitalism," organization science, vol. 12, pp. 215234, 2001. [40] p. s. adler and c. heckscher, "towards collaborative community," in the firm as a collaborative community, c. heckscher and p. adler, eds., new york: oxford university press, 2006, pp. 11-105. [41] p. s. adler and s.-w. kwon, "social capital: prospects for a new concept," academy of management journal, vol. 27, pp. 17-40, 2002. [42] e. shaw, "small firm networking," international small business journal, vol. 24, pp. 5-29, 2006. [43] b. johannison, "network strategies: management technology for entrepreneurship and change.," international small business journal, vol. 5, pp. 49-63, 1986. [44] d. krackhardt and m. kilduff, "structure, culture and simmelian ties in entrepreneurial firms," social networks, vol. 3, pp. 279-290, 2002. [45] m. s. granovetter, "the strength of weak ties," american journal of sociology, pp. 1287-1303, 1973. [46] m. t. hansen, et al., "so many ties, so little time: a task contingency perspective on corporate social capital," research in the sociology of organizations, pp. 21-57, 2001. [47] d. krackhardt and r. n. stern, "informal networks and organizational crises: an experimental simulation," social psychology quarterly, vol. 51, pp. 123-140, 1988. [48] j. coleman, "social capital in the creation of human capital," american journal of sociology, pp. 95-120, 1988. [49] b. nardi, et al., "it's not what you know, it's who you know: work in the information age," first monday, vol. 5, p. http://firstmonday.org/issues/issue5_5/nardi/index.html 2000. [50] d. krackhardt, "the strength of strong ties: the importance of philos in organizations," in network and organizations; structure, form and action, n. nohria and r. eccles, eds., boston: harvard university press, 1992, pp. 216-239. [51] r. burt, brokerage and closure. an introduction to social capital. new york: oxford university press, 2005. [52] j. podolny and j. baron, "resources and relationships, social networks and mobility in the workplace," american sociological review, pp. 673-693, 1997. [53] c. licoppe and z. smoreda, "are social networks technologically embedded? how networks are changing today with changes in communication technology," social networks, pp. 317-335, 2004. [54] e. m. rogers and d. l. kincaid, communication networks. toward a new paradigm for research. new york: the free press, 1981. [55] r. mcphee and s. r. corman, "an activity-based theory of communication networks in organizations, applied to the case of a local church," communication monographs, vol. 62, pp. 132-151, 1995. [56] c. tilly, trust and rule. cambridge: cambridge university press, 2005. [57] l. joseph, "inter-city marketing network for women microentrepreneurs using cell phone," in i4d information for development vol. http://www.i4donline.net/feb05/intercity.asp, 2005. [58] r. ling and t. e. julsrud, "grounded genres in multimedia messaging," in the global and the local in mobil communication, k. nyiri, ed., vienna: passagen verlag, 2005, pp. 329-338. [59] t. e. julsrud and j. w. bakke, "trust, friendship and expertise: the use of email, mobile dialogues and sms to develop and sustain social relations in a distributed work group," in the mobile communications research annual: the reconstruction of space and time through mobile communication practices vol. 1, r. ling and s. campbell, eds., new brunswick, nj transaction, 2008. [60] c. haythornthwaite, "strong, weak, and latent ties and the impact of new media " the information society, vol. 18, pp. 385-401, 2002. [61] r. yin, case study research. design and methods, third ed. vol. 5. thousand oaks: sage, 2003. [62] m. b. miles and a. m. huberman, qualitative data analysis: an expanded sourcebook. thousand oaks: sage, 1994. [63] j.-a. carrasco, et al., "collecting social network data to study social activity-travel behaviour: an egocentric approach " in the 85th transportation research board meeting,january 22-26, 2006., washington dc, 2006. [64] b. wellman, "challenges in collecting personal network data: the nature of personal network analysis," field methods, vol. 19, pp. 111-115, 2007. http://firstmonday.org/issues/issue5_5/nardi/index.html http://www.i4donline.net/feb05/intercity.asp 38 tom e. julsrud international journal on advances in ict for emerging regions 04 december 2011 [65] r. h. bernhardt, et al., "informant accuracy in social network data v.," social science research, pp. 30-66, 1982. [66] t. e. julsrud, "collaboration patterns in distributed work groups: a cognitive network approach" telektronikk, pp. 60-71, 2008. [67] j. donner, "the use of mobile phones by microentrepreneurs in kigali and rwanda: changes to social and business networks," information technology and international development, vol. 3, pp. 3-19, 2006. [68] j. donner, "microenterpreneurs and mobiles: an exploration of the uses of mobile phones by small business owners in rwanda," information technology and international development, vol. 2, pp. 1-21, 2004 [69] j. r. tyler, et al., "email as spectroscopy: automated discovery of community structure within organizations," the information society, vol. 21, pp. 143-153, 2005 2005. [70] j. larsen, et al., mobilities, networks, geographies. hampshire: ashgate, 2006. [71] r. ling and b. yttri, "hyper-coordination via mobile phones in norway," in perpetual contact: mobile communication, private talk, public performance, j. katz and m. aakhus, eds., cambridge: cambridge university press, 2004, pp. 139-169. [72] r. rettie, "mobile phones as social capital," mobilities, vol. 3, pp. 291-311, 2010. [73] r. jensen, "the digital provide: information (technology) market performance and welfare in the south indian fisheries sector," quarterly journal of economics, vol. 122, pp. 879-924, 2007. [74] j. donner and m. escobari, "a reviev of the research on mobile use by micro and small enterprises (mses)," in 3rd annual conference on information communication technologies and development. april 17th-19th, 2009 quatar eduction city, doha, quatar. , 2009. [75] a. jagun, et al., "the impatc of mobile telephony on developing country micro-enterprise: a nigerian case study," information technologies and international development, vol. 4, pp. 47-65, 2008. [76] j. samuel, et al., "mobile communication in south africa, tanzania and egypt: results from community and business surveys," http://www.vodafone.com/start/media_relations/news/group_press_r eleases/2005/press_release09_03.html, 2005. [77] r. duncombe and r. heeks, "enterprise across the digital divide: information systems and rural microenterprise in botswane," journal of internalonal development, vol. 14, pp. 61-74, 2002. http://www.vodafone.com/start/media_relations/news/group_press_releases/2005/press_release09_03.html http://www.vodafone.com/start/media_relations/news/group_press_releases/2005/press_release09_03.html  > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 1 abstract— among the discussion-content analytical tools in the field of e-learning research, the community of inquiry (coi) model is extensively applied and continuously improved by its users. this model investigates the types of elements that are manifested through inquiry-based learning processes in online discussions. they are social, cognitive, teaching and metacognitive presences. these elements are essential for meaningful student interactions to take place in online learning environments. in particular, the metacognitive presence construct of the coi model discovers the students’ ability of self and co-regulation of learning in an online learning environment. however, the metacognitive presence construct of the coi model has not been evaluated along with the other components of the model. therefore, in this paper the coi model was re-evaluated to determine its reliability in analysing discussions in online courses on information technology related subjects. the evaluation is conducted with four online courses designed and developed for a distance learning programme in sri lanka. the paper discusses the modifications that were needed to make the model more applicable for conducting discussion-content analysis in similar types of online learning environments, and reports on the results of the final evaluation. furthermore, the findings of the study imply that the theoretical framework of the coi model needs to be improved to properly enclose the metacognitive presence component. in spite of this, the study adds points to the coi model supporting for its well applicability and reliability in analysing online discussion content in information technology related courses. index terms— inquiry-based learning, reliability, social presence, cognitive presence, teaching presence, metacognitive presence i. introduction t has become a common practice at higher educational departments conducting distance learning programmes to use e-learning to deliver instructional materials to students. other than the materials, an e-learning environment can provide students with platforms to engage in discussions. online discussions based on forums can be created to support both place and time-independent communications, providing more opportunities for students to converse with other students and teachers. if a forum is kept open for discussion throughout a course then it can become a record of how students and teachers interacted with each other. at the completion of the course, this forum can serve as a rich source of information for the course coordinators and designers to analyse and understand student interactions during discussions. the content can be analysed using different types of analytical instruments, in order to study factors such as student participation and interaction, as well as cognitive, metacognitive and social cues (e.g. [1] and [2]), critical thinking (e.g. [3], [4], and [5]) and group development [6]. analytical instruments are critically examined with respect to two parameters: validity and reliability. reliability is a factor determining the quality of an analytical model [7]. reliability of an analytical model is measured with statistical techniques such as percentage agreement, cohen‘s kappa or pearson‘s correlation [8], [9]. a reliable analytical model strengthens the validity of the results in the content analysis. therefore, results of a discussion content analysis should be preceded by an assessment of reliability of the analytical model. since most of the issues associated with validity and reliability can be mitigated with sound analytical models having ―discrete categories, and clear indicators‖ [10, p. 2], analysts should try to improve analytical models by appropriately modifying their lists and the definitions of categories and indicators. among the instruments used for online discussion content analysis, the model based on the community of inquiry (coi) framework is extensively used (e.g. [11] and [12]) in the field of online learning, and it has been continually refined and adapted by researchers [13]. the coi framework mainly focuses on the nature of educational transaction in an online learning environment, and it has emerged in the context of asynchronous text-based online discussions [14]. the framework reflects the critical thinking processes that could exist in an online discussion. the analytical tool based on this framework is named the community of inquiry (coi) model [15], [16]. it can be used to identify social, cognitive, teaching and metacognitive presence in online discussions. while social, cognitive and teaching presence are essential to foster meaningful interactions, cognitive presence is considered to be the most important element to identify critical thinking activities in a coi [4]. metacognitive presence elements are closely linked to critical thinking and higher learning in a discussion [16]. reviewing research on coi framework and its analytical model, garrison and arbaugh [15] report that there is a need of evaluating the model with all its components and further studying the relationships between the components of the coi theoretical framework. moreover, they encourage future research to evaluate the coi model and the framework in disciplines other than education. as an attempt to meet this requirement, shea et al. [17] have examined the coi model – excluding metacognitive presence component, with minor modifications. however, their findings also highlight the necessity of improving the analytical model further, in particular, to enhance the reliability of the social presence construct. in spite of that, a study of akyol and garrison [16] has introduced a new component named ‗metacognitive presence‘, to the coi model which has not been evaluated with the rest of the other components of the coi model. metacognitive skill is considered essential for students learning in distance/ online learning environments [18]. studying metacognitive presence in online discussions can reveal students‘ latent knowledge and regulatory skills further enabling researchers to discern student learning processes in online learning environments [16]. this understanding can help to design self-directed online learning activities for distance learning programmes. therefore, we decided to re-evaluate the coi model with its metacognitive presence construct. this paper reports on our research to re-evaluate and attempts to improve the coi model with its four components: social, cognitive, teaching and metacognitive presence to make it more reliable for analysing forum discussions in online courses. re-evaluation of community of inquiry model with its metacognitive presence construct t. a. weerasinghe, k. p. hewagamage, and r. ramberg i > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 2 a. purpose the coi model has become a useful tool for investigating students‘ interactions in inquiry based online discussions [19], [17]. however, it needs further improvements to make it a more reliable tool to investigate students‘ learning in online discussions [20]. the model and the framework have further not been examined with its new component, the metacognitive presence construct. the new construct has a potential for disclosing latent knowledge and regulatory skills which cannot be identified with the cognitive presence construct alone. therefore, we wish to determine whether we can reliably use the metacognitive construct along with the rest of the components of the coi model. for this purpose, we evaluate the coi model by analysing a set of discussions selected from four online courses. b. theoretical framework and model from its inception, the community of inquiry (coi) framework has had three components enclosing three types of elements essential for meaningful interactions to take place in a community [4]. the three types of elements and the components are named social, cognitive and teaching presences. the teaching presence component represents the role of the teacher, which is often carried out with the collaboration of a number of individuals who are not teachers [21]. in [22], shea and bidjerano recommended that this framework should consider the online learner‘s self-regulatory learning behaviour and self-efficacy. in line with this reasoning, they suggested adding to the framework another element: learner presence. this stood independently from the rest of the components of the framework. akyol and garrison, however, did not agree with this modification reporting that ―creation of a learning presence construct would implicitly assign teaching presence to only that of the teacher‖ and it ―is incongruent with the premise of a community of inquiry‖ [16, p. 189]. instead, they introduced the metacognitive presence component, which was placed at the intersection of the teaching and cognitive presence of the coi framework (see fig1). fig1. community of inquiry (coi) framework (adapted from [16]) in accordance with this framework, the coi model has four coding schemes: social, cognitive, teaching and metacognitive presence. social presence is defined as ―the ability of learners to project themselves socially and emotionally in a coi‖ [4, p. 94]. the social presence coding scheme has three categories: affective, open communication and group cohesion. garrison, anderson and archer define these categories respectively ―in terms of the participants identifying with the community, communicating purposefully in a trusting environment and developing interpersonal relationships‖ [14, p. 7]. cognitive presence is the main component of the coi framework. it describes ―the extent to which learners are able to construct and confirm meaning through sustained discourse in a critical community of inquiry‖ [23, p. 1]). the cognitive presence coding scheme has four categories that represent the phases of an inquiry process in a collaborative learning environment. the phases are triggering event, exploration, integration and resolution. triggering event is the initiation phase of a critical inquiry where an issue, dilemma or problem is identified or recognised. the next phase is ‗exploration‘ where learners tend to grasp the nature of the problem and move to explore relevant information. in the integration phase learners construct meaning from the ideas generated in the exploratory phase. the last phase of the critical inquiry model is ‗resolution‘. it indicates a resolution of dilemma or problem that caused the triggering event. teaching presence is defined as ―the design, facilitation and direction of cognitive and social processes for the purpose of realizing personally meaningful and educationally worthwhile learning outcomes‖ [21, p. 5]. teaching presence represents the role of teaching, which is carried out by the collaborative involvement of participants in a community [21]. this component of the analytical tool has three categories: design and organisation, facilitating discourse and direct instruction. the design and organisation category considers the role of a teacher during the designing and planning process of online learning activities, while the other two categories – facilitating discourse and direct instruction – investigate signs of teaching presence during students‘ engagement in learning activities. metacognitive presence is the new component that akyol and garrison [16] introduced to the coi model. referring to contemporary research into metacognition and learning (e.g. [24] and [25]), they defined metacognition in an online learning community as ―the set of higher knowledge and skills to monitor and regulate manifest cognitive processes of self and others‖ [16, p. 184]. moreover, motivational states for learning are considered in describing metacognitive presence in online discussions. the component has three categories: knowledge of cognition (kc), monitoring of cognition (mc) and regulation of cognition (rc). according to [16], kc ―refers to awareness of self as a learner in a broad sense‖, where ―knowledge includes entering knowledge and motivation associated with the inquiry process, academic discipline, and expectancies‖ (p. 184). kc characterises pre-task metacognitive states; in other terms, more general aspects of metacognition observed at anytime. in comparison, mc and rc represent activity-based social presenc e cognitiv e presenc e teaching presence setting climate support ing discour se monitorin g & regulatio n educationa l experience metacog nitive presence > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 3 metacognitive states, which can be observed during the learning process. ii. context the examination of the coi framework and model was carried out by analysing eight discussion threads in the online learning environment of the bit (bachelor of information technology) degree programme (www.bit.lk) at the university of colombo school of computing (ucsc), sri lanka. the students of the degree programme did not receive any lectures or feedback from the university teachers on site or in a physical lecture room. instead, the university distributed course materials using a virtual learning environment. the online courses were designed and developed by the teams of content developers and instructional designers at the elearning centre of the ucsc. there were discussion forums for each course, which provided an opportunity for the students to discuss their concerns. a facilitator was there to assist students in the forums to find answers to their problems. the courses were offered in english and the recommended language to be used in the online discussions was english. even though english language competency was considered as an entry requirement for the bit programme, this was not the first language of the majority of the students. therefore, their understandings of others‘ responses were different from native english speakers and the expressions were biased towards their mother tongue. iii. method the model was examined by evaluating it twice. the first evaluation was conducted using the coi model improved by shea et al. [17]. this model is composed of three separate coding schemes for investigating indications of social, cognitive and teaching presence. along with these, we used a coding scheme for identifying signs of metacognitive presences. it was developed based on the classification of metacognitive presence elements in online discussions by akyol and garrison [16]. the first evaluation was carried out by analysing four discussion threads (sample 1) randomly selected from four online courses in the bit programme. the coding schemes were improved based on the comments of the coders and the issues that arose at the first evaluation. the improved coding schemes were re-evaluated by analysing eight online discussions that consisted of two online discussions from each of the four courses (see table 1). these eight included the four discussion threads (sample 1) that we previously analysed in the first evaluation and four other randomly selected discussion threads (sample 2). a. discussion threads there were altogether 99 student messages (s-posts) and 17 facilitator messages (f-posts) in the selection of eight discussions. this represented more than 10% of the total number of messages in the discussion threads having at least five messages and dealing with student inquiries on the online courses. as reported in table 1, the courses had different types of subject content. table i number of messages (posts) in the samples course course description sample 1 sample 2 s -p o st s f -p o st s s -p o st s f -p o st s c1 a theoretical subject with many concepts, definitions and descriptions. 16 0 9 1 c2 composed of theoretical and practical subject content. included subject content related to mathematics and digital logic. 14 2 10 1 c3 covered more practical subject contents than theoretical contents. the students were expected to use open office applications. 14 3 20 6 c4 composed of theoretical and practical subject content. the course is about the internet and world wide web. 7 2 9 2 b. coders and coding process we realised that our students had attempted to form their messages in english while thinking in sinhala. as a result, they had used phrases that could not be understood by a coder who did not know sinhala well. therefore, we selected the coders from sri lanka. both of them were university teachers of information technology in the country. one was in the research team and the other was an external researcher who participated only in the evaluation. the coders were instructed to follow the same coding procedure that shea et al. [17] applied in a study to re-examine the coi model. additionally, during the discussion for negotiation, when there were disagreements between the two coders, the due reasons for the disagreement were enquired and noted down. also, the coders‘ comments and suggestions to improve the coding schemes were gathered. c. unit of analysis garrison, anderson and archer [22] stated that messagelevel unit was the most appropriate unit of analysis for identifying cognitive presence in their discussions. as messages are demarcated clearly in a discussion, it is easy to consider messages as the unit of analysis if students have posted only one idea in one message. however, in our case, students formed messages including more than one idea in one message. also, they had not structured the messages into paragraphs, having one idea in one paragraph. therefore, we decided to use the chunk of a message as the unit of the analysis. a chunk could be a complete message or a meaningful segment of a message, with a cue of a presence that is described in the coi model. d. inter-rater reliability measurements the coding decisions of the two coders were evaluated for inter-rater reliability using cohen‘s kappa (k) and holsti‘s coefficient of reliability (cr). the reason for applying two reliability measurements: k and cr, was to eliminate the weaknesses associated with individual reliability > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 4 measurements and, thereby, to increase the validity of the results. cohen kappa values were calculated using the equation on [26, p. 155]. holsti‘s coefficient of reliability was calculated referring to [27, p. 140]. in reference to prevailing research, rourke, anderson, garrison and archer [28] report that inter-rater reliability should be more than 0.8000 for rc and more than 0.7500 for k in order to consider it as a very good agreement. the results of the present evaluation are discussed with respect to this satisfactory level of agreement. iv. results the evaluation of the existing coi model with its three components: social, cognitive and teaching presence and the coding scheme prepared based on the analytical model of metacognitive presence did not result in initial irr values at satisfactory levels. especially the cognitive and metacognitive presence coding schemes were relatively less reliable than the two other coding schemes. the initial irr values of the social presence coding scheme ranged from rc=0.53330.8354 and k=0.31820.6393 while that of the teaching presence coding scheme ranged from rc= 0.2857-0.8000 and k= -0.16670.5714. also, a considerable number of negotiated irr values were below rc=0.8000 and k= 0.7500. for instance, the negotiated irr of the cognitive presence coding scheme ranged from rc= 0.6522-0.7957 and k= 0.5014-0.6513. this signified that there was a poor agreement (according to [29]) between the two coders. furthermore, the comments of the coders revealed that they had encountered difficulties in following the coding schemes. a. difficulties and suggested modifications in order to make the coding schemes valid and highly reliable the schemes should be well comprehensible and easy to use. this can be achieved by adding more meaning and clarity to the descriptions of categories and indicators [10]. doing this should assure that components of the model (presences) are properly described by the categories and indicators are with sufficient details to ―reflect the essence of the categories‖. [20, p. 68]. with this insight and considering the difficulties faced by the coders and their suggestions to improve the coding schemes, necessary modifications were introduced to the analytical model. the modifications made to the components of the model and their rationales are discussed below. social presence coding scheme during the first evaluation, the coders experienced a difficulty in distinguishing conventional expressions from unconventional ones. also, there were indications of students expressing emotions and tone of voice by using big letters, capital letters or colour text. therefore, the two indicators, conventional and unconventional expressions were combined and the definition of the combined-indicator was modified accordingly, including other signs of affective expressions. the coders had long discussions that ended up with disagreements. for instance, at one occasion the coders did not come to an agreement due to ambiguity of the two indicators: ‗expressing emotions‘ in the affective category and ‗expressing appreciation‘ in the open communication category in the social presence coding scheme. the coders commented that it was not easy to determine whether an expression was not emotional when expressing an appreciation. for instance, they found chunks such as ‗wow!‘ and ‗best!‘, which could be interpreted both as appreciations and as emotions. therefore, we decided to modify the indicator under open communication to ‗encouraging or complementing‘ the coders revealed that they were not sure of what ‗expressing values‘ exactly meant. shea et al. [17] had also noted that the highly subjective nature of ‗expressing values‘ caused problems to the reliability of the social presence coding scheme. therefore, we decided to clarify what ‗values‘ specifically means. further, the comments of the coders suggested that ‗expressing values‘ could also be considered in the definition of the ‗self-disclosure‘ indicator. therefore, we included ‗expressing values‘ in the ‗self-disclosure‘ indicator and modified the definition of ‗self-disclosure‘ to consider personal values such as beliefs, vision, attitudes and interests. there were a considerable number of clues that urged us to create two other indicators for the open communication category: ‗accepting mistakes‘ for the recognition of each other‘s contribution and ‗expressing willingness to support‘. moreover, we modified the indicator ‗asking questions‘ to ‗requesting support‘, because we determined that it would be more suitable to identify students‘ requests for clarifications and information, which might not be in the form of a question. the examples provided for each category indicator of the social presence coding scheme were not helpful to our team. therefore, we created examples relevant to our context and added them to clarify each category indicator. cognitive presence coding scheme during the evaluation, the coders found it very difficult to distinguish the chunks to be matched with the ‗exploration‘ and ‗integration‘ categories in the tool. they commented on the ambiguity of the information in the coding scheme and suggested that the socio-cognitive processes and the examples provided in the coding scheme should be improved. also, the coders noted that the instructions given in the coding scheme were not clear; the instructions implied that some chunks could be matched both with the ‗integration‘ and ‗exploration‘ categories. these comments and issues that the coders encountered were considered in improving the coding scheme and the modifications were done by referring to the explanations available in [4], [30] and [14]. the coders encountered messages containing expressions of satisfaction after solving the problem which caused the ‗triggering event‘. these messages could be interpreted as clues of resolutions. therefore, the coders suggested us to add another indicator – ‗judging or evaluating and expressing satisfaction‘ to the ‗resolution or application‘ category. additionally, the text in the examples column was replaced with a few sample codes that were more familiar to our context > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 5 (see appendix). five questions or problems were added to exemplify plausible causes of triggering events. probable replies to these questions were added as examples for exploration, integration and resolution. our intention was to make the analytical tool more easily understood by the coders. also, we removed the extra instructions provided in the descriptor column to let the coders look freely for relevant clues and segment messages and to make coding decisions. teaching presence coding scheme the teaching presence component of the model was evaluated by analysing both facilitator‘s and students‘ messages in four discussion threads. this signified the importance in changing the category indicators to make the tool more applicable to identify signs of teaching presence that emerged with the students‘ teaching activities during the discussions. the first category of the teaching presence construct is ‗designing and organisation‘ (do). shea et al. [17] had six indicators for this category. the first two indicators were ‗setting curriculum and communicating assessment methods to be used in the course‘ and ‗designing methods‘. the last was ‗macro-level comments about the course‘, which was defined as ‗provides rationale for assignment/topic‘. these indicators were found irrelevant to our context, because in our courses, the facilitator or the students were not supposed to set the course curriculum and assignments, design learning activities or provide macro-level instructions to do the course activities. these kinds of course planning and designing activities had been carried out by the instructional designers before starting the courses. however, the facilitator could use the discussion forums to post announcements relevant to the subject under discussion. therefore, we added a new indicator, ‗informing notices‘, to the do category. based on the comments of the coders, the two indicators in the ‗facilitating discourse‘ category were slightly modified to make them more suitable to the context. the modified indicators were ‗encouraging, acknowledging or reinforcing student contribution‘ and ‗drawing in participants and prompting discussions‘ (fd-d). the discussions in the environment did not have any time limitations or restrictions. therefore, the second indicator, fd-d, could also be considered as encouraging student participation. for this reason, in order to make the two indicators clearly comprehensible and applicable to the environment, we slightly modified the indicators to ‗acknowledging or reinforcing student contribution‘ and ‗encouraging or motivating students‘. the list of indicators in the direct instruction (di) category was modified by adding two new indicators: ‗providing specific instructions‘ and ‗encouraging doing activities‘. these were determined during the first evaluation where the coders noticed that the students had provided task-specific instructions to peers and encouraged them to try out challenging activities. moreover, based on the finding of the evaluation, two other indicators – ‗offering useful illustrations‘ and ‗making explicit references‘ – were modified by including ‗or examples‘ and ‗providing extra learning resources‘ respectively. the existing model had another indicator, ‗supplying clarifying information‘, which was quite similar to the two other indicators that were modified. however, the definition of the indicator described the teaching role of providing additional explanations. therefore, in order to reduce the ambiguity of the indicator and make it more meaningful the existing indicator was replaced with ‗providing additional explanations‘. moreover, the comments of the coders urge the importance of replacing the existing indicators of the assessment category with three new indicators. they are ‗judging, evaluating or checking the relevancy of message content‘, ‗assessing another student‘s knowledge‘ and ‗assessing an activity reported or presented in a message‘. metacognitive presence coding scheme the coders suggested that we should add examples to clarify the meaning of the indicators in the coding scheme. they reported that it was difficult to understand the meaning of the term ‗knowledge of cognition‘ and to differentiate the meaning of ‗monitoring cognition‘ and ‗regulation of cognition‘. therefore, in order to make the metacognitive presence coding scheme easily comprehended a set of examples were added to each category. appropriate examples were selected from the sample coding in [16] and the rest were created. furthermore, we removed some descriptive terms and added two indicators to the coding scheme. according to [31], knowledge of cognition (kc) incorporates general learning strategies and tasks that describe when, why or how one can perform an activity. in line with this reasoning, akyol and garrison reported that ―kc includes knowledge about cognition, cognitive strategies and tasks‖ [16, p. 184]. however, in order to describe category definitions of ‗monitoring cognition‘ and ‗regulation of cognition‘ (rc), they used the terms ‗task knowledge‘ and ‗strategies‘ respectively and that was incongruent with the description provided by pintrich. therefore, we removed the terms which created confusion and added one more indicator, ‗knowledge about general learning strategies and tasks‘, to the kc category. also, we modified the list of indicators in the rc category by adding ‗suggesting taking an action‘ under the list of ‗applying strategies‘. this addition was made after noticing that our students had posted messages including statements such as ―let‘s discuss about open office software‖ and ―it‘s better to give up‖. these statements could be interpreted as suggestions to take actions during the application of a strategy and thus considered as clues of rc. b. results of the 2nd evaluation the coding schemes with modifications were re-evaluated for their reliability. this second evaluation (eval 2) was conducted two months after the first evaluation (eval 1). the cognitive presence component of the coi framework was explained to the coders referring to [4]. the same set of discussion threads (sample 1) was given to the coders and the > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 6 same procedure was applied during the analysis. finally, a new set of discussions (sample 2) was analysed to determine whether we can use the adapted analytical model reliably to analyse online discussions in our courses. the coders analysed 99 students‘ messages and 17 facilitator‘s messages. they identified different numbers of chunks and altogether there were 599 coding decisions that had to be checked or discussed for negotiation. there was a significant increase even in the initial irr values of all the coding schemes (see graph 1 and 2). the negotiated irr values of the schemes ranged from 0.9000 to 1.0000. graph 1: irr in rc measurement graph 2: irr in k measurement the reliability values resulted at the evaluation conducted with a new set of discussion threads (sample 2) are shown in table 2. table 2 irr values of the evaluation with sample 2 coding scheme course initial irr negotiated irr rc k rc k social presence c1 1.0000 1.0000 1.0000 1.0000 c2 0.8814 0.6714 1.0000 1.0000 c3 0.8837 0.7283 1.0000 1.0000 c4 0.8889 0.7101 0.9744 0.9242 teaching presence c1 0.9091 0.7500 1.0000 1.0000 c2 0.8750 0.8706 1.0000 1.0000 c3 0.9600 0.8870 1.0000 1.0000 c4 0.8750 0.6842 1.0000 1.0000 cognitive presence c1 0.8235 0.6471 1.0000 1.0000 c2 0.5000 0.3402 1.0000 1.0000 c3 0.8679 0.7707 0.9600 0.9356 c4 0.7742 0.6583 1.0000 1.0000 metacognitiv e presence c1 0.7826 0.5702 0.9600 0.8818 c2 0.8182 0.5560 0.9697 0.9040 c3 0.7179 0.4413 1.0000 1.0000 c4 0.9143 0.8182 1.0000 1.0000 most of the irr values of the evaluation conducted with sample 2, reached levels of agreement which were more than rc=0.8000 = and k=0.5000. the negotiated irr values ranged from rc=0.8600-1.0000 and k=0.8818-1.0000 (see table 2). these can be interpreted as very good agreements between the two coders. therefore, we can assume that the modifications we made could improve the schemes. moreover, all the coding schemes seemed well applicable in our context. each coding scheme supported to identify considerable numbers of clues. however though it was encountered at very few instancesthere were coder disagreements regarding decisions related to the analysis using social, cognitive and metacognitive presence coding schemes. at one instance, the coders had a disagreement regarding a decision pertaining to the ‗open communication‘ and the ‗affective‘ categories of the social presence coding scheme. the reason for this discrepancy was the ambiguity of the ‗expressing emotions‘ indicator in the affective category and the ‗expressing appreciation‘ indicator in the ‗open communication‘ category. the disagreement related to the cognitive presence coding scheme was due to un-clarity of a message posted by a student. the two coders interpreted it in two different ways that led one coder to match the whole message with the ‗exploration‘ category while the other coder matched it with the ‗integration‘ category. each category of the metacognitive presence coding scheme could capture more than 29% of clues out of the total number of metacognitive presences that could be identified. however, there were 4% of disagreements. one coder explains that those chunks could be considered as signs of bringing previously acquired knowledge to the discussion and thus they can be matched with the ‗knowledge of cognition‘ category. however, the other coder disagreed with the decision saying that there was not enough information to consider those chunks having signs of previously acquired knowledge. c. relationship between metacognitive presence construct and other components the metacognitive presence coding scheme captured a considerable number of the chunks that could also be captured either by the teaching presence component or by the cognitive presence component of the tool. moreover, we found that some of the chunks that had been identified using the cognitive presence or the teaching presence coding schemes were segmented further in matching with the categories in the metacognitive presence coding scheme. also, there were > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 7 chunks that did not necessarily belong to the chunks with clues of cognitive presence or teaching presence, but had signs of metacognitive presence. for instance, in one discussion, the students considered what a closed system and an open system could be and tried to determine whether a prison could be a closed system or an open system. one student, who was found to have integrated his idea with others‘ opinions, added: ―am i correct? ...prison: open prison, closed prison. you can’t explain these two types of prison only in a closed system.  i think this question makes some kind of confusion when thinking too much. it’s better to give up.‖ the last part of this message: ―i think this question...‖ could be treated separately from the rest of the message. during the analysis using the cognitive presence coding scheme, one coder suggested that it might be considered as a resolution. she reasoned that it might be interpreted as an attempt to seek a resolution or to end the discussion. the other coder did not agree and both decided to ignore that line in the message. however, when using the metacognitive presence coding scheme, both coders divided this line into two chunks and matched the first, ―i think this...too much‖ with the ‗monitoring cognition‘ category and the second, ―it’s better to give up‖ with the ‗regulation cognition‘ category. in another instance, a student provided a detailed description of binary arithmetic and reported: ―...if you feel anything not clear here please contact me. it is a pleasure to help you! i too had to struggle for days to have a good understanding. so, never give up! wish you good luck!‖ the coders matched the chunk ―i too had to struggle for days to have a good understanding!” with categories in the metacognitive and social presence coding schemes, but not with any of the categories in the teaching or cognitive presence schemes. d. typical issues and guidelines methodological issues in online discussion content analysis have frequently been discussed in the literature (e.g. [32], [10] and [14]). in such articles, suggestions and advice that analysts can adhere to in order to handle issues related to reliability have been proposed. for instance by incorporating multiple coders, and using cohen‘s kappa to compensate for chance agreement and triangulation methods to increase the validity of the results. however, there is still a need for easily applicable instructions that can support novice analysts to achieve valid results. during our analysis, the main problem that we encountered was associated with the issue of culture. the students seemed to have formed messages in english while thinking in their mother tongue. therefore, the coders had to pay extra attention to interpreting messages in the discussion threads. further, the comments of the coders and our experience emphasised the importance of taking necessary steps to reduce the difficulty in analysing online discussion content in general. this resulted in the formulation of guidelines that could be followed in analysing online discussions. these guidelines are discussed below. reformulate messages where it is essential in some cases we had to reform the messages to make them more comprehensible. therefore, we suggest that the analyst of discussion threads should read all the discussion threads and try to understand the discussion before starting the analysis. while doing this, the analyst can carefully improve the clarity of the messages where it is essential. study the context of the discussion in order to understand a discussion, the analyst may need to know the information related to the context of the discussion, which can be obtained from the online course environment where the discussion emerges. this contextual information will probably be essential to make decisions during the coding process. therefore, if discussions are in printed form then the analyst should go back to the online learning environment and study the contexts of each discussion before starting the analysis. understand the inquiry process the reason for using the coi model was to understand the student inquiry processes that emerged for the purpose of solving problems related to subjects covered in the online courses. this understanding was mainly connected to the analysis conducted using the cognitive presence coding scheme. therefore, analysts who wish to use the analytical tool of the coi framework should have a thorough understanding of the coi framework and the inquiry process. comprehend the coding schemes analysts should be able to comprehend not only the inquiry process and the coi framework, but also the category definitions and indicators in each coding scheme of the coi model. this will aid the analysts to investigating the chunks more precisely and, as a result, increase the reliability of the instrument. when there is more than one coder working with the analysis, they should grasp the instructions and information in the coding schemes together and build up a mutual understanding of the coding schemes. consider only one coding scheme at a time the coi model has four components: social, cognitive, teaching and metacognitive presence. each of these components has three or four categories and altogether there are fourteen categories. hence, it is not easy for a coder to either remember all the categories and their definitions or to refer to the schemes back and forth during the analysis. therefore, analysts who are interested in investigating all the elements covered in the coi model should use only one coding scheme at a time until all the discussions are analysed. this process should be repeated with all the coding schemes. double check the work the coders who participated in our evaluation missed a considerable number of clues that could be identified by the model. this emphasises the importance of rechecking the analysis with a coding scheme before going to work with the > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 8 next scheme. v. discussion the evaluation reported on in this paper aimed to determine whether the coi model could be used to analyse online discussion content in a learning environment prepared for distance learners in an asian country. the evaluation was conducted with a sample set of discussion threads in four online courses that covered subject content relating to information technology. out of these courses, three (i.e. course 2, 3 and 4) had practical as well as theoretical subject content, while the other course did not have any laboratorybased learning activities. a. reliability of the analytical tool the model was adapted considering our experience and the suggestions brought out by cotemporary researchers for better reliability of the model. for instance, rourke, anderson, garrison, & archer [33], who developed the social presence analytical tool, examined it by analysing discussions in two graduate-level courses; one in workplace learning and the other in distance learning. they reported that there were issues in investigating clues of expressions of emotions and humour. shea et al. [17] used an adapted version of the coi model and re-examined the model by analysing discussion content from two courses in business management at a state college in the united states. they experienced problems with the indicators ‗expressions of values‘ and other indicators that rourke et al. [33] also confounded in the social presence coding scheme. furthermore, we solved another issue encountered by our coders due to the ambiguity of the two indicators – ‗expressing emotions‘ in the affective category and ‗expressing appreciation‘ in the open communication category. after doing necessary modification and re-evaluating each construct of the analytical model, we could ensure that the constructs were more reliable than before. the reliability values of the adapted model were at higher levels than the reliability values found by shea et al. [17]. therefore, we believe that the coi model with our modifications can be used reliably to analyse discussions in the online courses. more specifically, the results of the current study imply that the modifications made to the coding schemes are appropriate and relevant for conducting discussion content analysis in our online courses. further research is welcome to affirm that the adapted model is more appropriate and reliable in other disciplines as well. b. theoretical framework and metacognitive presence the theoretical framework of the coi analytical model is composed of three major components – social presence, cognitive presence and teaching presence. akyol and garrison [16] introduced a new component, ‗metacognitive presence‘ to the theoretical framework and they placed it at the intersection of the cognitive and teaching presence components of the framework. however, in our evaluation, we found cues of metacognitive presence which could be matched neither with cognitive presence nor with teaching presence. this implies metacognitive presence does not fall only on the intersection of teaching presence and cognitive presence. it goes beyond the boundaries of cognitive and teaching presence. therefore, the findings of the present study imply that the coi framework may need further improvements to properly enclose the metacognitive presence construct. vi. conclusion the present study was conducted to evaluate the coi analytical model and the findings affirm that the model with our modifications was reliable and more applicable to analyse online discussions in our context. based on the experience of our coders, a set of guidelines was formulated to lessen the difficulties pertaining to online discussion content analysis in general. consequently, the validity of the results that analysts can find with this improved model and by following the set of guides will get increased. therefore, this study adds more value to the coi model and suggests that researchers will be able to use this model as a highly useful and reliable analytical tool. hopefully, the implications from findings of the future research using this model will bring up more practical and fruitful suggestions to enhance students‘ learning experience in online learning environments. the coding process that we employed in the present study was very time consuming. this signifies the requirement of future research to develop application software to automate the analytical process and thereby evaluate students‘ learning in online discussions. for instance, the improved coi model can be used to develop an application to automatically rate students‘ messages as soon as they are posted to online discussions. this will help teachers to more efficiently evaluate student activities in online discussions and consequently, students will get motivated to participate in online discussions and actively engage in the inquiry-based learning. moreover, the present study encourages future research work to investigate possible improvements that can be done to the theoretical framework of the coi model and properly enclose metacognitive presence construct which is useful for declaring information related to distance online learning environments. acknowledgement the study is financially supported by the swedish international development cooperation agency through the national e-learning centre project at the university of colombo, sri lanka. our thanks should go to mrs. t. karunaratna, who participated in the evaluation as the external coder, and to those who provided feedback to improve the coding schemes. references [1] henri, f. (1992). collaborative learning through computer conferencing: the najaden papers. (ed. a. r. kaye) pp. 117-136. springer-verlag, berlin. > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 9 [2] mcdonald, j. (1998). interpersonal group dynamics and development in computer conferencing: the rest of the story. in proceedings of 14th annual conference on distance teaching and learning (pp. 243-48). madison, wi: university of wisconsin-madison [eric document ed422864] [3] newman, g., webb, b., & cochrane, c. (1995). a content analysis method to measure critical thinking in face-to-face and computer supported group learning. interpersonal computing and technology, 3(2): 56-77. available: http://qcite.qub.ac.uk/handle/123456789/5895 [4] garrison, d. r., anderson, t., & archer, w. (2000). critical inquiry in a text-based environment: computer conferencing in higher education. the internet and higher education, 2(2-3): 87-105. [5] fahy, p. j., crawford, g., ally, m., cookson, p., keller, v., & prosser, f. (2000). the development and testing of a tool for analysis of computer-mediated conferencing transcripts. the alberta journal of educational research, 66(1): 85–88. [6] mcdonald, j., & gibson, c. c. (1998). interpersonal dynamics and group development in computer conferencing. the american journal of distance education, 12(1): 7-25. [7] de wever, b., schellens, t., valcke, m., & van keer, h. (2006). content analysis schemes to analyze transcripts of online asynchronous discussion groups: a review. computers and education, 46(1): 6-28. [8] krippendorff, k. (2004). content analysis: an introduction to its methodology (2nd ed.). thousand oaks, ca: sage. [9] lombard, m., snyder-duch, j., & bracken, c.c. (2004). practical resources for assessing and reporting intercoder reliability in content analysis research projects. available: http://www.temple.edu/mmc/reliability. [10] garrison, d. r., cleveland-innes, m., koole, m., & kappelman, j. (2006). revisiting methodological issues in the analysis of transcripts: negotiated coding and reliability. the internet and higher education, 9: 1-8. [11] arbaugh, j.b., bangert, a., & cleveland-innes, m. (2010). subject matter effects and the community of inquiry (coi) framework: an exploratory study. the internet and higher education, 13(1-2): 37-44. [12] swan, k., & ice, p. (2010). the community of inquiry framework ten years later: introduction to the special issue. the internet and higher education, 13(1-2): 1-4. [13] buraphadeja, v., & dawson k. (2008). content analysis in computermediated communication: analyzing models for assessing critical thinking through the lens of social constructivism. american journal of distance education, 22(3): 130-145. [14] garrison, d. r., anderson, t., & archer, w. (2010). the first decade of the community of inquiry framework: a retrospective. the internet and higher education, 13(1-2): 5-9. [15] garrison, d. r., & arbaugh, j.b. (2007). researching the community of inquiry framework: review, issues, and future directions. the internet and higher education, 10(3): 157-172. [16] akyol, z., & garrison, d.r. (2011). assessing metacognition in an online community of inquiry. the internet and higher education, 14(3): 183-190. [17] shea, p., hayes, s., vickers, j., gozza-cohen, m., uzuner, s., mehta, r., & rangan, p. (2010). a re-examination of the community of inquiry framework: social network and content analysis. the internet and higher education, 13(1-2): 10-21. [18] puzziferro, m. (2008). online technologies self-efficacy and selfregulated learning as predictors of final grade and satisfaction in college-level online courses. the american journal of distance education, 22: 72–89. [19] mckerlich, r., & anderson, t. (2007). community of inquiry and learning in immersive environments. journal of asynchronous learning networks, 11(4). available: http://www.incubatorisland.com/htmlobj215/coi_in_immersive_environments.pdf. [20] garrison, d. r. (2007). online community of inquiry review: social, cognitive, and teaching presence issues. journal of asynchronous learning networks, 11(1): 61-72. [21] anderson, t., rourke, l., garrison, d.r., & archer, w. (2001). assessing teaching presence in a computer conferencing context. journal of asynchronous learning networks, 5(2): 1-17. [22] shea, p., & bidjerano, t. (2010). learning presence: towards a theory of self-efficacy, self-regulation, and the development of a communities of inquiry in online and blended learning environments. computers & education, 55(4): 1721-1731. [23] garrison, d. r., anderson, t., & archer, w. (2001b). critical thinking and computer conferencing: a model and tool to assess cognitive presence. available: http://auspace.athabascau.ca:8080/dspace/bitstream/2149/740/1/critical _thinking_and_computer.pdf [24] pintrich, p.r., wolters, c., & baxter, g. (2000). issues in the measurement of metacognition. (eds. g. schraw & j. impara) pp. 4397. lincoln, ne: buros institute of mental measurements. [25] larkin, s. (2009). socially mediated metacognition and learning to write. thinking skills and creativity, 4(3): 149−159. [26] neuendorf, k.a. (2002). the content analysis guidebook. california: sage publications. [27] holsti, o. r. (1969). content analysis for the social sciences and humanities. reading, ma: addison-wesley publishing company. [28] rourke, l., anderson, t., garrison, d. r., & archer, w. (2001). methodological issues in the content analysis of computer conference transcripts. international journal of artificial intelligence in education, 12: 8-22. available: http://hal.archivesouvertes.fr/docs/00/19/73/19/pdf/rourke01.pdf [29] banerjee, m., capozzoli, m., mcsweeney, l., & sinha, d. (1999). beyond kappa: a review of inter-rater agreement measures. canadian journal of statistics, 27(1): 3–23. [30] garrison, d. r., anderson, t., & archer, w. (2001a). critical thinking, cognitive presence and computer conferencing in distance education. american journal of distance education, 15(1): 7-23. [31] pintrich, p. r. (2002). the role of metacognitive knowledge in learning, teaching, and assessing. theory into practice, 41(4): 219–225. available: http://www.each.usp.br/cmapping/pdf/edmtxta10.pdf [32] rourke, l., & anderson, t. (2004). validity in quantitative content analysis. educational technology research and development, 52(1): 518. [33] rourke, l., anderson, t., garrison, r., & archer, w. (1999). assessing social presence in asynchronous text-based computer conferencing. journal of distance education, 14(2): 50-71. available: http://auspace.athabascau.ca:8080/dspace/handle/2149/732 http://qcite.qub.ac.uk/handle/123456789/5895 http://www.temple.edu/mmc/reliability http://www.incubatorisland.com/htmlobj-215/coi_in_immersive_environments.pdf http://www.incubatorisland.com/htmlobj-215/coi_in_immersive_environments.pdf http://auspace.athabascau.ca:8080/dspace/bitstream/2149/740/1/critical_thinking_and_computer.pdf http://auspace.athabascau.ca:8080/dspace/bitstream/2149/740/1/critical_thinking_and_computer.pdf http://hal.archives-ouvertes.fr/docs/00/19/73/19/pdf/rourke01.pdf http://hal.archives-ouvertes.fr/docs/00/19/73/19/pdf/rourke01.pdf http://www.each.usp.br/cmapping/pdf/edmtxta10.pdf http://auspace.athabascau.ca:8080/dspace/handle/2149/732 > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 10 appendix table 1 coding scheme for social presence category code indicators definition examples a ff e c ti v e e x p re s s io n (s -a e ) sae1 using conventional or unconventional expressions to express emotions mood or emotions expressed through symbols or words. expressing emotions using repetitious punctuation, conspicuous capitalisation or emoticons. (may include big letters and text in different colours or fonts along with the normal text to express the tone of voice.) “...i‟m so ...” “oh! shit, i ...” “that‟s not what i mean.” “i just can't stand it when…!!!;” “anybody out there!” “what does it mean!?!?” sae2 using expressions of humour using conventional strategies such as humorous banter, teasing and joking. “he heee..., no need to think, what the book says is correct and i‟m wrong.” sae3 self-disclosure disclosing details of personal life, expressing vulnerability or sharing personal beliefs, vision, attitudes and interests. “where i work this is what we do...” “i just don‟t understand this question.” “i believe that we have right to see the assignment marks.” o p e n c o m m u n ic a ti o n (s -o c ) soc1 continuing a thread using reply feature of software, rather than starting a new thread. soc2 quoting from others‟ messages quoting complete or a part of a message posted by another. “what do you mean by „excess-k representation‟?” soc3 referring explicitly to other‟s messages directing references to contents of other‟s posts. “in your message, you talked about excel not calc.” soc4 requesting support requesting support or information from the students and the facilitator. (includes questions.) “could you please help me in ...?” “please upload that file or send me the link. i‟d like to read it.” soc5 encouraging or complementing encouraging others or complementing others‟ or the content of their messages. “i really like your interpretation of the reading.” “don‟t worry. you will find it easy.” soc6 expressing agreement or disagreement agreeing or disagreeing with others or content of their messages. “i was thinking the same thing. you really hit the nail on the head.” soc7 accepting mistakes acknowledging one‟s own mistakes and expressing gratitude for being made aware of them (recognition of each other‟s contribution). “sorry, it‟s a typing mistake.” “oh! i have made a big mistake. thanks for showing it.” soc8 expressing willingness to support expressing willingness to support others whether with or without request for such help. “contact me if you need further clarification.” “i can help you in java and php.” soc9 offering personal advice offering specific advice to other students. “i recommend adsl connection of .... the best thing to do first is...” g ro u p c o h e s io n (s -g c ) sgc1 vocatives addressing or referring to the participants by names or as sister or brother. “i think john made a good point. jessica, what do you think about it?” “no sister, that‟s not...” sgc2 addressing or referring to the group using inclusive pronouns addressing the group as we, us, our group or as friends. “our textbook refers to ...” “i think we veered off track...” sgc3 salutations and greetings indicating purely social functions: greetings or closures. “hi all” “good luck!” sgc4 social sharing sharing information unrelated to the course content but which helps maintain group cohesion. “we are having the most beautiful weather here. i will take some photos and send you soon.” table 2 coding scheme for metacognitive presence categor y code description indicators examples > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 11 k n o w le d g e o f c o g n it io n m-kc (entering knowledge/m otivation) entering knowledge and motivation associated with the inquiry process, academic discipline and expectancies. a more general aspect of metacognition observed anytime. pre-task reflection knowledge of the inquiry process knowledge of critical thinking and problem solving knowledge of factors that influence inquiry and thinking knowledge of discipline knowledge of previous experiences knowledge of self as a learner knowledge about general learning strategies and tasks entering motivational state expectancy of success “based on a combination of my past reading and experience, i define ...” “i remember in my first year teaching online …. it highlighted for me the importance of engaging activities.” “we know how to work with windows… we have experience...” “i am quite slow in reading.” “i make short notes while listening to help me understand.” “i am certain that i can understand even the most difficult text that they are going to discuss.” “we will be able to solve this…” m o n it o ri n g o f c o g n it io n mmc (assessment) observed during the leaning process. awareness and willingness to reflect upon the learning process. assessment of task, understanding progression and effort required is an important monitoring function. reflection on action declarative; judging commenting on task, problem or discussion thread asking questions to confirm understanding commenting about self and others‟ understanding making judgments about validity of content commenting on or making judgments about the strategy applied asking questions about progression or stalling assessing motivational state and effort required “i have understood blended learning to be a …” “i like your eloquently worded definition…” “good points.” “…. well today i have learned something about a …” “i think i have been able to think of an example for almost each of the models presented in …” “you make an interesting point when you observe …” “i am not certain why this is true a priori.” “am i correct?” “you all have done very well…” “i am interested in reading from tom's list.” r e g u la ti o n o f c o g n it io n m-rc (planning) observed during the leaning process. enactment and control of the learning process through the employment of strategies to achieve meaningful learning outcomes. reflection in action procedural; planning setting goals applying strategies o providing/asking for support o challenging self or others o asking questions to deepen thinking o asking for clarification o request information o self questioning o suggesting taking an action questioning progression, success taking control of motivation and effort facilitating/directing inquiry “your thoughts?” “i think i need to see more supporting research for the idea that …” “one of your solution is …would it be feasible within …” “also, i think i am going to need help in understanding” “i am just curious about the social processes … and how they might help learning” “shall we give up this now and go to the next topic?” “will we be able to finish this on time?” “let‟s use the new software and share our experience.” > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 12 table 3 coding scheme for teaching presence categor y code indicators definition examples d e s ig n a n d o rg a n is a ti o n (t -d o ) tod1 informing notices provides information related to changes or updates to the course content or any other notices specific to the subject under discussion “a new version of this lesson is uploaded to the course. please have a look at it before answering to this activity.” “you can access the lessons from here...” tod2 establishing time parameters communicate important due dates/time frames for learning activities to help students to keep pace with the course. please post a message by friday tod3 utilising medium effectively helps students to find appropriate places to discuss concerns. therefore, attempts to organise/manage the discussions properly. assists students to use lms features for learning activities and resolving technical problems. “this has been discussed in the forum... please post your question there.” “try to address issues that others have raised when you post.” tod4 establishing netiquette helps students understand and practice the kinds of behaviours that are acceptable in online learning, e.g., providing documentation on polite forms of online interaction. “keep your messages short.” “remember, all uppercase letters is the equivalent of “shouting.” f a c il it a ti n g d is c o u rs e (t -f d ) t-fd1 identifying areas of agreement/disagree ment assists in identifying areas of agreement and disagreement on course topics in order to enhance student learning. “joe, mary has provided a compelling counter-example to your hypothesis. would you care to respond?” t-fd2 seeking to reach consensus/understa nding assists in guiding class towards agreement about discussion topics in a way that enhances student learning. “i think joe and mary are saying essentially the same thing.” t-fd3 acknowledging or reinforcing student contributions acknowledges student participation in the course, e.g., reply in a positive and encouraging manner to student submissions. “thanks, for your post alex.” “thanks for your contribution.” t-fd4 encouraging or motivating students to participate in the discussion assists students engaging and participating in productive dialogue. “who else can provide an answer to what peter asked?” “any thoughts on this issue?” t-fd5 setting climate for learning encourages students to explore concepts in the course, e.g., promotes the exploration of new ideas. “don't feel self-conscious about „thinking out loud‟ on the forum. this is the place to try out ideas after all.” t-fd6 re-focusing/readdressing discussion on specific issues helps focus discussion on relevant issues and keeps participants on topic. “i think that's a dead end. i would ask you to consider...” “is that all what you have to say about this topic? what about …?” t-fd7 summarising discussion reviews and summarises discussion contributions to highlight key concepts and relationships to further facilitate discourse. “the original question was…. joe said…mary said…” “we concluded that…we still haven't addressed…..” d ir e c t in s tr u c ti o n (t -d i) t-di1 providing specific instructions or advices provides task-oriented instructions or advice for learning. “create a table by yourself and try to add multiple keys...” “you should read the question carefully...” t-di2 offering useful examples or illustrations explains subject content using examples and illustrations (an attempt to make course content more comprehensible). “look at the following figure. it shows you how to print...” “this example will help you to understand...let‟s assume that ...” t-di3 providing additional explanations attempts to reduce confusion or misconceptions about course content by providing additional explanations. “let me provide you with some additional detail explaining how this device works.” t-di4 making explicit references or providing extra learning resources provides useful and relevant information to find solutions or for further clarifications. “this will help you to clarify your doubts; http://….” t-di5 encouraging doing activities encourages or motivates students to complete learning activities and try out challenging tasks that are required to be completed during the course. “try the activity 5. it‟s very easy.” “you can do it if you follow ...” t-di6 responding to technical concerns provides direct instructions on technical questions related to the online learning environment. “if you want to include a hyperlink in your message, you have to ...” a s s e s s m e n t (t -a s ) t-as1 providing constructive feedback to student posts judges and evaluates the relevancy and usefulness of the information posted in messages and provides comments or tips on improvement. “very good, nadia. you have posted some very useful information.” “are sure what you have posted here is correct. you better check it again john.” > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 13 table 4 coding scheme for cognitive presence phas e descriptor code indicators socio-cognitive process examples (t1,...,tn are triggering events and t1,...,tn are replies to the issues raised at the triggering eventst1,..,tn respectively.) t ri g g e ri n g e v e n t (c -t e ) evocative stimulate one‟s curiosity core organising concept or problem that learners can relate to from their experience or previous studies framing the issue and eliciting questions or problems that learners see or have experienced assessing state of learners knowledge and generating unintended but constructive ideas c-te1 recognisi ng problem presents background information that may culminate in a question or presents a problem/issue. t1. “in section 5, page 152 of the student manual says „solid states‟... could you please explain what it means?” t2. “i think the statement „the internet uses tcp standards in data transmission‟ is correct. but in a quiz, it is considered as incorrect. can it be a mistake? please explain.” c-te2 sense of puzzlemen t questions or messages that take the discussion in a new direction. t3. “sometimes ago, i studied what „bit‟ and „byte‟ are. but now, i can‟t remember and i am confused. can someone explain what they are?” t4. “i wanted to print cell borders of a calc worksheet. but failed. is there anybody who has done it before?” t5. “are touch-screen laptops better than normal laptops?” e x p lo ra ti o n (c -e x ) inquisitive understand the nature of the problem and then search for relevant information and possible explanation group activities brainstorming private activitiesliterature searches manage and monitor this phase of divergent thinking in such a way that it begins to be more focused c-ex1 exploratio n within the online community unsubstantiated agreement or disagreement/contradiction of previous ideas. t2  “i don‟t agree. it is incorrect.” “i agree with you.” c-ex2 exploratio n within a single message many different ideas/themes presented in one message. t1  “dictionary meaning of „solid state‟ is.... in a past exam paper i found it defined as „...‟ but i have been taught it as „...‟” c-ex3 informatio n exchange personal narratives or descriptions (not necessarily regarding personal experiences) or facts (i.e., from sources such websites, articles, programmes, etc.) adds points but does not systematically defend/justify/develop addition. t4  “http://www..... this online video might help you to understand how to print cell borders.” c-ex4 suggestio ns for considerati on author explicitly characterises message as exploration t3 [after bringing out some information about bit and byte] “does that seem about right?” “am i way off the mark?” c-ex5 leaping to conclusion offers unsupported opinions t2  “...it‟s a mistake.” t4  “cell borders of a worksheet cannot be print.” in te g ra ti o n ( c -i n ) tentative focused and structured phase of making meaning decisions are made about integration of ideas c-in1 integration among group members reference to previous message followed by substantiated agreement or disagreement. building on, adding to others‟ ideas. t2  “i don‟t agree with you because...” “i agree because...” “according to what renuka noted, ... but i think ...” c-in2 integration within a single message justified, developed, defensible, yet tentative hypotheses. t4  “i used this free tutorial, http://.... it explains how to print worksheets with cell borders. according to that, first you have to ....” c-in3 connectin g ideas integrating information from one or more sources – textbooks, articles, personal experience, other posts or peer contribution. t5  “as neel said, now there are laptops with touch screens. see the attached picture. but there is a problem with these laptops. read this, http://... therefore, i think...” c-in4 creating solutions explicit characterisation of message as a solution by participant. t4  “here is the answer; you can print cell borders like this... format>page>sheet tab>...” r e s o lu ti o n /a p p li c a ti o n (c -r a ) committed reducing complexity by constructing a meaningful framework or discovering a contextually specific solution confirmation or testing phase may be accomplished by direct or vicarious action. resolution of the dilemma or problem c-ra1 vicarious application to real world testing solutions providing examples of how problems were solved or evidences of successful application. t4  “how i printed a calc worksheet with cell borders was...” “it did not work at first. but when i selected some lines of text and tried again then it worked...” c-ra2 defending solutions defending why a problem was solved in a specific manner. t4  “here is the modified list of steps to print a worksheet with cell borders. i did a small change to the second step of mahela‟s procedure. because i could not open the print dialog box by following it as it was.” c-ra3 judging or evaluating and expressing satisfactio n judgement or evaluation followed by an expression of satisfaction after solving the problem or issue that caused the triggering event. t1  “...i understood what „solid state‟ means. thanks.” t2, t3, t5  “...thanks for the explanation. i got my doubt cleared/ problem solved.” t4  “...thanks a lot. i followed your instructions and printed a worksheet with cell borders.” > replace this line with your paper identification number (double-click here to edit) < december 2012 international journal on advances in ict for emerging regions 14 icter journal pid294 vol13 no3 international journal on advances in ict for emerging regions 2020 13(3): december 2020 international journal on advances in ict for emerging regions a multi-layered data preparation model for health information in sudan ahmed mustafa abd-alrhman#1, love ekenberg #2 abstract— data quality is a major challenge in almost every data project in today’s world, especially when the required data has a national or global look and feel. however, data preparation activities dominate the efforts, cost, and time consumption. nowadays, many data collection approaches are continuing to evolve in the era of big data to accommodate revolutionary data flows, especially in the health sector, which contains many different levels of data types, formats, and structures. the lack of qualified and reliable data models is still an ongoing challenge. these issues are even magnified in developing countries where there is a struggle to make advances in health systems with limited resources environments, and to adopt the advantages of ict to minimize the gaps in health information systems. this article introduces a geo-political multi-layered model for data collection and preparation, the model enables the health data to be collected, prepared and aggregated by using data attendance approach and address data challenges such as data missing, incompetence and format. the currently used data collection method in health sector in sudan was analysed and data challenges were identified, with respect to geo-political structure of the country. the result of the model provides structured datasets framed by time and geographical spaces that can be used to enrich analytical projects and decision-making in the health sector. keywords—data preparation, data quality, sudan, health information systems i. introduction in today’s world, the use of data is a common necessity in almost every domain, including individuals, groups, enterprises, and governmental entities; however, different usages and needs characterize the data type, format, and volume. data management introduces an entire span of sequential processes from the generation of a particular dataset until archiving or in some cases deletion; this includes generation, collection, recording, processing, transforming, and loading of data, known as data lifecycle management (dlm) [1]. many developing countries, including sudan, suffer from data collection problems, particularly in the health sector [2-4]. in the case of sudan, the governance body for health, which is the federal ministry of health (fmoh), obligates health providers (private, public) to record and submit a monthly manual health report which summarizes different medical and management statistical data in any health facility. the report is structured into subsections in a unified fashion for all health service providers. each section contains a list of heading titles and blank spaces for required data entries, for the purpose to record the existence of the services in the facility, in addition to numerical values that represent: services, tests, patients, equipment, and personnel. the analysis of the data herein pointed out many challenges, for instance, some reports have 35% incomplete data, other reports had more than 25 manual corrections, in addition to 5-7% of missing data and inconsistencies between working electronic laboratory systems and the corresponding manual aggregated data in the monthly reports for one health facility. with this reservation, the results generally showed many challenges in the data collection and management, which are categorized into three problematic themes: data, user, and management errors. • the main data errors are (1) missing data: some columns are left entirely blank, without a trace for default values; (2) incompleteness: some patient files were found not fully recorded; (3) inconsistent data: for example, this was found when comparing existing software results with designated data elements in the report and variances were identified; and (4) unavailability: in some cases, the data was not ready in data collection points at reporting time. • the interaction of the user (data reporter) with the medical report in terms of collection, recording, and submission of data indicates issues such as (1) manual data correction or cancellation using stationary tools; (2) collection of data being done verbally and not documented in some cases; (3) the collected data were not supervised in some cases, as the collection department consists of one person who is responsible for all reporting activities; and (4) harmonization problems between medical and management staff in data collection. • the management of data suffers from many obstacles including: (1) manual collection of data, which might be exposed to human mistakes; (2) the absence of validation or verification mechanisms; (3) poor time constraints; for instance, data may be reported as monthly aggregations, and reports can be delayed for 10 days from submission dates; (4) poor data structure, with no standard coding mechanisms and no data links, which overburdens future analysis. the overall results indicate poor data quality, which is considered one of biggest challenges facing those who working with data preparation and management. despite the notion that the time spent in collecting and preparing data to ensure its quality dominates total efforts and represents almost 60% of the duration of data analysis projects [5], applying ict in health domain can have a positive impact on healthcare provision in developing countries, especially with the increases in network coverage and data processing manuscript received on 22 sep. 2019. recommended by dr. manjusri wickramasinghe on 20 november 2020. ahmed mustafa abd-alrhman,is from sudan university for science and technology. (ahmedabudi@hotmail.com). love ekenberg is from the department of computer and systems sciences, stockholm university (lovek@dsv.su.se) and the international institute for applied systems analysis (iiasa) (ekenberg@iiasa.ac.at) a multi-layered data preparation model for health information in sudan 2 international journal on advances in ict for emerging regions december 2020 capacities and the decreases in hardware costs. however, challenges remain outstanding to develop data models that can satisfy data quality measures and management, to support and strengthen health information systems. data usually needs further processing in order to be ready for use, that is, to transform the collected data into other shapes and formats to meet new requirements such as data analysis. this has become even more necessary in the case of non-electronic data forms, which put and additional preliminary step to transform the data into a soft version as an essential requirement to enhance the upcoming data management [6], as noted by sarkies et al. [7] who estimated that an improvement of more than 30% could be achieved by using an electronic patient recording program rather than manually recording patient data. in this article, we suggest an innovative approach to build structured data collection systems for health information in sudan and elsewhere whit similar conditions. we are suggesting a geo-political data hosting approach to systematically organize the data according to demographical dimensions with political control mechanisms. the reason for choosing such a structure is to better support the hierarchy of official health authorities, which are responsible for maintaining policies, regulations, and management activities. we introduce a data model to structure data sources using geo-political settings to represent health organizations in sudan and provide a mechanism for exchanging transactional and aggregated data, between health facilities and governing health bodies. we introduce a different approach in the data preparation process by moving the processes of data preparation activities downstream to data collection points, compared to traditional approaches that process and qualify the data after completion of data collection phase. the objectives of the proposed approach mainly focus on: • integrating data preparation techniques in early stages at the data entry point to increase the efficiency and speed of data preparation process; • early detection of errors and isolation of low data quality and fix data errors at data entry points; • the new approach transfers part of the process of extract, transform, load (etl) to data entry points in order to enhance the load balance, process throughput, and data production time. the next section will discuss data management in the health sector, and highlight the data quality requirements and its impact on health information, after, we will describe the current status of ict implementations in sudan and identify the requirements from health data models in this developing country. the third section introduces the multi-layered data attendance collection model (mdacm) and its configuration. lastly, a discussion is presented to show the model’s ability to function in an environment with a poor ict infrastructure. ii. methodology the intention was to understand the data collection problems inside the health facilities as well as the communication mechanisms and challenges with health authorities and we have primarily used a qualitative approach. an analysis of the monthly manual reports was conducted to investigate the report structure, contents, and user interaction possibilities. the analysis included five large hospitals in the capital city, khartoum, whereof two were public and three in private ownership. the selection of health facilities was constrained in terms of the service capacity and frequency of patients. monthly reports were analysed for three months (june august 2018) for each hospital. to that, digital database analysis was conducted in the health facilities using digitalised systems to verify the manual monthly report data and to identify any possible discrepancies. furthermore, interviews were conducted with management employees, responsible for generating the monthly reports. moreover, interviews with medical staff were conducted to identify the possible gaps and obstacles in the data collection processes. we also monitored the actual data collection, verification and exchange to validate this information. iii. literature review a. data management challenges in health sector it is obvious that data management in the health sector is crucial because of the sensitivity of the domain and its tight relationship with society, economics, security, and public health. however, the importance of data preparation extends the perception of data usage to decreasing operational cost for organizations and better resource management. a cost-benefit analysis of data quality conducted by redman [8] estimated that data errors in the range of 1–5% can cause revenues losses by 10% in industry. this can also be a burden in developed countries; for instance, eckerson [9] estimates that the cost to us businesses of poor data quality estimated by 600 billion usd per year. it is not only the financial cost that is negatively impacted by poor data quality, data analytics and the it industry also share such concerns; for instance, a survey conducted by a professional data preparation company [10] showed that data preparation costs almost 450 billion usd per year. even in such circumstances, the survey indicates that data analysts prefer to work with modelling and analytic activities rather than spending time preparing data. the impact of poor data quality can affect health provision, as noted by yawson and ellingsen who investigated the implementation of an electronic health record system in ghana, where they concluded that recording huge amounts of data in health information systems does not necessarily indicate an improvement in health provision quality because of the bad quality of the data [11]. this reveals that data models should be strengthened with quality measures in the data collection and acquisition phases to produce a valid and qualified dataset. collected data might contain errors and the quality of data can be negatively impacted by, for example, incompleteness, inconsistency, and duplications, which can be a challenge for data projects in various phases. rahm and hai do noticed the problem of data quality when uploading data into data warehouses that can contain highly probable quality issues occurred at their original sources [12]. this is shown in the process of data cleansing, which occurs after data collection. the problem observed here, in the concentration and delays of the work burden until the collection process is finished. in the same context, zhang highlight the importance of the data quality in data preparation [13], which can speed up performance of the data analysis and mining processes. the authors identify the needs to purify and clean the collected data, which might be collected from different sources with multiple issues such as incompleteness and inconsistency [13]. moreover, kwak and kim pointed the problems of contextual missing values and 3 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions the significance of outliers’ affection in statistics estimation [14]. aguinis et.al discussed and recommended the best data preparation techniques and addressed data issues, such as outliers management, data correction methods and data transformation [15]. the density of data in health is always characterized by the accumulation of a huge amount of data which is batched as datasets. many researchers recognize probable obstacles in managing big datasets and many solutions and approaches have been proposed in that context. for instance, kanchi et al. identify the challenges when managing big data including health records and recommended optimization methods to reduce data problems by concentrating on data management practices, techniques, and infrastructure, which can have a positive impact on data quality [16]. in the same context, levant et al. suggest that data quality of systems can be improved significantly by addressing the data problems and the consequent remedies, in particular, data correctness, data completeness, precision, timeliness, and usability [17]. from that perspective, they suggest the enhancement of systems’ data quality by building semantically rich data models, increasing the database rules and constraints, and using a predefined process for data usability. the size of data, complexity and data sources can be viewed as a significant challenge as was pointed by kristian [18]. b. data quality management different approaches have been introduced to measure data quality. for instance, heinrich et al. introduced a metric-based approach to quantify the data quality by adding data correctness and a timeline to meet the system requirements [19]. rather than using pre-defined requirements as recommended by heinrich [19], cappiello et.al introduced another approach which uses scoring processes to compare the result outputs to match the evaluation of pre-specified objects [20]. in addition, an assessment of data quality was introduced in 2018 using the data quality framework (dqf) by creating quality properties and provenance of data with respect to user’s experience of the quality, to make it possible to assess the data quality and track possible data errors [21]. svetlozar introduced some new approaches to combine data preparation with data visualization and clustering techniques to support decision-making processes [22]. furthermore, it has been highlighted that feedbacks on collected data can prevent, e.g., incomplete data and consequently can be used for mapping and completion processing [23]. management of health data at national levels is gaining more global attention, especially in developing countries, by bringing forward compliance with international goals [24]. many countries including lowand middle-income countries have adopted national approaches for health data collection. for instance, in 2010, india started identifying the population eligible for state healthcare for by using aadhaar cards with biometric verification for citizens. similarly, côte d’ivoire introduced a data-sharing mobile application to monitor epidemics nationwide in 2013 [25]. the international community encourage countries for utilization of data models to improve the countries health information systems, and thus health status, which can positively enable decision makers to achieve better health provision and proper resource management, especially in severe conditions like catastrophic events and disease outbreaks. there is global consensus on migrating from paper-based information systems to electronic format. this is especially emphasized by who, which describes health information systems as “the health information system provides the underpinnings for decision-making and has four key functions: data generation, compilation, analysis and synthesis, and communication and use” [26]. ict adoption models for health information systems have been presented as systematic guidance to implement information systems based on electronic medical records. for instance, the capability maturity model (cmm) is used to measure the current status of ict implementation in health institutions by focusing on initiating and standardizing the processes with continuous improvements [27], however, the adoption of the cmm in australia can be viewed as an enhanced version which scaled into five steps to assess the increasing capability of the e-health system in australia. enterprise architecture (ea), introduced earlier by spewak and hill, follows a top-down approach for instantiating and improving process cycles, starting by defining enterprises requirements frameworks and governances [27]. in addition, the emr adoption model focuses on the adoption of paperless emr in health facilities by using seven progressive stages including a clinical decision support system, which was adopted in us and canadian hospitals with different degrees of implementation [27]. an observation on such adoption models is that, they require a reasonable degree of stability of the ict infrastructure, in particular, processing and storage capacities, network coverage, and personnel skills, which are the major obstacles faced by developing countries compared to developed countries. in the same context, a study by the gates foundation in 2009 involving 19 developing countries showed a poor ict status, which requires these countries to adopt an alternative approach that considers the difficulties and obstacles facing ict infrastructure. cost of implemention and infrastructure requirements had been identified as a real challenge in developing counties to adopt ict in health information systems [27]. on one hand, we find the existence of ict infrastructure in these countries; however, it is still poor and not widespread, especially network coverage; on the other hand, almost all developing countries have reasonable paper-based health information systems, which vary in terms of strength and effectiveness. another obstacle is that without the support of reliable health information system, it is difficult to perform health management and planning nationwide [27]. many implementations of health information systems focus on electronic patient records, although a significant improvement in this methodology can be achieved [28], however, health management, planning, and policy making; depend on disease aggregation datasets that focus on better support for medical researches and patient treatment, instead of individual patient records. one of the biggest challenges in building electronic health information systems in developing countries is the cost of implementation [8], which requires moving paper-based data collection systems into electronic format to enable aggregation and analysis of data for the purpose of decision making, health planning, and resource management. to tackle cost and technological obstacles in developing countries, bram et al. suggest that health data should be collected by hiring community health workers who interact a multi-layered data preparation model for health information in sudan 4 international journal on advances in ict for emerging regions december 2020 directly with the population and deliver health care and consultations [28]. the approach is constructed based on the relationship of trust between volunteers and their respective communities; however, the approach is heavily dependent on a non-stable concept, the human factor, which can be unreliable and subjective; moreover, the coverage and data management could become issues in terms of incompleteness, inconsistency, accuracy, and traceability. data management in developing countries, including sudan, is therefore facing a common challenge including poor infrastructure, cost of implementation, lack of data models, and consequently poor health information systems and data quality for use it in health planning, management, and decision making. c. current health information system in sudan sudan is federated as 18 states, with each state being divided into a number of localities. the population is around 40 million, distributed over 17,765,048 km2. four mobile operators are licensed to operate in sudan and they cover all the big cities and almost 83% of inhabited areas with 28 million subscribers and provide internet connectivity including 3g and 4g speeds. healthcare delivery in sudan is divided between the public (governmental) and private sectors. the health management structure is divided into four organization levels. at the top is the federal ministry of health (fmoh); the second level is the health ministries in the 18 states, which control – at the third level – the medical units in localities (189 localities), and health facilities are regulated at the fourth level by localities (almost 6100). there is no central record of registration of health facilities; however, states ministries are responsible for registering and licensing each health facility within their boundaries, in addition, there are referential public hospitals in the capitals of states, with extra presence in the country’s capital, khartoum. the health information system in sudan is an old system dated back to the twentieth century. the data has been collected from health facilities manually until the current date (june 2019). this is done by a unified monthly report for all health facilities including the private sector. the monthly form is organized as aggregated data that summarize statistics about clinical information on diseases and patient frequencies in a given facility. each locality is responsible for aggregating and reporting, monthly, the collected medical forms under its authority to the state health ministries. likewise, the state ministries summarize localities’ aggregated data, and send it to federal ministry of health as state report in monthly or quarterly frequencies. data collection in the sudanese health sector faces many challenges which negatively affect its quality. this includes many incidents of data incompleteness, inconsistency, inaccuracy, and delays in reporting time, although a prescheduled reporting times was found. in addition, the health system struggles from fragmentation and lack of a data integration mechanism between different administrations even in the fmoh. data management is still facing major challenges, especially the process of data repairs and followup. the use of ict in the health system in sudan is still an ongoing struggle and it has a poor status. for instance, fmoh indeed has an it department equipped with servers, storage, and a local network with fibre optic connectivity used to connect the government’s central data centre as part of an ongoing e-gov project. however, moving down the health organizational hierarchy, states and localities are poorly equipped with computers and networks, especially the medical units that are far from the central capital, with additional challenges in network coverage and power supply. many attempts to implement electronic health information systems have failed. however, the fmoh started to use district health information software (dhis) in 2016 for the purpose of aggregating health data from the level of localities. implementation of the electronic system did not cover all areas, including the capital state, khartoum, which did not adopt dhis system because of an ongoing separate enterprise resource planning (erp) system for khartoum state that includes a dedicated health information system for the state. in addition, other localities face technical and managerial obstacles, for example network coverage and power supply; however, the health units in the other localities vary in the level of system implementation too, and they are still reporting their monthly data twice, manually and using dhis. it department in fmoh prepares an annual report that reflects country’s health status uses special electronic system. the report is sent to health officials, ministry of finance, donors and international organizations. the data contained in the annual report is collected from dhis and completed by the data collected from other health units that does not uses this system. sudan started to deploy a national information system to register new born in hospital, however, the system is very poor in terms of implementation and coverage. the system was implemented in private and public hospitals, but despite the presence of the system, users are still issued with birth certificates manually and some technical issues are continuing to emerge including network disconnections. health administration performs regular programmes of staff training, including training on the current ongoing dhis system; however, the instability of staff is still a challenge. on the other hand, management and medical students take courses in using computers and basic operation software in universities as part of their studies. although health workers have the basic knowledge to operate computers and basic software, capacity building is another challenge that needs more attention. implementation of health information systems and electronic patient records in health facilities is poor and limited. for instance, one of the three biggest public hospitals, omdurman public teaching hospital in the capital, only uses the electronic health information system in daily work, while the other two hospitals have faced a failure to implement ehealth solutions. private sector succeeded to use electronic systems for financial, administration and statistics purposes, especially in capital and big cities, however, there is limited attention to use applications for medical purposes. no national patient record system has been identified. the overall status of ict in the health sector in sudan, as a developing country, is reasonable for it to start the adoption of a data collection model to strengthen health information and achieve continuous improvement; however, there is much that can be done to improve the use and adoption of ict in the health sector in sudan, especially in capacity building and it infrastructure. to improve the quality of data and minimize data errors in the health sector in sudan, the 5 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions data must be prepared and managed in order to be ready for analytical activities and decision making. iv. multi-layered data attendance collection model (mdacm) a. design motivation the data organization is intended to represent the geopolitical organization of the country. the background here is that sudan geographically is divided into 18 states, where each state has a local government, consisting of a number of local ministries. each state is further subdivided into a number of departments, each characterized by demographic properties, where services, people, and offices are located and systematised in pre-specified geographical areas governed by the locality authorities. therefore mdacm organizes and visualizes data in multiple layers, where each layer represents a governance level (see figure 1). the proposed model is mainly concerned with medical data, and for that reason the model will focus mainly on providing a hierarchal data model for health organizations in sudan. the proposed model is intended to minimize health system obstacles by using a flexible data structure design that benefits from the current ict infrastructure, that is, existing networks and processing and storage capacities. it is also suggesting an enabling mechanism to allow electronic data entry wherever possible while at the same time accepting data transformation from paper-based records and appropriately integrating it into the data model. building a national health information system should reflect the country’s current health status. health information systems in countries require data to be structured into two dimensions: time and geographical space. while a timestamp identifies the occurrence of the data, the geographical area articulates the incident locations; therefore, a time-space data representation can be provided for decision makers, health planners, and health administrators in that context. in addition, the design of the data model is motivated by many requirements and challenges, such as support decision making, interpretation of data, and identification of patterns. its primary focus is to collect and organize medical information to reflect the health status in the country; however, this implicitly requires modelling data to match the geo-political structure of the country, in particular, the four governing layers that are used in sudan and represent authorities’ hierarchy. by considering this outer design approach, it will provide flexibility to organize and manage health information and data flows. another aspect considered by the model is the concept of combining different objectives and criteria, in particular: (1) health information criteria which focus on diseases and patient medical records; (2) data management criteria, which focus on: data preparation, data quality and decision making; and (3) health management, which concerned about health provision, control health status, and better resource management. on the other hand, health information is distributed in many sources with commonalities and disparities; thus the model should organize the data to reflect the status of different health medical sections, data types, formats, and health programmes. stakeholders in health domaininclude governmental bodies, health workers, health providers, international organizations, and donorsuse health information from different perspectives, and for that reason, it is a necessity from the proposed model to fulfil these different requirements. new advances in ict provide additional opportunities for utilization of technology to enhance data collection, processing, and exchange; however, developing countries like sudan is still suffer from a lack of capabilities such as integrated networks. in this model we argue that it is still possible to use existing ict infrastructure to enhance health information system in sudan, by using a hybrid data transfer approach including online and offline data communication mechanisms. b. governance organizational structure – country (layer 1: l1) the federal ministry of health (fmoh) represent the first layer (layer 1: l1), and governing body which will contains the further sub layers (see fig. 1). the required data format, and structure will be created in this layer, including the disease coding system and data template structure. in addition, this layer represents the final destination of data after the collection and aggregation processes were completed in the descendent states’ layers, to reflect the health status as a national representation format. in addition, the data in this layer could be used for national interventions, analysis, and decision making. c. states (layer 2: l2) the second layer (l2) represents the geo-political division of the country in term of states. each state in sudan has a state government ruled by a governor and state ministries. not all federal ministries have corresponding state ministries; however, all states have ministries of health which are related to the fmoh technically and report medical data to the federal ministry. the data template structure is inherited from the parent country layer (l1) and forms the data definition that is required from each state; furthermore, this definition will be spread downstream to the third layer (l3) for each locality under the corresponding state. the collected data in the state layer reflects the health status in a given state and represents the resultant data aggregation from governing subunits (localities l3); furthermore, interventions, analysis, and decision making are limited to the state authority only. d. localities (layer 3: l3) the third layer (l3), the localities, is the organization of governance activities in each state, which is responsible from applying policies and regulations in a specified geographical area and interacting directly with the population and business bodies including health facilities. although localities are obligated to supervise public health and interventions, they are however, loosely tied to maintaining technical health information. the states’ ministries are more related to health providers and facilities, but even in that case, ministries of health in states organize the health data based on localities, and thus this layer can be considered as an organizational data layout at that level. the data template is inherited from the state layer (l2) to form the locality data structure definition. data in this layer is collected directly from data sources, which are the health facilities (l4), and the aggregated data represents the health status in the specified locality. at this point, the proposed data model defined how the data should be hosted, structured, collected, and aggregated, a multi-layered data preparation model for health information in sudan 6 international journal on advances in ict for emerging regions december 2020 and the data levelling mechanism was introduced; furthermore, the design determines the data usage, authority, and representation. using a layered hierarchy will provide more flexibility if the data in each layer needed to be compiled with other datasets; for instance, we can compile the model data with a civil registration system, medical insurance data, or budget data, because the national data is structured in the same design, and in that context the model output will enhance future data compilation and integration needs and reduce the time consumption of further data preparation activities. fig.1: an overview of the multi-layered data attendance collection model (mdacm) legend: l: layer dt: data template ndc: national disease code dal: data aggregator in layer x hf register: health facility register mini-emr: minimized electronic medical record e. medical data management data in the medical sector is shared and used by different stakeholders for different purposes and interests, which include: patient medical records, disease information and management data. however, data links, relations, and semantics are always challenging, especially technical health information concerned with diseases. for example, in a health facility, a patient who enters the health facility for clinical consultation may be redirected to a medical laboratory for investigation of a specific disease such as malaria. a positive laboratory result of the requested disease test will support the physician’s order for medicine placed with the pharmacy, which will finally be issued by the pharmacist. in this example, three medical sections (clinic, laboratory, pharmacy) interact with the mentioned disease from different perspectives, in particular, the data format and type of that disease; however, the common denominator is malaria, which is processed differently in each section and needed to be mapped and linked in all medical sections, and thus, building a basis for disease codes becomes crucial for disease analysis and control. the relations between medical data records can be identified using diseases’ standard codes, by unification of the language between different medical stakeholders, which facilitates data links and semantics even in different environments and usages; moreover, further data analysis can be enhanced if the data has been properly standardized, referenced, and normalized. this should be considered in early data design stages and thus diseases need to be modelled and standardized. f. national disease code top health management is responsible from producing and maintaining national disease code (ndc) records for each identified disease in the country. an ndc is a unique numerical or string value that defines a specific disease (see table 1), and this code will be used to link the data for all data processing activities concerning this disease. in addition, health facilities will integrate and report their data regarding this disease using the ndc as a key, and furthermore all data templates use the ndc in the template’s definition structure to refer to the required disease. an example of ndc records could look like the following table: table 1 defined national disease codes (ndcs) no. ndc disease name 1 10001 malaria 2 20001 cholera 3 30001 flu ndcs can be further nested to accommodate many types for a specific disease; however, we are going here to use a simple single disease code in this model. g. medical sections organization the health system in a country is constructed and managed by different specialized bodies inside the ministries of health; for instance, in sudan clinical data are maintained by sub-administration (disease control) in the health ministry, drugs are managed by the national fund for medicine, and medical laboratories are supervised by laboratory administration in the ministries of health. on examining medical data for diseases particularly, we can identify the relations between these medical entities concerning a given disease, for instance, diagnosis, laboratory testing, and curing of the same disease, and we notice that data from different entities can be collected and mapped to that disease. this relation can gain significant importance when the data is to be collected at national levels and the integration between different sections can enhance the data analytical process. to create such a relation, we integrate ndcs with designated medical entities (sections) by listing the sections that are involved in each disease (see table 2); in addition, medical sections are used to link health facilities’ internal medical sections to corresponding layered health bodies; for example, pharmacies are linked and report to the drugs and medicines administration. h. data templates when data is collected from different data sources, it is normally represented in different data formats and structures, 7 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions which puts an extra burden on data preparation activities in terms of transforming different datasets from different data sources into a unified data-ready structure to enable data analysis operations. the time and effort consumed in this process always dominate almost every analytical project; however, this problem can be optimized by instantiating predefined and constrained data templates and install these templates in data sources before data collection starts. table 2 integration of ndc in medical sections no. ndc medical section 1 10001 clinic 2 10001 medical laboratory 3 20001 clinic in this model, the data template implementation mechanism is introduced for the data collection process inherited by data sources. it is designed to constrain the required data format, type, and values in lower stages (data entry points), unlike traditional approaches which collect data and then optimize it later. this mechanism transfers and distributes the burden of data preparation activities to the start point; moreover, data cleansing, transformation, integrity, and constraints are implemented and corrected at the data source at the time of collection, which enhances the collection process in terms of time, effort, and cost. fig. 2: national disease code at this point, an abstract of the data (template) is created and a definition of the data is structured in terms of format (how), reporting dates (when), data source (who), and the required data (what), in addition to injecting the predefined reporting layer hierarchy to insure proper data organization (where). an instance of a data template represents the reporting of data (medical data) from a specific health facility (data source) at a specific date (attendance of data). to enhance the management process, the data template is organized into two sections, first, the management section, which is concerned with the organization and administration of the data-reporting templates, and secondly the medical section, which focuses on the medical data results and values. 1) data template management section: the management section of a data template starts by providing identification of a data source (health facility). as each health facility is registered in the system, the identification will directly lead to mapping the data source location to the corresponding parent layers (locality –> state –> country); moreover, it implements an authentication mechanism (registered and authenticated facility). health information analytics and statistics use time and space dimensions in reporting data. the space (geographical area) was already inherited previously by using health facility identification in this model, and the time of data reporting is managed by the reporting-date property in the data template. the reporting date not only acts as a timestamp for the data but can also be used to identify missing (absent) data, in addition to the attendance status property which provides the status of the data template. the following values of attendance status are used: pending: the data template reporting date has not been reached. present: the data template date has been reached and the data has been uploaded. missed: the data template date has been reached and the data has not been uploaded. constraints are applied to the data entry time to check for possible data errors, for example, incomplete data, format, translation, and conversion, and correction takes place. each row in the template is referenced by a data source identifier in addition to the date field, which enables the collection process to update the data row rather than creating a new data record. the proposed model introduces a mechanism to collect and upload data in term of the date (timestamp) of the data. this mechanism can be used to enhance the collection process from monthly reporting (monthly aggregate data) to daily reporting and even real-time reporting by continuously updating the template’s instance-specific date. the collected data is aggregated in the data sources daily, which can optimize the analysis and decision-making process in emergencies and catastrophic events which need instant feedback, for example disease outbreaks. 2) data template medical section medical data is concerned with disease-related information and is organized in this section, first, by identifying the reporting unit (medical section) within the health facility which is the data owner, for example the clinic, laboratory, or pharmacy. although the medical section will provide additional data aggregation criteria in parent layers, it also enables a data traceability feature by determining which medical unit is responsible and has reported what data; moreover, it enhances the data correction process if needed. the ndc is used to constrain the collected data about a specific disease. although the main objective of the model is to properly collect accurate medical information about specific diseases, medical data about patients is usually represented in the form of age groups or ranges, for instance, infants, adults. in addition, the gender also characterizes and classifies the medical representation, especially if the data is required in medical analysis. for that reason, the model structures the values of medical results in two extra levels: age and gender. a matrix organization is formed by adding age groups as rows and gender types as columns. a summation of rows will provide the total number of patients for the data template. to enable more insight into the data results, optional reference values are attached to the template data; for instance, laboratory results are often supported by a normal range, which can be standard default values and can be a multi-layered data preparation model for health information in sudan 8 international journal on advances in ict for emerging regions december 2020 updated by local reference values for the health facility to justify its result values. different data sources may use different results formats to report their data. for this purpose, a translation mechanism should be implemented in the data template to map the local data format into unified format and data types; for example, the result can be represented as [found|not found], [yes|no], [true|false], or [0|1], which all represent boolean values, and the string results can be coded and mapped to numeric values to make quantitative analysis easier. the proposed model provides, at this point, the necessary infrastructure for the data collection process by determining how the data will be collected and aggregated; in addition, it provides a constrained data template model for the health facilities. i. positioning and registration of health facilities (layer 4: l4) health facilities are registered and licensed to be able to provide healthcare services in sudan, while the fmoh supervise this process, it is actually practiced by sub-state ministries; as an advantage, the model will use current registration process to identify health facilities. registration of health facilities is integrated in the model as the last layer (layer 4: l4) and positioned under the locality layer (l3), where the health facility is physically located. the data in this layer is aggregated locally on a daily basis for each disease and represents the local statistics for a specific health facility. to properly organize and link any health facility organizational structure with the data model, medical sections in the health facility should be identified and integrated with the corresponding master medical sections in the master layer (l1). this step provides a mechanism to manage subsets of data, for instance the laboratory data for a specific disease. in addition, the medical section in the health facility will inherit a full set of concerning ndcs and the health facility will be able to exchange data with the model. j. mini-electronic medical record (mini-emr) in this section, we introduce a minimized version of the electronic medical record (mini-emr) to address the problems of data collection in health facilities which occur between medical and management staff, who are mutually responsible for recording and reporting health information and statistics in the health facilities. while management personnel experience problems due to incomplete data, missing data, reporting delays, and unavailability in some cases, the medical staff on the other hand, prioritize service provision instead of data management, which as a consequence, affect their data recording commitments, in addition to other challenges such as: lack of medical staff, training skills, and a preference for using the data recording time to provide health care to waiting patients. apparently achieving a balance between the provision of healthcare and strengthening of the health information system is a difficult trade-off and an intermediate solution should be introduced with continuous optimization. medical staff are obligated to record clinical information while management staff are responsible for aggregate medical and management information, and from that perspective, a simplified and structured medical patient record with a data flow model can enhance the data collection process by reducing the number of fields in the emr to create a smaller version. the miniemr is structured into two sections: the first contains the patient’s personal information and management data and the second contains the medical data. in addition to a simple process flow between the two sections, the mini-emr addresses and adopts the model organization and criteria by incorporating data links and interfaces for the data aggregation process such as dates, the medical sections, and the ndcs. an example of a mini-emr could be introduced as follows: mini-emr data structure management section i. patient name ii. gender iii. age iv. id (optional) v. date vi. health facility id medical section i. medical section name (clinic, laboratory, pharmacy, etc.) ii. ndc iii. description (clinical data, test name, medicine name, etc.) iv. result (diagnosis, laboratory result, drug dose, etc.) data flow management i. create mini-emr ii. record management section data iii. send to medical section medical i. receive new mini-emr ii. record medical section data iii. send for management and aggregation the minimized emr can be used as a starting point for implementing a full and comprehensive electronic medical record in the future or according to the development of the health facilities toward full ict adoption in the health information system; however, the proposed structure is used to enable data aggregation from health facilities (dal4) while at the same time reducing the data collection problems between medical and management staff and introducing the data sharing mechanism inside the health facility in compliance with the general data model requirements. data aggregation in the health facility layer (dal4) uses the medical section data to aggregate the data into the data template row (dtr) by transforming and aggregating the patient records into a disease aggregation record using mini-emr parameters such as date, ndc, gender, and results. k. data configuration in order to start the data collection process, the system should be initialized and configured. the configuration process starts by determining the target collection period, for instance, annual collection, which constrains the model by start and end dates. the next step is selecting the required data template from the templates list, which identifies which disease is targeted and in which medical section; for example, 9 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions the objective may be to target malaria disease test results in laboratories annually. the model generates data template rows for all dates for the collection period, and default values are initialized for quantitative attributes, in addition to performance indicators such as the attendance status. the data template, at this point, was created in the master layer (l1) and instantiated subsequently in layers 2 (states), layer 3 (localities), and layer 4 (health facilities) to produce instances of datasets. l. data transaction datasets are collected from health facilities according to referenced data template. each registered health facility uses a combination of a health facility identifier and reporting date to authenticate its reporting data. the complexity of the model structure is simplified in the health facilities by mapping the registration id to the entire structure (see figure 3), and the main focus of the health facility is to accurately collect and report the required medical data. the data aggregator in the fourth level (dal4) is responsible for aggregating (summation, counts) the health facility’s data by using date attributes. the process is organized as follows: • collect and aggregate data in the health facility (dal4) daily. • use the date key to match data to the dtr and the health registration id to identify the health facility. • identify missing and incomplete data. • correct data errors and assign default values if needed. • translate the result type to match the template codes. • provide the results to the user and ask for confirmation. • update the dtr result data and attendance status. m. data aggregation in organizational layers ready datasets from health facilities are used to generate upper layers’ datasets, in particular, three additional data aggregators (das) are generated as follows: • data is aggregated from all health facilities in a locality and the attendance performance is assigned in the localities layer (dal3). • data is aggregated from all localities in a state and the attendance performance is assigned in the states layer (dal2). • data is aggregated from all states and the attendance performance is assigned in the country layer (dal1). at this point, the model provides a structured dataset for a specific disease with the ability to verify, validate, and trace the data to the data sources; moreover, the datasets are represented in multiple dimensions, which are: • geographical distribution (space dimension): provided by the layered structure; • periodical distribution (time dimension): provided by articulating the data template by date ranges; • disease-specific dimension: provided by standardizing and linking the reported data using the ndc; • gender distribution: provided by structuring the dataset values by gender; • age distribution: provided by using the age group categorization for datasets. • fig. 3: data transaction legend: dtr: data template row v. discussion developing countries -including sudanare making significant steps in adapting ict in governance; however, many challenges are still outstanding, which can be found in the infrastructure, cost, and capacities. the health sector always has the dilemma of assigning the budget between health care provision and development to improve data collection, quality, and decision making. our model provides an optimization, in that regard, by addressing these challenges with respect to data preparation issues as well. the model provides a comprehensive solution for health data collection in the country with respect to the global consensus and guides of health information systems, for example managing the disease code system and structuring records for health facilities and patient data. many approaches to national data management focus on patient records like the aadhaar card in india and côte d’ivoire [25]; however, these models which focus on patient records rather than the health service providers – the health facilities – may suffer from data analysis problems if it is necessary to reflect the country’s geo-political status, because of instability of the moving human patient versus the fixed health facilities, which can have a significant impact on health planning, resource management, and decision making. the ability to manage diseases, using the ndc for example, can help decision makers and health programmes in a country, by providing a mechanism to monitor all diseases in one national data warehouse with the ability to split datasets or even entire data marts at the same time as focusing on a special disease or health programme. another contribution of the proposed model, it combines different analysis properties early in the model design; for instance, the timeline, gender, and group ages. by considering such important parameters in the data gathering phase, the analysis process can be enhanced with rich datasets that could be analysed with practical health indicators combined with geographical distribution. in addition, the model is prepared to manage historical and archival data using dates and attendance data, which can improve decision quality and data-pattern discoveries. as many researchers point to the need for data quality in big data management [1,15], developing data models that can adopt data quality measures and practices to strengthen health information systems is becoming an outstanding challenge. in this regard, the model provides a mechanism to adopt such a a multi-layered data preparation model for health information in sudan 10 international journal on advances in ict for emerging regions december 2020 recommendation in data preparation by distribution of the workload at different levels, in particular the etl process, which can improve the processing speed, data quality, and error correction. application of the model is flexible from the data exchange point of view. the data can be transferred using different methods including online and offline connections, depending on the availability of networks and communications capabilities in the country. configuration of data templates with dates can allocate data on arrival even in the situation of unavailable connectivity, and thus, overcome the necessity for a pre-existence of communication for immediate data exchange, which can offer an aspect of cost reduction. however, in good economic situations, the model can be further improved using fast online networks without the need to make significant changes. -model implementation implementation of the proposed model should systematically follow a top-down setup approach. first, the health authority in the country is requested to provide the required framework, policies, and procedures in order to set up the health information system for the country, for example sudan. setting up the system includes, definition of the geopolitical structure of the country by dividing the country into matching states and localities. a tree view hierarchy will start to emerge, which provides a visual layout of the upcoming data and represents geographical areas with political governance. secondly, the health authority should keep a national registration records for health facilities in the country, and each health facility should be allocated using the geo-political organization that was defined previously. the allocation should determine the geographical position of the health facility inside the locality layer and state layer respectively. this step will enable health managers to visualize the health capacity and provision status in the structured area. not only is this useful to organize health data; it will also provide the benefit of giving insights about health resource allocation, gaps, and needs. at this point, the framework was implemented with four layers: country (fmoh), states, localities, and health facilities. at the bottom the data sources (health facilities) were identified and defined as data collection points; moving up the hierarchy, three data aggregation levels were identified. moreover, data links, flow, interfaces, and traces paths were clearly visible. the third step is to define a standard disease coding system, in particular, the ndc. the health authority in the country should create and code a unique list of diseases in the country in order to create a foundation for data links for health data. the ndc will be the reference key that will be instantiated, shared, and inherited between all stakeholders in the system. master medical sections should be created to match different medical sections in health facilities, for example, laboratory, medicine, and clinics (disease diagnosis). of course, health management has specialized administration and subsections that manage each medical domain; however, a virtual representation of theses sections will simplify data classification in the upcoming data management activities; moreover, it will map the actual medical section with the health facility directly to the collection model and hide the actual complex distribution of the real medical administration. in the fourth step, to start using the system, a data template should be created at the health authority level (layer 1). the data template creation will define what disease is targeted by using ndc for the data collection process, and this will automatically identify the targeted medical sections which is previously configured in the master medical section records; in addition, the data template will contain the time duration of the targeted data, empty result fields, and data attendance status flags. data is structured in the data template to represent different data criteria such as patient’s age intervals, gender, and result transformation references. the creation of a data template enables health administrators to define their expectations of the model output in addition to data rules and constraints. the quality management measures can be created and applied, and in addition, the data template illustrates (layer 1) the expected output shape of the data. in the second layer (the state layer), the data template should be instantiated by the number of states. in the context of sudan, an 18 layers (l 2) data templates will be created. each targets one state. in the same context, each layer 2 data template (state template) will be further instantiated by the number of localities in the designated state, and thus the layer 3 data templates are created (locality layer). each health facility should obtain a copy of the descendant layer 3 data template, which defines the data required from the health facility and its format and constraints. the data template for the health facility is linked with the health facilities register and can be used as an authentication mechanism to identify the source of the data. at this point, the data templates in the model are created and sent to data sources, to be filled by health data. adoption of the system depends on the available ict infrastructure and available resources, in order to work with optimal performance. at the health facility, a daily aggregation of the data for a specific disease will form the output for a specific record in the health facility’s data template – the dtr. however, it can be flexible in order to accept many forms of collection and aggregation mechanisms for health data depending on the currently available ict resources at each health facility. for instance, the following scenarios may occur: • the health facility has an electronic his and the system can be directly integrated with the dtr; • the health facility adopts the mini-emr to collect and aggregate data and upload it to the dtr; • the health facility uses manual records, performs manual aggregation, and enters the output in the dtr; • the health facility has no ict resources and manually aggregated data is provided to the locality layer; • the locality layer has no ict resources and manually aggregated data is provided to the state layer; • the state layer has no ict resources and manually aggregated data is provided to the country layer. the data is exchanged between multiple layers in bidirectional mode. on the creation of the data template in the first layer (l1), instances of the state layer are created and sent down the hierarchy to the states (l2). in the same fashion, instances of data in each l2 are created and sent to l3. each locality creates and sends data templates to all health facilities in that locality. the exchange of data can benefit from the available network connectivity to automate 11 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions the data transfer and provide online data exchange; however, in the case of unavailability of network coverage, the instances are created in the corresponding layer and maintained in that layer. in addition, in the case of disconnection of the network, the system can operate in offline mode and should provide a synchronization mechanism to upload or download the data when restoring the connection. depending on the available resources and technologies, the system implementation can adopt many other forms of data exchange, for example, cloud computing and shared storage; however, in limited-resources environments, additional extracting and loading tools could be developed; for instance, a tool can be provided to extract and upload the template data stored in spreadsheets or other file formats. data quality management should be considered in the design of the data templates in the first layer, and quality rules and constraints should also be defined and equipped with the data template. this will enable the system administrators to implement shared quality policies early, to minimize the burden of the data preparation load. when collecting the data from health facilities, translation rules should be applied so that the results match the required output format of the model. next, the data aggregator is implemented inside the health facility – the data aggregator layer 4 (dal4) – and the quality rules are checked. in the case of data errors, the system should provide a repair mechanism or report the errors to the user for correction. the system will continue to aggregate the data in a backward (folding) fashion from localities (dal3: from all health facilities in the locality) to states (dal2: from all localities to states) to country level (dal1: from all states to the health authority, fmoh). the aggregated data in all levels should maintain quality status measures which report the level of availability (attendance), accuracy, and error ratios in order to determine the degree of reliability of the collected data. after the process has been completed, the model will provide hierarchal datasets that match the organizational structure (geo-political), which will be ready for analysis and usage. datasets that will be produced from the data model can have multiple usages and advantages. for instance, they can show the status of specific disease in the country, it can be used to analyse specific interventions for a disease, and moreover, it can enhance the allocation and management of resources and of course increase the visibility of the decisionmaking process. the implementation of mdacm can have many shapes depending on the ict resources available and upgrades, moreover, a phase-based implementation can take place without the need for significant modification of the data model. in addition, the model can also operate in a limitedresources environment; however, the model will continue use and enhance the quality measures of data in the health information system and ensure proper data representation of the health status in the country by considering the ict resources and availability – even if limited – as an interim process toward health information system optimization rather than as an impossible obstacle. the implementation of electronic health information systems at national scale is directly affected by the country’s financial resources and capacities, which leads to an observable distinction between lower and higher income countries. as a consequence, many system adoption models [27] have been introduced to systematically advance the process of health system automation in developed and developing countries. one solution to minimize the time delay in this process is to use flexible data models, like the one which is introduced in this article. besides providing a structured mechanism toward the adoption of e-health in developing countries like sudan, the model contributes by facilitating the transformation process between medical patient records and aggregated data: while the former are required to manage patients’ medical treatment and are used by medical and management staff in health facilities, the aggregated data is mainly used for public health management, resource management, and decision-making processes. our model provides a solution to enhance health information system by: (1) structuring an effective e-health system at country scale, (2) it provides a mechanism to transform patient medical data into aggregated data for upper management, (3) the model facilitates the evolution of an adoption mechanism for an e-health system with consideration of the implementation cost, infrastructure, and capacities of lower income countries, (4) it provides a starting point for data quality management to enhance the quality of collected data in the health system in the country, and lastly,(5) it provides a mechanism to comply with international health recommendations like standardization of disease codes and profiling health facilities in the country[24]. -example the example demonstrates the applicability of the proposed data model. the simulation uses mcdam for collecting and analysing the data for the large malaria outbreak in sudan during 01-jan-2019 to 31-mar-2019. the model targets all states and is based on collected laboratory medical results. first a new data template is created and template instances are generated for all states. this is done in the first layer (l1) (see fig.4). the empty data templates are then subinstantiated in the next layer (l2, l3 and l4). the actual data collection process starts from the health facilities, using the proposed mini-emr model (see fig.5), where the data is aggregated daily for each health facility (see fig. 6). after collecting the data from the health facilities, it is aggregated in a similar way from the health facilities in the locality layer (fig. 7), thereafter in the state layer (fig. 8) and finally in the master layer (fig. 9). in this simulation, a data status field is attached representing the data attendance. the successful data collection for all specified period is indicated by a “present” status. the contextual, partial collection is indicated by “partial completed” stating also the last date of the data collection (data date). finally, absence of data is designated an “absent” status. vi. conclusions in the article, we tackled the manual data collection problem in the health sector in sudan. many challenges were discussed including disease coding models, data collection processes, data preparation, and data quality. the proposed model provides a mechanism to collect health data in a multilayered fashion that represents the geo-political layout of the country and thus enables data compilation with similar governing structures that were found at national levels. moreover, it provides an efficient and structured data a multi-layered data preparation model for health information in sudan 12 international journal on advances in ict for emerging regions december 2020 fig. 4: data template for malaria disease for all states in sudan from 01/01/2013 to 31/012019 fig. 5: use mini-emr to collect data for malaria disease khartoum state-bahry localitybahri teaching hospital in sudan for 01/03/2019 fig. 6: aggregated data for malaria disease khartoum state-bahry localitybahri teaching hospital in sudan from 01/03/2019 to 12/03/2019 fig. 7 collected data template for malaria disease khartoum state-bahry locality in sudan from 01/01/2019 to 31/03/2019 13 m. ahmed#1, l. ekenbereg#2 december 2020 international journal on advances in ict for emerging regions fig. 8 collected data template for malaria disease khartoum state in sudan from 01/01/2019 to 31/03/2019 fig. 9 collected data template for malaria disease for selected states in sudan from 01/01/2019 to 31/03/2019 collection method with implementation of data quality and preparation measures to minimize the processing efforts, cost, and time consumption in this regard. the concept of measuring the attendance data is used to detect the status of the data early on and thus to improve data analytics and decision making. references [1] r. dubey, a. gunasekaran, s. j. childe, s.f .wamba, t. papadopoulos, “the impact of big data on world-class sustainable manufacturing”, vol.84, pp 631-645, 2016. [2] g. walt, j. shiffman, h. schneider, s. f. murray, r. brugha, l. gilson, “doing health policy analysis: methodological and conceptual reflections and challenges, health policy and planning”, vol. 23, 2008. [3] h. busse, e. a. aboneh and g. tefera, “learning from developing countries in strengthening health systems: an evaluation of personal and professional impact among global health volunteers at addis ababa university’s tikur anbessa specialized hospital (ethiopia)”, 2014. [4] d. j. casley, d. a. lury, “data collection in developing countries”, 1980. [5] a. skoogh, anders and j. björn, “time-consumption analysis of input data activities in discrete event simulation project”, vol.1, 2007. [6] r. ahmed, r. robinson, a. elsony, r. thomson, s. b. squire, r. malmborg, p. burney, … k. mortimer, “a comparison of smartphone and paper data-collection tools in the burden of obstructive lung disease (bold) study in gezira state, sudan. plos one, 2018 [7] m. n. sarkies, k. a. bowles, e. h. skinner, d. mitchell, r. haas, m. ho, k. salter, k. may, d. markham, l. o'brien, s. plumb, t. p. haines, “data collection methods in health services research: hospital length of stay and discharge destination. applied clinical informatics”, vol. 6(1), pp 96-109, 2015. [8] t.c. redman, data quality for the information age, norwood-usa ma: artech house, 1996. [9] w.w. eckerson, “data quality and the bottom line: achieving business success through a commitment to high quality data”,2002 [10] trifacta, global organizations wasting billions of dollars on data preparation. [online]. available : https://globenewswire.com/newsrelease/2018/05/17/1508217/0/en/global-organizations-wastingbillions-of-dollars-on-data-preparation.html, accessed january 2019 [11] s. d. yawson, g. ellingsen, “assessing and improving ehrs data quality through a socio-technical approach”, in procedia computer science, vol. 98, pp 243-250, 2016, [12] e. rahm, h. h. do, “data cleaning: problems and current approaches”, 2000. [online]. available: http://dbs.uni-leipzig.de [13] s. zhang, c. zhang, “data preparation for data mining”, vol. 17, pp 375-381, 2003 [14] kwak, s. k., & kim, j. h. statistical data preparation: management of missing values and outliers. korean journal of anesthesiology, 70(4), 407–411. doi:10.4097/kjae.2017.70.4.407,2017 [15] aguinis, h., hill, n. s., & bailey, j. r. best practices in data collection and preparation: recommendations for reviewers, editors, and authors. organizational research methods. https://doi.org/10.1177/1094428119836485,2019 [16] s. kanchi, s. sandilya, s. ramkrishna, s. manjrekar and a. vhadgar, “challenges and solutions in big data management”, 3rd international conference on future internet of things and cloud, rome, pp. 418–426, 2015 [17] l. omran, v.c. storey and r. y. wang, “systems approaches to improve data quality”, 1995 [18] . kristian, r. filipe,c. p. francisco, mobility patterns, big data and transport analytics, pages 73-106, isbn 9780128129708,https://doi.org/10.1016/b978-0-12-812970-8.000051, 2019 [19] heinrich, bernd, kaiser, marcus and klier, mathias, “how to measure data quality? a metric-based approach”, 28th international https://globenewswire.com/news-release/2018/05/17/1508217/0/en/global-organizations-wasting-billions-of-dollars-on-data-preparation.html https://globenewswire.com/news-release/2018/05/17/1508217/0/en/global-organizations-wasting-billions-of-dollars-on-data-preparation.html https://globenewswire.com/news-release/2018/05/17/1508217/0/en/global-organizations-wasting-billions-of-dollars-on-data-preparation.html a multi-layered data preparation model for health information in sudan 14 international journal on advances in ict for emerging regions december 2020 conference on information systems (icis), queen's university montreal, canada, 2007. [20] c. cappiello, c. cerletti, c. fratto, and b. pernici, “validating data quality actions in scoring processes”, journal of data and information quality, vol. 9, 2018. [21] m. d. angeles and g. u. francisco, “a data quality practical approach”, vol., pp 259-274, 2009. [22] svetlozar n., j. boris, j. nenad, s. abhishek, r. sippo rossi,, generating insights through data preparation, visualization, and analysis: framework for combining clustering and data visualization techniques for low-cardinality sequential data, vol 125, https://doi.org/10.1016/j.dss.2019.113119.2019 [23] konstantinou n., paton n. w. paton. feedback driven improvement of data preparation pipelines, 2019 [24] (2017) world health organization, health systems. [online]. available: http://www.who.int/topics/health_systems/en/ https://doi.org/10.1016/j.is.2019.101480. [25] r. wyber, s. vaillancourt, w. perry, p. mannava, t. folaranmi and l.a. celi, “big data in global health: improving health in lowand middle-income countries”, vol. 93, pp 203-208, 2015 [26] world health organization, health metrics network framework and standards for country health information systems, january 2008. [online]. available: https://www.who.int/healthinfo/country_monitoring_evaluation/whohmn-framework-standards-chi.pdf [27] (2012) world health organization, management of patient information trends and challenges in member states. [online]. available: https://apps.who.int/iris/bitstream/handle/10665/76794/97892415046 45_eng.pdf;jsessionid=a98788c197b17146b2c795e57d4c89cb?s equence=1 [28] j.t. bram, b. warwick-clark, e. obeysekare, and k. mehta, “utilization and monetization of healthcare data in developing countries”, pp. 59–66, doi: 10.1089/big.2014.0053,2 015. [29] (2017) human rights & health equity office, guide to demographic data collection in health-care settings, sinai health system. [online]. available: http://torontohealthequity.ca/wpcontent/uploads/2017/10/measuring-health-equity-guide-todemographic-data-collection.pdf [30] a. e. monge, “matching algorithms within a duplicate detection system”, 2000. [31] clinical data classification. [online]. available: http://guides.lib.uw.edu/hsl/data/findclin [32] r. pelánek, j. rˇihák, j. papoušek, “impact of data collection on interpretation and evaluation of student models”, he sixth international conference on learning analytics & knowledge, pp. 40-49,2015. [33] (2007)the world bank, the world bank strategy for hnp results annex. [online]. available: http://documents.worldbank.org/curated/en/102281468140385647/he althy-development-the-world-bank-strategy-for-health-nutritionpopulation-results [34] k. bhalla, j.e. harrison, s. shahraz, l.a. fingerhut, “availability and quality of cause-of-death data for estimating the global burden of injuries”, vol. 88, pp. 831-838, 2010. [35] a.d. black, j. car, c. pagliari, c. anandan, k. cresswell, t. bokun, “the impact of ehealth on the quality and safety of health care: a systematic overview”, plos med 8(1): e1000387. https://doi.org/10.1371/journal.pmed.1000387, 2011. [36] a.e. powell, h.t.o. davies, r.g. thomson, “using routine comparative data to assess the quality of health care: understanding and avoiding common pitfalls”, bmj quality & safety, 12:122–128, 2003. [37] j. thuma, “practical approaches to data quality management in business intelligence and performance management”, 2009. [38] world health organization, creating a master health facility list. [online]. available: https://www.who.int/healthinfo/systems/who_creatingmfl_draft.p df [39] m. c. azubuike and j. ehiri, “health information systems in developing countries: benefits, problems, and prospects”. the journal of the royal society for the promotion of health, vol. 119, 1999. [40] a. m. abd-alrhman, l. ekenberg, “modelling health information systems during catastrophic events – a disaster management system in sudan”, 2017 ist-africa week conference (ist-africa), windhoek, pp. 1-9. doi: 10.23919/istafrica.2017.8102390, 2017. https://apps.who.int/iris/bitstream/handle/10665/76794/9789241504645_eng.pdf;jsessionid=a98788c197b17146b2c795e57d4c89cb?sequence=1 https://apps.who.int/iris/bitstream/handle/10665/76794/9789241504645_eng.pdf;jsessionid=a98788c197b17146b2c795e57d4c89cb?sequence=1 https://apps.who.int/iris/bitstream/handle/10665/76794/9789241504645_eng.pdf;jsessionid=a98788c197b17146b2c795e57d4c89cb?sequence=1 https://doi.org/10.1371/journal.pmed.1000387 revised: 11 september 2008; accepted: 21 july 2008 abstract: although research indicates e-commerce offers viable and practical solutions for organizations to meet challenges of a predominantly changing environment, the few available studies related to smes in developing countries reveal a delay or failure of smes in adopting ict and e-commerce technologies. the various factors identified as causes for the reticence can be broadly classified as internal barriers and external barriers. this paper presents a model for barriers to adoption of ict and e-commerce based on the results of an exploratory pilot study, survey and interviews of sme intermediary organization. it identifies support for smes in sri lanka with regard to ict and e-commerce. it also determines a strong need for necessary support and discusses the availability of the support. keywords: e-commerce, sme, adoption, developing countries, barriers, support. introduction developing countries have the potential to achieve rapid and sustainable economic and social development by building an economy based on an ict-enabled and networked sme (small and medium-sized enterprise) sector, capable of applying affordable yet effective ict solutions [1]. it is accepted that e-commerce contributes to the advancement of sme business in developing countries [2]. with the development of ict and the shift to a knowledge-based economy, e-transformation and the introduction of ict is becoming an increasingly important tool for smes both to reinvigorate corporate management and promote growth of the national economy[1]. e-commerce technologies facilitate organizations to improve their business processes and communications, both within the organization and with external trading partners [3]. however, the adoption of ict and e-commerce in developing countries has fallen below expectations [2], as they face unique and significant challenges in adopting ict and e-commerce [4]. nevertheless, it is imperative for smes to adopt e-commerce technologies to survive in intense competitive national and global markets. the sme sector plays a significant role in its contribution to the national economy in terms of the wealth created and the number of people employed [5]. forging ahead smes need to accept the challenges, availability of e-commerce support for smes in developing countries mahesha kapurubandara1*, robyn lawson2 1, 2 locked bag 1797, penrith south dc, nsw 1797, australia mahesha@scm.uws.edu.au, r.lawson@uws.edu.au including the barriers as they move towards successful adoption of available technologies while raising awareness of relevant support activities and preserving limited available resources to avoid severe repercussions from costly mistakes. this paper contributes to the ability to understand factors that inhibit ict and e-commerce adoption in smes in sri lanka, a developing country on its way to an e-society. believing that research findings from sri lanka will prove to be useful for other developing countries it explores how best the barriers could be overcome by way of support activities. the paper first outlines current research into adoption in developing countries, discussing models for adoption by previous research and presents a framework established for use with this research. the research methodology and the results are subsequently discussed. theoretical framework smes in sri lanka smes everywhere play a critical role in economic development, and sri lanka is no exception. many countries use different parameters to define smes by referring to: number of employees, amount of capital invested or amount of turnover [6]. in sri lanka a clear definition of a sme is absent with government agencies using different criteria to define smes [6, 7]. the national development bank (ndb), the export development board (edb), and industrial development board (idb) use value of fixed assets as the criterion, whereas the department of census and statistics (dcs), small and medium enterprise development (smed), and the federation of chambers of commerce and industry (fdcci) use number of employees as criteria [7]. following the world bank definition, for this study we consider enterprises with 10-250 employees as smes [6]. the 2004 mission statement of the international labour organization (ilo) reported that 75% of sri lanka’s labour force was employed in the sme sector depicting smes’ contribution towards employment and income generation. the domestic market is the main outlet for smes. smes are also sub-contracted to large exporters with larger entrepreneurs coordinating direct exports as is seen with coir based products, wood, handicrafts, the international journal on advances in ict for emerging regions 2008 01 (01) : 3 11 * corresponding author ic te r plants and foliage. if sri lanka wishes to ride high on the electronic highway it should provide sri lankan smes ‘a ramp to the digital highway’ and stimulate e-commerce. this is supported by the government’s e-sri lanka vision, championed by the information and communication technology agency of sri lanka (icta), aiming to harness ict as a lever for economic and social advancement. barriers to ict and e-commerce adoption by smes developing countries face insurmountable barriers getting on to the electronic highway. yet, it is encouraging to note existing research to identify barriers in a variety of factors grouped into several categories. a number of authors [8, 9] group such factors into three major categories: owner/manager characteristics, firm characteristics, costs and return on investment. support for smes to adopt e-commerce technologies, demand consideration for each of these categories. while diversity among owner/managers and the decision makers for smes, reflects on a number of factors towards adoption of e-commerce technologies it can be concluded that factors affecting adoption relate to owner/manager characteristics. a significant factor here is little or no knowledge, firstly of the technologies, and secondly of the benefits from such technologies. this is a major barrier to the take up of e-commerce [10], lack of knowledge on how to use the technology and low computer literacy, mistrust of the it industry and lack of time also hinders the adoption. sme owners, concerned about a return on their investments, are reluctant to make substantial investments particularly since shortterm returns are not guaranteed [11]. other factors, such as the current level of technology usage within the organization related to the characteristics of the organization, also affect adoption of e-commerce [10]. the organization for economic co-operation and development (oecd (1998) has identified that: lack of awareness; uncertainty about the benefits of electronic commerce; concerns about lack of human resources and skills; set-up costs and pricing issues; and, concerns about security as the most significant barriers to e-commerce for smes in oecd countries. low use of e-commerce by customers and suppliers, concerns about security, concerns about legal and liability aspects, high costs of development, limited knowledge of e-commerce models and methodologies, and unconvincing benefits to the company are among other factors [12]. smes definitely have limited resources (financial, time, personnel). this “resource poverty” has an effect on adoption, as they cannot afford to experiment with technologies and make expensive mistakes [13]. barriers to e-commerce in developing countries if governments believe that e-commerce can foster economic development it is necessary to identify inherent differences in developing countries with diverse economic, political, and cultural backgrounds to understand the process of technology adoption [9]. sme studies of e-commerce issues in developed countries [14 16] indicate issues faced by smes in developed countries can be totally different. organizations adopting ict and e-commerce in developing countries face problems like: lack of telecommunications infrastructure, lack of qualified staff to develop and support e-commerce sites, lack of skills among consumers needed in order to use the internet, lack of timely and reliable systems for the delivery of physical goods, low bank account and credit card penetration, low income, and low computer and internet penetration [4, 17, 18]. lack of telecommunications infrastructure includes poor internet connectivity, lack of fixed telephone lines for end user dial-up access, and the underdeveloped state of internet service providers. disregard for e-commerce is not surprising where shopping, a social activity in sri lanka, recognizes face-to-face contacts as important. distrust of what businesses do with personal and credit card information in countries where there may be good justification for such distrust, could become a serious obstacle to ecommerce growth [18, 19]. absence of legal and regulatory systems inhibits development of e-commerce in developing countries. a study of sme adoption of e-commerce in south africa found that adoption is heavily influenced by factors within the organization [12]. lack of access to computers, software/hardware, affordable telecommunications, low e-commerce use by supply chain partners; concerns with security and legal issues; low knowledge level of management and employees; and unclear benefits from e-commerce were found to be major factors that inhibit adoption. similar study in china found that limited diffusion of computers, high cost of internet and lack of online payment processes directly inhibit e-commerce. inadequate transportation and delivery networks, limited availability of banking services, and uncertain taxation rules indirectly inhibit e-commerce adoption. a study in egypt [20] found main contributory factors to non-adoption include: awareness and education, market size, e-commerce infrastructure, telecommunications infrastructure, financial infrastructure, the legal system, the government’s role, pricing structures, and social and psychological factors. a comparison of two studies in argentina and egypt suggests key factors affecting e-commerce adoption in developing countries are: awareness, telecommunication infrastructure, and cost. the internet and e-commerce issues of smes in samoa are consistent with the studies conducted in other developing countries [21]. studies in sri lanka revealed inhibiting factors as: lack of knowledge and awareness about benefits of e-commerce, current un-preparedness of smes to adopt e-commerce as a serious business concept, insufficient exposure to it products and services, language barriers and lack of staff with it capability [7]. web-based selling was not seen as practical as there is limited use of internet banking and web portals, as well as inadequate telecommunications mahesha kapurubandara, robyn lawson 4 october 2008 the international journal on advances in ict for emerging regions 01 (01) ic te r infrastructure [7]. thus, available literature reveals significant factors dealing with internal and external barriers that can be grouped to develop a framework for investigations affecting adoption of e-commerce technologies. internal barriers: smes can control internal factors categorising them into: individual (owner/ manager), organizational and barriers related to cost or return on investments. external barriers: those that cannot be resolved by the sme organization and are compelled work within the constraints. inadequate telecommunication infrastructure and legal and regulatory framework are examples of external barriers. these could be further subdivided into: infrastructure related, political, social and cultural and legal. some external barriers could be addressed by clustering sharing expenses, resources and facilities. figure 1: conceptual model barriers to adoption research methodology empirical research in this area being limited, an exploratory investigation utilizing qualitative and quantitative evidence was considered most suitable. the research centred round smes in colombo district with the highest density of companies using ict. colombo district was the base for investigations with sme selection necessitating employee strength of 10–250 employees; not totally immature but somewhat versatile in the use of ict and e-commerce. the study was conducted in two stages preliminary pilot interviews, a survey, and interviews with sme intermediary support organizations according to mingers[22], the use of such multiple methods is widely accepted as providing increased richness and validity to research results, and better reflects the multidimensional nature of complex real-world problems. besides, a multimethod approach allows for the combination of benefits of both qualitative and quantitative methods, and permits empirical observations to guide and improve the survey stage of the research [23, 24]. the preliminary pilot interviews brought in barriers imperative to smes with the model (figure 1) and the survey instrument, forming outcome from interviews and observations supported by an extensive literature review. the survey and interviews with intermediary support organizations followed. face-toface interviews were semi-structured to gather qualitative empirical data and provide flexibility [25] as they allow researchers to explore issues raised by respondents, generally not possible through questionnaires or telephone interviews. the research was carried out in three stages. stage 1: was the pilot exploratory study withsmes, stage 2: survey of the sme organisation using a questionnaire and stage 3: interviews with intermediary sme organisations. the research approach has been discussed elsewhere [26] and therefore it suffices to discuss the results in the following section. results discussion barriers: pilot interviews: a majority (88%) of respondents ranked lack of awareness the most significant barrier.this can be attributed to the fact that the majority of owner/managers, described themselves as basically computer literate. knowledge of available technologies or suitability for effective use towards improved productivity for benefits was negligible. they appear to be confused with choices in software/hardware. computers, were underutilized with adhoc purchases and isolated implementation shadowing ict strategy; a major concern, being decision makers. next, was the cost of internet, equipment and e-commerce implementation. inadequate telecom infrastructure chosen by 83% was the third most frequently cited barrier chosen by the more advanced in usage of ict using e-mail and internet, more likely to have experienced problems. unstable economy, political uncertainty, lack of time, channel conflict, lack of information about e-commerce and lack of access to expert help, was cited as barriers by 70% of respondents. analysis of survey data: more than 75% of the respondents (96% males and 4% females) were either professionally qualified or graduates. of the tables produced below, table 1 identifies the top 6 internal barriers of 9 listed. table 2 shows external barriers, divided into cultural, infrastructure, political, social, and legal and regulatory. tables 4 and 5 illustrate internal and external support needed. analysis of survey results reveal that lack of skills, lack of awareness of benefits and return on investments prevent smes from adopting ict and e-commerce technologies, reinforced by “awareness and education” ranking top for support by nearly 90% of the respondents, not surprising for a developing country like sri lanka trying to implement technologies. it reflects on other internal barriers too and awareness and education can, to a great extent, counter this barrier. since use of ict in sri lanka is low, e5 availability of e-commerce support for smes in developing countries the international journal on advances in ict for emerging regions 01 (01) october 2008 ic te r internal barriers mean std n % staff lack required skills 3.88 1.35 120 66.6 security concerns with payments over the 3.64 1.28 118 66.9 internet e-commerce cannot give a financial gain 3.64 1.24 108 62.0 n = number of organisations commerce faces inhibition and does not suit business transactions. “lack of popularity in online marketing” and “low internet penetration” rate high in the list of external barriers. improving ict diffusion in sri lanka can address this problem. ‘inadequate infrastructure’ impedes smes as reinforced by their request for “improvement of national infrastructure” raking very high on the support needed. smes in sri lanka are adversely affected by the high cost and unreliable service of infrastructure services such as electricity and telecommunications. the steps taken by the government to improve telecommunication facilities breaking telecom monopoly is noteworthy. policy inertia and the lack of legal and regulatory framework also rank high and enforce constraints on smes. policy reforms introduced by governments support the large export-oriented foreign direct investments leaving smes with ad-hoc policy prescriptions and weak institutional support [27]. the government’s role in an overly bureaucratic regulatory system results in delays in its deliberations and is extremely costly [27]. appropriate legal and regulatory framework would ensure that smes operate on a level playing field. social barriers come next. a one-stop shop facility helps smes access information, technology, markets, and the much needed credit facilities. this concept, implemented for export-oriented foreign direct investments (eofdi) by the board of investments (boi) found it to be successful. being policy makers working towards progress of smes, senior management lacking in ict knowledge is identified as an important constraint directly impacting operational efficiency of smes. awareness building and education with regard to ict and technologies would help to alleviate this problem. government, academia, and industry sectors can take leadership roles in promotion of ict by conducting awareness and training programs, technical and non-technical catering to the needs of smes at grass-roots level. smes place a very heavy reliance on external advice and support. such support and advice seem unavailable. perceptions of the sme intermediaries: the intermediaries, with a consensus for awareness building programs at national level agree lack of awareness and lack of skills are major barriers for smes to adopt technologies. training programs; workshops and seminars conducted in the local language need to be especially designed for smes at grass roots level. absence of a “one-stop shop” for advice and support is de-motivating and affects smes. it is fundamental to educate senior management of government organizations prior to providing support for smes with ict and e-commerce. smes need not only ict technologies but also quality control and standards. inter-institutional coordination, staff development, and institutional capacity are also vital. much effort seems replicated and wasted with public sector, private and non-governmental sme intermediary organizations working in isolation. the government is best equipped to reach rural smes at grass roots level. tapping and utilizing all available strengths in a more coordinated manner would prove much more productive. analysis of barriers and support this section discusses the extent to which the barriers are addressed by the support provided by the sme intermediary organization. the table 5 and table 6 given below illustrate the barriers that ranked as most significant, table 7 identifies the support required to alleviate the barriers and table 8 indicates whether support is available by the sme intermediary organisations. barriers the internal barrier “employees lacking the required skills are ranked the highest in the list”. the interviews with the intermediary organizations reveal that this barrier is addressed only partially. while they admit that the smes need skill training from the grass root level, they are not in a position to deliver that support as they do not have the resources nor the mechanism to address that barrier. “security concerns with payments” ranked next on the list. support is not available from the intermediaries and they are not in a position to provide any support in this regard. the next two barriers on the list can be addressed with awareness building programs. the intermediaries do not seem to be addressing that fully. mahesha kapurubandara, robyn lawson 6 october 2008 the international journal on advances in ict for emerging regions 01 (01) table 1: internal barriers to using or extending use ict & e-commerce ic te r internal support mean std n % awareness and education 3.91 .87 132 79.9 guidance in overcoming risks associated with 3.86 .92 129 78.0 implem entation guidelines for appropriate hardware and software 3.78 .88 134 72.4 advice and direction for ict and e-commerce 3.70 .91 135 70.4 n = number of organisations 7 availability of e-commerce support for smes in developing countries the international journal on advances in ict for emerging regions 01 (01) october 2008 table 2: external barriers to using or extending use of ict & e-commerce external barriers mean std n % cultural barriers lack of poularity for online marketing and sales 3.56 1.28 120 62.5 infrastructure barriers low internet penetration in the country 3.78 1.09 125 71.2 inadequate quality and speed of lines 3.63 1.06 130 70.8 inadequate infrastructure in the country 3.52 1.22 125 62.4 political barriers unstable economic climate in the country 3.73 .971 135 73.3 changing regulations with each govermment 3.72 1.12 135 71.9 change social barriers lack of information on e-commerce 3.59 1.04 133 69.1 no one-shop facility 3.50 1.19 127 54.3 no access to reliable expert help 3.25 1.10 130 52.8 senior management in other sector lack ict 3.24 1.05 123 52.8 knowledge legal & regulatory barriers little support for smes from govermment and 3.7 .96 128 64.0 industry associations inadequate legal framework for business using 3.68 .98 121 64.5 e-commerce no simple procedures and guidelines 3.67 1.10 128 65.6 lack of suitable software standards 3.51 1.10 128 53.9 n = number of organisations table 3: internal support for smes to use or extend use of ict & e-commerce table 4: external support for smes to use or extend use of ict & e-commerce external support mean std n % improve national infrastructure 4.04 .76 130 84.6 provide financial assistance 3.97 .81 135 78.5 provide tax in centives 3.97 .92 132 80.3 improve ict diffusion 3.95 .83 130 80.8 government & industry sector to take 3.91 .91 134 75.4 leadership/promotion role improve collaboration among smes 3.86 1.04 133 69.1 improve low bank account and credit card 3.83 .81 123 72.4 penetration enforce suitable software standards 3.8 .97 132 74.3 n = number of organisations ic te r table 5: internal barriers internal barriers required support is support available? employees lack the required skills training minimal security concerns with payments over the internet legal framework no e-commerce cannot give any financial gains awareness minimal e-commerce not suited to the way we do business awareness minimal external barriers required support is support available? lack of popularity for online marketing & sales awareness no low internet penetration in the country awareness and infrastructure no poor speed & quality of line with telecommunications infrastructure no inadequate infrastructure in the country infrastructure no relatively high cost of internet access infrastructure no unreliable power supply infrastructure no unstable economic climate in the country. no constant change of rules and regulations no the lack of available information on e-commerce awareness partially no one shop facility for services one stop shop no snr. management lacking in ict knowledge training minimal no access to reliable expert help consultancy minimal little support from government/industry with policies legal framework no inadequate legal framework for using e-commerce legal framework no no simple procedures and guidelines legal framework no lack of suitable software standards legal framework no the above table 6 illustrates that in order to resolve the external barriers, support in infrastructure, awareness building, education, and training and consultancy is required. the sme intermediaries are helpless in providing the support with infrastructure and legal framework as the government intervention is required. the lack of availability of information and the lack of popularity of e-commerce can both be addressed by appropriate awareness building programs. even though it is available to a certain extent, the sme intermediaries are not capable of proving fullest support to address this barrier due to lack of proper mechanism to reach the smes at grass root level. even though the intermediaries are making an effort to generate awareness, they also seem to be hindered by finances and resources and lack of properly formulated strategies and coordinated programs. support the survey results revealed the support strongly requested by the smes. the tables 8 and 9 given below illustrate the support that ranked as most significant, within the organization: internal support and outside the organization: external support. it identifies the support required to assist them and also whether support is available with the sme intermediary organizations. table 7: internal support internal support requested by sme is support available? guidance to overcome risks with implementing no awareness building/educating in ict & e-commerce partially assist smes with guidelines for hardware and software no advice & direction with regard to ict & e-commerce no guidance to overcome risks with implementing e-commerce ranked highest in the list of support for smes. they need support in every aspect of the implementation of e-commerce starting with knowledge, technical, management, and consultancy. this support is not available from the sme intermediary organizations. assisting with hardware and software and also advice and direction is minimal and almost non–existent. it could also be contributed to the fact that the main focus of the intermediaries is first and foremost to elevate the standard of smes in general. adoption of e-commerce has taken a back seat in view of the other pressing problems. mahesha kapurubandara, robyn lawson 8 october 2008 the international journal on advances in ict for emerging regions 01 (01) table 6: external barriers ic te r table 8: external support external support is support available? improving national infrastructure no provide some form of financial assistance to help smes no government & industry sector to take leadership & promotion no provision of tax incentives no improve low computer & internet penetration no improve low bank account & credit card penetration no enforce suitable software standards no improve collaboration among the smes no the above table shows the support required from outside the organization: external support. other than the item listed last; “improve collaboration among smes” the intermediaries are not capable of proving support with the other barriers. they need to liaise with the government to provide such help to the sme organizations. the evidence from the above tables is informative. a few barriers; namely, awareness creation and training, appear to be addressed to a certain extent while a majority of the barriers are either disregarded or totally neglected by the participant sme intermediary organizations. it is also evident that where such support is available, it is restricted to the urban areas. further, there appears to be a disparity between the smes’ requirements and the support available with the sme intermediaries. the sme intermediaries, on the other hand, seem to be facing trouble, meeting with their objectives. apparently, this hinders progress with the efforts of the sme intermediaries to assist smes. perhaps this drawback can be attributed to uncoordinated efforts that lack proper strategy, frequent changes of government resulting in changes of rules and regulations, lack of interest from the authorities concerned or may even be due to lethargy from both the public and the private sectors towards heavy investment in what is often seen as an unstable economic environment. challenges facing smes the objectives of this study were to understand and determine the importance of internal and external barriers; and support required to overcome them. the importance of barriers shows that smes are extremely hindered by external barriers. the internal and external support required reveals that there is a strong request for it. the difference between adoption patterns in developing and developed countries focuses on support activities needed in the development. support is available in developed countries and it is a matter of finding the appropriate support for a sme encountering barriers, whereas in developing countries this support is almost non-existent. another difference centers on the external barriers identified, such as the need to improve the national telecommunications infrastructure. this research contributes by identifying the absence of a government and industry coordinated approach to providing support for smes, and not addressing problems at a grass-root level. in addition an initial framework for etransformation of smes in a developing country is proposed for trial towards validation. next on the agenda next, further statistical analysis of survey data attempts to validate initial outcomes, test construct validity, and assumption testing. barriers and support that predominate at various levels of sophistication need determination for unique perspectives in examining issues and understanding. problems need to be prioritised at different levels to enable smes better equip themselves progress through e-transformation. finally the initial framework will be trialled with case study organizations. conclusion this study provides an understanding of the challenges faced by smes in the adoption of ict and e-commerce in developing countries. assessing and determining current levels of ict and e-commerce sophistication of smes it examines barriers impeding smes, while identifying support required for etransformation. the developed conceptual model identifies internal and external limitations while assessing necessary support to overcome obstacles. the results of the exploratory interviews and the survey clearly indicate the necessity to provide support to smes if they are to successfully adopt ict and e-commerce. faced with identified barriers, both external and internal, these barriers are found to impede sme uptake of ict and e-commerce. accordingly, necessary support to overcome or alleviate the barriers discovered had also to be recognised. this support, in the form of suggestions were later confirmed in a series of interviews carried out with sme intermediaries whose task is to provide some support, but agree with the existence of many internal and external barriers that prohibit the uptake of ict or e-commerce by smes. the intermediaries go further with their observations. they believe and confirm that smes are restricted with the strength or the capacity to address these barriers on their own. the little available support extended by intermediary organizations at present, seems to be inadequate. besides, the available support programmes are incapable of meeting sme requirements. it was not surprising, therefore, to note that some smes were 9 availability of e-commerce support for smes in developing countries the international journal on advances in ict for emerging regions 01 (01) october 2008 ic te r even unaware of the existence of intermediaries, leave alone the support programmes offered by them. apparently, only a few smes have opted to receive assistance from the intermediaries. the identification of a lack of support, an important outcome from the sme intermediary interviews as a possible factor, contributes towards the slow uptake of ict and e-commerce as explained by them. their projects do not seem to be sufficiently geared towards the needs of the smes. moreover, it is apparent that activities of such bodies are seen as uncoordinated and bureaucratic. this fact is in agreement with previous research [28]. in an age where information and technology combine to evolve new and emerging technologies that are speedily snapped up by the developed world for their betterment, it is sad to see the developing, trailing behind for the want of necessary financial, and other support. in such a scenario, it is vital that both industry and government step in with the correct advice and support to help smes with their uptake of e-commerce. one of the major outcomes of the study presented in this chapter is the necessity to review current initiatives aimed at promoting ict and e-commerce with the smes and to develop strategies with systematic focus to help smes to e-transform their organisations. this information can be fed up to the relevant government authorities to assist them with strategy formation. references 1. undp (2004). unctad e-commerce and development report 2004. 2. unctad (2001). e-commerce and development report. 3. chong s. (2007). business process management for smes: an exploratory study of implementation factors for the australian wine industry. journal of information systems and small business, 1(1-2): 41-58. 4. marshall p., sor r., and mckay j. an idustry case study of the impacts of electronic commerce on car dealership in western australia. journal of electronic commerce research 2000. 1(1): 1-16. 5. rashid m. a. and al-qirim n. a. (2001). ecommerce technology adoption framework by new zealand small to medium size enterprises. research letters information mathematical science, 2(1): 63-70. 6. cooray m. n. r. (2003) walk through cleaner production assessment in sme’s a case study from sri lanka, small to medium enterprise developers. 7. slbdc (2002). survey of electronic commerce implementation on sme sector in sri lanka, 2002 sri lanka business development centre: colombo. 8. chau s.b. and turner p. (2001). a four phase model of ec business transformation amongst small to medium enterprises. 12th australian conference on information systems. coffs harbour, australia. 9. mehrtens j., cragg p.b. and mills a. (2001). a model of internet adoption by smes. information and management, 39(3): 165-176. 10. jacovou c.l., benbasat i. and dexter a.s. (1995). electronic data interchange and small organizations: adoption and impact of technology. mis quaterly, 19(4): 465-485. 11. akkeren v. j. and cavaye a. l. m. (1999). factors affecting entry-level internet technology adoption by small business in australia evidence from three cases. journal of systems and information technology, 3(2): 33-48. 12. cloete e., courtney s. and fintz j. (2002). small business acceptance and adoption of e-commerce in the western-cape province of south-africa. electronic journal of information systems in developing countries, 10(4): 1-13. 13. ebpg, (2002), eeurope go digital: benchmarking national and regional e-bussiness policies for smes, final report of the e-business policy group, european commission, enterprise directorate general, brussels, 28 june 2002. 14. corbitt b., behrendorf g., and brown-parker j. (1997). small and medium-sized enterprises and electronic commerce. the australian institute of management, 14 (204-22). 15 huff s. and yoong p. (2000). smes and ecommerce: current issues and concerns. a preliminary report in international conference on e-commerce, kuala lumpur, malaysia. 16. oecd. (1998). smes and electronic commerce, ministerial conference on electronic commerce. ottawa, canada. 17. anigan g., (1999). views on electronic commerce. international trade forum, 2: 23-27. 18. bingi p., mir a. and khamalah j. (2000) the challenges facing global e-commerce information systems management, 17(4): 26-35. 19. elkin n. (2001). online privacy and security in latin america. mahesha kapurubandara, robyn lawson 10 october 2008 the international journal on advances in ict for emerging regions 01 (01) ic te r 20. el-nawawy m.a. and ismail m.m. (1999). overcoming deterrents and impediments to electronic commerce in light of globalisation: the case of egypt. 9th annual conference of the internet society, inet 99, san jose, usa,. 21. schmid b., stanoevska-slabeva k. and tschammer v. towards (2001) the e-society: e-commerce, e-business, e-government. 2001: zurich, switzerland. 22. mingers j., (2001). combining is research methods: towards a pluralist methodology. information systems research, 12(3): 240-259. 23. gable g.g., dharshana s. and taizan c. (2003). enterprise systems success; a measurement model. 24th international conference on information systems 2003. 24. gallivan m. j., (2001). organisational adoption and assimilation of complex technological innovations; development and application of a new framework. the database for advances in information systems, 32(3): 51-85. 25. sekaran u.,(2000) research methods for business – a skill building approach. 3rd edition ed. 2000: john wiley & sons, inc. 26. kapurubandara m. and lawson. r. (2007). smes in developing countries need support to address the challenges of adopting e-commerce technologies. 20th bled econference emergence: merging and emerging technologies, processes, and institutions. bled, slovenia, june 4 6. 27. task force. (2002). national strategy for small and medium enterprise sector development in sri lanka. task force for small & medium enterprise sector development program,reports prepared under the ips leadership for the gosl 28. lewis r. and cockrill a. (2002). going globalremaining local: the impact of e-commerce on small retail firms in wales. international journal of information management, 22(3): 195-209. 11 availability of e-commerce support for smes in developing countries the international journal on advances in ict for emerging regions 01 (01) october 2008 ic te r  international journal on advances in ict for emerging regions 2011 04 (02) : 37 51  abstract—cloud computing enables use of computing without user-side hardware, software and associated financial and knowledge requirements, except for the needs of a simple “webterminal” and access to the internet. emerging economies can fully capitalize on this to start a “revolution” and leap-frog developed nations by skipping completely the several major computing architectures gone through in developed nations. an analogy is the leap-frogging, that has taken place with wireless telephony. we briefly introduce our research into streamlining india’s grain supply chains and then discuss in detail on how cloud computing can play a pivotal role. although the focus is on the portion of the supply chains between wholesalers and consumers, we also discuss how cloud computing can streamline the entire supply chains and how it offers leap-frogging opportunities for the entire society. the proposed use of webterminals for accessing the cloud for all computing needs is new, and we point out several high-impact research subjects. index terms—cloud computing, web-terminal, wireless access, leap-frogging, grain supply chains, logistics, hubbing, branding, developing nations, india i. introduction loud computing has been motivated in developed nations to support the new paradigms of software as a service, hardware as a service, platform as a service, etc. [1,2]. however, it may be more beneficial to emerging economies like india. this is because countries like india can leap-frog developed nations in use of computing by directly migrating to the paradigm of cloud computing and skipping the less efficient intermediate computing architectures. note that this direct migration can virtually free users from the need to own computing hardware (stand-alone or networked personal manuscript received on march 01, 2011. accepted july 16, 2011. jacob tsao is professor of industrial and systems engineering at san jose state university. he taught and conducted research at s.p. jain institute of management and research of mumbai, india for four months during his sabbatical leave in the 2009 – 2010 academic year. (jacob.tsao@sjsu.edu) shailaja venkatsubramanyan is associate professor of management information systems at san jose state university. (shailaja.venkatsubramanyan@sjsu.edu) shrikant parikh is with s.p. jain institute of management and research. (drsparikh@gmail.com) prasenjit sarkar is with ibm almaden research center, san jose. (psarkar@almaden.ibm.com) computers) and client-side software. all such intermediate architectures also require the expertise and/or labor about their installation, operation, trouble-shooting, maintenance, upgrade, etc. these cost and knowledge requirements have been a barrier for the general population in an emerging economy to benefit from computing and the internet. cloud computing possesses the potential for removal of this and other barriers. although cloud-computing researchers have expressed the opinion that emerging economies can benefit from cloud computing, they tended to seek general application areas for cloud computing, e.g., government services, or suggested further development of technologies, e.g., voice-enabled web portal, that may facilitate the adoption of cloud computing by emerging economies [3]. we have had an entirely different motivation – to streamline and improve india‟s grain supply chain and, in the process, discovered that cloud computing is a promising technology for solving a key part of the problem [4]. a major problem in india‟s grain supply chains is spoilage. spoilage rates in these supply chains have been consistently estimated to be between 25% and 30%, with their developednation counterparts being approximately 3% [3,4]. the rates of price “mark-up,” i.e., service charges over crop costs, have been estimated to be over 240%, with their counterparts in developed nations being between 50% and 100% [3,4]. moreover, food prices have been steadily growing. this points to not only the importance but also the urgency of streamlining grain supply chains of india. improving india‟s grain supply chains has the potentials of better food quality and lower retail prices for the consumers and higher profits for the farmers. in addition, improving profit margins for the vast number of subsistence grain farmers may reduce population migration from rural areas to large cities; such migration has been attributed as a reason for the existence or expansion of some urban slums. this research was motivated to develop service, information technology and logistics concepts that can help streamline india‟s grain supply chains. we focus on the portion of the supply chains between the wholesalers and the consumers, and this portion accounts for a whopping 210% of the 240% overall price mark-up [3,4]. ict leap-frogging enabled by cloud computing for emerging economies: a case study on streamlining india‟s grain supply chains jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar c 38 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 a layer of intermediaries (i.e., middlemen) exists between the wholesalers and the retailers, and they serve three major functions: checking prices offered by a large number of small wholesalers, checking quality of grains carried by the wholesalers, and paying for the merchandizes at the time of shipping on behalf of the retailers. cloud computing has the potential of automating the functions of price checking and advance payment. it can also allow retailers to access directly information about many existing “private, unwritten brands,” which are currently understood only between the wholesalers and the middlemen, and hence has the potential for automating the function of quality checking as well. together with other improvements in supply-chain operations, particularly the concept of “consumer visible” branding, this layer of intermediaries can be drastically reduced in importance or even completely eliminated, hence significantly reducing the supply-chain costs. in addition to cloud computing and branding, the solution we proposed in [4] to help reduce the cost of the supply chains between wholesalers and consumers has another pillar – distributed hubbing for drastically reducing the distribution cost. unlike how cloud computing is used in developed nations [1,2], the two primary benefits of cloud computing for streamlining india‟s grain supply chains and improving the operations of many other systems in emerging economies are about the access to computing, not about the computing that can be performed in the cloud. a feature of cloud computing is the requirement of a “thin client,” as originally promoted in the 1990‟s by sun microsystems. this feature is actually an unsung hero from the perspective of emerging economies. this is because the minimum requirement for access to the cloud is a “web-terminal.” such web-terminals serve only the purpose of accessing the internet through a browser and require little memory and computing power. they can be very inexpensive, unlike some of the sophisticated “smart phones” serving as popular high-end cell phones and mobile internet-access devices in developed nations. in addition, there is no need for owning software, whose purchase and upgrading could be expensive for the majority of the population of an emerging economy; the charge of on-demand software may be much more affordable. this points to the first primary benefit of cloud computing low cost. the second primary benefit is low knowledge requirement. with the simplicity of such web-terminals and with the computing hardware and software applications residing in the cloud, the user is no longer required to install, maintain, recover, or upgrade either hardware or software. this architecture can be referred to as „pure cloud computing‟ or simply „pure cloud‟. cost and knowledge requirements are two major barriers for the grain retailers to access and use computing, and such barriers led to their continued complete reliance on the intermediaries and, consequently, to continued high service charges in this portion of the supply chains. (lack of public accessible price information also led to continued incidences of price-gouging by some merchants in this portion of the supply chains). the proposed “pure cloud computing” can drastically lower the two requirements and provide unprecedented opportunities for the retailers to benefit from computing and the internet. this type of operational concepts provides a leap-frogging opportunity for the entire indian society in general to more quickly than otherwise benefit from the power of computing by moving directly toward (public and/or private) cloud computing, bypassing the conventional steps of stand-alone computing, client-server architecture, networked computing, software as a service, hardware as a service, etc. as will be pointed later, although such leapfrogging is quite feasible for urban and suburban area, where access to the internet is already prevalent and hence is not a big issue, significant challenges exist for such leap-frogging to take place in rural or remote areas due to low accessibility to the internet. however, we argue that similar leap-frogging directly to cloud computing for rural and remote areas should still be the goal because it should be much less costly than going through the stage of personal computing and the other traditional computing architectures. an analogy to this ict leap-frogging is the leap-frogging having been taking place with wireless telephony. the market penetration for cell phones has reached 50% in just 10 years while its counterpart for land-line telephony is still 18% despite of 50 years of deployment. in fact, much of the benefit enabled by access to computing via the internet can be reaped, at least during initial deployment of the proposed operational concept, with the almost ubiquitous cell-phone technology and with some basic services like google sms applications, particularly google sms search [5,6]. the rest of this paper is organized as follows. section ii describes the problem of streamlining india‟s grain supply chains. section iii proposes a new operational concept as a solution for the problem and addresses its feasibility and benefits. it focuses on two pillars of the operational concept – “consumer visible” branding and distributed hubbing for the portion of supply chains between the wholesalers and the consumers, and the discussion is brief. section iv focuses on the other pillar of cloud computing and addresses its feasibility and its leap-frogging benefits. section v discusses briefly how cloud computing and ict can significantly improve the operations of the rest of india‟s grain supply chains. section vi discusses the success and hindrance factors for the proposed improvement via cloud computing. section vii discusses the promising leap-frogging role cloud computing can play in drastically lowering barriers to use of computing for the entire society of an emerging economy. concluding remarks are given in section viii. ii. india‟s grain supply chains the produce supply chains of india have long been rather organized, with heavy well-intentioned government participation through provision of physical trading facilities and legal regulations. for example, each city has a agriculture produce marketing committee (apmc) that ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 39 september 2011 international journal on advances in ict for emerging regions 04 owns and operates one wholesale market for each of several major types of produce, e.g., grains, pulses, spices and perishable produces, at which wholesalers store, display to brokers (between the wholesalers and the retailers), trade with the brokers and transship the produce. such markets of a city were originally intended for direct trading between farmers and retailers. as scales of agricultural activities grew larger and more specialized, such direct trading became difficult, if not impossible. for example, grains are grown mostly in punjab and other northern states of india and transported to meet the consumer needs of all parts of india. crops are consolidated by brokers (or directly transported by farmers) for sale to grain traders at the local apmc, called mundi at the farmer or supplier side of supply chains. typically, another layer of brokerage lies between these traders and milling companies, which sell their products to wholesalers through yet another layer of brokerage. between the wholesalers and the consumers are the retailers and the brokers between the wholesalers and the retailers. these supply chains are illustrated in figure 1. we found few articles addressing this severe problem in international journals or conferences. after summarizing india‟s current grain supply chains and how the spoilage and mark-up rates accumulated through the chain, sachan et al. [7] and sachin et al. [8] proposed, with a system dynamics approach, cost models for three supply-chain-integration alternatives for grains (namely cooperative supply chain model, collaborative supply chain model and contract farming). few indian journal or conference articles about grain supply chains were found, e.g., [9,10]. several indian agencies and news organizations have published research reports on these and general supply chains, e.g., [11,12,13]. a. major causes of the problem pointed out in the existing literature major root causes of the problems of high spoilage and price mark-up pointed out in the literature include [12]:  “lack of adequate storage and transport infrastructure at the village level and right through the supply chain, which results in loss of output to rodents, pilferage, spoilage, etc.  presence of a large number of intermediaries, which results in a high mark-up to the end consumer”. farmers brokers millers (flour; rice) traders broker (cleaned wheat) broker wholesalers consumers retailers brokers fig. 1. india‟s grain supply chains indian government plays a big role in food supply chains. icra [12] states, “a plethora of laws, often overlapping in coverage, regulates the indian food industry”. b. implemented partial solutions and current improvement efforts in response to this severe problem, efforts have been made to alleviate it. a successful effort is echoupal.com initiated and operated by itc ltd. of india [14]. echoupal.com, accessed from personal computers through phone lines or satellites by trained farmers as conduits of information for farmers residing within a distance of approximately 5 kilometers, allows farmers to know the current commodity prices so that they can time the sale of their crops accordingly. itc also purchases crops from farmers but with advanced equipment, e.g., automatic weight scale, quality-test lab, etc., and with a system of modern procurement practices, e.g., establishing quality standards based on numerical measures, quality-sensitive pricing, online price negotiation, etc. it also educates farmers on modern agricultural practices for better yield and better quality. it is important to note that much of this modernization would not have been possible without the it infrastructure of echoupal. also, coverage of echoupal expands as the it infrastructure does. while echoupal focuses on the farmer side of the supply chains, our focus is on the consumer side. a small number of grocery supermarket chains have been founded in recent years and a small portion of the grain supply chains of india have been modernized. reliance fresh is one of them. however, accordingly to our sources, it continues to rely on wholesalers for its supply but d-mart , another small supermarket chain, directly sources its grains from the millers or even the farmers, hence shortening the supply chains. c. our focus and improvement opportunities we focused on mumbai, india for a case study and conducted a number of site visits to interview wholesalers, sales brokers for wholesalers (i.e., purchase brokers for the retailers), retailers, managers of apmc (which is the nationwide government agency regulating the grain and produce supply chains ), and even purchase brokers for the wholesalers (i.e., sales brokers for the millers), among other key participants in the supply chains. the large number of brokers involved in india‟s grain supply chains has been blamed for the high mark-up and spoilage rates. although some brokers have engaged in price gouging, brokers intermediating between different pairs of other participants of the supply chains do serve multiple functions in the current supply chains. take the retailers‟ perspective for example. the purchase brokers for a retailer (i.e., the sales brokers for a wholesaler) are actually needed in the current system to (i) investigate the highly variable quality levels offered by a wholesaler for one grain variety through time, not to mention the variability associated with a large 40 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 number of wholesalers, (ii) search or negotiate for the best prices among many different wholesalers offering desired quality, and (iii) make advance payments to the wholesalers before collecting payments from the purchasing retailers (and bear the risk of non-payment by some retailers). the mumbai apmc consists of three separate markets dedicated to three produce types: grains/pulses, spices and fresh produces. 600 wholesalers, 1500 brokers and 3500 trucks work at the grains/pulses market and serve all 15 million residents of mumbai, india. each of these 600 wholesalers occupies only a small, rectangular storage area of an approximately 1,700 square feet. the neighboring wholesale spaces share two walls; samples of up to over 50 varieties of grains are displayed on tables placed at the narrow storefront, which is approximately 20-feet long and opposite of the loading dock. the 600 stores occupy 20 virtually identical buildings. a typical storefront display of sample merchandise consists of small trays labeled with “private brands.” note that these “brands” are currently understood only between wholesalers and the middlemen (i.e., purchase brokers of retailers). although each wholesale store is quite small, computer use is an integral part of the operation. in fact, mumbai apmc develops software for the wholesalers. retail operations tend to be small as well; a typical neighborhood retail store specializing in staple grains and pulses may have a storefront of 8 to 12 feet. it is safe to assume that operating a typical retailer does not require much knowledge. these motivated our search for a low-cost and low-knowledge-requirement ict solution to help streamline the supply chains. we note that according to the current regulations, apmc markets on the consumer side of the grain supply chains are the only locations, with few exceptions or “loopholes” where wholesale activities can be conducted and from which all the grains to be consumed by the city residents are transshipped. as cities and their populations grow larger, apmc markets outgrow their original confines, and some such markets have been relocated to city outskirts or even neighboring cities. for example, the apmc markets of mumbai, including the grains/pulses market, spice market and fresh-produce markets, moved out of mumbai altogether and into the neighboring city of navi mumbai. serving the grain needs of a large city through a single out-of-the-city grain hub induces drastically more transportation and logistics than necessary. this motivates our service concept of distributed sub-hubs or subapmc markets to streamline the supply chains. iii. a new operational concept as mentioned earlier, the solution proposed for streamlining the supply chains has three pillars: cloud computing, distributed hubbing and “consumer visible” branding. in this section, we briefly describe an operational concept as a solution to the streamlining problem and focuses on the two non-it pillars. the it pillar, i.e., cloud computing, will be discussed in detail in the next section. we observed that, except the basmati rice (a special variety of rice), there are few “consumer visible” brands for grains, if at all. currently, purchasing good-quality grains requires screening out low-quality products, and such a task is performed by the retailers‟ purchase brokers, who poke grain bags with a sharp metal scoop to collect samples. consumervisible branding in india is being developed to some extent; the current burgeoning supermarket chains carry their own “brands” of “loose grains,” put them in barrels, and sell them by weight. this concept of consumer-visible branding, when implemented with other procedures and standardized fig. 2. 30 wholesale spaces in one of 20 identical buildings fig. 3. a typical display of private brands at a wholesaler fig. 4. a cluster of three typical grain retailers in mumbai ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 41 september 2011 international journal on advances in ict for emerging regions 04 practices, can help address the quality issues and eliminate the major function of quality screening currently served by the brokers between the retailers and the wholesalers. as mentioned earlier, if information about the “private brands” marketed by the wholesalers to the middlemen can be made available to the retailers or the general public by ict, this middleman function can also be drastically reduced, if not completely eliminated. transportation accounts for a large percentage of the supply-chain cost. it is also a major source of green-housegas emissions. the concept and practice in india of having only one centralized location in a city set aside to enable direct interaction between farmers and retailers dates back seven hundred years. these days, this location is the apmc. although well-intentioned, such direct interaction rarely takes place, and farmers or retailers cannot afford the time to staff a counter at such a location. in addition, since grains are mostly grown in states in northern india, there are no grain farmers in or near mumbai or most large indian cities, and there is never any interaction between grain farmers and retailers there. such a centralized location has now become, by law, the only location where grain and produce wholesale can take place and the hub to which all grain shipments from millers must terminate and from which all grain shipments to retailers must start. with only one centralized hub location for wholesalers to distribute grains to their retailer customers, the total distance traveled by the distribution trucks is several times than what would be required had there been multiple sub-hub locations. in large cities like mumbai, the apmc are most likely located in the suburb or in a neighboring city. in such a case, the total distance traveled could even be higher. concomitant with the unnecessary distance traveled are the unnecessary fuel burn, the resulting environmental impact, traffic congestion, and all the negative consequences associated with the congestion. we conducted a study to demonstrate that a distribution system consisting of four small “sub-hubs” could drastically reduce the total distance traveled by the distribution trucks and reported the findings in [4]. in the rest of this section, we briefly summarize the findings. figure 2 provides a bird‟s-eye view of the mumbai metropolitan area, consisting of the mumbai peninsula and navi mumbai. note that the purpose of that study was not to suggest a new distribution system for mumbai. rather, it was to inform the designers of possible future grain-distribution systems of other large cities so that the transportation cost can be minimized, subject to non-logistics considerations. the total cost of transporting grains from the source to mumbai retailers consists of long-haul cost and distribution cost. we focus on transportation cost first and then address the facility costs, particularly the fixed costs associated with construction and operations of the hub or sub-hubs. grains are produced in northern states of india, with the state of punjab being a major grain producer state of india and the primary supplier of grain crops to mumbai. since punjab is directly north of mumbai metropolitan area and navi mumbai‟s latitude is at the mid-point between the latitudes of the northern and southern tips of the metro area, the total longhaul transportation cost associated with the current system should be approximately the same as its counterpart associated with the alternative four-sub-hub system. the distance traveled by distribution trucks was used as the proxy for the distribution cost. the results are summarized in table i. note that the large ratio 4.86 of the total distance traveled under the four-sub-hub configuration over its current one-large-hub configuration points to a potential of drastic reduction of the recurrent cost of distribution. as for the fixed costs, the issues of land availability and traffic intensity associated with accommodating a large hub in a big city like mumbai may be much more difficult to resolve than their counterparts associated with accommodating four small sub-hubs. we hope that this drastic numerical evidence will facilitate considerations for governmental policy changes. iv. ict leap-frogging enabled by cloud computing as mentioned earlier, a broker between a wholesaler and a retailer serves three major functions. one of them is quality checking, which was discussed in the previous section and has the potential of being automated with ict. the two other fig. 5. mumbai peninsula and navi mumbai. table i contrast between the tow systems: distance traveled (truckload = 20 tons) region distance to sub-hub distance to apmc apmc/ sub-hub north west 134.5 627 4.66 north east 136.0 417 3.07 central 65.2 479 7.35 south 74.8 473 6.32 all 4 regions 410.5 1996 4.86 truckload-km 862,050 4,191,600 4.86 42 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 major functions served by the broker, namely checking prices offered by many different wholesalers and making advance payments to wholesalers, pertain directly to information technology (it). although india is well known for its it industry, a large portion of the general public, particularly small merchants like grain retailers, may not have the required financial means and knowledge to acquire, install, operate, maintain, trouble-shoot and upgrade the it hardware and software. we believe that the modern concept of cloud computing may work particularly well for india, particularly for the grain retailers. although the concept of cloud computing originated in developed countries to deal with the issues of too much data (to be stored on personal or corporate computers), too much computing, sporadic computing needs, fast growth of computing needs, and/or to support the new paradigms of software as a service (or software on-demand), platform as a service or infrastructure as a service, it may be more beneficial to emerging economies like india. this is because countries like india can leap-frog the developed nations in use of cloud computing and avoid the financial investments and computer savvy that are required for the historically intermediate and incremental steps involving personal computing, particularly the ownership of computing hardware (stand-alone or networked personal computers) and ownership of client-side software. retailers in india or other small merchants can benefit from the wealth of information that could be posted on the internet, particularly the prices offered by various wholesalers, and from online payment. note that india is on its way to catch up with the developed nations in telephony by skipping, i.e., leap-frogging, the step of land-line telephony and capitalizing on wireless communication, via cell phones. as mentioned earlier, while market penetration of land-line telephony in india is approximately 18% at this point after decades of deployment, its counterpart for cell-phone use has risen to over 50% in just 10 years. we stress that such leap-frogging may and should take place in computing or internet use in india and other emerging economies, first in individual special economic sectors and then in the larger society as a whole. a. service characteristics the key is to require user-friendly, fast and inexpensive price-checking and payment functions. from the perspective of a retailer, the services are provided via the internet. therefore, the key business model must be at least the application service providers (asp) model, without requiring anything beyond a so-called “thin client.” in fact, what is really needed is a so-called “web-terminal,” which can be viewed, in terms of functionality, as a “dumb terminal” plus an internet browser. with such web-terminals, the grain retailers can access internet service applications via a landline or wireless communication. of course, a traditional desktop or laptop computer with internet access will serve the purpose as long as the concomitant capital and knowledge requirements pose no issues. from now on, we focus on the minimum requirement in terms of both capital cost and technical knowledge. note, however, that a desktopor laptop-based solution may work very well through sharing among individuals or through paid use of such hardware at a commercial facility, e.g., an internet café. these web-terminals have also been referred to as “webenabled” personal access devices in the us and other developed nations. actually, the devices are not required to be “personal”; the retailers can use shared web-terminals devices to access the internet, as long as the sharing is convenient, economical and secure. to avoid confusion, we will use the term “web-terminals” to summarize the type of devices that are really needed. from the service provider‟s point of view, software-as-aservice is required at the outset. as the business of such a provider grows, the provider may benefit from platform-as-aservice and infrastructure-as-a-service. in addition, as the service sector grows, such service systems must be scaled up and the services may be deployed on a private, community, public or hybrid cloud [15,16], reaping the benefit of scalability of cloud computing. making payment online or even through a cell phone is commonplace in india, although it is more popular among affluent and technology-savvy people. the system requirements for supporting this function are already well established and documented, and we will focus on information needed by the retailers for their own online pricechecking and about their ability to recognize the private brands offered by wholesalers directly. in fact, the price is set for an individual private brand. a wholesaler brands its merchandise according to several criteria. although grain variety is a major and obvious criterion, there exist other major ones. within a particular grain variety, some consumers prefer grains grown in particular regions. some consumers prefer surprisingly rice grains that have been in storage for a significant amount of time, as long as the quality is not compromised. a rationale is that the level of water contents of such grains is reduced and therefore the rice when cooked is more “fluffy and “puffy.” such grains may also weigh less and may be perceived as less costly. quality is of course another major criterion. the quality of the grains arriving at the storage of a wholesaler depends not only on the quality of the same grains when leaving the milling plant but also on the transportation process, through which the grains may be subject to quality deterioration caused by excessive moisture, high temperature, and insect and pest contamination. different brands may reflect not just the varieties of the grains but also the quality levels. information about all these brand criteria and the corresponding prices can easily coded in a database for browsing or search. we assume that the services provided require mostly display of text and hence that the transmissions of text require small communication bandwidths. we next discuss two possible implementations, one with a land-line and the other ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 43 september 2011 international journal on advances in ict for emerging regions 04 without it but with wireless communication, regardless whether the use is shared or personal. b. assessment of hardware and software requirements for land-line access to the internet the data-transmission rate for a typical telephone land-line in india is 46.6 kbps. it is ample for the retailers‟ needs for price checking and payment. higher bandwidths are available and inexpensive. for example, the cost for broadband typically consists of a rs. 300 monthly charge and a rs. 3000 one-time charge ($1  rs. 46). these should not be considered expensive by most retailers. in the us and other developed nations, there do not seem to exist well-known brands/models of web-terminals that are used exclusively with land-line telephony. this may have been because the near-ubiquitous presence of desktop or laptop computers in businesses or even homes. the existence of such brands/models for wireless communication, as to be discussed below, may very well have resulted from the market for “mobile internet.” the basic requirements for a land-line-based web-terminal are simple. it consists of a monitor (for user interface), a communication device (hardware and software), and a browser (for display of and request for internet information). these days, computer monitors are well known to be very low-cost and highly reliable. so are the basic communication devices. browser software is basically free. as a result, a land-line-based web-terminal should be very inexpensive and highly reliable, easing the capital and knowledge requirements to the minimum. c. assessment of hardware and software requirements for wireless access to the internet several well-known brands/models for wireless (mobile) web-terminals already exist in the us or other developed nations, although they are quite high-end when compared to the needs of the grain retailers. they include ipod, ipod touch, blackberry, geode webpad, etc. other brands/models also come with wireless voice communications, e.g., iphone. a patent has been granted for a wireless web-terminal design that uses land-line through the base station of a cordless-phone. the rate of data transmission for typical basic cell phone in india is 10-30 kpbs. it is quite adequate for the retailers to check prices and make payment. if more bandwidth is required, a 256 kbps broadband connection can be purchased with a rs. 3500 one-time charge and a rs. 300 monthly minimum. as indicated earlier, these brands/models have been developed for high-end consumers and may have much functionality that is not needed for our purpose here. the minimum requirements for a wireless web-terminal are similar to those for the land-line counterpart. one strategy is to reduce unnecessary functions of the high-end models just mentioned. however, another strategy is to capitalize on the functions already residing on a basic cell phone. one approach is to develop a “docking station” on which a basic cell phone can be placed and which is connected to a monitor for more user-friendly user interface. d. enabling non-users of it to reap benefit of internet the concepts proposed in this section may be applicable for other small merchants in india. they may also be applicable elsewhere. given the security issues legitimately concerning it managers of large corporations as well as the legacyinvestment issues, many experts pointed out that the first adopters of cloud computing might not be big corporations but might be small to medium enterprises. some even pointed out that a major contribution of cloud computing may be to attract current non-users of internet to become users. russ daniels, hp‟s cto for cloud services strategy, stated [15], “…we can make technology useful for a much broader group of people. it‟s not just consumers, but it‟s the people that today are nonconsumers, the people that aren‟t using technology because it‟s too complex, too expensive, too hard to get to, and that‟s really exciting….” franco travostino (distinguished architect of ebay) stated [15], “… i would say that entrepreneurs and small operations have to be the first beneficiaries given that clouds today don‟t have five-nine [99.999 percent] or seven-nine [99.99999 percent] dependability. increasingly, we should see more of the fortune 500 companies. they will be torn between using their own internal cloud within their own it confines versus a real external cloud by an external provider which they do not have control over. …” note that internet-browsing capability has been or can be added to other common electronic devices, e.g., tv. this bodes well for web accessibility by the retailers and for more streamlined grain supply chains in india. with the internet browsing capability of web-terminals, cell phones and the other common electronic devices discussed in this section, no conventional computers, either desktop or laptop computers, are required. this enables emerging economies‟ leap-frogging of developed nations in the sense that emerging economies can skip the expensive stages of stand-alone computing, client-server computing, software as a service, hardware as a service and infrastructure as a service directly to the drastically more efficient cloud computing. privacy and some other issues associated with the use of cloud computing in developed nations may not be as important in emerging economies, at least comparatively. v. cloud computing and ict for the entire chains in this section, we discuss briefly how cloud computing and ict may significantly improve the operations of the rest of india‟s grain supply chains and how the potential improvement may address some critical and urgent issues facing the indian society as a whole. recently, india‟s economic and thereby the political landscape has been rocked by high food inflation, which has been on the order of 17%+ in recent past. several reasons have been put forward for this state of affairs. a key set of 44 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 reasons revolves around inefficiencies in grain-food supply chains and low levels of visibility in the supply chains. we now briefly discuss the portion of grain supply chains on the farmer‟s side. if the farmers have direct access to information about different prices offered by the traders/consolidators for the same commodity in the same or different apmcs and, for that matter, even have access to such information at the regional level, they can accordingly choose to sell the commodity in the „right‟ market at a „right‟ time and get a better price. this information can be made available directly to the farmers by way of providing them access to different information bases on the web. such access to information can be made possible in a ubiquitous way by way of cloud computing, web terminals and the almost ubiquitous wireless connectivity. in a similar fashion, the next set of traders in the supply chains and finally the very large wholesalers will stand to benefit if they have access to details about the crop planting and yield on a macro-scale. these details can include information like approximate size of area in which a particular crop (e.g., wheat) has been planted, the likely availability of the final crop to be sold at different apmcs in different parts of the country in different timeframes, and the progression of the commodities and their quantities through the supply chains. such information in turn can be used to project availability of different commodities to be sold in different parts of country in different time frames. another significance of this information is that the retailers and the wholesalers from which the retailers source different commodities will have a much better insight into supply of commodities and their quantities at different points in time at different relevant locations. this transparency, i.e., the “visibility in extreme,” will enable and encourage the stakeholders to trade and make commodities available at “fair market prices.” their trading decisions will be much more informed. the central point is, cloud computing, web terminals and wireless access enable this information to be universally available. of course the information databases need to be made available on the cloud for easy access by the stakeholders. currently the prices at which the commodities are traded are lot more arbitrary and cannot be called “well informed”. the small traders and their customers stand to lose most because of this state of affairs. moreover, these prices are highly influenced by “dynamics” of the commodity exchange (mcx). it appears that a small group of traders on the exchange can have a high degree of influence on the price at which a particular commodity is traded, which in turn impacts the retail price of the commodity, hurting the customers. a key assertion of this paper is that the unfair pricing which can be determined by improper or even speculative means can, to some extent, be neutralized by universal cloudcomputing-based agricultural databases and wireless access to them. vi. success and hindrance factors in this section, we discuss possible success and hindrance factors for implementation of the proposed use of cloud computing in india‟s grain supply chains. note that although our proposal to post the price information on the internet was motivated to replace a service currently performed by the middlemen between the wholesalers and the retailers, the information is perhaps more useful for the consumer and the apmc. we first discuss factors that would facilitate the implementation and then factors that may hinder the implementation. a. possible success factors as pointed out earlier, apmc of a city plays a pivotal role in the operation of grain supply chains. a primary goal of apmc is to ensure market efficiency in general and to prevent price gouging in particular. mr. sudhir tungar, the principal secretary of mumbai apmc, informed us during an interview of a recent success in detecting and resolving a price-gouging incident in the district of colaba in south mumbai. local retailers told their customers about a nonexistent crop shortage and charged prices that were about five times the prevailing wholesale prices charged at the mumbai apmc approximately only 36 km away. apmc sent truckloads of the merchandise to the district immediately after detection. the public apprehension immediately subsided, and the price gouging ceased. he also informed us that the large price mark-ups might also result from retailer speculation about supply volatility. had wholesale prices or the prevailing retail prices been posted on the internet, price-gouging incidents like this one would not have occurred. an informal survey of our colleagues and some students confirmed our conjecture that most grocery shoppers in india really want to know a fair price range for each major staple grain, pulse (e.g., beans) or vegetable. in addition, major success factors for echoupal include the farmers‟ strong desire to know fair price ranges for their crop and the ability of echoupal to provide the information [14]. we believe that apmcs‟ and consumers‟ desires for transparency in price information will help propel the realization of the proposed use of cloud computing via a web-terminal. it should be informative to discuss successful implementations of cloud-computing-based or closely related initiatives or in india. echoupal was developed when the client-server architecture was the state of the art in india. more importantly, it relies on expensive communication with rural farmers, via landline or microwave, for downloading educational video and other services requiring substantial bandwidth. although client-side software may be necessary, the service can be considered cloud computing. it is conceivable that this internet-based service can be provided via a web-terminal, i.e., its resident browser, if the communication bandwidth is sufficient and advanced video technology is employed. the more important things are the value of price information to farmers and the farmers‟ ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 45 september 2011 international journal on advances in ict for emerging regions 04 willingness to pay for the information; they have helped propel the demand for and the success of echoupal. upton and fuller [14] detailed how and how much farmers have benefited from the transparency of the price information. we believe that their counterparts on the consumer side will help propel the implementation of the proposed use of cloud computing. farmer lobby is very large and politically strong. therefore, anything that can help farmers improve their livelihood becomes a favorite cause for many politicians. for example, in a recent budget, the central government of india pardoned a large part of rural agricultural debts. farmers have complained about the extortionate service charges piled on their meager crop revenues for decades; so have the consumers. streamlining any portion of the grain supply chains will benefit the farmers and the consumers. therefore, we believe that, in terms of political support, the proposed cloud-computing-based operational concept on the consumer side will be popular for the society at large. as for price information on the consumer side of the supply chains, indian harvest [17] currently provides daily spot prices of various agriculture commodities, but only for several commodity trading centers in india. our goal is to develop a feasible and attractive operational concept that can be implemented in the near term to provide informational transparency on the consumer side and hence complement echoupal in streamlining india‟s grain supply chains. we now turn our attention to successful implementation of cloud-computing-based initiatives on the consumer side of grain supply chains in the us. this portion of the supply chains has long been fully integrated and dominated by few large supermarket chains, e.g., safeway. the kind of wholesalers and retailers currently in operation in india have long disappeared in the us. as a result, the streamlining issue being dealt with in this paper does not exist in the us, and there are no comparable initiatives currently or in the recent past. for decades, us supermarket chains have had their own internal mis and it departments. with the deployment of advanced technologies like virtualization, the it currently supporting this portion of the supply chains can be characterized as a private cloud or an “internal cloud” managed by a centralized internal authority [1]. current us searchable/queryable agriculture databases are mostly implemented as cloud computing, with search or query requests specified and transmitted through an internet browser (without any other client-side functionality) and with the searches or queries performed in the cloud, free of charge. we give several examples for such agriculture databases to demonstrate their technical feasibility and popularity. the internet service provider agriculture.com [18] provides daily and intraday price data; agricommodityprices.com [19] provides periodically updated commodity-specific and country-specific price data. we note, however, that other databases exist but some of them have been provided mostly for research purposes, instead of for making routine purchasing decisions. an example is the database provided by the food and agriculture organization of the united nations. in particular, faostat [20] provides time-series and cross sectional data related to food and agriculture for some 200 countries, and pricestat [21] supports queries for the annual average price for a commodity, e.g., rice (paddy), in a country, e.g., india. the national agricultural statistics service (nass) of the us department of agriculture maintains a similar database, but with a focus on the us and with more details [22]. www.indexmundi.com [23] provides average monthly prices of rice and other commodities of a country, among a large amount of data constituting a country profile. most major supermarket chains in the us publish their prices on the internet (i.e., in the cloud) and some of them even take online orders for home delivery or pick-up at store. all these support our key assertion that the unfair pricing can, to some extent, be neutralized by universal, cloud-computingbased agricultural databases and wireless access to them. b. possible hindrance factors we now discuss possible factors that may hinder the implementation of the proposed cloud computing initiative. although it is possible that retailers may be unwilling to use computers or “web-terminals” to access price information posted on the internet, we believe that it is not likely. there has been no particular resistance to usage of computers in india even in rural areas particularly when there is a good reason to use it. computer training institutes imparting the basic computer skills (e.g. office, basic programming, basic maintenance skills) are ubiquitous even in c, d class cities and smaller towns because they are so popular and in demand. let us address the more general possible issue of resistance to technology adoption. there are no better examples than those involving rural populations. there are several instances where rural populations have adopted technology without much resistance, with adoption of automated teller machine (atm) being a simple example. while the fishermen of the indian state of kerala are returning from their fishing trips, they make cell-phone calls to check prices offered at different on-shore markets and decide on their landing spots accordingly. this also shows a strong desire for price information. in addition, basic information access is possible and inexpensive with no-frill mobile phones and of course with wireless–internet connected thin clients. the price of a mobile phone supporting gprs-enabled internet connection is approximately rs.3000, with an approximate gprs (general packet radio service ) connection cost of rs.100 per month (with unlimited usage). in general, we do not see any real resistance to technology, per say. as long as the technology provides value and is easy to use and afford, it will be embraced by the target users. currently, cellular telephony covers over 90% of the 1 billion indian population, with over 700 million subscribers. even if the proposed cloud-computing-based initiative http://www.indexmundi.com/ 46 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 experiences initial resistance to use of web-terminals or computers, basic price information can be obtained and payment can be made via a regular cell phone. internet searches or queries made via cell-phone text messaging have long been supported, e.g., google sms applications [5]. google sms search [6] can be used to get information on driving directions, sports, movies, stocks, definitions, etc; these services are free from google but message & data rates may apply. note that the required technology is the regular wireless telephony, not the newer and more expensive 3g or 4g technology. note also that we are not suggesting the use of smartphones for accessing the internet. they are very expensive to begin with and are marketed for affluent and technology-savvy users. also, many applications, i.e., the “apps,” require significant client-side functions. we now address the possible resistance to the proposed cloud-computing-based initiative from the wholesalers, the retailers and the middlemen. a retailer we visited informed us that his family and the family of his procurement agent (i.e., middleman between him and the wholesalers at the mumbai apmc) have cooperated for generations. as long as he can make a reasonable profit, whether his agent makes a lot of profit or not does not really concern him. although he has not felt any urgent need to find an alternative way of securing grains, he would be happy to try out the prototype technology and user interface we planned to develop. as mentioned earlier, the proposed information transparency can also prevent retailer price gouging. it can also introduce or at least encourage fair competition among local retailers. these may not be welcome by the middlemen and retailers alike. however, as discussed earlier, we believe that demand for informational transparency by the consumers, the farmers and apmc officials may entice or even force changes required for streamlining the grain supply chains of india. resistance from middlemen is expected. possible replacement work for them include quality inspection, certification or assurance for the supply chains. however, the work would not be performed for individual retailers on specific retailer purchases but perhaps for apmc or a certification body. as for the threats to existing traders (not just the middlemen between the wholesalers and the retailers) and trading relationships, these problems have been overcome before, e.g. by echoupal. itc, the sponsor of echoupal, has found new roles for commission agents and has successfully managed any potential resistance from them. vii. leap-frogging for the entire society in general technical merits of a technology alone cannot guarantee its market success; technology deployment and management of technology is a critical issue. although cloud computing was not invented for emerging economies, e.g., for the need of overcoming barriers against internet access by people without affluence or computer savvy, it does present many opportunities for emerging economies. the transportation sectors of developed nations have experienced a similar situation in the past two decades. advances in computing, communication and automation technologies spurred a intense level of interest in applying the technologies to improve transportation systems. a discipline called intelligent transportation systems (its) emerged. many application possibilities, called user services, were developed and studied. some initiatives originally regarded as the most promising ones failed miserably, due to inattention to deployment issues. these failures as well as the necessity to streamline many concurrent research and development efforts and to minimize risks, a framework for organizing the many deployment issues at the outset of research and development was developed [24]. developing such a framework for application of advanced computing technologies to emerging economies may be a worthy research topic. cloud computing can enable an entire society to take advantage of the economic benefits. this applies not only to the urban and suburban areas that have access to highbandwidth internet access, but also to the remainder of the developing world that have minimal connectivity to the rest of the world. in this section, we first deal with these two crosssections of society and deal with the independent issues that are relevant to each of them. we then point out the leapfrogging opportunity offered by the concept of internal cloud to replace desktop or laptop computing or skip it all together and discuss applications of cloud computing in other sectors of the indian society. a. areas with high-bandwidth internet connections societies with high-bandwidth internet connections, particularly the developed nations, use cloud computing with two specific aims: first, to reduce the cost of application development; and second, to avail of services that have costs that are lower than that built from traditional delivery means. the lower service costs result mainly from a new service delivery and charging paradigm for software applications. for urban and suburban areas of an emerging economy, the costs of accessing various software services can be further reduced due to reduced infrastructure and development costs, if the kind of web-terminals discussed in section 4 are used. note that the infrastructure costs include those incurred for needs assessment, system specification, equipment selection, purchase, installation, maintenance, upgrade, repair, security, and many other life-cycle concerns. assessing overall cost savings achievable via this leap-frogging is a critical research issue for emerging economies. this research was never needed in developed nations and hence has not been paid any attention. as discussed earlier, the lower requirements for hardware and software knowledge associated with cloud computing are another main source of benefits that emerging economies can fully capitalize on. although the benefit of lowered knowledge requirements on accessing the internet is much more intangible than the benefit of lower costs, ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 47 september 2011 international journal on advances in ict for emerging regions 04 assessing this benefit is also an important new research subject. lower cost of application development allows providers to bring applications to the market much faster than what was possible previously. this allows a larger segment of the population and even individuals to contribute applications to the ecosystem, while previously this was the domain of medium to large software houses. we can illustrate this by looking at the costs of two cloud providers. as summarized in table ii, google app engine provides a lot of free resource time for application developers – these free resources can be used to mitigate the cost of application development. in addition, google app engine provides development tools for free. in addition to these free resources, cloud providers such as microsoft azure and google app engine provide low development and hosting costs for cloud applications beyond the free time. see table iii for a comparison. the next logical step is to compare these costs to that incurred by those using traditional development means. assuming that medium to large software houses have consolidated their operations in a mid-size data center, researchers at the rad lab in berkeley have examined costs and found that the infrastructure costs in a large cloud data center virtualized to different application developers are five to seven times lower than that in the mid-size data centers used by the software houses. this gives a tremendous advantage to the next wave of application developers who are not burdened with the cost of maintaining these mid-size data centers. these translate to lower cost of goods for cloud services developed as a result. researchers have audited financial statements to examine claims that service-oriented architecture (soa) leads to higher profits relative to traditional software delivery models [25]. specifically, they have examined vendors that rely on the software-as-a-service (saas) pricing model, and compare their performance to other firms that still use the traditional perpetual license model. the researchers find that, relative to their peers, saas firms tend to have lower costs of goods sold as a portion of revenues. even non-traditional scientific applications get an economic benefit from running in the cloud, which could bolster scientific development in developing nations. kondo et al. [26] did a survey of grid computing applications for scientific processing and tried to determine the cost-benefits of cloud computing versus volunteer computing applications. the authors calculated overhead for platform construction, application deployment, compute rates, and completion times. given a best-case scenario, the authors found that the ratio of volunteer nodes needed to achieve the compute power of a small amazon ec2 instance is about 2.83 active volunteer hosts to 1. one potential drawback of cloud computing is that complex graphical tasks need dedicated computing power on a desktop machine with a powerful gpu. in such a scenario, these applications would not lend themselves to cloud computing and potentially increase development costs in emerging economies. however, even in such a scenario, the advent of multi-core gpus may enable virtualization of resources for cloud computing users and allow them to avoid traditional application development costs. lin and wang [27] have proposed a cloud computing framework for such domains, with the end users only has a relatively inexpensive thin terminal with a high resolution screen and i/o devices. b. areas with limited connectivity although such leap-frogging is quite feasible for urban and suburban areas now or in the near future, where access to the internet is already prevalent and hence is not a big issue, significant challenges exist now for such leap-frogging to take place in rural or remote areas due to low accessibility to the internet. however, we argue that similar leap-frogging directly to cloud computing should still be the goal because it should be much less costly than going through the stage of personal computing and the other traditional computing architectures. this points to a critical area for researchers to enable direct realization of the cloud computing paradigm in this section of society. we again remind the reader of the analogy between it leap-frogging and telephony leap-frogging. with the advent of wireless-communication technology, not only is there much less need for further development of costly wired telephony, the strategy for further developing telephony and the resulting hardware/software architecture can be revamped to fully capitalize on the wireless technology. in the realm of it, the it development strategy and architecture for rural or remote areas of an emerging economy can be revamped in a similar way. a critical research subject is how to allow efficient access to the internet or the cloud in rural or remote areas of an emerging economy without the equipment required in the conventional architectures through which the table ii free development resources in google app engine category free resource (daily rate) cpu 6.5 hours bandwidth out 1 gb bandwidth in 1 gb storage (database) 1 gb storage transactions 10 million table iii development resources cost in microsoft azure and google app engine category windows azure google app engine cpu $0.12/hour $0.10/hour bandwidth out $0.15/gb $0.12/gb bandwidth in $0.10/gb $0.10/gb storage (database) $0.15/gb/month $0.005/gb/month storage transactions $0.01/10,000 not available 48 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 cloud computing evolved in developed nations. more precisely, how can such access be enabled without pcs and laptops? note that this issue has not been tackled by the developed nations because owning pcs, laptops, tablet computers, smart-phones and other advanced devices has not been and will most likely not be an issue. the reader is again reminded that in addition to costs, knowledge requirements are another major challenge for access to the cloud in rural and remote areas of an emerging economy. the examples given below tackle these issues with actual research projects in developing economies and use innovative techniques to bridge the knowledge gap. furthermore, looking at the past, one can look to the evolution of the pc industry as an example of how hardware costs and knowledge gaps can be overcome. the concept of personal computing did not truly exist until apple introduced the macintosh in 1984 – a computer that got rid of command prompts in favor of a graphical user interface. the macintosh did not require a huge learning curve like previous dos command-line driven machines. for the first time, computers became usable by normal people – not just the professionals. microsoft learnt from this new concept of graphically-driven computing, following up with windows in 1986, broadening the market so that almost everyone can afford a computer from different hardware manufacturers in developed nations. by 1995, most households in the us had a pc, and it is not unreasonable to speculate that the combination of wireless and mobile handset technologies can achieve the same for the emerging regions given that already mobile usage has crossed the 2 billion population mark. a key inhibitor to cloud computing in emerging economies is the absence of connectivity to the cloud. many developing regions around the world, especially in rural and remote areas, require low-cost network connectivity solutions. traditional approaches based on telephone, cellular, satellite or fibers have proved to be an expensive proposition especially in low population density and low-income regions. in africa, even though cellular and satellite coverage is available in rural regions, bandwidth is extremely expensive due primarily to low user densities (satellite usage cost is about us$3000 per mbps per month). wimax, another proposed solution, is currently also very expensive and has been primarily intended for carriers (like cellular). wimax is hard to deploy in the “grass roots” style typical for developing regions. wifi-based long distance (wild) networks are emerging as a low-cost connectivity solution and are increasingly being deployed in developing regions. the primary cost gains arise from the use of very high-volume off the shelf 802.11 wireless cards, of which over 140 million were made in 2005. these links exploit unlicensed spectrum, and are low power and lightweight, leading to additional cost savings. these networks are very different from the short-range multi-hop urban mesh networks. unlike mesh networks which use omnidirectional antennas to cater to short ranges (less than 1–2 km at most), wild networks comprise of point-to-point wireless links that use high-gain directional antennas (e.g. 24 dbi, 8 degree beam-width) to focus the wireless signal (for line of sight) over long distances (10–100 km). to extend this connectivity to a large population of users, the developers of wild proposed cellular phones as a medium of connectivity. cellular communications, including handsets and base stations, have become ubiquitous technologies throughout the developing and developed world. roughly three billion users spend large portions of their income on these basic communications. however, the remaining half of the world currently has limited access, in large part due to lack of network coverage. some areas do not have a high enough population density to support a traditional cellular deployment. other areas are too far from established infrastructure to make a deployment economically feasible. this leads to many rural areas where there is no network coverage at all. to resolve this issue, the wild developers propose the village base station (vbts), which provides four main benefits:  flexible off the grid deployment due to low power requirements that enable local generation via solar or wind;  explicit support for local services within the village that can be autonomous relative to a national carrier;  novel power/coverage trade-offs based on intermittency that can provide bursts of wider coverage; and  a portfolio of data and voice services (not just gsm). vbts is essentially an outdoor pc with a software-defined radio that implements a low-power low-capacity gsm base station. long-distance wifi provides “backhaul” into the carrier. at around 20w, its power consumption is low enough to avoid diesel generators and the corresponding requirement for roads and fences. this also reduces the operating costs significantly. the base station can be deployed in the middle of the village, on a nearby hill, or in any other area with lineof-sight coverage. although much of the contribution of vbts is engineering the combination of a software radio, wifi backhaul, and local generation, there are two main societal contributions:  the development of a platform for a wide range of services  the optimization of coverage versus power consumption via variable power and intermittent coverage the connectivity has led to the development of unique solutions for the general population, one of which covers the field of education. english is the language of power in india associated with the middle and upper classes. in other developing regions, it is another language such as spanish, mandarin, or french which is not native to most of the population. the public school systems in developing regions face insurmountable difficulties. in india, for example, it has been consistently difficult to converse in english with those teachers responsible for teaching english in poor schools, where the overwhelming majority of children in the country struggle to learn. more important, public schooling is out of ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 49 september 2011 international journal on advances in ict for emerging regions 04 the reach of large numbers of children in rural areas and the urban slums who cannot attend school regularly, due to their need to work for the family in the agricultural fields or households. at the same time, cell phones are increasingly adopted in the developing world, and an increasing fraction of these phones feature multimedia capabilities for gaming and photos. these devices are a promising vehicle for out-ofschool learning to complement formal schooling. in particular, the english learning games on cell phones present an opportunity to dramatically expand the reach of english learning, by making it possible to acquire esl in out-ofschool settings that can be more convenient than school. games can make learning more engaging while incorporating good educational principles. more important, a large-scale evaluation with urban slums children in india has shown significant learning benefits from games that target mathematics. the developers of the english learning project believe that similar outcomes can be replicated with elearning games that target literacy. the challenge in evaluating any language learning project, however, is that the language acquisition is a long-term process on the learner‟s part. worse, with a novel technology solution that has yet to be institutionalized, there are tremendous logistical obstacles in running a pilot study over a non-trivial duration. after 3 years, in which the developers commenced with needs assessments and feasibility studies, followed by subsequent rounds of field testing interleaved with numerous iterations on our technology designs, they have established the necessary relationships with local partners for such an evaluation. another approach to connectivity in emerging economies is the shared usage of scarce networking bandwidth. computer scientists in pakistan are building a system to boost download speeds in the developing world by letting people effectively share their bandwidth. software chops up popular pages and media files, allowing users to grab them from each other, building a grassroots internet cache. in developing countries, almost all the traffic leaves the country. that's the case even when a pakistani user is browsing websites hosted in his or her own country. the packets can get routed all the way through a developed nation and then back to a developing country. so a team in pakistan is developing donatebandwidth, a system inspired by the bittorrent peerto-peer protocol that is popular for trading large music, film, and program files. with bittorrent, people's computers swap small pieces of a file during download, reducing the strain placed on the original source. donatebandwidth works in much the same way but lets people share more than just large files. when users try to access a website or download a file, a donatebandwidth program running on their machine checks first with the peer-to-peer cache to see if the data is stored there. if so, it starts downloading chunks of the file from peers running the same software, while also getting parts of the file through the usual internet connection. the software could allow people in countries that have better internet connections to donate their bandwidth to users in the developing world. donatebandwidth also manipulates an isp's cache. when running donatebandwidth, a computer starts downloading part of a file, while also sending a request for other donatebandwidth users who have access through the same isp, and whose computers have spare bandwidth, to trigger them to start downloading other parts of the same file. the file is then loaded into the isp's cache, so it can be downloaded more quickly. the project is similar to distributed computing schemes such as seti@home, which uses volunteers' spare computer power to collaboratively analyze radio signals from space, looking for signs of intelligent life. donatebandwidth permits sharing of unused internet bandwidth, which is much more valuable in the developing world, compared to computing cycles or disk space. c. potential of internal cloud for non-transaction it in developed nations, many transaction-based enterprise it applications involving large databases have already been or are being migrated from the client-server architecture to cloud computing, particularly internal cloud. the task of such migration is not very difficult because the uses of the applications are already centralized and the primary task is to implement the client-server interaction with the browser technology. however, many organizations in developed nations, and particularly those in emerging economies, whose primary it needs are not database-oriented transactions can also benefit from use of internal cloud computing. the main idea is to have all resident software applications (not supported by an external cloud) migrated to an internal cloud, i.e., centralized and virtualized servers. this way, there is no need to have a desktop computer on top of every desk. just replace every desktop computer with a webterminal. the web-terminals will need virtually no it support. all the efficiency benefit achievable through an external cloud can be achieved with such an internal cloud, except for the larger economy of scale. obviously, this internal cloud can be supplemented with access to an external cloud. for example, free software applications for office work are widely available in the internet, if they are not provided in the internal cloud. such free applications include google doc, google spreadsheet, etc. the proposed concept of leapfrogging by means of performing all computing in the cloud and accessing the cloud only through a simple web-terminal can be referred to as „pure cloud computing‟ or simply „pure cloud.‟ this new term of „pure‟ cloud is intended to articulate a new concept of cloud computing and differentiate its minimum client-side hardware and software requirements from the higher clientside requirements of any other cloud-computing implementation. „pure cloud‟ is based not only on cloud computing but even an easier, more easily deployable and therefore more widely acceptable implementation of cloud computing where the prerequisites needed to operate the clients are almost nothing. only a 50 jacob tsao, shailaja venkatsubramanyan, shrikant parikh and prasenjit sarkar international journal on advances in ict for emerging regions 04 september 2011 simple browser-based access to the cloud suffices, and it can be supported on thin, simple and low-cost web-clients. this „pure cloud‟ concept will be particularly beneficial for emerging economies. migration of the non-transaction it uses to an internal or external cloud may be beneficial for any organization in general but may be particularly beneficial for large non-profit organizations like schools and universities, hospitals, etc. when such migration is implemented in elementary through high schools, in developed nations or emerging economies, potential benefits include not only reduction of hardware, software and labor costs but also increased ability of schools to monitor or even control of proper student access to internet contents. note that video-game playing and internet browsing unrelated to classroom learning may severely distract students. other potential benefits include reduced opportunities for equipment theft or damage. more importantly, due to the significantly lower hardware, software and labor costs associated with the proposed internal cloud architecture, schools may finally be able to provide in-class access to computing and the internet. d. cloud computing for other services or countries we discuss in this sub-section cloud-computing-based initiatives in other sectors of india or in other countries. due to space limitation, we focus on the health industry. after a brief discussion of application of cloud computing in the us, we illustrate leap-frogging opportunities for emerging economies with one project proposal submitted to the pakistan government and one specific start-up company in india. it is well known that the us is experiencing a daunting healthcare crisis. innovative it has been regarded as a major improvement that can significantly reduce cost and increase quality. among the better known innovations is the electronic medical record (emr) or electronic health record (ehr). to facilitate coordination and even direct communication among different healthcare providers involved in the care of a patient, most of the emr implementations are based on cloud computing, particularly internal cloud. cloud-based emr presents an obvious leap-frogging opportunity for emerging economies. what has not been discussed in the developed nations is the leap-frogging opportunity to “pure cloud,” mentioned earlier about exclusive use of web-terminals, in lieu of desktop computers, for cloud access. the first author participated in a threepartner team effort in response to a request for proposals “establishing center of innovation for use of ict in healthcare sector” issued by the national ict r&d fund of the ministry of information technology of pakistan. the proposal features the use of cloud computing, including the concept of “pure cloud,” as a main theme of the proposed research. the funding decision is yet to be made. after a nearly one-year halt of operations, the national ict r&d fund recently resumed operations [28]. the third author is a co-founder of an indian company called a3 biomed technologies ltd [29]. the company is in business for providing solutions and services that will enable cardiologists to remotely monitor their patients via mobile phone from anywhere in world. the solutions are based on wireless devices connecting the patient to a server. the company provides services, based on these solutions, to healthcare organizations like nursing homes, small hospitals and hospital chains. in the healthcare industry, patient‟s privacy, information confidentiality and data security are very important. also, healthcare organizations are very possessive about their patients‟ medical records, for good reasons. as a result, each organization wants its own private server. however, many organizations are not able to support (and afford) their own physical servers on their premises and all the operational and management overhead such a private server will entail. this would impede wide adoption of the remote patient monitoring services. however, the cloud server technology comes to the rescue. a3's server software components can be installed on a cloud-based server, and each organization can have its own “private cloud” that is managed by a3 or a third party. note that, in this architecture, a cloud-based server, including the hardware infrastructure, can be dedicated for exclusive usage by one organization at an affordable cost. so, the cloud-based server technology plays a crucial role in roll-out and adoption of this life-saving remote monitoring technology. note that such remote patient monitoring is of particular relevance to countries like india where there is a general dearth of qualified doctors and patients residing in vast areas of the country do not have easy access to proper healthcare. viii. conclusions in the process of our seeking to streamline india‟s grain supply chains, we discovered cloud computing to be a promising solution. cloud computing enables use of computing without user-side hardware, software and the associated financial and knowledge requirements, except for the need of a “web-terminal” and access to the internet. we capitalized on this opportunity and proposed a solution based on cloud computing and other operational concepts for streamlining grain supply chains in india. design of a prototype system is underway that fulfills the functions currently provided by the middlemen between the wholesalers and retailers, adds value to the retailers, wholesalers and even consumers, and is user-friendly. in addition to distributed hubbing, “consumer visible” branding and cloud computing, which was the main focus of this paper, many other improvement opportunities exist. further integration of it with the supply chains beyond the proposed use of cloud computing on the consumer side and the expanding use of echoupal.com on the farmer side is critical, e.g., rfid, for this worthy and fertile research area. another improvement opportunity is to increase the economy of scale. significant reduction in the current number of 600 ict leap-frogging enabled by cloud computing for ee: a case study on streamlining india’s grain supply chains 51 september 2011 international journal on advances in ict for emerging regions 04 wholesalers may lead to significantly higher efficiency due to not only consolidated wholesale operations at the hub or subhubs but also coordinated distribution. it may actually play a significant role in such a possible reduction, and this and many other it studies are worthy subjects for future research. assessing overall cost savings achievable for urban and suburban areas, where access to the internet is not a big issue, via this leap-frogging is a critical research issue for emerging economies. another critical research subject is how to allow efficient access to the internet or the cloud in rural or remote areas of an emerging economy without expensive pcs and laptops. we believe that many existing operations in other industrial sectors of an emerging economy can benefit from cloud computing in similar ways. in fact, we argued that such leapfrogging can benefit the entire society of an emerging economy. acknowledgment the guest editor‟s and reviewers‟ constructive comments led to a much better paper than the original version and are gratefully acknowledged. references [1] p. mell and t. grance, “the nist definition of cloud computing,” (2011). http://csrc.nist.gov/publications/drafts/800-145/draft-sp-800145_cloud-definition.pdf; accessed on july 29, 2011. [2] rimal, b.p., choi, e., and lumb, i. (2009), “a taxonomy and survey of cloud computing systems,” proceedings of the fifth international joint conference on inc, ims and idc, p 44-51 [3] cleverley, m. (2009), “emerging markets. how ict advances might help developing nations,” communications of the acm, vol. 52, no. 9, pp. 30 -32. [4] tsao , h.-s. j., parikh, s., ghosh, a.s., pal, r., ranalkar, m., tarapore, h., and venkatsubramanyan, s. (2010), “streamlining grain supply chains of india: cloud computing and distributed hubbing for wholesale-retail logistics,”., proceedings of 2010 ieee international conference on service operations and logistics, and informatics (ieee-soli 2010), qingdao, china. [5] google, http://www.google.com/intl/en_us/mobile/sms/ . [6] google, http://www.google.com/intl/en_us/mobile/sms/search/ . [7] sachan, a., sahay, b.s., and sharma, d. (2005), “developing indian grain supply chain cost model: a system dynamics approach,” international journal of productivity and performance management;vol. 54, no. 3/4. pp.187-205. [8] sachan, a., sahay, b.s., and mohan, r. (2006) “assessing benefits of supply chain integration using system dynamics methodlogy,” international journal of services technology and management, vol. 7, no. 5/6, pp. 582-601. [9] dharni, k. and sharma, s. (2008),”food processing in india: opportunities and constraints,” the icfai university journal of agricultural economics, vol. v, no. 3., the icfai (institute of chartered financial analysts of india) university press, india. [10] singh, r., singh, h.p., badal, p.s., singh, o.p., kushwaha, s., and sen, c. (2009), “problems and prospects of food-retailing in the state of uttar pradesh (india),” journal of services research, vol. 8, no. 2 (october 2008-march 2009), institute for international management and technology, new delhi, india. [11] icra (2001a), report on fmcg, investment information and credit rating agency, new delhi, india, march, 2001. [12] icra (2001b), the indian fmcg sector, investment information and credit rating agency, new delhi, india, may, 2001. [13] etig (2003), changing gears: retailing in india, economic times intelligence group, mumbai, india. [14] david m. upton, virginia a. fuller (2003), “itc echoupal initiative,” hbs case 604-016, harvard business school, harvard university; also in “corporate information strategy and management” by applegate, l.m., austin, r.d. and mcfarlan, f.w. (8 th edition). [15] milojicic, d. (2008),”cloud computing: interview with russ daniels and franco travostino“, ieee internet computing, sept./oct., 2008. [16] youseff, l., butrico, m., and da silva, d. (2008), “toward a unified ontology of cloud computing,” proceedings of 2008 grid computing environments workshop. [17] indian harvest, centre for monitoring indian economy pvt. ltd. http://www.cmie.com/database/?service=database-products/sectoralservices/indian-harvest.htm, accessed on july 30, 2011. [18] agriculture.com, http://www.agriculture.com/markets/commodities, accessed on july 30, 2011. [19] agricommodityprices.com, http://www.agricommodityprices.com, accessed on july 30, 2011. [20] faostat, food and agriculture organization, united nations, http://faostat.fao.org/site/291/default.aspx, accessed on july 30, 2011 [21] pricestat, food and agriculture organization, united nations, http://faostat.fao.org/site/570/desktopdefault.aspx?pageid=570#ancor, accessed on july 30, 2011. [22] national agricultural statistics service, united states department of agriculture, http://www.nass.usda.gov/index.asp, accessed on july 30, 2011. [23] indexmundi.com, http://www.indexmundi.com/commodities, accessed on july 30, 2011. [24] tsao, h.-s. j., "a framework for evaluating deployment strategies for intelligent transportation systems", intelligent transportation systems journal (its journal), vol.6, pp. 141-173, 2001. [25] t. hall and j. luter iii. is soa superior? evidence from saas financial statements. journal of software, vol. 3, no. 5, may 2008. [26] kondo,d., javadi, b., malecot, p., cappello, f., and anderson, d.p. (2009), “cost-benefit analysis of cloud computing versus desktop grids,” 2009 ieee international symposium on parallel & distributed processing (ipdps). [27] lin, t. and wang, s. (2009), “cloudlet-screen computing: a multicore-based, cloud-computing-oriented, traditional-computingcompatible parallel computing paradigm for the masses,” proceedings 2009 ieee international conference on multimedia and expo (icme), p 1805-1808. [28] national ict r&d fund, request for proposals: “establishing center of innovation for use of ict in healthcare sector,”; http://www.ictrdf.org.pk/, accessed on july 30, 2011. [29] a3 biomed technologies, http://www.a3biomed.com/index.htm, accessed on july 30, 2011. http://csrc.nist.gov/publications/drafts/800-145/draft-sp-800-145_cloud-definition.pdf http://csrc.nist.gov/publications/drafts/800-145/draft-sp-800-145_cloud-definition.pdf http://www.google.com/intl/en_us/mobile/sms/ http://www.google.com/intl/en_us/mobile/sms/search/ http://hbr.org/search/david+m.+upton http://hbr.org/search/virginia+a.+fuller http://www.engineeringvillage2.org.libaccess.sjlibrary.org/controller/servlet/controller?cid=quicksearchcitationformat&searchword1=%7byouseff%2c+l.%7d§ion1=au&database=3&yearselect=yearrange&sort=yr http://www.engineeringvillage2.org.libaccess.sjlibrary.org/controller/servlet/controller?cid=quicksearchcitationformat&searchword1=%7bbutrico%2c+m.%7d§ion1=au&database=3&yearselect=yearrange&sort=yr http://www.engineeringvillage2.org.libaccess.sjlibrary.org/controller/servlet/controller?cid=quicksearchcitationformat&searchword1=%7bda+silva%2c+d.%7d§ion1=au&database=3&yearselect=yearrange&sort=yr http://www.cmie.com/database/?service=database-products/sectoral-services/indian-harvest.htm http://www.cmie.com/database/?service=database-products/sectoral-services/indian-harvest.htm http://www.agriculture.com/markets/commodities http://www.agricommodityprices.com/ http://faostat.fao.org/site/291/default.aspx http://faostat.fao.org/site/570/desktopdefault.aspx?pageid=570#ancor http://www.nass.usda.gov/index.asp http://www.indexmundi.com/commodities http://www.engineeringvillage2.org.libaccess.sjlibrary.org/controller/servlet/controller?cid=quicksearchcitationformat&searchword1=%7btao+lin%7d§ion1=au&database=3&yearselect=yearrange&sort=yr http://www.engineeringvillage2.org.libaccess.sjlibrary.org/controller/servlet/controller?cid=quicksearchcitationformat&searchword1=%7bshuhui+wang%7d§ion1=au&database=3&yearselect=yearrange&sort=yr http://www.ictrdf.org.pk/ http://www.a3biomed.com/index.htm abstract—human computer interaction (hci) is currently aimed at the design of interactive computer applications for human use while preventing user frustration. when considering the nature of modern computer applications, such as e-learning systems and computer games, it appears that human involvement cannot be improved only by using traditional approaches, such as nice user interfaces. for a pleasant human involvement, these computer applications require that the computers should have the ability to naturally adapt to their users and this requires the computers to have the ability to recognize user emotions. for recognizing emotions currently most preferred research approach is aimed at facial expression based emotion recognition, which seems to have many limitations. therefore, in this paper, we propose a method to determine the psychological involvement of a human during a multimedia interaction session using the eye movement activity and arousal evaluation. in our approach we use a low cost hardware/software combination, which determines eye movement activity based on electrooculogram (eog) signals and the level of arousal using galvanic skin response (gsr) signals. the results obtained using six individuals show that the nature of involvement can be recognized using these affect signals as optimal levels and distracted conditions. index terms—arousal, attention, cognition, emotion, eog, eye movement activity, gsr, hci, human involvement, multimedia interactions. ii. ntroduction most modern multimedia applications come up with very attractive user interfaces and hci studies the design of user interfaces in greater detail [20]. while most applications benefit from nice user interfaces, such as iphone twitterrific, there is another category of applications where the human-machine interaction could be improved by having machines naturally adapt to their users, for instance tutoring systems. in such systems the adaptation involves the consideration of emotional manuscript received march 12, 2009. accepted november 20th, 2009. this research was funded by the national science foundation, sri lanka. hiran b. ekanayake is with the university of colombo school of computing , 35, reid avenue, colombo 7, sri lanka (e-mail: hbe@ucsc.cmb.ac.lk). damitha d. karunarathna and kamalanath p. hewagamage are also with the university of colombo school of computing , 35, reid avenue, colombo 7, sri lanka (e-mail: ddk@ucsc.cmb. ac.lk, kph@ucsc.cmb.ac.lk). determining the psychological involvement in multimedia interactions hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage information, possibly including the expression of frustration, dislike, confusion, excitement, etc. this emotional communication along with handling affective information in hci is currently studied under affective computing [35]. human emotions are believed to be containing an emotional judgment about one’s general state and bodily reactions [6] [42]. for instance, when a driver cuts one off, he/she would experience physiological changes such as increases in heart rate and breathing rate, as well as feeling and expression of fear and anger. literature suggests many research approaches that can be used to recognize emotions such as by using facial expressions [39], changes in tone of voice, affect signals [17] and electroencephalography (eeg) [7]. however, these approaches contain their own limitations. humans’ abilities to make decisions, judgments and keep information in memory are all studied under cognitive science [15]. although there is no widely accepted definition for human attention, it is considered as a cognitive process that helps humans to selectively concentrate on few tasks while ignoring other tasks, for instance concentrating on a movie played on a computer screen ignoring what is happening outside. according to recent developments in cognitive science emotions also play a major role in human cognition especially in decision making and memory, such as flashbulb memory [15][42]. the research work discussed in this paper identifies human involvement in hci as a measurable psychological phenomenon and it proposes several involvement types. some of these involvement types can be considered as improving hci while others are none or less involvement conditions. in determining human involvement the work proposes a low-cost hardware/software approach that consists of gsr and eog sensing devices and recording software. the remaining sections of this paper are organized as follows: in the related work section, the two popular approaches to improve human involvement in multimedia interactions are presented with their positive and negative aspects. the section also gives a brief overview of mental tasks, the role of attention in coordinating mental tasks, the correlation between eye activity and visual attention, the role of emotions in humans, recognizing emotions from psychophysiological signals especially using the international journal on advances in ict for emerging regions 2009 02 (01) : 11 20 december 2009 the international journal on advances in ict for emerging regions 02 12 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage the changes in skin conductance, and human involvement under cognitive emotional valances. the methodology section discusses the proposed methodology of determining psychological involvement in multimedia interactions and it identifies several involvement types with their distinguishable characteristics with respect to eye activity and gsr activity. this section also includes a brief discussion of proposed low-cost hardware/ software framework for evaluating the proposed involvement cases. experimental process and results are presented in the results section. finally, the last section concludes by commenting on the findings and suggestive future work. related workii. improving human involvement in a. multimedia interactions hci attempts to improve human involvement in computer-based multimedia interactions by improving user interfaces and presentation of multimedia content [20]. this also includes monitoring and profiling of human behavioural patterns, such as their preferred visiting paths and selections, and personalization of multimedia content to support the preferences of individual human users [5][11][19]. although this method has the potential to assign similar patterns for similar multimedia content, for different types of content, the predicted patterns may not give acceptable outputs. another drawback of this approach is that this method is less sensitive to human mood changes and long-term behavioural changes. therefore, as an improvement to this approach it has been proposed that human emotional information captured in real-time can be used as a feedback to make the machines adapt to its users and change the presentation accordingly [36]. currently, the most prevailing method for capturing human emotional information is by capturing facial expressions of users. the challenges for this method are that the quality of a facial expression analyser depends on a number of properties, such as accurate facial image acquisition, lighting conditions, head motion, etc. [33] and masked emotional communication [36], for instance a “poker face” to mask a true confusion. on the contrary, another school of researchers are developing theories to model human like cognition and related aspects in a computer to make the machines think and act like humans, thus with the expectation that these models can predict and decide how it should communicate information with humans in a human-like manner improving the relationship. these attempts are varying from artificial intelligence (ai) based techniques [9] [32] to cognitive modelling techniques like act-r models [3]. although, the cognitive science day-today reveal many other functions and relationships between human cognition, emotion and physiology, it is in doubt that the true human behaviour can ever be modelled in a computer when the biological and sub-symbolic nature of humans is considered [42]. mental tasks, attention, and eye activityb. in psychology it is believed that people have only a certain amount of mental energy to devote to all the possible tasks and to all the incoming information confronting them [15]. nature has resolved this problem by giving the ability to filter out unwanted information and to focus the cognitive resources on few tasks or events rather than on many through a method called attention [15][40]. according to the model of memory [15], the working memory is the area that contains information about currently attended tasks. the attention plays an important role concerned with coordinating information in the working memory resulting from the outside environment as well as information from the long term memories. the kahneman’s model of attention and effort is a model that explains the relationship between mental effort for tasks and attention [15] [25]. according to this model the attention is enforced through an allocation policy which is affected by both involuntary and voluntary activities. for example, while opera lovers are more likely to concentrate during an opera session, others would feel drowsy even if they want to be awake. although there are lot more theories to explain the attention, one prevailing theory that explains the visual attention is the spotlight metaphor that compares attention to a spotlight that highlights whatever information the system has currently focused on [15][40]. according to this theory, one can attend to only one region of space at a time and shift of attention is considered as a change of previously focused tasks. spotlight theory is used in implementing visual attention in modern cognitive modelling architectures [2]. apart from the visual attention, auditory attention also plays an important role concerning attention. theories such as broadbent’s filter theory, treisman’s attenuation theory and deutsch and deutsch’s theory suggest that all incoming messages are processed up to a certain level before they are selectively attended to [15][40]. the published evidence supports that eye movements directed to a location in space are preceded by a shift of visual attention to the same location [18][21][23]. however, the attention is free to move independent of eyes. eye-tracking is one of the most active research areas studying december 2009 the international journal on advances in ict for emerging regions 02 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage 13 eye movements and these eye movements consist of saccades and fixations [12]. saccades are rapid ballistic changes in eye position that occur at a rate of about 3-4 per second and the eye is blind during these movements. the information is acquired during long fixations of about 250 milliseconds that intervene during these saccades. biomedical investigations recognize eog as a technique that can be used to measure the gaze angle and assess the dynamics of eye movements [24][26] [27][37]. in this method it is required to place the sensors on the sides of the eyes for measuring the horizontal motions of the eyes and above and below the eyes if the vertical motions of the eyes are also studied. emotions and psychophysiological signalsc. there is no universally agreed explanation for emotional responses in humans. literature suggests many reasons for emotions and many other factors having influence on it, such as limbic system activity [6][42], asymmetries between cerebral hemispheres [29], gender differences [34], mental states and dispositions, and disequilibria between the self and the world [10]. emotional responses are considered to have two levels of responses [42], cognitive judgment about one’s general state and bodily reaction. the cognitive judgment mainly contributes to the motivation of goal accomplishment, memory processing, deliberative reasoning, and learning. bodily reactions of emotions are of two forms, i.e. expressions and physiological signals. this response is considered to have two dimensions: pleasure (pleasant and unpleasant) and arousal (aroused or unaroused). emotion research mainly focuses on this bodily reaction in emotion recognition. facial expression analysis is one of the heavily researched areas in recognizing emotions. paul ekman has studied the presence of basic emotional categories expressed by facial expressions across different cultures and ethnicities and identified eight facial expressions [13], happiness, contempt, sadness, anger, surprise, fear, disgust, and neutral. it is believed that these basic emotions provide the ingredients for more complex emotions, such as guilt, pride and shame [22]. one of the major challenges for facial expressions based approaches is that people’s ability to mask their true expressions if they do not like to communicate their true feelings [36]. in contrast, most emerging methods for emotion recognition are provided by peripheral and central nervous system signals [7][10][29]. the sympathetic activation of autonomic nervous system of the peripheral nervous system and the activation of endocrine system introduce changes to the heart rate, skin conductivity, blood volume pressure, respiration, and many other sympathetic organs which can be detected using biofeedback sensing instruments. healey and picard [17] in their paper present how the emotions can be recognized from these physiological signals with a higher accuracy. apart from the peripheral signals, eeg signals from the brain have also proved the possibility in assessing the level of arousal [7]. gsr or skin conductance response (scr) is another popular method known to be having a nearly linear correlation with the person’s arousal level thus with the cognitive emotional activity of a person [38][41]. therefore, gsr is used as a method for quantifying a person’s emotional reaction to different stimuli presented. literature suggests that the low level of cortical arousal is associated with relaxation, hypnosis, and subjective experience of psyche states and unconscious manifestations, whereas the high level of cortical arousal is associated with increased power of reflection, focused concentration, increased reading speed, and increased capacity for long-term recall. the skin conductivity or gsr is associated with this cortical arousal with the relationship that when the arousal of the cortex increases, the conductivity of the skin also increases, and when the arousal of the cortex decreases, the conductivity of the skin also decreases and this is resulting from the “fight or flight” behaviour of the autonomic nervous system. however, literature shows few other responses which can have an impact on the electrical resistance of the skin. the following summarizes the causes for skin electrical activity: tarchanoff response is a change in dc potential • across neurons of the autonomic nervous system connected to the sensory-motor strip of the cortex. it has an immediate effect (0.2 to 0.5 seconds) on the subject’s level of arousal and this effect can be detected using hand-held electrodes, because hands have a particularly large representation of nerve endings on the sensory motor strip of the cortex. “fight or flight” stress response of the autonomic • nervous system comes into action as the arousal increases as a result of increased sweating due to release of adrenaline. this is a slow response compared to tarchanoff response. forebrain arousal is a complex physiological • response, unique to man, affecting the resistance in thumb and forefinger. changes in alpha rhythms cause blood capillaries to enlarge and ultimately this too affects the skin resistance. d. cognition, emotion and brain’s involvement the emotional reaction of body has sympathetic effects to the body for “fight or flight” behaviour as december 2009 the international journal on advances in ict for emerging regions 02 14 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage well as increased activation in the reticular activation system (ras) [4], [38]. the reticular activation system is the centre of attention and motivation of the brain [1], [28], [30]. it is also the centre of balance for the other systems involved in learning, self-control or inhibition, and motivation. when it functions normally, it provides the neural connections that are needed for the processing and learning of information, and the ability to pay attention to the correct task. however, if over excited, it distracts the individual through excessive startle responses, hyper-vigilant, touching everything, talking too much, restless and hyperactive behaviours. fig. 1 shows the co-relation of attention and arousal. fig. 1. the interaction of attention and arousal. methodologyiii. during a computer-based multimedia interaction session, the optimal involvement can be expected as the human participant looks towards the computer screen and psychologically experiences the content presented by the computer. however, in real situations this expected involvement behaviour can not be observed all the time as the disturbances can occur as a result of outside events and internal stress responses of the participant. in our research, we hypothesize that the eye movement activity measured as eog signals can be used to distinguish whether a participant is attending the visual content presented at the computer or not. the reason for using an eog based approach is that eog signals can be captured using low cost hardware (cost about usd 1000) in contrast to using expensive eye tracking systems (cost about usd 10000). we expect low magnitude eog signals as a result of saccade eye movements when the participant’s visual space is limited by screen dimensions than when the participant is attending the general visual space or the environment (fig. 2). since the primary task during a multimedia interaction session is to pay attention to the content that appears on the computer screen, we assume that the fixations during eye movements are mostly located within the visual space defined by screen dimensions. fig. 2. subject’s limited visual space during multimedia interaction. apart form using eog signals to distinguish visual focus, these signals may be used to identify inattention conditions, such as drowsy situations. there are empirical evidences supporting that one’s eye blink rate increases as one gets drowsy [31]. although, eog can be used to identify visual focus, it is less effective in determining whether the participant is psychologically experiencing the multimedia content, because, simply looking at the computer screen does not mean that the participant is mentally attending to the content that is seen. therefore, to identify this mental involvement we employ skin conductivity based measurements or gsr. literature points out that the maximum attention or the optimal involvement can be gained when the arousal is moderate whereas too much or too little arousal does not give satisfactory levels of involvement. in our research we propose low cost hardware and software solution to capture eog and gsr signals. our eog hardware unit is based on grant’s sound card eeg project [8] (fig. 3). the cost for building this unit is around usd 100 whereas commercial products are ten times more expensive than this. the software to interface with this device is freely available in the project website and for our solution we have used the neuroprobe. to detect eog signals it is required to place electrodes at both sides of eyes and middle of forehead. we have developed a headband mounting these electrodes, so that the electrode placements can be done easily and without much difficulty to the participant. the eog hardware then receives eog potentials which are in the range of 10 – 100 microvolts and these potentials are amplified, modulated and transmitted to the computer through the sound card. at the computer, neuroprobe software demodulates the signal and filters out unwanted components, such as 50 hz a/c interferences and electromyography (emg) signals, recovering original eog waveforms. december 2009 the international journal on advances in ict for emerging regions 02 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage 15 fig. 3. eeg hardware unit to capture eog signals. for capturing gsr signals we used a lego mindstorms brick based solution [16] (fig. 4). this unit is about usd 100 whereas commercial gsr recorders cost more than usd 1000. the electrodes are wrapped around middle and index fingers of the left hand of a participant, so that the right hand is free to use for other tasks, such as controlling the mouse. although the brick can take readings at a rate of about 40 samples/second, since the transmission to the computer is through an ir link, the achievable transmission rate is about 2 samples/ second. to reduce the errors, we have implemented a gaussian smoother in the brick software. moreover, the readings are represented as a value between 0 to 1023, called the raw value, and the relationship between the actual skin resistance (sr) and the reading is given by, reading value = 1023*sr/ (sr+10000). we thought this accuracy is sufficient because usually in emotion research the response window of 1 to 10 seconds in analysed [17]. fig. 4. lego gsr sensor unit. finally, eog signals received from the neuroprobe and gsr signals received from the brick are fused at a software developed by us. this software is also capable of annotating the signals based on media events, such as media transitions and user defined events. the recorded signals are then analysed using matlab signal processing toolbox [43]. resultsiv. the experiments were conducted using six volunteers labelled a, b, c, d, e and f (fig. 5). fig. 5. six individuals facing the experiment. multimedia interaction session consisted of several multimedia types and they were labelled as i01, i02, etc. table i gives a brief description of multimedia interactions used in the experiment. table i multimedia interactions used in the experiment interaction id description i01 a song without visual content i02 still and exciting picture without auditory content i03 a video clip containing an exciting event i05 a video lecture without exciting events i06 repeat of the same interaction i05 i08 a video clip containing an exciting event i13 after computer-based multimedia interactions, the recording was continued for a while without informing the subject for each subject, for each interaction, gsr and eog signals were recorded. the letter g represents a gsr waveform. the time is measured in seconds. fig. 6 shows the gsr waveforms recorded for each subject over the interactions i01, i02 and i03. graphs (a) and (c) in fig. 6 show some relationship between the gsr signal waveforms of each subject. however, graph (b) does not show much change in its signal waveforms or clear relationship between signals. december 2009 the international journal on advances in ict for emerging regions 02 16 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage table ii mean signal magnitueds of eog waveforms of each participant over interactions 101, 102 and 103 subject i01 i02 i03 a 556 620 510 b 345 524 623 c 518 587 343 d 476 352 363 e 590 487 563 f 278 292 276 average 460 477 446 fig. 7 shows samples of eog waveforms recorded for each subject over the interactions i03 and i13. the letters l and r represent left and right eye eog waveforms respectively. for instance, b05l corresponds to the left eye eog waveform for the interaction i05 for the subject b. (a) (a) (b) (c) fig. 6. gsr waveforms for each individual for (a) interaction i01, (b) interaction i02 and (c) interaction i03. in order to check the quality of visual attention over three types of multimedia interactions, i01, i02 and i03, the mean signal magnitudes of eog waveforms were calculated for each subject and tabulated in table ii. the results in table ii show that the interaction i03 has the lowest average eog value compared to the interactions i01 and i02. december 2009 the international journal on advances in ict for emerging regions 02 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage 17 (b) fig. 7. eog signals for each subject for the (a) on-screen interaction (i03) and (b) off-screen interaction (i13). table iii gives the mean values of eog signal waveforms obtained for the interactions i03, i05, i06, i08 and i13 and it compares the increase of average eog signal magnitude over the interaction i13 respect to interactions i03, i05, i06 and i08 (denoted as i03..8). table iii a comparison of mean signal magnitudes of eog waveforms between on-screen interactions and an off-screen interaction subject i03 i05 i06 i08 i03..8 i13 increase a 510 679 699 690 644 2210 343% b 623 492 583 848 636 1214 191% c 343 error error error 343 1170 341% d 363 399 592 309 416 1150 276% e 563 869 827 578 710 1369 193% f 276 572 434 418 425 612 144% average 446 602 627 569 561 1288 248% from fig. 7 and results in table iii it is apparent that average eog signal magnitudes for off-screen interaction has 1.5 to 3.5 times increased value than the average eog magnitudes for on-screen interactions. fig. 8 shows the gsr waveforms recorded for each subject over the interactions i05, i08 and i13. table iv gives the means calculated for each of the gsr signals. (a) (b) (c) fig. 8. gsr waveforms for each individual for (a) interaction i05, (b) interaction i08 and (c) interaction i13. from fig. 6 (c) and fig. 8 (b) it can be observed that for all the participants over the time segment 150-200 seconds of the interaction i03 and the time segment 25-35 seconds of the interaction i08 the gsr waveforms show a similar pattern. during these time segments the participants were observing the exciting events contained in those multimedia documents and from their facial expressions it was observed that they are getting excited for a while. december 2009 the international journal on advances in ict for emerging regions 02 18 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage from fig. 8 (a) it can be observed that for participants b and d the gsr waveforms show very similar patterns and except for participant e all the other gsr waveforms show less variance. during this interaction the participants were observing the video lecture and except the participant e all the others showed they are concentrating over the interaction form their facial expressions. however, it was observed that participant e is having some body movements. fig. 8 (c) identifies the gsr waveforms when the participants are not attending on-screen interactions. from table iv it is apparent that off-screen interaction gives the lowest mean gsr signal values for all the participants than on-screen interactions. emotionally significant interactions, i.e. i03 and i08, give the moderate mean gsr signal values and video lecture interactions, i.e. i05 and i06, give the highest mean gsr signal values. fig. 9 shows the gsr waveforms for subjects b and d over the interactions i05 and i06. during the interaction i05 it was observed from their facial expressions that both participants b and d were concentrating over the interaction. however, when the interaction is repeated (i.e. i06) boring (or inattention) behaviours were observed. the boring behaviour was distinguishable from periodic rapid (a) (b) fig. 9. gsr waveforms for the interactions i05 and i06 of (a) participant b and (b) participant d. eye activity and frustrated facial expressions. fig. 10 shows instances of eog waveforms when the participant is concentrating during the interaction i05 and falling into boring and drowsy (inattention) situation during the interaction i06. fig. 10. eog waveform instances of the subject b (a) concentrating over the interaction i05, (b) active during the interaction i06, and (c) drowsy during the interaction i06. table v gives the means and standard deviations calculated for eog signal waveforms of window size 10 seconds for randomly selected instances of interaction i05 and instances when concentration type (active) and drowsy type behaviours are present during the interaction i06. from table v results it can be identified that in most situations the eog mean and standard deviation values of i06 report about 25% increased values than the values for the interaction i05. discussionv. the results in table iii have identified the effectiveness of using eog signals to differentiate the users’ attention to on-screen interactions from off-screen interactions. further the results have december 2009 the international journal on advances in ict for emerging regions 02 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage 19 proved the appropriateness of using eog and gsr signals to distinguish human involvement with multimedia interactions. multimedia documents having emotionally significant events can result in more active involvement and it can be identified from the resulting gsr signals having higher variances, moderate mean gsr values and changes in gsr pattern having correlation with emotional events in the media. the gsr waveform becomes smoother and reaches a higher mean value when the human is concentrating on an interaction having no or less emotionally significant contents. however, this type of involvement can also fall into inattention if the human feels the interaction is boring. since gsr alone cannot identify concentration from inattention, the eog signal patterns are analysed in fixed sized windows. the results have identified that under inattention the eog waveforms report an increased mean and standard deviation values than concentration type of involvement. off-screen type of involvement was easily distinguished by higher magnitude eog signal waveforms and low gsr mean values. apart from the involvement types, the results have also shown the significance of having auditory content in addition to visual content in a multimedia interaction in improving the human involvement. however, for a robust conclusion more focused research work is required to identify the correlation between different media types and resulting types of involvement. the work reported in this paper has considered only limited psychological factors for its work. for a more complete investigation, consideration of psychological factors, such as gender differences, age and cultural aspects are also required. moreover, our experiments were conducted using low cost hardware/software having many limitations. although, low cost hardware/software is more realistic when the practical use is considered, on the negative side it hinders the ability to identify more psychophysiological patterns with respect to human involvement in multimedia interactions. this was evident from the gsr signals recorded for the participant c, where the readings did not have much variance. as a future continuation of this work, an application, such as for e-learning, can be developed having the capability to determine the user’s involvement and to dynamically change the presentation to give the human a pleasant multimedia experience while avoiding negative psychological conditions, such as boredom and fatigue. acknowledgment the authors wish to sincerely thank all those who supported and participated in the experiments. references [1] amdahta, reticular activating system, (http://www. deficitdeatencion.org/reticular.html, july 2008). [2] j. r. anderson, m. matessa and s. douglass, the act-r theory and visual attention, seventeenth annual conference of the cognitive science society (pp. 61-65). hillsdale, nj: lawrence erlbaum associates, 1995. [3] j. r. anderson and c. lebiere, the atomic components of thought, mahwah, nj: erlbaum, 1998. [4] b. best, the anatomical basis of mind, (http://www.benbest. com/science/anatmind/anatmind.html, august 2008). [5] p. brusilovsky, knowledgetree: a distributed architecture for adaptive e-learning, 13th international world wide web conference, 2004. [6] r. carter, mapping the mind, university of california press, 2000. [7] g. chanel, j. kroneeg, d. grandjean, and t. pun, emotion assessment: arousal evaluation using eeg’s and peripheral physiological signals, international workshop multimedia content representation, classification and security (mrcs), special session: multimodal signal processing, istanbul, turkey, sept 11-13 2006. [8] g. g. connell, sound card eeg technology, (http://www. hotamateurprograms.com/eeg.html, august 2007). [9] d. davcev, d. cakmakov, and v. cabukovski, a multimedia cognitive-based information retrieval system, acm conference on computer science, 1993. [10] d. n. davis, emotion as the basis for computational autonomy in cognitive agents, 14th european conference on artificial intelligence, 2000. [11] p. dolog, n. henze, w. nejdl, and m. sintek, personalization in distributed elearning environments,www2004 the thirteen international world wide web conference, new york, usa, acm, 2004. [12] a. t. duchowski, a breadth-first survey of eye tracking applications, behavior research methods, instruments, & computers (brmic), 34(4), pp.455-470, november 2002. [13] p. ekman, universals and cultural differences in facial expressions of emotion. in j. cole (ed.), nebraska symposium on motivation 1971, (vol. 19, pp. 207-283). lincoln, ne: university of nebraska press, 1972. [14] electrodermal activity mete, cornell university, ece 4760, designing with microcontrollers, final projects. (http:// instruct1.cit.cornell.edu/courses/ee476/finalprojects/s2006/ hmm32_pjw32/index.html, april, 2008). [15] k. m. galotti, cognitive psychology: in and out of the laboratory, 3rd edition, indian reprint 2007, 2004. [16] m. gasperi, galvanic skin response sensor, (http://www. extremenxt.com/gsr.html, september 2008). [17] j. healey, and r. picard, digital processing of affective signals, icassp ‘98, 1998. [18] j. m. henderson, the relationship between visual attention and eye movements, canadian psychology, 31:4, 388-389, 1990. [19] k. p. hewagamage, and r. s. lekamarachchi, learning patterns: towards the personalization of e-learning, 5th international information technology conference, 2003. december 2009 the international journal on advances in ict for emerging regions 02 20 hiran b. ekanayake, damitha d. karunarathna and kamalanath p. hewagamage [20] t. t. hewett, acm sigchi curricula for human-computer interaction. technical report, pages: 162, isbn:0-89791474-0, 1992. [21] j. e. hoffman, visual attention and eye movements, h. pashler (ed.), attention. london: university college london press, 119-154, 1998. [22] e. hudlicka, beyond cognition: modeling emotion in cognitive architectures, in proceedings of the sixth international conference on cognitive modeling, 2004. [23] d. e. irwin, information integration across eye movements, canadian psychology, 31:4, 389-390, 1990. [24] k. najarian and r. splinter, biomedical signal and image processing, the crc press, isbn: 0849320992, 2005. [25] d. kahneman, attention and effort. englewood cliffs, nj: prentice-hall, 1973. [26] c. kirtley, biomedical engineering be513: biomedical instrumentation 1, 2002, (http://www.univie.ac.at/cga/ courses/be513/projects/ june 2006). [27] j. malmivuo, and r. plonsey, bioelectromagnetism: principles and applications of bioelectric and biomagnetic fields, oxford university press, (http://butler.cc.tut. fi/~malmivuo/bem/bembook/28/28.htm, 1995) [28] r. w. mayer, physiological psychology, 2001, (http://online. sfsu.edu/~psych200/unit5/5cont.htm, august 2008). [29] p. niemic, studies of emotion: a theoretical and empirical review of psychophysiological studies of emotion, journal of undergraduate research, v. 1, no. 1 (fall 2002), pp. 1518, 2004. [30] omh (2006), family psychoeducation and multifamily groups. new york state office of mental health, center for information technology and evaluation research, ( http://www.omh.state.ny.us/omhweb/ebp/family_ psychoeducation.htm, july 2008). [31] t. omi, f. nagai and t. komura, driver drowsiness detection focused on eyelid behaviour, 34th congress on science and technology of thailand, 2008. [32] s. panchanathan, j. a. black jr, p. tripathi, and k. kahol, cognitive multimedia computing, ieee international symposium on information science and electrical engineering, 2003. [33] m. pantic and l. rothkrantz, automatic analysis of facial expressions: the state of the art, ieee transactions on pattern analysis and machine intelligence, volume 22, issue 12, pp.1424–1445, ieee, december, 2000. [34] a. pease and b. pease, why men don’t listen and women can’t read maps. welcome rain, isbn: 1566491568, 2001. [35] r. w. picard, affective computing, mit press, cambridge, 1997. [36] r. w. picard, affective computing for hci, proceedings hci, munich, germany, 1999. [37] r. polikar, characterization of biomedical signals. class notes in biomedical signal processing and modeling, (http:// engineering.rowan.edu/~polikar/classes/ece504/), 2006. [38] p. shepherd, the galvanic skin response (gsr) meter, (http://www.trans4mind.com/psychotechnics/gsr.html may 2006). [39] k. s. song, j. h. park, and s. m. jeong, enhancing e-learning interactivity via emotion recognition through facial expressions. icl 2006, 2006. [40] e. a. styles, the psychology of attention, hove, uk: psychology press, 1997. [41] k. takano, a. nagasaka and k. yoshino, experiment on validity of skin resistance level as an index of arousal level, sangyo igaku, 35, 257-68, 1993. [42] p. thagard, mind: introduction to cognitive science, second edition, mit press, 2005. [43] the mathworks, inc., signal processing toolbox user’s guide, 1994-2009. abstract—language identification technology is widely used in the domains of machine learning and text mining. many researchers have achieved excellent results on a few selected european languages. however, the majority of african and asian languages remain untested. the primary objective of this research is to evaluate the performance of our new n-gram based language identification algorithm on 68 written languages used in the european, african and asian regions. the secondary objective is to evaluate how n-gram orders and a mix n-gram model affect the relative performance and accuracy of language identification. the n-gram based algorithm used in this paper does not depend on the n-gram frequency. instead, the algorithm is based on a boolean method to determine the output of matching target n-grams to training n-grams. the algorithm is designed to automatically detect the language, script and character encoding scheme of a written text. it is important to identify these three properties due to the reason that a language can be written in different types of scripts and encoded with different types of character encoding schemes. the experimental results show that in one test the algorithm achieved up to 99.59% correct identification rate on selected languages. the results also show that the performance of language identification can be improved by using a mix n-gram model of bigram and trigram. the mix n-gram model consumed less disk space and computing time, compared to a trigram model. index terms—boolean method, character encoding scheme, digital language divide, language identification, mix n-gram model, n-gram, natural language processing, language, script. manuscript received april 2, 2009. accepted october 20th, 2009. this work was sponsored by the japan science technology agency (jst) through the language observatory project (lop) and of the japanese ministry of education, culture, sports, science and technology (mext) through the asian language resource network project. chew y. choong is with the nagaoka university of technology, nagaoka, niigata, japan, (e-mail: yewchoong@gmail.com). yoshiki mikami and c. a. marasinghe are also with the nagaoka university of technology, nagaoka, niigata, japan. (e-mail: mikami@kjs.nagaokaut.ac.jp, ashu@kjs.nagaokaut. ac.jp). s. t. nandasara is with the university of colombo school of computing, colombo, sri lanka (e-mail: stn@ucsc.cmb.ac.lk). optimizing n-gram order of an n-gram based language identification algorithm for 68 written languages chew y. choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara ii. ntroduction digital language dividea. ethnologue [1] claims that there are 6,912 living languages in the world. however, iso 639-2, the second part of the iso 639 standard, only adopted 464 codes for the representation of the names of languages [2]. in 1999, worried about half of the world languages facing the risk of dying out, the united nations educational, scientific and cultural organization (unesco) decided to launch and observe an international mother language day on 21 february every year to honor all mother languages and promoting linguistic diversity [3]. the united nation’s effort in promoting mother languages was recognized by the guinness world record when its publication of universal declaration of human rights (udhr) was declared the “most translated document” in the world. udhr is translated into 329 languages as of march 2009. on the web, google search engine allows users to refine their search based on one of the 45 languages it supports. as of november 2008, microsoft’s dominant (63.67%) windows xp operating system was only released in 44 localized language versions. all these facts lead us to conclude that access to the digital world is greatly divided by language. measure languages on the internetb. in order to bridge the digital language divide, unesco has been emphasizing the concept of multilingualism and participation for all the languages in the internet. unesco, at its 2005 world summit for the information society in tunis, published a report entitled "measuring linguistic diversity on the internet", comprising articles on issues of the language diversity on the internet. however, unesco admitted that the volume does not present any final answer on how to measure languages on the internet [4]. the language observatory project (lop) launched in 2003 is to provide means for assessing the usage level of each language in the internet. more specifically, the project is expected to produce a periodic statistical profile of language, script and character encoding scheme (lse) usage in the internet [5]. the lop uses a language identifier to the international journal on advances in ict for emerging regions 2009 02 (02) : 21 28 december 2009 the international journal on advances in ict for emerging regions 02 22 chew y. choong,hew y. choong, y. choong, choong,choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara automatically detect the lse of a web page. the algorithm described in this paper is used to construct the language identifier for lop. language identificationc. language identification generally refers to a process that attempts to classify a text in a language to one in a pre-defined set of known languages. it is a vital technique for natural language processing (nlp), especially in manipulating and classifying text according to language. many researchers [6] [7] [8] [9] [10] [11] [12] have achieved excellent results on language identification based on a few selected european languages. however, majority of african and asian languages remain untested. this reflects the fact that search engines have very limited support in their language-specific search ability for most of the african and asian languages. in this paper, a language is identified by its lse properties. all lse properties are important for precise language categorization. for example, the script detection ability allows one to measure the number of web pages that are written in a particular script, for instance, the sinhala script. furthermore, lse detection is critical to determine the correct tool for text processing at a later stage. table i shows sample texts of uzbek language written in three different types of scripts and character encoding schemes. a machine translation tool must at first get to know the script and character encoding scheme of the source text, in order to select the proper translator to translate the source text to another language. table i example of uzbek language using different scripts and character encoding schemes language script character encoding scheme sample text uzbek arabic utf-8 غفقكگڭلمنء uzbek cyrillic cyrillic лмпрстўфх uzbek latin iso 8859-1 abchdefgg n-gramd. an n-gram can be viewed as a sub-sequence of n items from a longer sequence. the item mentioned can be refer to a letter, word, syllable or any logical data type that is defined by the application. due to its simplicity in implementation and high accuracy on predicting the next possible sequence from known sequence, the n-gram probability model is one of the most popular methods in statistical nlp. the principal idea of using n-gram for language identification is that every language contains its own unique n-grams and tends to use certain n-grams more frequently than others, hence providing a clue about the language. an n-gram order 1 (i.e. n=1) is referred to as a monogram; n-gram order 2 as a bigram and n-gram order 3 as a trigram. the rest is generally referred as “n-gram”. using “no-456” as an example, if we defined that the basic unit of desired n-gram as a “character”, the valid lists of character level bigrams and trigrams (each separated by space) will be as below: bigram: no o -4 45 56 trigram: no o-4 -45 456 several researchers [6] [7] [8] [9] [10] reported that using trigram model on selected european languages produced the best language identification result. however, many african and asian languages are not based on the latin alphabet that many european languages employ. thus, this study evaluates the performance of n-gram orders (n=1, 2 …6) and a special mix n-gram model for language identification on selected languages. the rest of the paper is structured as follows. in the next section the authors briefly discuss related work. the n-gram based language identification algorithm is introduced in section iii. in section iv, the authors explain about the datasets and experiments. experimental results are presented and discussed in section v. section vi concludes the paper and mentions future work. related workii. the task of identifying the language of a text had been relatively well studied over the past century. a variety of approaches and methods such as dictionary method, closed-class-model [11], bayesian models [7], svm [12] and n-gram [6] [7] [8] [9] [10] [13] [14] had been used. two n-gram based algorithms are selected for detailed description. the cavnar and trenkle algorithm deserves special attention as it explains in-depth on how n-gram can be used for language identification. suzuki algorithm which is implemented in language observatory project is a benchmark to our algorithm. cavnar and trenkle algorithma. in 1994, cavnar and trenkle reported very high (92.9–99.8%) correct classification rate on usenet newsgroup articles written in eight different languages using rank-order statistics on n-gram profiles [8]. they reported that their system was relatively insensitive to the length of the string to be classified. in their experiment, the shortest text they used for classifying was 300 bytes, while their training sets were on the order of 20 kilobytes to 120 kilobytes in length. they classified documents by calculating the distances of a test document’s n-gram profile from all the training languages’ n-gram profiles and then taking the language corresponding to the minimum distance. in order to perform the december 2009 the international journal on advances in ict for emerging regions 02 chew y. choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara 23ikami, c. a. marasinghe and s. t. nandasara 23 a. marasinghe and s. t. nandasara 23. marasinghe and s. t. nandasara 23 marasinghe and s. t. nandasara 23arasinghe and s. t. nandasara 23 and s. t. nandasara 23s. t. nandasara 23. t. nandasara 23andasara 23 23 is found among the training profiles. the boolean method returns value of 0 if there is no match. after all n-grams in the target profile have been compared to those in training profile, the system derives the matching rate by dividing the total match values to total number of distinct n-grams in the target profile. (see equiation (1)) fig. 1. system flow of the language identification process. where, the matching mechanism can be simplified as in the following steps: let us define the target profile as • t and the number of distinct n-grams in t as n. hence, the list of n-grams in t is t1, t2, t3… tn; similarly, we define the training profile as • t and the number of distinct n-grams as k. the list of n-grams in t is t1, t2, t3… tk; r• (or r-value) is calculated for every distinct n-gram in the target profile using equation (1), where ri is the rate at which the i th distinct n-gram in the target profile (t) matches with the jth distinct n-gram in the training profile (t); distance measurement they had to sort the n-grams in both the training and test profiles. suzuki algorithmb. in suzuki algorithm, the method is different from conventional n-gram based methods in the way that its threshold for any category is uniquely predetermined [9]. for every identification task on the target text, the method must be able to respond with either “correct answer” or “unable to detect”. the authors used two predetermined values to decide the answer to a language identification task. the two predetermined values are ub (closer to the value 1) and lb (not close to the value 1), with a standard value of 0.95 and 0.92, respectively. the basic unit used in this algorithm is trigram. however, the authors refer to it as a 3-byte shift-codon. in order to detect the correct language of a target text, the algorithm will generate a list of shift-codons from the target text. the target’s shift-codons will then be compared with the list of shift-codons in training texts. if one of the matching rates is greater than ub, while the rest is less than lb, the algorithm will report that a “correct answer” has been found. the language of the training text with matching rate greater than ub is assumed to be the language of the target text. by this method, the algorithm correctly identified all test data of english, german, portuguese and romanian languages. however, it failed to correctly identify the spanish test data. methodologyiii. the overall system flow of the language identification process is shown in fig. 1. in this process, a set of training profiles is constructed by converting training texts, in various language, script and character encoding scheme (lse) into n-grams. the generated training profile contains a set of distinct n-grams, without frequency of occurrence of n-grams. in the same way, the system converted the target text into target profile. the system then measured the matching rates of n-gram between the target profile and the training profiles. the system classifies the target profile belonging to the lse of the training profile that yields the highest matching rate. the matching mechanisma. the process in fig. 1 labeled “measure matching rates between target profile and all training profiles” is used to calculate the matching rates between a target profile and all training profiles. unlike many other n-gram based algorithms, our algorithm does not depend on n-gram frequency. instead, the algorithm uses a boolean method to decide the output of the matching. the boolean method returnsvalue of 1 if the n-gram from the target profile     = tt ttm ji ji i withmatchedif withmatchnotdidif 1 0 ∑ = = n i i n mr 1 i training texts training profiles matching rates target texts target profiles n-gram encoder measure matching rates between target profile and all training profiles return lse of training profile that gives maximum rate find maximum rate (1) december 2009 the international journal on advances in ict for emerging regions 02 24 chew y. choong,hew y. choong, y. choong, choong,choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara b. the base unit of the n-gram in this algorithm, the basic unit of the n-gram is of data type “byte”. the reason “byte” is selected instead of character or word is to avoid possible character encoding errors due to unexpected conversion occurring when reading a text file encoded in an abnormal encoding scheme. for example, a text file created with a non-standard legacy font. data sets and experimentsiv. there are two data sets used in the experiments. the first data set contained all the training texts that are encoded in various language, script and character encoding scheme (lse). from here onward we refer to this set as the training corpus. the second data set is a collection of text documents that the authors used as target texts in the experiments. from here onward we refer to it as the validation corpus. the training corpus is mainly based on the universal declaration of human rights (udhr) texts collected from the official united nation’s universal declaration of human rights web site. at the time of the experiment, the training corpus contained 571 udhr text documents in various types of lse. the total size of the training corpus is 10,513,237 bytes. the document sizes ranged from 3,977 to 126,219 bytes. the validation corpus was mainly based on web pages that the authors collected from online newspapers and media web sites. the six major online newspapers and media service providers used to construct the validation corpus were bbc news in 32 languages, voice of america news in 45 languages, wikinews in 25 languages, google news for 62 countries, deutsche welle news in 30 languages and china radio international in 45 languages. in addition, the authors referred to online news portals such as “abyz news links”, “world newspapers and magazines” and “thousands of newspapers on the net” to locate a wider range of local news in many asian and african countries. a total of 730 web pages were collected, spanning 68 languages and with a total size of 32,088,064 bytes. the document sizes ranged from 313 to 437,040 bytes. we did not normalize the size of those documents in order to mimic the situation on the web. table ii language in validation corpus, grouped by region africa (9 languages) asia (27 languages) europe (32 languages) afrikaans, amharic, hausa, ndebele, rundi, rwanda, shona, somali, swahili abkhaz*, aceh, arabic, armenian*, azerbaijani, burmese, chinese, dari, farsi, georgian*, hebrew, hindi, indonesian, japanese, korean, kurdish, malay, nepali, panjabi, pashto, russian*, tamil, thai, turkish*, urdu, uzbek, vietnamese abkhaz*, albanian, armenian*, bosnian, bulgarian, catalan, czech, danish, dutch, english, estonian, finnish, french, georgian*, german, greek, hungarian, italian, latvian, lithuanian, norwegian, polish, portuguese, romanian, russian*, serbian, slovak, slovenian, spanish, swedish, turkish*, ukrainian table ii shows the list of languages used in validation corpus, grouped according to present world’s region. the asterisk (*) next to a language name indicates that the language is spread across multiple regions. for example, the language abkhaz is mainly used in the caucasus area, which is a geopolitical region located between europe, asia, and the middle east. experiment 1: to evaluate the correct identification rate of the algorithm based on different n-gram orders the first experiment was designed to evaluate the correct identification rate of the algorithm based on different n-gram order. in total, six language identification tests were carried out based on n-gram order 1 to 6. n-gram orders greater than 6 are not considered as they consumes too much processing power and time. for each n-gram order within the range, every text document in training and validation corpus was converted into n-gram profile. after that, the system calculated the matching rate between the target profile and every training profile. the matching rate is determined by the boolean method described in the “the matching mechanism” section. after all matching rates have been determined, the system reported the language, script and character encoding scheme (lse) of the target profile, derived from the lse of the training profile that returned the highest matching rate. december 2009 the international journal on advances in ict for emerging regions 02 chew y. choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara 25ikami, c. a. marasinghe and s. t. nandasara 25 a. marasinghe and s. t. nandasara 25. marasinghe and s. t. nandasara 25 marasinghe and s. t. nandasara 25arasinghe and s. t. nandasara 25 and s. t. nandasara 25s. t. nandasara 25. t. nandasara 25andasara 25 25 experiment 2: to evaluate the efficiency of the algorithm based on mix n-gram model the second experiment was designed to evaluate how mix n-gram model affects the language identification result. in this experiment, each training text was trained into training profile using the optimized n-gram order discovered in the first experiment. the authors defined the optimized n-gram order for each lse as the smallest n that gave the most correct answers. in this model, the n-gram order used to convert the target text is dynamically alterd by the system, depending on the n-gram order of the current training profile. if current training profile is trained with n=2, the target text will be converted to n-gram using n=2. in the authors’ first attempt, 12 languages, namely armenian, azerbaijani, chinese, czech, hungarian, indonesian, japanese, korean, panjabi, pashto, rwanda and slovak were trained using n-gram order 2. the rest of the training texts were trained using n-gram order 3. in the authors’ second attempt, 5 languages, namely armenian, chinese, japanese, korean and panjabi were trained using n-gram order 2. the rest of the training texts were trained using n-gram order 3. results and discussionsv. result for experiment 1 the objective of the first experiment is to evaluate the accuracy of our algorithm on selected 63 written languages, using n-gram orders 1 to 6. in fig. 2, the correct identification rate of language identification (y-axis), along the n-gram order (x-axis) is shown. by using n-gram order 1, the correct identification rate is very low, only 6.99%. when n-gram order increased to 2, the correct identification rate increased to 56.30%. the algorithm achieved its best correct identification rate at 99.59%, when n-gram order is 3. beyond n-gram order 3, the system gains no improvement on identification result. instead, the algorithm only achieved correct identification rate of 96.44%, 94.66% and 93.01% for n-gram order 4, 5 and 6, respectively. fig. 2. correct identification rate based on n gram order 1 to 6. table iii language identification errors on trigram model language number of distinct n-gram identified as danish 66 norwegian dari 829 farsi malay 328 indonesian using trigram model, 3 out of 730 target profiles in validation corpus were incorrectly identified by the algorithm. table iii shows that the three target profiles, namely danish, dari and malay had been identified as another language that is very close to their language family. danish and norwegian both belong to the north germanic languages (also called scandinavian languages), a sub-group of the germanic branch of the indo-european languages. dari and farsi are practically the persian language, where dari is the local variant of the language spoken in afghanistan, while farsi is the local variant of the language used in modern day iran. in the case of malay and indonesian, they share the same language family in austronesian, a language family that is widely dispersed throughout the islands of southeast asia and the pacific. besides, it should be noted that the number of distinct n-grams in the danish target profile is as low as 66. result of experiment 1 also showed that for 12 languages we were able to correctly identify all their target profiles using n-gram order 2. table iv lists the language, script and character encoding scheme (lse) of the 12 languages. december 2009 the international journal on advances in ict for emerging regions 02 26 chew y. choong,hew y. choong, y. choong, choong,choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara result for experiment 2 in the first test of experiment 2, the authors trained the training texts of the 12 languages using n-gram order 2, while the rest of training texts were trained using n-gram order 3. unfortunately, the first attempt on using mix n-gram model returned very bad result. the overall correct identification rate for all target profiles was reduced to 46.30%. the authors manually went through every record of the identification results and discovered that, after trained with n-gram order 2, the azerbaijani, czech, hungarian, indonesian, rwanda and slovak’s training profiles caused a lot of missed identification errors to other lse’s target profiles. hence, the authors learned that languages based on arabic and latin scripts are not suitable with n-gram order 2 when they are not tested alone. in the second test of experiment 2, only armenian, chinese, japanese, korean and panjabi were trained with n-gram order 2, while the rest were trained with n-gram order 3. the language identification result of this mix n-gram model was excellent, achieving an overall correct identification rate of 99.59%. table iv lses that achieved highest correct indentification rate on their target profiles using n-grm order 2 language script encoding correct identification rate (%) armenian armenian utf8 100 azerbaijani latin utf8 100 chinese traditional, simplified big5, gb2312, utf8 100 czech latin latin, utf8 100 hungarian latin latin 100 indonesian latin latin 100 japanese japanese euc, jis, sjis, utf8 100 korean korean euc-kr 100 panjabi gurmukhī utf8 100 pashto naskh (arabic) utf8 100 rwanda latin utf8 100 slovak latin latin, utf8 100 although the overall correct identification rate is the same as using trigram model, this model achieved better performance in two categories: (1) processing time and (2) disk space. using n-gram order 2 resulted in lower training and identification time. fig. 3 shows the total computing time needed for language identification task based on n-gram order 1 to 6. the time grows enormously when n increased. for simplicity, we defined n=2.5 to represent the mix n-gram model of bigram and trigram. in this optimized condition, the language identification task needs only 651,078 milliseconds to complete, compared to 817,844 milliseconds on trigram model. the mix n-gram model is able to save up to 20.39% of total computing time. fig. 3. exponential growth of computing time on language identification task. the mix n-gram model also consumes less disk space. its total size of training profiles is 7,652 kilobytes. the total size of training profiles using n-gram order 3 is 8,076 kilobytes. the mix n-gram model requires 5.25% less in disk space. table v shows the comparison of our algorithm and several other algorithms in the literature based on language coverage, n-gram order and overall correct identification rate of language identification. our algorithm stands out in terms of language coverage and the mixture of n-gram order. december 2009 the international journal on advances in ict for emerging regions 02 chew y. choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara 27ikami, c. a. marasinghe and s. t. nandasara 27 a. marasinghe and s. t. nandasara 27. marasinghe and s. t. nandasara 27 marasinghe and s. t. nandasara 27arasinghe and s. t. nandasara 27 and s. t. nandasara 27s. t. nandasara 27. t. nandasara 27andasara 27 27 table v comparison between n-grm based language identification algorithms based on language coverage, n-grm order and correct identification rate algori-thm language coverage n gram order correct identification rate (%) dunning t. [7] 2 languages (dutch, polish) 2 3 92 99.9 cavnar and trenkle [8] 8 languages (english, portuguese, french, german, italian, spanish, dutch, polish) 3 92.9–99.8 suzuki [9] 5 languages (portuguese, spanish, romanian, german, english) 3 no precise figure. problem on spanish ölvecký [10] 3 languages (czech, slovak, polish) 3 95–99.2 this paper’s algorithm 68 languages, as listed in table ii mixture of 2 and 3 99.59 conclusion and future worksvi. in this paper, we reported the n gram based language identification algorithm and the experiments carried out to evaluate its accuracy against 68 languages used in african, asian and european regions. we show that the algorithm is highly efficient in classifying written text. the algorithm is unique as the matching mechanism does not depend on n gram frequency. the algorithm depends on a boolean method to determine the output of matching target n grams and training n grams. like many previous studies done by n gram methods, n gram order 3 generated the best language identification result in the experiments. however, we discovered that the performance of five asian languages, namely armenian, chinese, japanese, korean and panjabi, improved by using n gram order 2. an experiment based on a mix n gram model of bigram and trigram confirmed the effectiveness of mixing n gram order. the total computing time consumed by language identification task in experiment 2 was reduced by one-fifth while maintaining the same correct identification result. although the current research has demonstrated good performance, the authors believe there is still room for improvement: • currently the validation corpus contains text documents in 68 languages. this number is relatively small if compard to the 571 languages collected in the training corpus. how well the algorithm can scale from the current corpus to a bigger size corpus remains unknown. to confirm the true ability of the algorithm, we need to evaluate it against a larger validation corpus. the algorithm made three errors in language • identification using the validation corpus. in all cases, the target text was identified as a language that is close to its language family. what are the best strategies to correctly identify languages that are close to each other? the authors need to find a solution for this critical issue. table iii showed that the numbers of distinct • n-grams for the wrongly identified danish, dari and malay target profiles are quite low. danish’s profile in particular contains only 66 distinct n-grams. this brings up the question of what minimum size of n-grams is needed in order to correctly identify a language. how does the number vary among different languages, scripts and character encoding schemes? such a study is currently underway. a related issue is how the quality of training text • in general affects language identification result. although universal declaration of human rights is the most frequently translated document, other sources for training text could be considered in order to improve the identification result. acknowledgment the authors are grateful for the sponsorship of the japan science technology agency (jst) through the language observatory project (lop) and of the japanese ministry of education, culture, sports, science and technology (mext) through the asian language resource network project, which provided the entire corpus data used in this research. references r. g. gordon , [1] ethnologue: languages of the world (15th ed.). sil international, dallas, 2005, isbn: 155671159x. j.w. group, [2] codes for the representation of names of languages: alpha-2 codes. technical report, 1998, istc46/sc4 and iso tc37/sc2. unesco, [3] records of the general conference 30th session. united nations educational, scientific and cultural organization, 1999 j. paolillo, d. pimienta, d. prado, [4] measuring linguistic diversity on the internet. united nations educational, scientific and cultural organization, 2005. [5] y. mikami, p. zavarsky, m. z. rozan, i. suzuki, the language observatory project (lop). in poster proceedings of the fourteenth international world wide december 2009 the international journal on advances in ict for emerging regions 02 28 chew y. choong,hew y. choong, y. choong, choong,choong, yoshiki mikami, c. a. marasinghe and s. t. nandasara web conference, 2005, (www2005). pp. 990-991, may 2005, chiba, japan. p. f. brown, r.l. mercer, j. c. lai, class-based n-gram [5] models of natural language. computational linguistics, 1992, vol.18, pp. 18–4. t. dunning, [6] statistical identification of language. technical report, new mexico state university, 1994, mccs 94-273. w. b. cavnar and j. m. trenkle, [7] n-gram-based text categorization. in proceedings of sdair-94, 3rd annual symposium on document analysis and information retrieval, 1994, pp. 161–175. i. suzuki, y. mikami, a. ohsato, a language and character [8] set determination method based on n-gram statistics. acm transaction on asian language information processing, 2002 vol. 1. no. 3, pp. 270–279. t. ölvecký, [9] n-gram based statistics aimed at language identification. in mária bieliková (ed.), proceedings of iit.src 2005: student research conference in informatics and information technologies, bratislava, 2005, pp. 1–7. e. m. gold, language identification in the limit. [10] information and control, 1967, vol.10, no.5, pp. 447–474. d. chen and h. bourlard, [11] text identification in complex background using svm. proceeding of ieee conference on computer vision and pattern recognition, 2001, pp. 621–626. bruno martins and mário j. silva, [12] language identification in web pages. symposium on applied computing, proceedings of the 2005 acm symposium on applied computing, 2005, pp. 764-768. grefenstette g., [13] comparing two language identification schemes. in 3rd international conference on statistical analysis of textual data, rome, italy, 1995. abstract: the emergence of behavioural and structural congruence based on simple local interactions of atomic units is a fascination to the scientific community across many disciplines. the climax of behavioural congruence and emergence of behaviour is exemplified by the community life-style of ants. each individual ant possesses the capability only to solve part of the overall puzzle while aggressively communicating in primitive methods with the spatially related neighbours to produce emergent behaviour. the primary hypothesis of this research is that the constituent atomic actions of a complex behaviour could be successfully coordinated by a collection of collaborative and autonomous agents with the use of action templates. the aaants (adaptive autonomous agent colony interactions with network transparent services) model was conceptualised and implemented as a platform to represent the biologically inspired coordination and learning model to test the research hypothesis. the domain of foraging in a grid-world was identified as the experimental basis to evaluate the aaants coordination model. the experiments demonstrated relative improvements in achieving behavioural congruence using the aaants model in relation to the traditional monte-carlo based methods. keywords: collective intelligence, action templates, emergent behaviour, reinforcement learning, frame representation. introduction the survival of an entity in the environment is directly attributed to selecting the most appropriate and refined behaviour with respect to the rapid changes in the environment. behaviour of this nature could be called as congruent with reference to the current demands of the environment. however, over a period of time due to the changes and demands of the environment, the existing behaviour could become incongruent or obsolete. hence, behaviour should adapt and improve, or simply be congruent to the latest changes in the environment. the adaptive entities in the natural world use emergent models to achieve behavioural congruence. these models begin with an innate layer of basic incongruent atomic behaviour, which based on the reinforcements and or supervisions from the environment reaches a level of refinement more aligned with the demands of the environment. hence, dynamically and foraging in a grid world using action templates r. a. chaminda ranasinghe1*, a. p. madurapperuma2, n. d. kodikara3 1, 3 university of colombo school of computing, colombo, sri lanka 2 university of moratuwa faculty of it, sri lanka chaminda.ranasinghe@dialog.lk, ajith@itfac.mrt.ac.lk, ndk@ucsc.cmb.ac.lk revised: 30 september 2008; accepted: 24 september 2008 stochastically combining atomic behaviour that are either accepted or rejected based on the reinforcements from the environment tends to provide a high level of behavioural congruence in natural systems. the success of the naturally occurring models in delivering abundance of heterogeneous and congruent behaviour using concepts of emergence, innateness and adaptations has inspired this research. the primary hypothesis of this research is that the constituent atomic actions of a complex behaviour could be successfully coordinated by a collection of collaborative and autonomous agents with the use of action templates. the domain of grid-world foraging was selected to implement the use of action templates. the action templates were executed as a coordinated effort of a collection of autonomous software agents. the aaants model could be applied to several application domains based on the generic nature of the concept. the simulations and experiments discussed in this paper were based on the domain of grid-world navigation. however, this model could also be applied to the domains of pattern recognition, robotic movement and vision navigation. the experimental results related to robotic movement and vision navigation were excluded from the scope of this publication. the subsequent sections of the paper discuss the conceptualisation, realisation and experimentation of the aaants model within the domain of foraging in a grid-world. sections 2 and 3 clearly state the objectives and inspirations that contributed to the motivation of the research. section 4 describes the aaants coordination model that consists of atomic actions, action templates, behavioural concentres and sensory templates. section 5 contains a discussion of the design, implementation and execution of the gridworld experiment. the conclusion of the research with respect to the defined objectives of the research is discussed in the last section. motivation the age-old ambition of creating intelligence on artificial substance that is anthropomorphic in nature is still considered a dream, yet to be realised by humans. it was this curiosity that initiated the investigation into the behavioural complexity found in nature, which subsequently became the foundation of this research. there are several theories, models and paradigms that have given inspiration and direction to the work the international journal on advances in ict for emerging regions 2008 01 (01) : 24 32 * corresponding author carried out in this dissertation. naturally occurring collective systems of individually simple animals such as populations of insects and turtles together with artificial phenomena such as traffic jams suggest that individual complexity is not a necessity for complex intelligent behaviour of colonies of such entities [1], [2]. the community life-style of ants was an inspiration to this research. it was estimated that the ants’ success story spans over several millions of years preceding the known era of human existence [3]. each individual ant possesses the capability of only solve a part of the overall puzzle while aggressively communicating in primitive methods with the spatially related neighbours to produce emergent behaviour. ant colonies have evolved means of performing collective tasks, which are far beyond the capacities of their individual structures. this phenomenon is demonstrated without being hard-wired together in any specific architectural pattern and without central control [1], hence void of top-down control. the consensus is that comprehension of emergent complexity in insect colonies such as ants would serve as a good foundation for the study of emergent, collective behaviour in more advanced social organisms, as well as leading to new practical methods in distributed computation [4], [5]. therefore, the key motivation was to device an artificial learning model, that could demonstrate collective intelligence analogous to insects. the “society of mind” theory by marvin minsky [6], was another inspiration to this research. this theory portrays the mind as a collection of mindless components that interact and compete to provide intelligent emergent behaviour. society of agents in the mind is triggered by external sensations where agents act individually but in a cooperative and synchronised manner. the incarnation of a complete multi-cellular being starting from a single fertilised egg seems like a heavenly secret to all of us and is certainly a motivation to this research. it is the initial set of genes in a fertilised egg that helps a simple cellular growth to be morphed into a complex combination of organs found in a complete animal. it is amazing that every cell contains a complete footprint of all genes found in the initial cell and each cell only represents a single instance of the overall pattern. this aspect of different cells expressing same genes at different levels could be called as a subpattern where most patterns are in fact combinations of a small number of basic patterns [7]. hence, a gene could be compared to a conductor leading an orchestra; the conductor makes no music on its own but with the proper participants could produce a symphony of enormous beauty and complexity [8]. research objectives congruent behaviour could be achieved through several methods. however, persistence of congruent behaviour in relation to the dynamics of the environment, and further the sustenance of congruence over a considerable period of time is still considered non-trivial based on the current artificial models of intelligence. this research tends to take a step in the direction of sustaining behavioural congruence using coordination methods from nature based on emergence. the primary objective of the research is to evaluate whether the bottom-up emergent methodologies could provide similar or improved results in comparison to the methodologies that prescribe behaviour composition in a top-down manner to achieve behavioural congruence in dynamic environments. there are several aspects to be focused to realise this objective. the rest of the discussion primarily focuses on building a unique coordination model to realise the stated objective within the domain of grid-world foraging. aaants coordination model based on patterns and emergence the “aaants coordination model” was conceptualised based on the inspiration from the natural emergent systems. the model encompasses aspects such as identifying sensory patterns, relationship among actions and sensations and team formation among agents for coordination. the interactions among agents act as perturbations and the system achieves congruence with the use of reinforcements. the resulting model consists of heuristics and algorithms that could be used to implement an agent system that demonstrates emergent behaviour. similar work on sensory-motor coordination and identifying sensory patterns is found on research by rolf pfeifer et al and stefano nolfi et al [16], [17]. creating behavioural concentres with atomic actions and action templates the term atomic action (aa) could be defined as an action that cannot be further subdivided into elementary actions. for example, in humans, the contraction of a homogeneous muscle could be identified of as an aa. a given aa could produce different effects based on the intensity and the degree of temporal progress. if base duration of an atomic action a is defined as t, a.t represents the elementary temporal result of executing action a. however, changes in the temporal dimension of executing the same atomic action a, would produce different end results – e.g. [a.2t], [a.3t], etc. within the boundary of this research, aas are considered innate and could only be manipulated within the dimensions of time and intensity. the action templates the concept of the action template (at) is introduced herewith as the primary method of grouping aas to define behaviour. a template could be defined as a generalisation of related instances that determines or serves as a pattern. further, the concept of templates is used analogously to represent the concept of a class in object-oriented programming and design 25 foraging in a grid world using action templates the international journal on advances in ict for emerging regions 01 (01) october 2008 methodologies. a template could be also considered as a description of an aspect of a task. in-line with these definitions, a definite list of aas executed in concurrency and or sequence in relation to environmental sensations could be called as an at. an at would not be of any use without being instantiated. hence, ats are instantiated by the agents, thereafter refined for a specific task. a given at could be modified to accomplish different tasks by a group of agents. concurrency is a basic fact of nature for achieving complex behaviour. the survival in the environment demands concurrent threads of attention to both sensations and actuations. it should be noted that due to the need for concurrency, the aas within a single at could be contributed by several agents. the methodology used by agents to collectively execute synchronised tasks without the knowledge of the overall outcome was given special emphasis during the conceptualisation stage of this research. according to keith deckeretal [9], the coordination problem of choosing and temporally ordering actions is more complex because the agent may only have an incomplete view of the entire task structure of which its actions are a part, the task structure may change dynamically and the agent may be uncertain about the outcomes of its actions. the type and sequence of aas and their synchronisation with sensations for initiation and termination uniquely differentiates ats from each other. hence, in summary, three aspects are important to an at: types of aas, maximum temporal exposure of each aa and the influence of sensations (environmental sensations and the temporal progress of other aas within the same at could also be served as a sensation) for the purpose of initiation and termination effects of each aa. figure 1, depicts an action template defined using several aas. with reference to this diagram, each aa is constrained with a start and a finish (e.g. actions a1, a2, a3, a4 have respective start and finish defined as {s1, e1}, {s2, e2}, {s3, e3}, {s4, e4}). in addition, for each started aa instance, a timer is created for measuring the temporal progress of the action. a started action could finish due to lapse of allocated maximum execution time or due to a trigger from a sensation. the maximum allocated time of each aa would be defined during the creation of the at. further, the initiation of actions would be triggered from temporal progress of other dependent actions within the same template and or sensory stimulation from the environment. an important aspect of the at concept is in the methodology used for action synchronisation. an at should be first instantiated to facilitate the defined behaviour. subsequent to the initial instantiation, the first action in the sequence would be activated. however, there could be situations where several aas that belong to an at are activated simultaneously at the initiation based on the stochastic nature of the action selection mechanism. an ongoing action would publish the temporal progress within the respective domain, and other participants (agents) could use this information for coordinated participation. therefore, both the temporal progress of the other actions and the sensory information from the environment is used for action coordination. the coordination sequence is improved over a period of time due to the reinforcements received after executing an instance of a template. the aas could be described as innate to an intelligent entity. however, the ats could be formed both in terms of innateness and adaptations. the innate ats would be ready to use, though with further fine-tuning through environmental supervisions and or reinforcements. the adaptive ats would be created through a stochastic process where innate aas are randomly selected to form novel behavioural structures. further, ats would be able to form hierarchical or lateral bonds with each other, again through a stochastic process to create complex behavioural outcomes. the aaants model conceptualises both flavours of ats, but the gridworld experiment only focussed on the innate ats that are refined through reinforcements. a similar approach is taken in leaning systems like alecsys [10], where the learning “brain” of an a 1 a 2 a 3 a 4 s1 s2 s3 s4 e 1 e 2 e 3 e 4 l o o p b a ck fo r r e cu r sive b e h a vio u r t 1 t 2 t 3 s 1 s ens or y t em plates s ens or y t em plates s ens or y t em plates a c tion t im er s a c tion t im er s a c tion t im er s s 2 s 3 figure 1: action template with a defined sequence of atomic actions r. a. chaminda ranasinghe, a.p. madurapperuma, n.d. kodikara 26 october 2008 the international journal on advances in ict for emerging regions 01 (01) are analogies to these types of actions. hence, aas are the building blocks of any complex behaviour. s 1 s 2 s 3 s 4 s 8 s 5 s 6 s 7 s 9 s 1 0 s 1 1 s 1 2 r o le a r o le b r o le c a 1 a 2 a 4 a 5 a 7 a 6a 8 a 9 a 1 0 a 1 1 a 1 2 a 1 3 a 1 4 a 1 5 a 1 6 a 1 7 a 3 figure 2: ethogram representing transitions of behavioural acts based on different roles a1 a2 a3 a4 a5 a6 a7 a8 a c t u a t io n l a y e r a1 a3 a6 a2 a7 a4 a8 a5 a3 c o o rd in a t io n l a y e r a c tion t em plate ( t 1) a c tion t em plate ( t 2) a c tion t em plate ( t 3) b 1 = {t 1 , t2 } b 2 = { t1 , t 3} b 3 = { t3 , t 2} b 4 = { b 1 , b 2 } b 5 = { b 2 , b 3 } b 6 = { b 4 , b 5 } a c tion t em plates (innate ) a tom ic a c tions (innate ) b ehav iour c onc entr e h ier ar c hy ( lear nt ) figure 3: conceptual action breakdown structure of the aaants coordination model. the ats represented within the coordination layer in figure 3 are responsible for grouping aas into elementary chunks of coordinated behaviour. however, these templates would be useless without being coordinated with other ats to perform more complicated roles. the aaants model introduces the concept of behavioural concentres (bc) [13] as the enabler for coordination among the ats. the concentres are created, adjusted and destroyed based on the reinforcements from the environment. it is assumed that the innate repertoire of aas should suffice the expected behaviour of an individual. however, absence of a particular behaviour in an individual does not imply that relevant aas are agent could be designed as the composition of many learning behavioural modules. the modules are called as basic behavioural modules which are connected to sensory and motor routines that learn from external stimuli. the behavioural modules of alecsys could be made analogous in concept to ats discussed above. simply, aas are like the bricks and templates are like different wall types of a building, where different combinations of walls could be used to create buildings of diverse architectural complexities. the concept of the at would also be similar in some extent to behavioural assemblages [11]. according to tucker balch [11], groups of behaviours are referred to as behavioural assemblages. one way that behavioural assemblages may be used in solving complex tasks is to develop an assemblage for each subtask and to execute the assemblages in an appropriate sequence. the resulting task-solving strategy could be represented as a finite state automaton (fsa) and the technique is referred to as temporal sequencing. behavioural concentres the groups of actions in an at that consist of aas are the basis for building complex behaviour. a sequence of aas (depicted in figure 1) that are executed in a coordinated manner is referred to as a behavioural act (ba). the concept of a ba is similar to the definition found in myrmecology for a collection of elementary actuations [12]. for example, in figure 1, actions a1, a2, a3 and a4 represent a ba. further, a collection of closely linked bas could be defined as a role where a task could be differentiated as a similar sequence of bas that are coordinated. a popular method of depicting a behavioural repertory is by the use of an ethogram, which incorporates repertory of a caste, transition probabilities of acts and the time distributions spent on each act [12]. the figure 2 represents an ethogram that depicts the roles within a group of entities and the states and actions that facilitate the transitions. it should be noted that some actions (actions a5, a11 & a17 in the ethogram – figure 2) enable a role to be navigated to states of another role. roles could also be described in terms of cohesion and coupling of ats. there exists high cohesion among the aas that belong to an at. further, one or many ats are required to define a given role. the ats that belong to a specific role should have higher coupling within them than with others external to the role. the action breakdown structure (abs) of the aaants coordination model would be a good approach to explain the rest of the behavioural complexity of the aaants coordination model. the abs conceptualised within the aaants model is represented in figure 3. the structure is segmented into two primary layers of functionality based on the innateness and adaptability. the actuation layer represents the raw aas that are innate in nature and less complex. as examples, the basic contraction of muscles, release of enzymes and hormones, change of chemical composition in animals 27 foraging in a grid world using action templates the international journal on advances in ict for emerging regions 01 (01) october 2008 the simulated grid-world environment a grid-world is an area with a restricted boundary as depicted in figure 4. at a given instance there could be one or many participants within the grid that may perform state transitions either to reach the destination food source (fs) which is the goal state or else to return to the nest with the already captured food elements after reaching the goal state. each participating agent is analogous to an ant in a colony. a grid-world could be experimented along several dimensions such as spatial, temporal and functional. in terms of spatial aspects, the total grid is divided into small squares called as cells. most of the discussed experiments are based on a 10 x 5 grid, but the same experiments were performed on 20 x 30 and 30 x 40 grid environments to assess the scalability. the movements within the grid are done on temporal clock cycles and the main functions of agents are searching and transporting food. the grid and obstacle layouts are totally configurable using the grid-world simulator front-end application. figure 4: grid-world model for the ant foraging simulation the participants could travel from one cell to another in a horizontal or vertical direction, but restricted in travel diagonally. a single participant could inhabit a cell at a time during the search stage, though several may travel together while transporting a food unit collectively. however, there could be some cells that are obstructed and impassable by the agent to make the foraging task more realistic. a detailed discussion of the design of this experiment would be found in the ph.d. dissertation of r.a.c ranasinghe [18]. several experiments were conducted using the gridworld simulation to evaluate the aaants coordination model. • grid-world experiment 1 – single agent foraging. scenario 1: one step look-ahead policy using monte carlo (mc) method with proportionate reward distribution. scenario 2: disproportionate reward distribution among the participating states 1. 2. missing. many of us possess the atomic actuations in the upper limbs to become an artist, though few of us are capable of such coordinated behaviour. further, many of us have the innate aas to play a violin, though few of us could. therefore, the bcs and ats are important in harnessing the capabilities of aas. in most in-born talents such as art, music and athletics are mostly due to the inherited ats. hence, the assumption is that some types of special innate ats are required to full-fill some higher-level complex behaviour. however, even with inherited ats, without proper environmental adaptations to build up bcs could be called as a “waste of talent” by most of us. simulations and experiments the primary experiment was to develop an environment to simulate foraging activities of insects. the food collecting behaviour of insects called as foraging is a popular domain of experimentation among the researchers of collective intelligence [12]. further, the experiments related to a grid-world, where agents are supposed to transit through states with the objective of finding the optimum path in reaching a defined goal have been popular among the artificial intelligence community for years [19], [20]. the original grid-world problem was enhanced to include foraging related aspects to the simulation. key control variables and their configurations for different experiments are listed in table 1. there are many flavours of reinforcement learning methods such as monte carlo (mc), dynamic programming (dp) and temporal difference (td) [14]. each of these methods has advantages and disadvantages based on the domain of application. it is considered that mc methods scale better with respect to state space size than standard, iterative techniques for solving systems of linear equations [15]. further, an mc method does not require explicit knowledge of the transition matrix of the problem domain [15]. hence, mc method was selected as the reinforcement learning algorithm for the experiments of this research due to the above stated uniqueness and also due to the similarity in concept to other similar reinforcement learning methods. therefore, the fundamental learning algorithm of the aaants learning model was based on the mc method. in all the experiments, the exploration probability was kept constant. the initial exploration probability was kept at 0.99, which thereafter was linearly reduced after each episode. the reduction rate of exploration probability, hence, was kept at a constant across all the experiments. further, a uniform reward distribution strategy was adhered across all experiments except in the grid-world experiment 1 scenario 1. the reward distribution was performed episodically while keeping state values to ascend from home to destination, hence encouraging the agents to follow a path of ascending state values similar to the effect of pheromones in ants. r. a. chaminda ranasinghe, a.p. madurapperuma, n.d. kodikara 28 october 2008 the international journal on advances in ict for emerging regions 01 (01) variable description ex 1 sc 1 ex 1 sc 2 ex 2 sc 1 ex 2 sc 2 ex 3 sc 1 ex 3 sc 2 grid size 10 x 5 10 x 5 10 x 5 10 x 5 10 x 5 10 x 5 obstacle arrangement constant constant constant constant constant constant characterises of agents constant constant constant constant constant constant learning algorithm mc mc mc mc aaants aaants number of agents 1 1 2 4 4 4 number of search threads 1 1 2 4 1 1 reward distribution equal disproportionate disproportionate disproportionate disproportionate disproportionate look-ahead 1 step 1 step 1 step 1 step 2 step 2 step shared memory context no no yes yes yes yes implicit communication no no yes yes yes yes use of action templates no no no no yes yes knowledge representation individual individual shared shared shared shared initial state initialisation random random random random random random exploration probability and rate of reduction constant constant constant constant constant constant table 1: control variable summary across all grid-world experiments • grid experiment 2 – cooperative foraging using monte-carlo method. 1. scenario 1: 2-agent cooperative 2. scenario 2: 4-agent cooperative • grid experiment 3 – collective foraging based on the aaants model 1. scenario 1: 4-agents using an action template of 2 actions with 6 ats 2. scenario 2: 4-agents using an action template of 2 actions with 8 ats observations of the grid-world experiments the following observations of the grid-world experiments were captured as important to assess the hypothesis and objectives of this research. when comparing results of experiment 1, scenarios 1 & 2, it is evident that disproportionate distribution of rewards among state values results in better convergence to the optimum path (figure 5).the disproportionate distribution is analogous to pheromone distribution of insects where the concentration is maintained in an ascending rate when reaching the goal state. even after changing the location of the goals and obstacles in scenario 2, the algorithm was able to readjust the state values to converge to the new path within a reasonable number of episodes. the objective of the experiment 2 is to evaluate the effectiveness of implicit coordination methods using shared contexts on general learning algorithms such as monte-carlo. both scenarios of experiment 2 1. 2. demonstrated improvements when compared to the results of experiment 1, in which, the latter is void of any form of coordination. however, several more experiments were carried out with increased agent counts from one to ten. it was noticed that initial gradual improvements fade away after reaching an optimum threshold of agents, which was however, variable based on the grid sizes. 3. among all the monte-carlo based experiments (experiments 1 & 2), the 4-agent cooperative method produced the best outcome (figure 5). this was a modification done to the original monte-carlo method to include the cooperative aspects with the objective that it could be compared in similar grounds with the aaants model. experiment 3, introduces the full scale features of the aaants model. it introduces the capabilities of emergence, innateness and implicit communication. in experiment 3, a key difference when compared to experiments 1 & 2, is that though there are multiple agents, there exists only one search thread at a time. the multiple agents coordinate different elementary actions of the at to navigate a single search node from source to destination. an at is executed based on inputs from the environment and each elementary action is contributed by a single agent. the results of the experiment 3 out perform that of experiments 1 & 2, and further demonstrate that capabilities improve when the innate layer contribute several ats to survive in the environment. most suitable at needs to be selected based on the sensations from the environment. 4. 5. 29 foraging in a grid world using action templates the international journal on advances in ict for emerging regions 01 (01) october 2008 6. further, it was noticed that when the amount of obstacles were increased within the grid-world, the aaants method converges considerably faster (within less number of episodes) than the montecarlo methods. this was due to the fact that aaants uses obstacle characteristics as navigation markers during the initial exploration process. these obstacles were described as local-optima and specifically within the aaants model referred to as hubs – special states that bridges regions of cells. for example, when there is a pattern of receiving high reward for moving forward when a certain type of obstacle is in the neighbourhood, the agents detects these situations as hubs and adapts to executing the appropriate at whenever such situations were faced. the summary of the experimental outcomes of all the experiments of the grid-world domain is tabulated in table 2. it could be stated that the number of episodes to converge and states to reach the goal state considerably reduces in the aaants domain. the final outcome is very stable in the aaants model when compared to the rest of the control experiments. 7. figure 5: comparison of average episodes taken to converge to the optimum path using different learning strategies 8. figure 6, depicts the results of experiments conducted on extended search spaces of 20 x 30 and 30 x 40 grid sizes. the experiment 2-4 agent scenario was taken to represent the mc learning method, which is actually the best performing out of all the mc experiments. the mc method does show convergence to an optimal path, however, the overall number of episodes increases considerably when compared to the aaants learning model. both experiments related to the aaants learning model show superiority in comparison to mc method. out of the two aaants model based experiments, the method which contained higher number of ats seems to converge with a relative lower number of episodes and further, the ratio of increase is lower. it could be concluded that the aaants learning model scales better in complex environments when compared to the mc method. the experiments conducted on the same grid sizes with increased number of obstacles demonstrated even better results in favour of the aaants model in comparison to the mc method. r. a. chaminda ranasinghe, a.p. madurapperuma, n.d. kodikara 30 october 2008 the international journal on advances in ict for emerging regions 01 (01) table 2: observation summary of the grid world experments observation/ experiments ext 1 sc 1 ext 1 sc 2 ext 2 sc 1 ext 2 sc 2 ext 3 sc 1 ext 3 sc 2 average number of states of the optimum path from source to destination 15 10 9 9 8 8 presence of local optima yes yes (relativelylow yes yes yes yes stability after converging to the optimal path no mostly mostly mostly yes yes ability to reach the optimal path no mostly mostly mostly yes yes minimum number of episodes to converge to optimum path > 100 50 100 30 50 27 40 25 30 20 22 ability to converge after adjusting the location of the goal subsequent to reaching convergence no mostly mostly mostly yes yes ability to converge after adjusting obstacle arrangement subsequent to reaching convergence no mostly mostly mostly yes yes conclusion the essence of emergence is that any of the contributors to the emergent behaviour is not aware of the master plan. the grid-world experiments 1 and 2 is void of any form of emergence, however, shows gradual improvements (within the 4 scenarios of the experiments 1 and 2) related to the use of shared contexts and implicit communication among the participants. however, 31 foraging in a grid world using action templates the international journal on advances in ict for emerging regions 01 (01) october 2008 the grid-world experiment 3, focuses on the emergent nature of behaviour with the introduction of the full capabilities of the aaants model. the aaants model demonstrates considerable improvement over the standard monte carlo technique and specially performs exceptionally better in larger grid sizes. further, it is concluded that dynamic changes in the environments (goal and obstacle location changes) are gracefully handled by the aaants model in comparison to the monte-carlo learning model. these observations figure 6: comparison of overall average episodes to converge in extended grid search spaces confirm the achievement of congruent behaviour in dynamic environments using the concept of the aaants model. the grid-world experiments confirm that the behavioural acts built based on innate action templates provide better convergence to the optimum behaviour than using a pure adaptation strategy void of innate behaviour, which thereby confirm the respective objective set forth in the introduction. the purely adaptive experiments especially in the grid-world simulation, demonstrates that the tests conducted void of action templates takes relatively more episodes to converge to the optimum path and further, intermittently settle down on local optima. references 1. parunak v, sauter j. and clark s. (1997). toward the specification and design of industrial synthetic ecosystems, fourth international workshop on agent theories, architectures, and languages (atal’97) 2. resnick m. (1994). turtles, termites, and traffic jams, explorations in massively parallel microworlds, isbn: 0-262-18162-2 3. holldobler b and wilson e.o. (1994), journey to the ants: a story of scientific exploration, isbn: 0-674-48526-2 4. babaoglu o., meling h., montresor a., (2001). anthill: a framework for the development agentbased peer-to-peer systems, university of bologna, italy, norwegian university of science and technology, norway 5. garcía c.g., (2001). artificial societies of intelligent agents 6. minsky m., (1986), the society of mind, a touchstone book , simon and schuster publishers, new york, isbn 0-671-65713-5 7. salazar-ciudad i., garcia-fernandez j., and sole r. v., (2000). gene networks capable of pattern formation: from induction to reaction-diffusion, complex systems research group, department of physics, fen-upc campus nord, barcelona spain 8. elman j.l., bates e.a., johnson m. h., karmiloff-smith a., parisi d., plunkett k. (1999). rethinking innateness, a connectionist perspective on development, isbn: 0-262-05052-8 9. decker k. and lesser v., (1995). designing a family of coordination algorithms, department of computer science, university of massachusetts, amherst 10. colombetti m., dorigo m., (1993). training agents to perform sequential behaviour 11. balch t., (1997). learning roles: behavioural diversity in robot teams, mobile robot laboratory, college of computing, georgia institute of technology, atlanta georgia 12 holldobler b. and wilson e.o., (1990). the ants, the belknap press of harvard university press cambridge, massachusetts. isbn: 0-674-040759 13. ranasinghe r.a.c., madurapperuma a.p. (2005). learning coordinated actions by recognising state patterns with hubs, eighth international conference on human and computers, university of aizu, japan 14. sutton r.s. and barto a.g., (1998). reinforcement learning: an introduction , mit press, cambridge, ma, a bradford book 15. barto a., duff m., (1994). monte carlo matrix inversion and reinforcement leaning, computer science department, university of massachusetts, usa 16. pfeifer r., scheier c, (1999). understanding intelligence, isbn-0-262-66125-x 17. nolfi s., parisi d., (1999). exploiting the power of sensory-motor coordination 18. ranasinghe r. a. c. (2008). emergence of congruent behaviour by implicit coordination of innate and adaptive layers of software agents, ph.d. dissertation, university of colombo, sri lanka 19. arai s., sycara k., (2000). multi-agent reinforcement learning and conflict resolution in a dynamic domain, the robotics institute, carnegie mellon university, usa 20 abbeel p., andrew y. ng, (2004). apprenticeship learning via inverse reinforcement learning, department of computer science, stanford university, stanford, usa r. a. chaminda ranasinghe, a.p. madurapperuma, n.d. kodikara 32 october 2008 the international journal on advances in ict for emerging regions 01 (01) abstract— this paper presents a study of the impact of memory architectures, distributed memory (dm) and virtual shared memory (vsm), in the solution of parallel numerical algorithms on a multi-processor cluster. a parallel implementation of the shallow water equations to model a tsunami is chosen as the case study. data is partitioned into sub-domains, namely a four by three grid scheme and an eight by six grid scheme which are used for the parallel implementation of this model. there are four versions of the parallel algorithm for each grid scheme: distributed memory without threads, distributed memory with threads, virtual shared memory without threads, and virtual shared memory with threads. these four parallel versions have been implemented on a high performance cluster, connected to the “nordugrid”. experiments are realized using the message passing interface (mpi) library, the c/linda, and the linux pthreads. subject to the availability of memory, the virtual shared memory version without threads performs best, and as the task is scaled up, the threaded version becomes efficient in both dm and vsm implementations. index terms—mpi, linda, multi processors, shallow water equations, tsunami model. ii. ntroduction atsunami is a series of waves generated in a body of water by an impulsive disturbance that vertically displaces the water column [1], [2]. earthquakes, landslides, volcanic eruptions, explosions can generate a tsunami. a tsunami can savagely attack coastlines, causing devastating property damage and loss of life. manuscript received march 6, 2009. accepted july 7th, 2009. this research was funded by the national science foundation, sri lanka (grant no. rg/2005/fr/07) and by spider, the swedish programme of ict for developing regions. k.ganeshamoorthy is with the department of computation & intelligent systems, university of colombo school of computing, 35, reid avenue, colombo-7, sri lanka. (e-mail: ganesh@webmail.cmb.ac.lk). d.n.ranasinghe and k.p.m.k.silva are also with the department of computation & intelligent systems, university of colombo school of computing, 35, reid avenue, colombo-7, sri lanka. (email: dnr@ucsc.cmb.ac.lk, mks@ucsc.cmb.ac.lk) r.wait is with the international science programme and department of information technology, uppsala university, uppsala, sweden. (richard.wait@isp.uu.se) on the performance of the parallel implementation of the shallow water model on distributed memory architectures k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait a tsunami is characterized as a shallow water wave. shallow water waves are different from wind generated waves, the waves many of us have seen at the beach. wind generated waves usually have a period of five to twenty seconds and a wavelength of about one hundred to two hundred meters. a tsunami can have a period in the range of ten minutes to two hours and a wavelength in excess of 500 km. it is because of their long wavelengths that a tsunami behaves as shallow water waves. a wave is characterized as a shallow water wave when the ratio between the water depth and its wavelength gets very small [1]. fig.1. typical mixed mode programming model [9]. the shallow water equations on a rotating sphere serve as a primary test problem for numerical methods used in modeling global atmospheric flows [3]. they describe the behaviour of a shallow homogeneous incompressible and inviscid fluid layer. they present the major difficulties found in the horizontal aspects of three-dimensional global atmospheric modeling. thus, they provide a first test to weed out potentially non-competitive schemes without the effort of building a complete model. however, because they do not represent the complete atmospheric system, the shallow water equations are only a first test. ultimately schemes which look attractive based on these tests must be applied to the complete baroclinic problem. the existence of a standard test set for the shallow water equations will encourage the continued exploration the international journal on advances in ict for emerging regions 2009 02 (01) : 3 10 4 k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait december 2009 the international journal on advances in ict for emerging regions 02 of alternative numerical methods and provide the community with a mechanism for judging the merits of numerical schemes and parallel computers for atmospheric flow calculations [3]. the current state-of-the-art in tsunami modeling still requires considerable quality control, judgment, and iterative, exploratory computations before a scenario is assumed reliable. this is why the efficient computation of many scenarios for the creation of a database of pre-computed scenarios that have been carefully analyzed and interpreted by a knowledgeable and experience tsunami modeler is an essential first step in the development of a reliable and robust tsunami forecasting and hazard assessment capability. using more advanced parallel algorithms, it may become technically feasible to execute realtime model runs for guidance as an actual event unfolds. however, this is not currently justified on scientific grounds; an operational real-time model forecasting capability must await improved and more detailed characterization of earthquakes in real-time, and verification that the real-time tsunami model computations are sufficiently robust to be used in an operational, real-time model [4]. shallow water equations have been widely used to study the tsunami phenomenon [5] [3], as a model of the basic fluid dynamics of the ocean. solving partial differential equations numerically for real-life problems are computationally demanding, therefore, utilizing super-computers/clusters efficiently is important in order to achieve computational efficiency [6]. strategies to improve the accuracy and overall quality of model predictions have been and continue to be of great interest to numerical model developers but in addition to accuracy, the utility of a numerical model is greatly affected by the algorithm’s efficiency [7]. parallel computing provides a feasible and efficient approach to solve very large-scale prediction problems but any redistribution of the data is a potentially time-consuming task for parallel architectures [8]. fig.1. shows the memory hierarchy that exists in most nodes of a modern cluster environment [9], where many nodes are linked together by a high-speed network; and inside each node there may be two or more processors; along with each processor, memory access is either to a high speed memory unit “cache” or the low speed “main memory”. the aim of this paper is to study the impact of memory architectures associated with distributed memory and virtual shared memory in the extraction of multiple levels of parallelism, for the solution of numerical algorithms. the objective here is to evaluate the effects of the programming model on the scalability of this shallow water model. computing the wave propagation in the tsunami model, where the entire ocean is the solution domain, is challenging, both due to the huge amount of computation needed and due to the fact that different physics applies in different regions [10]. the message passing interface, mpi [11], and c/linda [12] are alternative paradigms for communication between global nodes in the distributed memory environment and in the virtual shared memory environment, respectively. in the mixed mode programming model, both mpi and c/ linda are used for communication between global nodes while inside each mpi process and within each c/linda process posix threads [13] are used in order to extract further parallelism. this paper is organized as follows: in the section, related work, we present the related work for our study of research. the section on linear long wave theory describes the linear long wave theory used to simulate the tsunami model. in algorithm design, we describe our design principles. implementation is presented next. experimental results from both memory architectures on high performance cluster are presented and are discussed in evaluation. finally, in conclusion we make the concluding remarks. related workii. parallel simulators for tsunamia. hybrid tsunami simulators that allow different sub-domains to use different mathematical models, spatial discretizations, local meshes, and serial codes have been proposed by xing cai [10]. boussinesq water wave equations given below are used for this purpose. where η and φ are primary unknowns denoting, respectively, the water surface elevation and velocity potential. the water depth h is assumed to be a function of the spatial coordinates x and y . in equations (1) and (2) the weak effect of dispersion and nonlinearity is controlled by the two dimensionless constants ε and α respectively. the widely used linear shallow water equations can be derived by choosing ε = α = 0. ( ) ( ) ( )2 0 6 . 2 . 2 1 0. 3 1 6 1 . 22 = ∂ ∂ ∇+      ∂ ∂ ∇∇−+∇∇+ ∂ ∂ =∇      ∇∇− ∂ ∂ +∇+∇+ ∂ ∂ t h t hh t hh t hh t φεφε ηφφ αφ φ η εφαη η k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait 5 december 2009 the international journal on advances in ict for emerging regions 02 the equations from (1) to (4) are used to model the tsunami. the equations (1) and (2) are resolved by unstructured meshes and finite element discretization, whereas structured meshes and finite differences are commonly used for equations (3) and (4). such a parallelization strategy is most easily realized by using sub-domains, such that the entire spatial domain ω is decomposed into a set of overlapping sub-domains {ωs} p s=1. in generic setting, where a partial differential equation (pde) is expressed as lω(u)=fω, the schwarz algorithm consists of an iterative processes generating u0, u1, u2, ..........., uk as a series of approximate solutions. during schwarz iteration k, each sub-domain first independenly updates its local solution through where 1− ω k s f refers to a right-hand side which is restricted within ωs and depends on the latest global approximation uk-1 . then, the new global solution uk is composed by sewing together the sub-domain local solutions uk1, u k 2, u k 3, ..........., u k p equation (5) thus opens for the possibility of using different local solvers in different sub-domains. taking the idea of additive schwarz one step further, different mathematical models in different subdomains can be applied. therefore, different serial codes may be deployed region wise. the proposed hybrid parallel tsunami simulator is implemented using object-oriented techniques and is based on an existing advanced c++ finite element solver named class boussinesq applicable for unstructured meshes, and a legacy f77 finite difference code applicable for uniform meshes. the resulting hybrid parallel tsunami simulator thus has full flexibility and extensibility [10]. b. parallel computation of a highly nonlinear boussinesq equation model through domain decomposition applications of the boussinesq equations cover a broad spectrum of ocean and coastal problems of interest, from wind wave propagation in intermediate and shallow water depths to the study of tsunami wave propagation across large ocean basins. in general, implementations of the boussinesq wave model to calculate free surface wave evolution in large basins are computationally intensive, requiring large amount of cpu time and memory. to facilitate such extensive computations, a parallel boussinesq model has been developed by the khairil et al. [2], using the domain decomposition technique in conjunction with mpi. the parallel boussinesq model developed is based on its serial counterpart. the governing equations consist of the two-dimensional depth-integrated continuity equation: and the horizontal momentum equation: where s = ∇.ux, t = ∇.(hux) + ∂h/∂t , h is depth, η is free surface elevation. equations (6) and (7) differ from the equations given by wei et al. [14] in the inclusion of the time derivatives of the depth (h1, h2)to account for temporal bottom profile changes that occur during landslide/earthquake, which is one of several possible sources of tsunami. the parallel approach has had three important aspects, domain decomposition, communication, and parallel solver of the tridiagonal system of the simultaneous linear equations. three different domain sizes have been considered:(500 x 500), (1000 x 1000), and (2000 x 2000). the overall performance of the model has been very good. the efficiency of the model decreases as the number of (500 x 500)and (1000 x 1000) processors increases which is apparent in the case of domains. the rate of the efficiency decrease is faster for smaller domain. this is due to the ratio of arithmetic operation time to communication time decreasing faster for domains with smaller number of nodes. the performance of the model improves as the number of grids increases; a favourable feature of a parallel model which is intended for simulation on ever-increasing domain sizes. thus, this parallel model provides a future opportunity for large waveresolving simulations in the near shore, with global domains of many millions of grid points, covering o(100km2) and greater basins. further, real-time simulation with boussinesq equations becomes a possibility. c. implicit parallel fem analysis of shallow water equations jiang chunbo et al. [15] have solved the shallow water equations (swes) as the governing equation to model a river flow. swes are implemented on clustered workstations. for the parallel computation, the mesh is automatically partitioned using the ( ) ( ) ( ) ( )6 0tzhç 2 1 sz 2 1 hçhç 6 1 h.ìh. t h x 2 x 222 x =             ∇      −−+∇      −+−∇−∇+ ∂ ∂ u ( ) ( )5 1−ωω = kks ss ful ( ) ( ) ( )( ) ( )( ) ( ) ( )7 0 2 1 . 2 1 . 2 1 2 1 . 2 1 222 22 =       ++∇−+∇−++ ∂ ∂ ∇+             +∇−∇+∇ ∂ ∂ +∇+∇+ ∂ ∂ stsztzst t tstzsz t g t xxxx xxxx x ηηηη η ηηη uu uu u ( ) ( ) ( )4 0 3 0. =+ ∂ ∂ =∇∇+ ∂ ∂ η φ φ η t h t 6 k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait december 2009 the international journal on advances in ict for emerging regions 02 geometric mesh partitioning method. the governing equations are then discretized implicitly to form a large sparse linear system, which is solved using a direct parallel generalize minimum residual algorithm (gmres). the shallow water equations (8) and (9) used here are obtained by integrating the conservation of mass and momentum equations assuming a hydrostatic pressure distribution in the vertical direction. where η is the water elevation, h is the water depth, qi = uh i, ui is the mean horizontal velocity, g is the gravitational acceleration, v is the eddy viscosity, ρ is the water density, τsi is the surface shear stress, and τbi is the bottom shear stress. the finite element method for triangular elements is used for the spatial discretization. three kinds of finite element meshes are used to simulate the velocity field in the two numerical examples, flow around a circular cylinder and flow in the yangtze river. mpi has been used for the communication between nodes. the computed results agree well with the observed results. the good speedup and efficiency for the parallel computation show that the parallel computing technique is a good method to solve large-scale problems [15]. linear long wave theoryiii. there are many different numerical methods for computing shallow water equations on a sphere. therefore, a standard test suite of seven problems for evaluating numerical methods for the shallow water equations in spherical geometry was proposed by williamson et al [3] and accepted by the modeling community in order to compare newly proposed methods. the shallow water equations are widely used as a prototype to study phenomena like wave-vortex interactions that occur in more complicated models of large scale atmosphere/ ocean dynamics [5]. consider the sea to be a volume of incompressible water on a rotating sphere, with coriolis force, fu. the horizontal coordinates are x and, y the vertical coordinate z, which is zero at the mean sea surface and positive upwards. the sea bed is located at z = h, and the surface is located at z = h. the linear shallow water equations [16], [17], [18] consists of the continuity equation and the conservation of horizontal momentum u and v are the velocity components in xand ydirection, is the surface elevation and g is the acceleration due to gravity. defining the equations in terms of the discharge fluxes u = uh, v = vh, leads to discretization that always satisfy the conservation of mass. then the conservation of horizontal momentum can be written as the finite difference approximations using centered differences in space and a leap-frog time discretization, are based on a staggered grid corresponding to an arakawa c-grid [19], [20], with the continuity equation centered on the point xi , yj , tk + 1/ 2 and the equations of motion centered on the points xi+1/2 , yj , tk and xi , yj+1/2 , tk respectively. writing d ≡ h + h, and using upwind differences for the convection terms to maintain stability it follows that where with up-winding, and similarly for the other terms. the difference equations are defined on a rectangular grid in terms of spherical polar coordinates. an accurate representation of tsunami running up the shore implies a grid spacing of no more that 100 meters in a region of about 4 km out from the shore. as this fine grid is not reasonable over the whole ocean a succession of overlapping grids is necessary near the coast. a data decomposition scheme is applied (10) 0)]([)]([ =+ ∂ ∂ ++ ∂ ∂ + ∂ ∂ hhv y hhu xt h (1 2 ) (1 1 ) y h gfu y v v x v u t v x h gfv y u v x u u t u ∂ ∂ −=+ ∂ ∂ + ∂ ∂ + ∂ ∂ ∂ ∂ −=− ∂ ∂ + ∂ ∂ + ∂ ∂ ( ) ( 9 ) 2,1, ,1 =                 ∂ ∂ + ∂ ∂ ∂ ∂ +−+ ∂ ∂ −=      ∂ ∂ + ∂ ∂ ji x q x q v xx g h h q x q t q i j j i j b i s i i i j j i ττ ρ η ( ) ( ) (1 4 ) (1 3 ) 2 2 y h hhgfu hh v yhh u v xt v x h hhgfv hh u v yhh u xt u ∂ ∂ +−=+      +∂ ∂ +      +∂ ∂ + ∂ ∂ ∂ ∂ +−=−      +∂ ∂ +      +∂ ∂ + ∂ ∂ (15) 1 2 2 1 2 1 , 2 1 2 1 , 1,3 2 1 2 1 , 2 2 1 2 1 , 1,2 2 1 2 3 , 2 2 1 2 3 , 1,1 2                     +       +       ∆ ≈      ∂ ∂ − − − − − + − + − + − + k ji k ji k ji k ji k ji k ji d u d u d u xd u x λλλ    =−==< −===≥− + 0 ,1 ,1 0 1 ,1 ,0 0 3,12,11,1 3,12,11,12 1 2 1 , λλλ λλλk ji u (8) 0= ∂ ∂ + ∂ ∂ j j x q t η k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait 7 december 2009 the international journal on advances in ict for emerging regions 02 the physical or computational domain used in this paper is rectangular in shape. in the domain decomposition method, the rectangular domain is divided into several smaller rectangular subdomains, where the number of sub-domains is equal to the number of processors used. with four processors, for example, there are three possible ways of decomposing the domain into equal-area parts as depicted in fig. 2. the best decomposition depends not only on the architecture of the system being used, but more importantly on the memory limitations on each node, especially in commodity clusters, as such, an assignment like two by two is recommended. fig. 3. (a) overview of the unstructured grid and assignment of worker nodes to sub-domains. (b) arrows indicate the innerworker communication. an important aspect in decomposing the domain is the load balancing, i.e. all processors should have equal or almost equal amount of data to be processed. if the number of grid points is divisible by the number of processors, the grid points in each processor is simply the ratio of the number of grid points to processors. if it is not, the remainder is distributed across the first m processors, where m is the remainder. the load should be balanced in both x and y directions. fig. 4. linear shallow water wave profiles at three different timesteps (ts) calculated using 48 processors. for the parallel solution of the shallow water equations. in data decomposition, we keep the sequential formulation of the problem, but distribute the data and operations among the processors. the scalability of several data decomposition algorithms for finite difference atmospheric and ocean models have been analyzed by several authors [21], [22]. several strategies exist within the data decomposition paradigm for dividing domains into sub-domains. in the two-dimensional grid, the computational domain is decomposed both in x and y coordinate directions. in many cases, the computation is proportional to the volume of a sub-domain and the communication is proportional to the surface area. in such cases, a logical strategy is to partition the domain in such a way that it minimizes the surface area of each sub-domain relative to its volume. this keeps the computation-to-communication ratio high. in this study, two-dimensional decomposition is chosen. this involves assigning each sub-domain to a processor and solving the equations for that subdomain on the respective processor. with the two dimensional decomposition, no global information is required at any particular grid point and interprocessor communications are required only at the boundaries of the sub-domains. the inner-border of a sub-domain requires the outer-border of the adjacent sub-domain during a time-step because of the spatial discretization [16], [17], [18]. algorithm designiv. in our work, the domain decomposition method is used to parallelize the tsunami model. in this method, the parallel algorithm is very similar to the serial algorithm with some additional routines added to facilitate the communication between processors. using this method, all the processors involved in the parallel calculations basically perform the same computational operations. the only difference is in the data being processed in each processor. rectangular physical or computational domain 0 1 2 3 0 1 2 3 0 2 3 1 fig. 2. three possible ways of decomposing a rectangular domain. the numbers in the sub-domains represent the processor number. 8 k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait december 2009 the international journal on advances in ict for emerging regions 02 two grid schemes, a 12 node (4 x 3) grid system and a 48 node (8 x 6) grid system, are used to parallelize the tsunami model with a domain size of (900 x 1226). portions of the domains are assigned to each of the worker nodes, as illustrated in fig. 3 where each sub-domain is labeled with a processor (worker) number. snapshots of the free surface evolution are shown in fig. 4. data for each subdomain is stored with each processor including water depth, coordinates, and initial disturbance. this data is read by each of the workers in a preprocessing stage i.e., prior to initiating the timeintegration loop. since the model updates the solution explicitly using local data, each processor works independently of the others but requires data from neighbouring workers to update solution along sub-domain boundaries. the exchange of data between processors occurs several times per time-step. there are two key features to this exchange: first, only data along the boundaries between processors is exchanged. second, each processor is only required to communicate with at most four other processors. the domain decomposition is performed with this second feature in mind to avoid communications between more than four other processors. communication occurs between two adjacent processors during message passing. in passing the data from one processor to another, an efficient and safe communication must be developed. to efficiently exchange the data between adjacent processors, the data is first stored in a contiguous memory prior to executing the sending processes. at the same time contiguous memories of the same size as used in the sending processes are created to receive the data from the sending processes. at this point, the data is ready for sending and receiving processes. table i run time in seconds (wot – without thread; wt – with thread) nodes mpi wot wt c/linda wot wt 12 9686.54 11053.54 2743.74 3297.78 48 11934.38 10273.73 3183.43 2845.34 in this model, message passing occurs four times per time-step and the mpi function, ``mpi_ sendrecv’’, and the c/linda operations “in” and “out” are used to perform the message passing between global nodes. this corresponds to eight messages per timestep per processor independent of the number of processors being used. many more messages are sent during pre-processing, but these are ignored for run-time analysis purposes since time integration is by far the most time consuming element of the program. the parallel algorithm is outlined below. 1. decompose rectangular domain into load balanced rectangular sub-domains where the width and length of a rectangular sub-domain are w/nw approximately, and l/nl respectively, where w and l are the width and length of domain, and n = nw*n1 is the number of processors to be used in the (nw x n1) grid scheme. 2. specify parallel language related parameters, such as locations, neighbours, sub-domain sizes, and file names, of processors. location of kth processor is (rk, ck), where rk = k/gdm1 + 1 and ck = k (rk 1)* gdm1 + 1 with gdm1 and gdm2 being the dimensions of the grid. in the four by three grid scheme, gdm1 = 3 and gdm2 = 4, and in the eight by six scheme, gdm1=6 and gdm2=8. for k = 0, 1, 2, 3, 4, . . . . . . . , (gdm1*gdm2 -1), define northk, southk, eastk, and westk to be the neighbours located in the side of north, south, east and west of the kth processor, then: northk = k gdm1 if rk > 1 southk = k + gdm1 if rk 1 3. input data and initial conditions: each processor, (a) pk , reads the water depth, coordinates, and initial disturbance from the text file assigned in step 2. (b) each processor, pk , exchanges subdomain boundary data from eastk to westk, from westk to eastk, from northk to southk , and form south, to northk, in oder. 4. set parameters and coefficients used at the open sea boundary. 5. repeat the following steps for the pre-defined time-steps: (a) each processor, pk, exchanges subdomain boundary data from eastk to westk. (b) each processor, pk, exchanges subdomain boundary data from northk to southk (c) computation of the equation of continuity. (d) setting of the open sea boundary condition. k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait 9 december 2009 the international journal on advances in ict for emerging regions 02 (e) each processor, pk , exchanges subdomain boundary data from westk to eastk. (f) each processor, pk , exchanges subdomain boundary data from southk to northk. 6. gather computed results among all processors. 7. compute the output on processor, where both are equal to null. implementationv. data is partitioned by sub-domains, where a four by three grid scheme and an eight by six grid scheme are used for the parallel implementation of this model. in each of the grid schemes, there are four parallel variations: (1) distributed memory without threads; (2) distributed memory with threads; (3) virtual shared memory without threads; (4) virtual shared memory with threads. implementations use the message passing interface (mpi) library [11], the c/linda [12] and the pthread library [13]. the mixed-mode programming model which uses thread programming in the shared memory layer and message passing programming in the distributed memory layer is a method commonly used to utilize the hierarchy of memory resources efficiently [9]. in our mixed-mode programming model, mpi is used for the data communication between the global nodes, and within each mpi process, pthreads are used. in the virtual shared memory mixed-mode approach, c/linda is used for the data communication between the global nodes, and within each c/linda process, pthreads are used. these four algorithms have been implemented on the monolith cluster of nordugrid [23] which consists of a high speed backbone interconnecting multi processor nodes. the monolith cluster has 396 nodes, all of which are i686 architecture with 2.20ghz intel(r) xeon(tm) processors, dual processor nodes and 2048mb per node main memory and 512kb cache. the operating system is linux version 2.4.34-cap1-smp. evaluationvi. table i shows the timing values for the eight scenarios arising from the four parallel variations of the algorithm mentioned above. consider the four by three grid scheme, with both threading and non-threading. the parallel algorithms for the virtual shared memory exhibit better performance than the algorithms for distributed memory. though the virtual shared memory implicitly passes messages, the replication management subsystem has been optimized in c/linda compared to mpi [10], to yield better performance. in the four by three grid scheme, in both scenarios for distributed and virtual shared memory, nonthreading algorithms exhibit better performance than threading algorithms. one possible explanation for this is that each node keeps nine, two dimensional floating point type arrays of size equal to the sub domain size. because of the memory limitations it is not possible to declare local two-dimensional arrays for threads, causing all threads in a node to concurrently use global arrays for their own computations. now consider the eight by six grid scheme. here too, for both threading and non-threading, the parallel algorithms for the virtual shared memory exhibit better performance than the algorithms for distributed memory. in contrast to the previous instance, both algorithms for distributed and virtual shared memory, the threading algorithms exhibit better performance than non-threading algorithms. this is because that the sub domain size allocated to each node is half size of sub domain size of the smaller grid scheme of four by three size. the local two dimensional floating point type array allocation for threads in a node is now possible compared to the four by three grid system. therefore, since all threads of a node work independently, the threading parallel algorithm shows better performance than non-threading parallel algorithm. among the parallel variations not using threads in both memory architectures, the four by three grid scheme shows better performance than eight by six grid scheme. this is due to the ratio of computation time to communication overhead decreasing faster for domains with smaller size. however, when the task is scaled up, say up to eight by six grid system, owing to the smaller sub-domain sizes aligning with the available memory in the node, the threaded versions become more efficient in both mpi and c/linda implementations. in both threading and non-threading environments, the c/linda version exhibits better performance than the mpi version. conclusionvii. this paper has presented eight different parallel implementations of a tsunami model based on the shallow water equations. each of these implementations use a mixed-mode programming model from thread based shared memory, to 10 k.ganeshamoorthy, d.n.ranasinghe, k.p.m.k.silva and r.wait december 2009 the international journal on advances in ict for emerging regions 02 distributed memory and finally to virtual shared memory. owing to node memory limitations, scalability issues become paramount, and threading becomes a significant bottleneck if sufficient node memory is not available, offsetting the middleware advantages. with sufficient node memory however, c/linda with threads outperforms mpi with threads, indicating the effectiveness of extracting parallelism over virtual shared memory, distributed memory and shared memory multiple levels for this class of problems. acknowledgment this work was conducted at the department of computation and intelligent systems, university of colombo school of computing. this research was performed using computational resources at the national supercomputer centre, linkoping university, sweden, and of the computational resources of research laboratory, university of colombo school of computing. references [1] wikipedia. http://en.wikipedia.org/wiki/tsunami, 2009. [2] k. i. sitanggang and p. lynett, parallel computation of a highly nonlinear boussinesq equation model through domain decomposition, international journal for numerical methods in fluids, issn 0271-2091, vol. 49, no.1, 2005, pp. 57-74. [3] d. l. williamson, j. b. drake, j. j. hack, r. jakob, and p. n. swarztrauber, a standard test set for numerical approximations to the shallow water equations in spherical geometry, journal of computational physics, 102(1), pp. 211-224, 1991. [4] t. vasily, g. frank, and m. hal, tsunami forecasting and hazard assessment capabilities, pacific disaster center (pdc) & maui high performance computing center (mhpcc) tsunami modeling, national oceanic and atmospheric administration center for tsunami research ( http://nctr.pmel.noaa.gov/, 2009) [5] j. d. paul, and s. rick, shallow water equations with a complete coriolis force and topography, physics of fluids, september 2005. [6] b. ragnhild, nested parallelism in openmp with application to adaptive mesh refinement, university of bergen, february 2003. [7] f. s. brett and c. p. john, parallel implementation of an explicit finite-volume shallow-water model, 16th asce engineering mechanics conference, july 16-18, 2003, university of washinton, seattle. [8] s . l . j o h n s s o n a n d c . h o , a l g o r i t h m s f o r m a t r i x transposition on boolean n-cube configured ensemble architectures, siam j. matrix analysis and application, vol.9(3), july 1988, pp.419-454. [9] w. meng-shiou, a. srinivas, and a. k. ricky, mixed mode matrix multiplication, proceedings of the ieee international conference on cluster computing, ieee computer society washington, dc, usa, 2002. [10] c . x i n g a n d p. l . h a n s , m a k i n g h y b r i d ts u n a m i simulators in a parallel software framework, lncs, applied parallel computing. state of the art in scientific computing, vol. 4699/2008, pp. 686-693. [11] mpi forum http://www.mpi-forum.org/http://www-unix. mcs.anl.gov/mpi/, 2009. [12] linda user guide, scientific computing associates ic., one century tower, 265 church street, new haven, ct 06510 7010 usa, september, 2005. (http://www.lindaspaces.com/ about/index.html, 2009) [13] pthreads.http://www.math.arizona.edu/swig/pthreads/ threads.html, 2009. [14] g. wei, j. t. kirby, s. t. grilli, r. subramanya, a fully nonlinear boussinesq model for surface waves, part i, highly nonlinear unsteady waves, journal fluid mechanics1995; 294:71-92. [15] j. chunbo, l. kai, l. ning, and z. qinghai, implicit parallel fem analysis of shallow water equations, tsinghua science & technology, vol. 10, issue 3, june 2005, pp. 364371. [16] http://www.gfdl.noaa.gov/fms/pubrel/m/atm_dycores/src/ atmos_spectral_shallow/shallow.pdf, 2009. [17] http://www.sea.ee/~elken/do5.pdf, 2009. [18] http://www.misu.su.se/~goran/shallow_water/, 2009. [19] arakawa a and lamb v, computational design of the basic dynamical processes of the ucla general circulation model, methods in computational physics, vol. 17, academic press, 1977, pp. 174-267. [20] c. goto and y. ogawa, dept. of civil engineering, tohoku university, translated for the tsunami inumdation modelling exchange project, by n. shuto. numerical method of tsunami simulation with the leap-frog scheme, unesco intergovernmental oceangraphic commission, manuals and guides, 35, 1992. [21] s. roar, scalability of parallel grid point limited area atmospheric models i & ii, manuscript, department of mathematics sciences, norwegian university of science and technology, trodheim, norway, 1996. [22] s. thomas, j. cote, a. staniforth, i. lie, and r. skalin, a semi-implicit semi-lagrange shallow-water model for massively parallel processors, proceedings of the 6th ecmwf workshop on the use of parallel processors in meteorology, november 1994, pp. 407-423. [23] the grid middleware project in the nordic countries. http://nordugrid.org abstract: structural design of an artificial neural network (ann) is a very important phase in the construction of such a network. the selection of the optimal number of hidden layers and hidden nodes has a significant impact on the performance of a neural network, though typically decided in an adhoc manner. in this paper, the structure of a neural network is adaptively optimised by determine the number of hidden layers and hidden nodes that give the optimal performance in a given problem domain. two optimisation approaches have been developed based on the particle swarm optimisation (pso) algorithm, which is an evolutionary algorithm which uses a cooperative approach. these approaches have been applied on two well known case studies in the classification domain, namely the iris data classification and the ionosphere data classification. the obtained results and comparisons done with past research work has clearly shown that this method of optimisation is by far, the best approach for adaptive structural optimisation of anns. keywords: neural networks, particle swarm optimization, weight adjestment, hidden layer adjestment. introduction artificial neural networks (anns) which have been inspired by biological neural networks, are used specially in imitating many qualities seen in human beings like identifying objects and patterns, making decisions based on prior experiences and accumulated knowledge, prediction of future events based on past happenings, etc.. the very fact that the human brain is very efficient in carrying out these actions is mainly attributable to its complex and intricate, but very effective neural network structure. besides the learning algorithm of a specific neural network, constructing an effective neural network structure is perhaps the single most challenging aspect in the designing of an ann. this is due to the high cohesiveness between the performance of a neural network and the structure of that particular neural network. until recently the structure of a neural network was defined by intuition or based on empirical suggestions. as far as the number of hidden layers were concerned a theoretical result by horniket alstated in [2], as ‘..a feed forward neural network with one layer is enough to approximate any continuous non linear function arbitrarily well on compact interval, provided adaptive structural optimisation of neural networks n. p. suraweera1*, d. n. ranasinghe2 1department of physics, university of colombo, sri lanka 2 university of colombo school of computing, sri lanka prash_sweera@yahoo.com, dnr@ucsc.cmb.ac.lk revised: 16 october 2008; accepted: 10 october 2008 that a sufficient hidden neurons are available’, may have had an influence in this way of thinking. in the recent years, particle swarm optimisation (pso) algorithm, which is a simple, easy to implement but highly effective evolutionary algorithm, has also been used for the purpose of ann evolution. according to the best of our knowledge, pso has not been used thus far, to evolve a full neural network structure, i.e., both the hidden layers and the number of nodes in a particular hidden layer, presumably due to the earlier mentioned theoretical result. however, in our research we show that it is indeed possible to come up with an adaptively optimized number of hidden layers for the neural network which will also yield improved classification results. as such, this research has strived to come up with an optimal structure for an ann by applying the pso algorithm, on a network used in a particular problem domain. the paper is organized as follows: in section ii, a brief overview of feed-forward neural networks and particle swarm optimisation is given, section iii is related work, section iv discusses the design and implementation aspects, section v presents the results and section vi gives the conclusion and future work that can be carried out on the optimisation approaches. overview of ann and pso the anns considered within this research are multilayer feed-forward neural networks and the given sample problems are solved through supervised learning using back propagation. importance of the architecture of an ann the architectural/topological design of the ann has become one of the most important tasks in ann research and application. it is known that the architecture of an ann has significant impact on a network’s information processing capabilities. given a learning task, an ann with only a few connections and linear nodes may not be able to perform the task at all due to its limited capability, while an ann with a large number of connections and nonlinear nodes may overfit noise in the training data and fail to have good generalization ability [1]. up to now, architecture design is still very much a human expert’s job. it depends heavily on the expert experience and a the international journal on advances in ict for emerging regions 2008 01 (01) : 33 41 * corresponding author tedious trial-and-error process. even though anns are easy to construct, finding a good ann structure is a very time consuming process [2]. as there are no fixed rules in determining the ann structure or its parameter values, a large number of anns may have to be constructed with different structures and parameters before determining an acceptable model. against this background, a logical next step is the exploration of more powerful techniques for efficiently searching the space of network architectures [3]. pso particle swarm optimisation (pso) is a population based stochastic optimisation technique developed by james kennedy and russell eberhart in 1995, inspired by social behavior of bird flocking or fish schooling. pso introduces a method for optimisation of continuous nonlinear functions [4],[5]. this algorithm is simple in concept, computationally efficient and effective on a variety of problems. pso is initialized with a group of random particles (solutions) and then searches for optima by updating generations. in every iteration, each particle is updated by following two “best” values. the personal best solution (fitness) it has achieved so far (measured using a fitness function). this value is called pbest. the best value obtained so far by any particle in the population. this best value is a global best and called gbest. apart from these values, when a particle takes part of the population as its topological neighbors, the best value is a local best and is called lbest. after finding the above parameters, the particle updates its velocity and position with following equations (1.1) and (1.2) [4]. v[t+1] = v[t] + c1* rand( ) * ( pbest[t]position[t] ) + c2 * rand( ) * ( gbest[t] position[t] ) 1.1 position[t+1] = position[t] + v[t] 1.2 v[t] and v[t+1]is the particle velocity, position[t] is the current particle (solution). pbest[t] and gbest[t] are defined as stated before. rand( ) is a random number between (0,1). c1, c2 are learning factors (usually c1 = c2 = 2). the pso algorithm [5] can be implemented by incorporating the above equations. the swarm size is a critical parameter – too few particles might cause the algorithm to become stuck in local minima, while too many particles will slow down the algorithm. the optimal number of particles per swarm will also depend on the function given in [6]. advantages of the pso approach the considerable adaptability of pso to variations and hybrids is seen as a strength over other robust evolutionary optimisation mechanisms, such as genetic 1. 2. algorithms (ga). normally, a stochastic hill-climber risks getting stuck at local maxima, but the stochastic exploration and communication of the swarm overcomes this [7]. the interaction of the particles in the swarm creates a very good balance between straying off the course and staying close to the optimal solution. the pso algorithm is easy to implement because it is expressed in a very few lines of code, and requires only specification of the problem and a few parameters in order to solve it [4]. another advantage is that pso takes real numbers as particles; hence eliminating the need of a special encoding scheme or the need to use special genetic operators. compared with other evolutionary algorithms such as ga, pso algorithm possesses attractive properties such as memory and constructive cooperation between individuals. all particles in a pso population carry memory (in the form of the personal best value it has reached so far), whereas in a ga if an individual is not selected the information contained by that individual is lost. because there are no selection and crossover operation in pso, each individual in an original population has a corresponding partner in a new population. it can avoid the premature convergence and stagnation in gas to some extent [9]. the cooperative approach followed by pso is seen as the biggest advantage over the competitive approach taken by the gas since, in cooperative situations, others are depending on you to succeed but in competitive situations, others hope to see you fail. so pso is a cooperative approach to optimisation rather than an evolutionary approach which kills off unsuccessful members of the search team. it is in the collective sharing of knowledge that solutions are found. related work ann weight training using pso adjusting weights to train a feed-forward multilayer ann has been one of the earliest applications of pso. according to kennedy and eberhart who are the developers of the pso algorithm, a particle swarm optimizer could train nn weights as effectively as the usual error backpropagation method [4]. one of their first experiments involved training weights for a threelayer ann solving the exclusive-or (xor) problem. they have also used a particle swarm optimizer to train a neural network to classify the fisher iris data set [10]. intriguing informal indications are that the trained weights found by particle swarms sometimes generalize from a training set to a test set better than solutions found by gradient descent method. gudise and venayagamoorthy [8], have shown that feed-forward neural network weights converge faster with the pso than with the back propagation algorithm. in order to compare the training capabilities of back propagation and pso algorithm, a non-linear quadratic equation, y = 2x2 + 1, with data points (patterns) in range (1 , 1) has been presented to the feed-forward neural network. based on the experimental results, the number n.p. suraweera, d.n. ranasinghe 34 october 2008 the international journal on advances in ict for emerging regions 01 (01) of computations required by each algorithm shows that pso requires less number of iterations to achieve the same error goal as compared to the back propagation. thus, pso is better for applications that require fast learning algorithms. an important observation made is that when the training points are fewer, the ann learns the nonlinear function with six times lesser number of computations with pso than that required by the back propagation. moreover, the success of back propagation depends on choosing a bias value unlike with pso. it is also stated that the concept of the pso can be incorporated into back propagation algorithm to improve its global convergence rate. more recent work in this regard is in [18], [19]. architecture evolution together with weight training of anns direct application of pso to evolve the structure of an ann has been done by zhang, shao and li[9]. both the architecture and the weights of anns are adaptively adjusted according to the quality of the neural network. recent similar work is also in [16], [17]. ann weight initialization apart from complete weight training, pso has also been used to initialize the weights of anns. van den bergh [11] his paper has shown that training performance can be improved significantly by using pso to initialize the weights, rather than random initializations. he has stated that since the weights in an ann serve as a starting position in error space, from where the optimisation algorithms proceed to find a minimum in the error space, it is clear that the precise starting position can affect the speed and accuracy with which the algorithm will find the minimum. by the means of two case studies, namely the ionosphere classification problem [10] and the henon curve problem, it has been shown that using pso to initialize weights will reduce the total time needed to train multi-layer perceptron networks. but it also mentions that even though pso can be used to train the multi-layer perceptron networks to completion, it will seldom be quicker than a mix between pso and gradient-based optimisation techniques. other adaptive techniques eberhart, one of the creators of the pso algorithm, and xiaohui have evolved not only the network weights but also the slopes of the sigmoidal transfer functions of hidden and output processing elements using pso [12]. the method is general, and can be applied to other transfer functions as well. flexibility is gained by allowing the slopes of the transfer function to be positive or negative. a change in sign for the slope is equivalent to a change in signs of all input weights. since the pso process is continuous, neural network evolution is also continuous. no sudden discontinuities exist such as those that plague other evolutionary approaches. design and implementation initially an association was made between the parameters of the pso and the ann, in order to construct an algorithm which would evolve the architecture of the ann. since this research involves two parameters to be optimized in an ann, namely the number of hidden layers and hidden nodes in each layer, these two parameters were mapped to appropriate variables of the pso algorithm. association between pso and ann the mapping resulted in the defining of a 1:1 relationship between the position variable of a particle in the pso swarm and number of hidden nodes in a layer of an ann. therefore the number of hidden nodes of each hidden layer will be indirectly evolved due to the velocity parameter (v) of the pso algorithm. the number of dimensions (the number of times the pso equations should be iterated) was associated with the number of hidden layers in each network. thus when executing the loop with the pso equations, it will iterate through each hidden layer corresponding to a network, optimizing the number of hidden nodes in each layer. the global best value reflects the optimum number of hidden nodes for an optimum number of hidden layers. fig 1 illustrates the mapping between the pso algorithm and ann. for i = 1 : to number of particles (m) do for j = 1 to number of dimensions (n) do number of hidden layers in ann r1 = uniform random number r2 = uniform random number v[i][j] = v[i][j] + c1*r1*(pbest[i][j]-position[i][j] + c2*r2*(gbest[j]-position[i][j]) position[i][j] = position[i][j] + v[i][j] number of hidden nodes in a layer enddo enddo figure 1:mapping between pso and ann optimisation approaches the’ global best’ approach in this method the position matrix values (number of hidden nodes of each hidden layer, in each network) were randomly initialized for a population of 30 particles (30 networks). this initialization was done subject to the constraints of the minimum and maximum number of hidden layers allowed in one network (the minimum number = 1, the maximum number = 5) and the number of particles in a population. since the random generation of position variables corresponding to each network allows a value to even be zero, a cleaning process was essential to proceed with the evolution. this cleaning process was implemented so that after the initialization of the number of hidden nodes in each network, it will verify the fact that none of the 35 adaptive structural optimisation of neural networks the international journal on advances in ict for emerging regions 01 (01) october 2008 networks have zero hidden nodes (which means that there is no hidden layer) in the middle of any network. in any case if there is a network which has this initial configuration, the cleaning process will remove the rest of the hidden layers also (because it is infeasible to have a network which has no hidden nodes in a prior hidden layer and has hidden nodes in the latter hidden layers). after carrying out this cleaning process, it gives a resulting population which has different numbers of hidden layers. these networks are then trained and the performance is evaluated using the classification accuracy percentage of the ann. the global best value of the population is defined according to the highest accuracy achieved by a network. the global best variable (‘gbest’-which is similar to an array), contains the number of hidden nodes in each layer of the ann which has given the best ever performance. the classification accuracy percentage is then checked to evaluate whether the required performance is reached by any network in the population. if so, then the program is terminated. if not, the pso equations will be applied to the parameters of the ann, and new values will be obtained for the number of hidden nodes in each layer. this evolution of each network was done by considering its personal best performance and the global best performance, where the latter gives the best performance ever to be reached by a network in the whole population. this process also can give rise to the cancellation of hidden layers in the middle of a network. therefore the cleaning process will be carried out again. then the above mentioned process will carry on iterating until the required performance is reached by any network. the most important aspect in this method of evolution is that one instance which has obtained the best ever performance in the whole population, in all executed iterations, is kept as a global measurement which will directly influence the evolution of all other networks in the population. this clearly demonstrates the cooperative approach followed by the pso algorithm. fig 2 illustrates the global best approach using a flow chart. since it was observed that the randomly initialized population in the above method tend to mostly consist of networks belonging to one class (e.g., networks with 5 hidden layers), it was then decided to create a uniform population (i.e., similar number of networks from each class) in the first stage of the algorithm. the rest of the algorithm was carried out in the same order. the ‘local best’ approach in this method, the main difference from the above method was that instead of a global best value for the whole population, local best values were taken into consideration within the pso algorithm. a local best value was defined for each class (e.g., 5 local bests corresponding to the networks belonging to the 5 classes – 1 hidden layer networks, 2 hidden layer networks, ….etc). therefore the evolution of each network was done by considering its personal best performance and the local best performance values. this gives rise to the modification of equation 1.1 as follows. v[t+1] = v[t]+c1*rand( )*( pbest[t] position[t] )+ c2*rand( )*( lbest[t] position[t] ) 1.3 similar to the earlier situation, pbest gives the best configuration ever to be reached by each specific network while the lbest gives the best configuration within a class (number of hidden nodes in each layer of the network which has given the best performance for a given class of networks). figure 2: flow chart for ‘global best’ approach n.p. suraweera, d.n. ranasinghe 36 october 2008 the international journal on advances in ict for emerging regions 01 (01) randomly initialize values for position matrix (corresponds to the number of hidden nodes in each hidden layer) carryout cleaning process create anns according to the values set in the position matrix train network and evaluate performance define network with best performance as the network with optimal structure carryout pso algorithm on each position value define pbest of each network and gbest for whole population required performance achieved by any network? yes no for each given class above, a local best was defined by comparing the performances among the members of a class. then each network in a class will try to achieve that specific local best corresponding to its class. therefore a network will never change its number of hidden layers during the execution of the algorithm but will change the number of hidden nodes in its predefined hidden layers. a cleaning process was not needed within this approach due to the above reason. fig 3 illustrates the above mentioned local best approach. implementation procedure the two approaches designed above were implemented in matlab and each method was applied on the selected application case studies. figure 3: flow chart for ‘local best’ approach fishers’ iris data set classification this is a multivariate data set introduced by sir ronald aylmer fisher (1936) as an example of discriminant analysis [10] . it consists of 50 samples from each of three species of iris flowers. initially 75 sets of inputs (half of the data set) from the iris data set were fed into all networks in the population, as training data. then each network was simulated using the whole data set. based on the classification, the performance measure of classification accuracy percentage was introduce into the program. the global best of the population and personal bests of each particle was identified using this performance measure. ionosphere data classification this deals with the classification of radar returns from the ionosphere [10]. “good” radar returns are those showing evidence of some type of structure in the ionosphere. “bad” returns are those that do not; their signals pass through the ionosphere. there are 34 continuous input variables in each data set and a total of 351 instances should be classified as either ‘good’ or ‘bad’ radar return patterns. since this data set does not have an equal number of data sets belonging to each of the two classes (there are 225 ‘good’ and 126 ‘bad’ radar return patterns), the first 200 data sets were used as the training set (the ‘good’ and ‘bad’ data sets are given alternatively). this data selection method was followed, since past research work which has used this data set in ann classification experiments, have used this same method [13]. results and evaluation experimental results were obtained for each of the case studies with the parameters set as: swarm (population) size = 30, c1=c2=2.0 maximum allowed number of hidden layers = 5 maximum allowed number of nodes in hidden layer = 10 number of training epochs = 200 iris data classification results ‘global best’ approach table 1: results of iris data set classification by global best approach inst ance optimal no. of hidden layers no. of hidden nodes in each hidden layer classification accuracy (%) 1st 2nd 3rd 4th 5th 1 2 3 4 0 0 0 97.33 2 2 9 6 0 0 0 97.33 3 2 7 2 0 0 0 97.33 4 2 5 9 0 0 0 97.33 5 3 6 4 4 0 0 97.33 6 2 4 7 0 0 0 97.33 7 2 5 8 0 0 0 97.33 8 3 6 4 7 0 0 97.33 9 2 5 9 0 0 0 97.33 10 3 6 6 5 0 0 97.33 the experimental results in table i show that 2 or 3 hidden layers can be considered as the optimal 37 adaptive structural optimisation of neural networks the international journal on advances in ict for emerging regions 01 (01) october 2008 initialize uniform population with similar number of networks from each class train network and calculate performance define network with best performance as the network with optimal structure carryout pso algorithm on each network define pbest of each network and lbest for each class required performance achieved by any network? no yes number of hidden layers. in this exercise, the classification accuracy refers to the validation set only. the highest achievable classification accuracy using this approach was 97.33% (this meant that at least 4 data sets were misclassified during the classification process). an instance in the above table refers to one complete optimisation cycle which concludes by giving the maximum accuracy. all instances in the above table have obtained a classification accuracy of 97.33%. this might be due to that the iris data set is considered to be a simple classification example, as the three classes are (almost) linearly separable [14]. this experimental result can be compared with the results obtained by van den bergh and engelbrecht [14], who have used the iris data classification case study for their experiments which attempt to improve the performance of the basic pso by partitioning the input vector into several sub-vectors. they have heuristically chosen an architecture which has 1 hidden layer with 3 hidden nodes, and with this topology, they have achieved only 94% classification accuracy. the ‘global best’ approach has achieved an accuracy of 97.3% for an ann architecture with 2 hidden layers. in another research work by eldracher [15], only 93% maximum accuracy has been obtained for the iris data set, by using a heuristically chosen, very simple network architecture with no hidden layers and a sigmoid transfer function. he has further suggested that the classification performance could be increased, if a hidden layer is added to the existing network. from the results of the ‘global best’ approach, it can be clearly identified that 2 hidden layered ann can obtain a higher classification accuracy. by above results, it can also be identified that there is a range for the optimal number of hidden nodes within a network. an accuracy of 97.3% has been achieved in networks which have hidden nodes in the range of 7-17 (irrespective of the total number of hidden layers). according to the facts given by tan [2], “a multi-layer perceptron network that uses any of a wide variety of continuous nonlinear hidden-layer transfer functions requires just one hidden layer with ‘an arbitrarily large number of hidden neurons’ to achieve the ‘universal approximation’ property”. therefore the above mentioned range might be very helpful when deciding a value for this ‘arbitrarily large number of hidden neurons’. in order to check the validity of the above statement, a single hidden layered ann was constructed, and the classification accuracy and total execution time was recorded by varying the total number of hidden nodes in the single hidden layer. fig 4 presents the results obtained by setting the following parameter values. training epochs (in 1 iteration) = 200 maximum number of iterations allowed = 10 termination condition of an iteration:(if accuracy >= 97%) the results in fig 4, illustrates the fact that increasing the number of hidden nodes has a slight tendency to increase the accuracy, but only up to a certain limit of hidden nodes. from above fig 4, it can be observed that 16 hidden nodes in a single hidden figure 4: average accuracy of anns with varying number of hidden nodes layer, has given the maximum average accuracy of 95.47%. but when the number of hidden nodes were further increased, the classification accuracy level begins to decrease rapidly. when considering the classification accuracy level in table 1, even though all 10 instances have obtained an accuracy level of 97.33%, the ann consisting of two hidden layers with 3 and 4 hidden nodes in each layer respectively, has the lowest number of weights to be trained within this classification problem. this ann has a total weight density (connection density) of 36, i.e., from input layer to first hidden layer – 12 weights, first hidden layer to second hidden layer – 12 weights, and from second hidden layer to output layer – 12 weights. this is clearly depicted in fig 5. figure 5: connection (weight) density of the two hidden layered ann if only the total number of weights are considered as the deciding factor which contributes to the success of an ann’s performance, then it can be deduced that a single hidden layered ann which has a similar number of weights (connections) might obtain the same accuracy level. therefore, the single hidden layered ann with 5 hidden nodes (= 35 weights) should be able to obtain the same accuracy level as that of the ann shown in figure 5. but the accuracy levels shown in fig 4 clearly n.p. suraweera, d.n. ranasinghe 38 october 2008 the international journal on advances in ict for emerging regions 01 (01) proves that the above deduction is false, because the single hidden layered ann with 5 hidden nodes has never achieved a classification accuracy of 97.33%. therefore it is clear that apart from the weights, the number of hidden layers in an ann also has a direct impact on the performance of the ann. instead of an ann with 5 hidden nodes in one hidden layer, the ann with 16 hidden nodes in a single hidden layer has obtained a similar classification accuracy as that of the ann shown in fig 5. even though the single hidden layered ann with 16 hidden nodes has a higher weight density (112 weights), the large amount of weights that need to be trained does not significantly increase the time taken to obtain its output. by this observation, it can be deduced that instead of a two hidden layered ann with 3 and 4 hidden nodes respectively, one hidden layered ann with 16 hidden nodes (can be considered as the ‘arbitrary large number of hidden neurons’ as stated by tan [2]), can obtain a similar classification accuracy. ‘local best’ approach table 2: results of iris data set classification by local best approach inst ance class no. of hidden nodes in each hidden layer classification accuracy (%)1st 2nd 3rd 4th 5th 1 1 8 0 0 0 0 96.00 2 3 8 0 0 0 97.33 3 2 2 3 0 0 97.33 4 2 9 2 4 0 97.33 5 7 4 6 3 7 97.33 2 1 2 0 0 0 0 96.00 2 7 4 0 0 0 97.33 3 5 3 8 0 0 96.00 4 8 8 7 6 0 97.33 5 2 5 9 6 4 97.33 3 1 9 0 0 0 0 96.00 2 8 9 0 0 0 97,33 3 2 7 8 0 0 94.67 4 2 5 2 10 0 97.33 5 10 6 7 2 5 97.33 4 1 3 0 0 0 0 94.67 2 9 7 0 0 0 96.00 3 9 7 5 0 0 97.33 4 8 8 6 9 0 97.33 5 8 6 3 8 9 97.33 5 1 5 0 0 0 0 93.33 2 8 7 0 0 0 96.00 3 10 2 4 0 0 97.33 4 10 3 5 3 0 97.33 5 7 5 8 6 5 97.33 the above results obtained from the ‘local best’ do not maintain consistency with the results obtained in the ‘global best’ approach. this could be due to the fact that the population is not subject to a change in the number of hidden layers throughout the execution lifetime. even the result that 4 or 5 hidden layers also give an accuracy of 97.3% might be directly related to this fact (since the networks do not change their number of hidden layers but only change the number of nodes in a layer, it has the opportunity of trying out a large number of different combinations for the total number of hidden nodes, within a predetermined number of hidden layers). ionosphere data classification results ‘global best’ approach table 3: results of ionosphere data (full set) classification by global best approach inst ance optimal no. of hidden layers no. of hidden nodes in each hidden layer classification accuracy (%) 1st 2nd 3rd 4th 5th 1 4 6 5 9 1 0 96.87 2 4 6 4 8 9 0 96.01 3 3 5 6 7 0 0 96.58 4 2 9 6 0 0 0 97.72 5 4 6 7 8 9 0 96.58 6 4 8 1 7 7 0 96.01 7 2 9 1 0 0 0 96.58 8 2 10 9 0 0 0 96.01 9 4 5 5 8 8 0 95.16 10 3 3 8 3 0 0 96.58 in a previous research which has used pso to initialize ann weights [11], a maximum classification accuracy rate of 95.41% has been achieved for the whole data set (training data + test data) in the ionosphere classification problem, by an ann with 9 hidden units (hidden nodes) and weights which have been initialized using the pso concept (the number of hidden layers is not specifically mentioned). on the other hand, an ann having 7 hidden nodes and whose weights were randomly initialized, achieved a classification accuracy of only 94.43% [11]. but according to the experimental results shown in table 3, a maximum classification accuracy of 97.72% has been obtained by an ann whose structure was evolved using the ‘global best’ approach which implements the pso algorithm to evolve the ann structure. this clearly shows the effectiveness of the ‘global best approach’. table 4 gives the results obtained from the ‘global best’ approach for the test data set only of the ionosphere data classification problem. according to table 4, a maximum classification accuracy of 94.70% was obtained for the test data set only. according to the facts given in the reference work [13], the ionosphere test data set classification carried out by a multilayer feed-forward network using back 39 adaptive structural optimisation of neural networks the international journal on advances in ict for emerging regions 01 (01) october 2008 propagation, has obtained an average of 96% accuracy on the test instances. even though it has mentioned that back propagation was tested with several different numbers of hidden units (between 0 and 15), specifications on the total number of hidden layers has not been stated. table 4: results of ionosphere data (test set) classification by global best approach inst ance optimal no. of hidden layers no. of hidden nodes in each hidden layer classifi cation accuracy (%)1st 2nd 3rd 4th 5th 1 3 1 2 8 0 0 93.38 2 3 3 1 1 0 0 93.38 3 4 8 6 6 7 0 92.05 4 4 9 2 4 3 0 93.38 5 2 8 4 0 0 0 90.73 6 4 5 1 3 7 0 93.38 7 2 8 1 0 0 0 90.73 8 2 4 4 0 0 0 92.05 9 4 10 4 2 7 0 94.70 10 4 6 6 4 4 0 90.73 local best’ approach the classification data given in table 5, confirms the results given in table 3 by presenting the fact that anns with two hidden layers or four hidden layers have a tendency to give a maximum accuracy level (94.87% is the highest accuracy achieved in the above approach). as shown in the ‘global best’ approach, anns with a single hidden layer or with 5 hidden layers, have never succeeded in achieving a maximum accuracy level. conclusion the results obtained from the ‘global best’ and the ‘local best’ optimisation approaches suggest that the ‘global best’ approach for adaptive optimisation of anns is more successful in obtaining higher accuracy levels. when considering the application case studies, the ‘global best’ approach has achieved a maximum classification accuracy of 97.33% for the iris data classification, and 97.72% accuracy on the full data set of the ionosphere data classification while achieving a classification accuracy of 94.70% on the test data set of the same case study. when compared with previous research work which has been carried out on the same case studies, the above mentioned accuracy values prove to be better than nearly all of the past results. therefore it can be concluded that the ‘global best’ approach has the potential to obtain a structurally optimized neural network. in this research, evolution of only the number of hidden layers and hidden nodes has been considered with regard to the adaptive optimisation of an ann. but it is well known that these are not the only parameters that can be optimized in a given ann. therefore in the future, this research work can include the adaptive optimisation of other ann parameters like the learning rate, learning momentum and activation functions, in order to realize the goal of achieving a completely optimized ann. table 5: results of ionosphere data (full set) classification by local best approach inst ance class no. of hidden nodes in each hidden layer classifi cation accuracy (%)1 st 2nd 3rd 4th 5th 1 1 9 0 0 0 0 92.88 2 7 8 0 0 0 94.87 3 5 4 3 0 0 93.16 4 3 3 9 6 0 94.87 5 5 5 2 8 8 93.45 2 1 3 0 0 0 0 92.02 2 3 8 0 0 0 93.16 3 10 10 3 0 0 94.30 4 7 3 6 6 0 94.30 5 7 8 4 7 2 93.73 3 1 4 0 0 0 0 92.59 2 9 5 0 0 0 93.73 3 8 3 2 0 0 94.30 4 4 7 4 5 0 92.59 5 6 6 8 8 4 93.16 4 1 8 0 0 0 0 93.45 2 7 2 0 0 0 94.87 3 5 7 7 0 0 93.73 4 9 7 8 6 0 93.73 5 7 7 6 2 10 93.45 5 1 4 0 0 0 0 92.02 2 5 7 0 0 0 92.59 3 9 9 3 0 0 93.16 4 7 3 3 4 0 91.74 5 8 8 2 2 10 94.30 acknoledgement the authors wish to sincerely thank the anonymous reriewers and the editorial staff of the journal for their valuable comments and suggestions in improving the clarity and the presentation of the article. refernces yao x. (1999, september). evolving artificial neural networks, proceedings of the ieee, vol. 87, no.9. 2. tan c. n. w., (1999). an artificial neural networks primer with financial applications examples in financial distress predictions and foreign exchange hybrid trading system, http:// www.smartquant.com/references/nuralnetworks/ nural28.pdf 3. balakrishnan k. and honavar v. (1995). evolutionary design of neural architectures – a preliminary taxonomy and guide to literature, tech report no cs tr 95-01, artificial intelligence research group, iowa state university, 4. kennedy, j. and eberhart, r. c. particle swarm optimization. proc. (1995) ieee international conf. on neural networks vol. 4, 1. n.p. suraweera, d.n. ranasinghe 40 october 2008 the international journal on advances in ict for emerging regions 01 (01) 5. hu x, particle swarm optimization tutorial. http:// www.swarmintelligence.org/tutorials.php 6. van den bergh f., engelbrecht a.p. (2001). effects of swarm size on cooperative particle swarm optimisers , proceedings of ijcnn 2001, washington dc, usa 7. particle swarm optimization wikipedia, http://en.wikipedia.org/wiki/particle_swarm_ optimization 8. gudise v.g., venayagamoorthy g.k. (2003). comparison of particle swarm optimization and back propagation as training algorithms for neural networks, proceedings of the ieee swarm intelligence symposium zhang c., shao h., li y. (2000). particle swarm optimisation for evolving artificial neural network. ieee international conference on systems, man and cybernetics, vol 4 10. uci machine learning repository http://archive. ics.uci.edu/ml/ 11. van den bergh f., (1999, september ) particle swarm weight initialization in multi-layer perceptron artificial neural networks, development and practice of artificial intelligence techiniques, pp. 41-45, durban, south africa. 12. eberhart r.c., hu x.. (1999). human tremor analysis using particle swarm optimization, evolutionary computation. 13. johns hopkins university ionosphere data base. http://www.ailab.si/orange/doc/datasets/inosphere. htm 14. van den bergh f., engelbrecht a.p., cooperative learning in neural networks using particle swarm optimizers, south african computer journal, http://www.cs.up.ac.za/cs/fvdbergh/publications/ pso_splitswarm.ps.gz 15. eldracher m. (1992). classification of non-linearseparable real-world-problems using deltarule, perceptrons, and topologically distributed encoding, proceedings of the 1992 acm/sigapp symposium on applied computing. 16. liu b., wang l., jin y., huang d. (2005). designing neural networks using hybrid particle swarm optimization, lncs 3496, pp 391-397. 17. xian-lun t., yon-guo l., ling z., (2007). a hybrid particle swarm algorithm for the structure and parameter optimization of feedforward neural networks, lncs 4493, pp 213-218. 9. 41 adaptive structural optimisation of neural networks the international journal on advances in ict for emerging regions 01 (01) october 2008 18. niu b., li l., (2008). a hybrid particle swarm optimization for feed forward neural network training, lnai 5227, pp. 494-501. 19. seydi ghomsheh v., aliyari shooredhdeli m, teshnehlab m. (2007). tranining anfis structure with modified pso algorithm, mediterranean conference on control and automation. ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2022 15 (3): december 2022 international journal on advances in ict for emerging regions communication-affinity aware colocation and merging of containers nishadi n. wickramanayaka, chamath i. keppitiyagama, kenneth thilakarathna abstract— microservice architecture relies on message passing between services. inter-service communication introduces an overhead to the applications’ overall performance. this overhead depends on the runtime placement of the services bundled in containers and it can be reduced by intelligently deploying the containers by considering the communication affinities between services. researchers have attempted to colocate microservices and merge containers based on affinities. however, container merging has not been considered up to now. this study shows that the problem of service placement in a microservice application considering communication affinities, constrained by computational resources, can be mapped to an instance of the binary knapsack problem. we propose a container colocation and merging mechanism based on a heuristic solution to the binary knapsack problem. the proposed approach reduced the communication overhead of a benchmark application by 58.5% and as a result, execution time was reduced by approximately 13.4% as well. keywords— affinity, binary knapsack, colocation, container networks, communication overhead, docker containers, microservices, microservice architecture, inter-service communication i. introduction icroservice architecture is used to develop software applications as suites of independently deployable small services that interact with each other [1]. it is often used to decompose an existing system rather than to compose a system anew using services offered by different enterprises. microservices are typically deployed in containers with each service contained in a dedicated container [2] which are then hosted in multiple hosts. hence microservices of an application exchange a significant amount of data, creating communication affinities. affinity is defined as a relation between two microservices [6] which in this study given by the total amount of data exchanged between those two services over time. to place services in different containers, function calls in the monolithic application should be converted to network calls between the containers in the microservice architecture. those network calls add an extra layer of networking with expensive operations such as packet encapsulation, decapsulation, address translations [3], which ultimately increase the services’ request/response time [2], [4]. correspondence: nishadi n. wickramanayaka #1 (e-mail: nishadinuwa1995@gmail.com) received: 24-12-2021 revised:26-08-2022 accepted: 30-08-2022 nishadi n. wickramanayaka, chamath i. keppitiyagama and kenneth thilakarathna are from university of colombo school of computing, sri lanka. (e-mail: nishadinuwa1995@gmail.com, chamath@ucsc.cmb.ac.lk, kmt@ucsc.cmb.ac.lk). doi: http://doi.org/10.4038/icter.v15i3.7251 © 2022 international journal on advances in ict for emerging regions therefore, the resulting communication overhead adversely impacts the overall performance of the microservice application (μapp) despite the benefits of the architecture. hence what this paper focuses as the main research problem is that the degraded performance of μapps due to the communications between the services of the application. container networks used to connect containers with each other play a major role in communication complexities of microservice architecture. overlay networks are used for host-to-host communication when containers/services are deployed in different hosts whereas bridge networks are used when containers/services are deployed within the same host. processes inside the same container communicate over the loopback interface. overhead imposed by an overlay network is higher than the overhead imposed by a bridge network [3]. loopback interface imposes the least overhead since it eliminates the intervention of bridge network as well. therefore, placement of two services with high communication affinities in different physical nodes makes this situation worse [6]. thus, it is evident that the container placement decisions of the μapps need to be taken carefully in the deployment time. as long as microservice architecture is used to design an application it is impossible to completely eliminate the communication overhead incurred at runtime. because even if all the services are located inside a single machine there will still be communication across address spaces. a possible way of addressing the problem is by reducing the communication overhead to a certain extent by making the deployment decisions carefully. further, if mapping of the exact same design decisions to runtime is not strictly necessary, then alterations may be applicable to further reduce the communication overhead. the motivation behind this study is to increase the performance of a μapp, by reducing the overhead of inter-service communication by carefully analysing the runtime behaviour of a μapp and containers. therefore, our main objectives were to explore the impact of container networks on μapps, to discover the possibilities of reducing the communication overhead of a μapp without changing the design of the application and finally to measure up to what extent the performance can be increased from the proposed solution. in order to achieve these objectives, we present a novel mechanism of container colocation and container merging. container colocation is defined as moving the services with high communication affinities into a single host. colocation is constrained by the resources available on the hosts. we could map this problem to an instance of the binary knapsack problem (bkp). container merging is the process of executing services that are already colocated, in a single container. through the colocation process the overlay network is reduced to a bridge network. merging process reduces the bridge network to communications over the m this is an open-access article distributed under the terms of the creative commons attribution license, which permits nrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited mailto:nishadinuwa1995@gmail.com mailto:nishadinuwa1995@gmail.com mailto:chamath@ucsc.cmb.ac.lk http://doi.org/10.4038/icter.v15i3.7251 communication-affinity aware colocation and merging of containers 34 december 2022 international journal on advances in ict for emerging regions container’s loopback interface. as a result, communication overhead is reduced, and the performance of the application is improved. also, this might reduce the number of hosts needed to execute the μapp. the rest of this paper is organized as follows. section ii presents a review of the background related to the study. section iii presents the design and implementation of the solution. results and the evaluation of the proposed approach are summarized in section iv. finally, section v concludes and outlines some future directions. ii. background and related work a. performance degradation of μapps and container networks with the inclination of the industry towards cloud-based infrastructure, microservice architecture has received massive attention from the academic community. hence ample amount of studies which compare the μapps with monolithic applications show the performance penalty resulted when using μapps [2], [4], [8]. the performance of microservices in container-based and virtual machine (vm) based environments has also been studied by salah et al. [9]. amaral et al. [10] evaluated the performance impact of two models of implementing microservices in a container environment: master-slave and nested-containers. they mention that nested-container model is hardly adopted in real-world applications since there are some trade-offs in terms of network performance. suo et al. [3] have done a thorough investigation on latency, throughput, scalability, and startup cost of various container networks on a single host and on multiple hosts in a virtualized environment. out of four networking modes on a single host (none mode, bridge mode, container mode, host mode), bridge mode network incurred 18% and 30% throughput loss in upload and download respectively in facilitating each container to own an isolated network namespace, resulting all inter-container communications to go through the docker0 bridge. out of four networking modes available on multiple hosts environments (host mode, nat, overlay network, routing), both nat and routing incurred considerable performance degradation due to the overhead of address translation and packet routing. however, the overlay network caused a high-performance loss of 82.8% throughput drop and a 55% latency increase compared to the host mode. they explain the reason for the performance degradation of μapps in terms of bridge network and overlay network which are used to connect the containers. this study has done a comparison between several container networks in a single host and multiple hosts separately. they have not compared the single host container networks with multi-host container networks. further, they have conducted the experiment in vms where an additional network overhead is introduced to containers through the virtualization. yang et al. [11] have attempted to bridge the above gap by deploying the containers on both vm and bare-metal environments. the results confirm the overhead of vm environments with a throughput loss compared to bare-metal deployment. in all tests, the multi-hosts control group showed a significant throughput loss compared to the single host control group. further, kratzke [12] has analysed the performance impact of the overlay network in terms of encryption to http-based and rest-like services. even though these analyses show the impact of container networks on imposing communication overhead, they have only considered inter-container communications and none of these studies have considered intra-container communications. hence, it is identified that intra-container communications should also take into consideration in order to further explore the ways of reducing the communication overhead. b. container placement problem based on the aforementioned studies, placement of the containers has been identified as one of the major reasons in creating these communication affinities. hence, it is pertinent to explore the state of the art of container placement process in practice. however, identifying the best placement of containers is not an easy task. existing container management tools implement several common placement strategies. kubernetes [13] places a minimum number of microservices per host in the cluster [6]. this is called the spread strategy. however, it can add latency to communication and lower the μapp’s performance. also, this does not take resource optimization into consideration during the deployment. some management tools use the bin-pack strategy: deploying a μapp in a minimum number of hosts so that it avoids the cluster resource wastage. besides these commonly used two strategies, the random strategy is also used where the management tool selects a host to deploy a microservice randomly. all these three strategies are available in docker swarm [14]. irrespective of the strategy, management tools only consider the instantaneous resource usage of the service when they place them on hosts and rarely try to find an optimal setting. however, they do not consider the communication affinities between services resulting in placing microservices with high communication affinities in different hosts. eventually, the large amount of network traffic that takes place between two services over the network can hinder the overall performance of the application. sampaio et al. [6] propose remap, a runtime microservices placement mechanism. they consider microservices’ resource usage and as well as their affinities when placing the microservices in hosts. this problem is modelled as an instance of the multi-dimensional binpacking problem. the objective of remap is to maximize the affinity score while deploying the microservices in a minimum number of hosts during the runtime. in solving the problem, they have used the first fit as a heuristic in their approach since a runtime placement needs quick solutions. remap instruments the microservices to gather information required to take colocation decisions during the runtime. though, we noticed the heavy cost of this instrumentation on the microservices. hence, the benefits derived through colocation are negatively affected due to the instrumentation cost. further, remap cannot handle data synchronization across different hosts after migrating a stateful microservice. hence, the migration of stateful microservices may lead the μapp in an inconsistent state. in addition, runtime migration cost may not be justifiable compared to the benefits derived due to colocation. remap does not use the hints available in the configuration files about the resource usage or the relationships between microservices indicated in them. we 35 n. n. wickramanayaka#1, c. i. keppitiyagama*2, k. thilakarathna# international journal on advances in ict for emerging regions december 2022 further noticed that remap does not consider container merging at all. han et al. [15] propose a refinement framework for profiling-based microservices placement to identify and respond to workload characteristics. the resource requirements obtained through profiling has been fed into a greedy-based heuristic algorithm to make microservices placement. however, the main focus of this work is not a placement algorithm but a profiling-based framework for microservices deployment. hence any placement algorithm can be adapted to their framework. both aforementioned solutions depend on the data collected at run time. however, we have identified that some essential parameters that are required to take the placement decision are already available in configuration files even before the runtime of the μapp. c. vm placement problem once the containers are mapped to vms and communications between containers to the communications between vms, virtual machine placement in physical machines (pms) can be considered as the closest research area to the container placement problem. tziritas et al. [7] propose a communication-aware graph-coloring algorithm, placing the vms in the underlying system in an energyefficient manner while optimizing the network overhead due to the vm communication inter-dependencies. however, that vm selection process cannot be directly mapped into the container selection process as their study pre-defines the number of servers to place vms and container placement may not necessarily give the number of host machines to place the containers as the problem is to optimize the placement of services, thus the algorithm itself should be able to identify the minimum number of hosts to locate the services. however, it is possible to map their vm communication graph to a container/service communication graph. chen et al. [5] propose a different approach for the vm placement problem which is an affinity-aware grouping method for allocation of vms into pms based on a heuristic bin-packing algorithm. it groups the vms based on the affinities among them and then allocates those identified vm groups into a minimum number of pms using the bin packing strategy. one major limitation of this research is the generation of one large vm affinity group with total resource requests overstepping the pm resource limit. sonnek et al. [16] present a decentralized affinity-aware migration technique that incorporates heterogeneity and vm communication cost to allocate vms on the available physical resources. their technique monitors network affinity between pairs of vms periodically and triggers the migration if inter-server traffic exceeds intra-server traffic and uses a distributed bartering algorithm, to dynamically adjust vm placement such that communication overhead is minimized. since the migration also has a cost, they refrain from migrating vms if it results in only minor benefits. iii. methodology the proposed solution comprises of two phases: • container colocation: moving containers that are initially deployed in different hosts into a single host. • container merging: placing the services that are initially deployed in different containers inside a single container. containers deployed in multiple hosts are connected through overlay networks and containers in the same host communicate through a bridge network. as mentioned before, overlay networks impose a higher overhead than bridge networks [3], [11], [12]. hence during the colocation phase services with high communication affinities are identified to deploy on a single host. this is not a trivial task since the colocation is constrained by the processing resources available on the host. in this study, we propose a novel approach to solve the colocation problem by mapping it to an instance of the bkp. current approaches to reduce the communication cost use only the container colocation process [6] which replaces the overlay network with bridge network. it is evident that the elimination of this bridge network should further reduce the overhead in inter-service communication. this study introduces a novel concept of merging the colocated containers to further reduce the communication overhead by eliminating the bridge network as well. once two containers are merged, services deployed on them would execute on a single container as two processes. these two processes communicate over the container’s loopback interface. the merging process converts the inter-container communications into intra-container communications. spread strategy deployment of docker swarm is considered as the baseline to this study. this deployment strategy distributes services evenly among the hosts, resulting in a minimum number of services per host. thus, we consider the deployment of service instance per host as the baseline since there are not any optimizations present in that strategy. change of the number of containers and hosts throughout the whole process of colocation and merging can be shown as in table i. table i change of the number of containers and hosts through the process initial deployment after colocation after merging n services n services n services n containers n containers m containers (m tan ρ > tan α (2) december 2009 the international journal on advances in ict for emerging regions 02 e. a. olajubu, b. s. afolabi and a. o. ajayi 41 41 this implies that there is an increase in geographical coverage as one move from oral advertisement to internet advertisement due to the number of audiences reached. more so, it is assumed that internet advertisement covers every one that has access to the internet anywhere and anytime [21]. the gap between the internet advertisements and the oral advertisements or better radio-vision advertisements in terms of audiences reached in fig. 1 signifies the gap between the smes in developing nations and the multi-nationals in these nations in terms of market sharing. the internet serves as a levelling ground for both multi-nationals and smes in developing countries, if the smes can pull resources together so as to get access to the internet for their marketing. therefore, it would be an advantage for smes to avail themselves of the offer of internet. the major problem for these smes is the initial capital outlay required for internet infrastructure and implementation. this is the reason for this discussion. fig. 1. market coverage model. e-shops and developing countriesiii. an average e-shop that is making wares in the developed world may not function with the same aplomb in developing countries. this is due to some peculiarities found in these developing countries. it is a well known fact that a technology that succeeds in one country may not in another. an example is minitel that worked so well and was so popular that it became an instance success in france which did not quite succeed in some other countries. in most developing countries especially in africa, there are a number of intimidating problems from availability of infrastructure to bandwidth problems, and even logistics problems. as one would have noticed, small and medium scale enterprises vary in sizes and capacities from country to country. in most developing nations, majority of these enterprises are made up of just one employee (the owner who gets help once in a while from a member of the family). this is to say that if such a person gets to be able to put his wares on the net (on an e-shop), it will come as a big relief to him as attending to individual clients who come to shop would no longer be a problem. such a person could gain enough time to get into other projects. some of the problems that confront the setting up of an e-shop in developing countries we encountered in the course of this study include but not limited to: lack of national or initial infrastructurea. our country of study is nigeria, and the first basic thing we encountered was the absence of basic infrastructure like telephone lines (adsl lines, vsat, fibre cables etc.) that could allow a group of smes to set up an e-shop. this factor has contributed greatly to the backwardness of e-business. at first, this venture did not make an interesting adventure to take as there is no guarantee that the e-buyers may ever get to see and appreciate the goods on offer. to this effect, the prospects of developing the site became uninspiring. eventually, an advert by a gsm operator that it gets up to a million hits per day changed the whole story. it is now a question of making the site as popular as possible in order to get people to start logging on to it. initial cost of putting up such sitesb. as earlier highlighted, the notion of smes in developing countries cannot be compared to that of developed nations. the turnover of most smes in developing countries is barely able to pay for hired hands. in fact, most of the smes are also seen as family businesses where the business is overseen by a member of the family (and this usually is for free, that is, none paid). so in essence, any extra cost that may come into the business is avoided as much as possible. the initial cost of buying the equipment and other necessary things to get the market started might become prohibitive for the economy of such an sme. in this work, we formulated an e-shopping platform where a number of smes can jointly own a website and different smes will own their webpage on the same server mainly for advertisement of goods and services. the server will be jointly owned and will be jointly administered by the group of smes. for security purpose, we advised that smes should allow web administrator employed by each to own passwords to the system. logistics and distributionc. distribution of mails or regular posts is one of the most challenging problems in developing countries. there are so many reasons for these: indiscriminate buildings that make addressing december 2009 the international journal on advances in ict for emerging regions 02 42 e. a. olajubu, b. s. afolabi and a. o. ajayi e. a. olajubu, b. s. afolabi and a. o. ajayi difficult, irregularities in the functioning of local postal services that make delivery take longer than necessary, transportation problems (e.g. bad roads, inadequate transports, etc.). payment methodsd. until recently, most developing nations did not use any other payment facility than cash. the number of residents having access to credit or debit cards was limited and as such the possibility of internet payments was therefore limited to these privileged few. therefore, in order that other people may have access to these types of shops other means of payment must be explored. lack of truste. in nigeria in particular, there is a major problem of trust among the citizens. this is a major problem as a minimum level of trust must be displayed between participants of the marketplace. a buyer must have a certain amount of trust that what the seller is selling is what he/she claimed to be selling and the seller must believe that the buyer is giving the right information about himself/herself. these are few of the challenges facing the smes in this region, now we proceed to discuss some of the fundamentals of the web model. basic web process modeliv. according to [4], the most basic process model used in web site development should be familiar to most people, as it is deductive. the basic model starts with the big picture and narrows down to the specific steps necessary to complete the site. in software engineering, this model is often called the software lifecycle model, because it describes the phases in the lifetime of software. for a trade to occur between buyer and seller, a certain process must occur. fig. 2 depicts the architecture adopted in this study which illustrates the basic processes involved in e-shopping between a buyer and seller. fig. 2. system architecture. a customised electronic shopping system consists of five major services: an e-buyer agent, an e-shop for made-to-specification goods, suppliers that stock or manufacture goods to be sold through the e-shop, shipping agency that ships the ordered goods to the e-buyer and a payment validation service. from the architectural model, the buyer logs on to the marketplace, through the universal resource locator (url) of the website, the buyer then searches for the product, when the product has been found, the buyer selects product using the shopping cart, the e-buyer sends details of its requirements for a particular product to the e-shop. the e-shop provides a list of suppliers that can possibly meet those requirements and for which it works as a price negotiating agent. the e-buyer forwards the product specification to the chosen supplier(s), who confirm(s) if it is possible to supply the product. the suppliers forward the costing of the order to the e-shop. the e-buyer negotiates the price with the e-shop and places the order. the supplier contacts a shipping agency about delivering the order. the shipping agency replies with the pickup date. finally the shipping agency informs the e-shop and the e-buyer about the delivery date. the e-buyer sends a confirmation on receiving the placed order. since the goods are made-to-specification, cancellation of an order is allowed only within 24 hours of placing the order. steps in the electronic shopping systema. the following illustrates the processes involved in electronic shopping systems. the e-buyer logs on to the electronic marketplace through several means such as locating the site through an internet search engine like google.com, find.com, and yahoo.com or by using the universal resource locator (url) of the e-shopping site e.g. www.buynow.com. when the e-buyer is logged on to the e-shopping website, he/she searches for products by using available category hyperlinks. some e-shopping sites also have a search engine facility that enables e-buyers to search for specific products. when the e-buyer is satisfied with a particular product, he/she selects the product into an electronic shopping cart, which is similar to the conventional cart or trolley used in departmental stores. the e-buyer specifies that the products that have been selected into the shopping cart be sent to the e-shop for processing. the e-buyer forwards the product specification to the chosen supplier(s), who confirm(s) if it is possible to supply the product. the e-shop confirms that the order has been received from the e-buyer and confirms if the products ordered are available from the manufacturer. the supplier forwards the costing of the order to the e-shop. the e-buyer negotiates the price with the december 2009 the international journal on advances in ict for emerging regions 02 e. a. olajubu, b. s. afolabi and a. o. ajayi 43 43 e-shop and places the order. the e-buyer makes his orders by using electronic procurement facilities like credit cards and debit cards. the procurement information supplied by the e-buyer is sent to a transaction handlers bank (automated clearing house) which validates the information supplied by the e-buyer. the transaction handlers bank checks credit card or debit card information for authentication. if the credit information is authentic, then the transaction handlers bank sends a confirmation to the e-shop that the credit has been approved. the e-shop then commences the processing of the order, by sending the e-buyer’s transaction details to the supplier. the supplier contacts a shipping agency about delivering the order. the shipping agency replies with the pickup date. steps taken to correct peculiar problems b. of smes in developing nations some of the problems with developing countries highlighted in section 3 are so peculiar with developing countries that they need to be solved locally or provide a means of contouring the problems. these problems are either solved or avoided. some of the problems earlier highlighted are taken care of by using the following: infrastructural problems are taken care of by 1. hosting the sites outside of the country (i.e. in developed countries). the uploading of the sites could be done from a cyber café or on the isp server based on negotiations. hosting of these sites outside of the country 2. may become expensive due to exchange rates and other reasons. in order to take care of this problem, a number of smes can come together to host a site so that the cost is spread and borne by all. another sme could be created to cater for the hosting and the general well being of the site. distribution and logistic problems can be taken 3. care of by the creation of collection points. this avoids the delays experienced through the use of postal services that are comatose. collection points are created in each of the major centres/ towns. the goods for each customer are delivered using the nearest collection point to him/her as there can be many collection points in each town depending on the number of customers, and the buying frequency of customers in that town. 4. until recently no debit or credit cards existed in nigeria or most of the developing countries. so the method of payments became an issue. with the arrival of debit cards, this problem was reduced but even with that, there are still a quite number of people who do not have access to debit cards. for this category of people, we suggest the use of telephone recharge cards. since nigeria is now pervaded with phone cards, the buyer can buy a telephone recharge card of equivalent value as the price of the goods he/she is buying. these credits (recharge cards) can be sent to the seller and (buyer’s) goods can then be sent to the collection point nearest to the buyer. 5. the reputation of the individuals hosting the site can serve as a means of avoiding such a problem. requirementsv. due to some specific problems encountered in developing countries, for instance, in terms of bandwidth availability, the system should be such that it can load as fast as possible with a reduction in the number of images used. we have also reduced the size of the images used. this was done in such a way as not to impair too much the quality of the images but to reduce the resolution of images which in turn will reduce size and considerably, the download time. accesses to the internet have greatly increased with the proliferation of cyber cafés that are now available in numbers in major cities. this also comes with its attendant problems, most especially cyber crimes. in order to aid the crime fighters as much as possible, the system tracks the ip addresses of the machines that are connected to it from which ever location. also for now, credit or debit cards will have ip address attached. this is also a cyber crime combating measure. although this is known to be limiting and inefficient, alternative measures are being explored to reduce frauds. end user requirementsa. the end user would be required to use the application from any web browser supporting html 3.2 (or later standard) and cookies, this is due to some of the security features built into it. it is also necessary to uniquely identify each user; therefore each user must register to be able to use the system. each user registration is therefore a property of the user and he/she takes responsibility of whatever action that takes place. security of sensitive information (like scratch card number etc.) is of paramount importance, so the site is built on ssl. the issues involved in this are left out of this write-up. users are able to search the entire database based on category, on each item, or a keyword. they can change items, quantities, etc., they can also view the status of their order. december 2009 the international journal on advances in ict for emerging regions 02 44 e. a. olajubu, b. s. afolabi and a. o. ajayi e. a. olajubu, b. s. afolabi and a. o. ajayi end user interaction with the shopping b. cart application this is a sequence of actions to be carried out by typical users, who visit the web site for shopping. new users coming to the site for the first time will register themselves while existing users will authenticate themselves by providing valid user identity and password. after a successful authentication procedure, the users, can browse product titles, add items to the shopping cart, view the list of items in their cart, change the quantities of items or even delete items selected earlier. when the user has made up his/her mind to buy items in the cart, he/she then checks out. checking out is the confirmation of user’s willingness to buy items in the cart. after the user has checked out, the items in the user’s cart are entered in the database, and the items are later shipped to the address of the user given at the time of registration. after checking out, the user can continue shopping or can decide to logout. users can also browse music and book titles available in the shopping cart site, without purchasing any items. sometimes users would be interested in viewing the current status of their accounts, and the status of items, (e.g. whether they have been shipped or not), purchased by them earlier. some of the screen shots of the software are provided below. fig. 3. screen shot for goods and services. fig. 3 is the screen shot for the type of services offered at the website, fig. 4 is the screen shot specifically for gsm phone vendors. also, fig. 5 presents a retailer whose goods cut across range of products and lastly fig. 6 is the screen shot for customers who intend to transact business in the website. fig. 3 presents the security information, and also the type of goods and services provided by the website. the password is expected to be owned by the web administrator, who manages the web site. fig. 4. gsm phone vendor screen shot. fig. 4 shows the webpage for mobile phone vendors. there can be multiple of pages depending on the number of gsm phone vendors involved in the e-shop. a webpage is meant to be owned by a vendor irrespective of the type of phones. fig. 5. vendor with variety of goods. fig. 5 displays some vendors having wares cut across different items. in our environment many retailers have wares from different manufacturers. the main idea is that each vendor has one webpage except those who can afford multiple web pages on the server. december 2009 the international journal on advances in ict for emerging regions 02 e. a. olajubu, b. s. afolabi and a. o. ajayi 45 45 fig. 6. registration point for potential customer. fig. 6 is meant for customers who intend to buy from the website. ordering goods from the website requires that the customers register formally with the website. conclusionvi. this work was carried out with the view of creating an enabling environment for the successful implementation of an e-shopping system for smes in developing countries. the following areas of e-shopping system have been successfully implemented: new users can be registered; existing users can login and check their account status, users can buy products online from our shopping cart application, users can view a complete list of products available on the shopping cart, users can search for products online, users can search the entire database for keywords, they can choose and add products to their cart, and decide later whether they would like to buy the selected products, users can change the quantities of the items or delete items from their cart, before checking out, users can view the status of products they have ordered, large number of users can use the application simultaneously. majority of the specific problems facing developing nations in terms of use of e-shops have been addressed. references m. bah, s. cissé, b. diyamett, g. diallo, f. lerise, d. [1] okali,e. okpara e, j. olawoye and c. tacoli, changing rural–urban linkages in mali, nigeria and tanzania environment&urbanization, vol 15 no 1 april 2003 pp. 13-24. a. beyene, [2] enchaning the competitiveness and productivity of small and medium scale enterprises (smes) in africa: an analysis of differential roles of national government through improved support services african development, vol. xxvii no 3 pp130-156. [3] r. kalakota and a. b. whinston, electronic commerce: a manager’s guide (reading, ma: addison-wesley, 1997). [4] t. a. powell, web design: the complete reference second edition, mcgraw-hill companies, inc, usa, 2002. [5] g. p. schneider, e-commerce, massachusetts: course technology-thomson learning, inc., canada, 2002. [6] solanki, a wide spectrum language for designing web services, software technology research laboratory, leicester, uk, 2001. [7] f. n. udechukwu, survey of small and medium scale industries and their potentials in nigeria, a seminar on small and medium industries equity investments scheme (smieis) a publication of cbn training centre, lagos, central bank of nigeria, 2003. [8] e. turban, d. king, j. lee, and d. viehland, electronic commerce: a managerial perspective, new jersey: pearson prentice hall, 2004. [9] m. ruzzier, r. b. hisrich and b. antoncic, sme internalization research: past, present, and future, journal of small business and enterprise development 13(4):476-497, 2006. [10] d. j. pare, does this site delivers? b2b e-commerce services to developing countries, information society 19:123-134, 2003. [11] w. poh-kam, global and national factors affecting diffusion of e-commerce in singapore, information society 19:19-32, 2003. [12] a. molla and r. heeks, exploring e-commerce businesses in developing country, information society 23:95-108, 2007. [13] w. jen-her and h. tzyh-lih, developing e-business dynamic capabilities: an analysis of e-commerce innovation from i-, m-, to u-commerce. journal of organizational computing and electronic commerce, 18: 95-111, 2008. [14] m. kapurubandara and r. lawson, availability of e-commerce support for smes in developing countries, the international journal of advances in ict for emerging regions, 01(01): 3-11, 2008. [15] http://www.digitsmith.com/e-commerce-definition.html (october 23, 2009) [16] e-business guide for smes available at http://ec.europa. eu/enterprise/e-bsn/ebusiness-solutions guide/docs/ ebusiness_guide_for_smes.pdf. (october 24, 2009) [17] a. scupola, smes e-commerce adoption: perspective from denmark and australia, journal of enterprise information management, 12(1):152-166, 2009. [18] c. l. iacovou, i. benbasat and a. s. dexter, electronic data interchange and small organizations: adoption and impact of technology, mis quarterly, 19(4):465-485, 1995. [19] k. kuan, and p. chau, a perception-based model of edi adoption in small businesses using technologyorganization-environment framework, information and management, vol.38, pp. 507-521, 2001. [20] a. scupola, the adoption of internet commerce by smes in the south of italy: an environmental, technological and organizational perspective, journal of global information technology management, 6(1): 51-71, 2003. [21] t. daugherty and b. b. reece, the adoption of persuasive internet communication in advertising and public relations curricula. journal of interactive advertising, 3(1), 2002. [22] l. arendt, barriers to ict adoption in smes: how to bridge the digital divide? journal of system and information technology, 10(2):98-108, 2008. [23] j. liebenau, g. harindranath and g. b. özcan, smes productivity and management: a research agenda for ict and business clusters, 2004. december 2009 the international journal on advances in ict for emerging regions 02 46 e. a. olajubu, b. s. afolabi and a. o. ajayi e. a. olajubu, b. s. afolabi and a. o. ajayi [24] s. chirasirimongkol and w. chutimaskul, information technology for thai leather sme development, fourth international conference on e-business, bangkok, thailand, 26.1-7, 19-20 november 2005. [25] j. yang, small and medium enterprises (sme) adjustments to information technology (it) in trade facilitation: the south korean experience, 2009. [26] hauser, a qualitative definition of sme, http://www.oecd. org/dataoecd/32/14/35501496.pdf. (september 5, 2009) [27] k. hak-su, small and medium enterprises and ict, http:// www.unapcict.org (september 6, 2009) ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2022 15 (3): december 2022 international journal on advances in ict for emerging regions improving drug combination repositioning using positive unlabelled learning and ensemble learning yashodha ruchini maralanda, pathima nusrath hameed abstract— drug repositioning is a cost-effective and timeeffective concept that enables the use of existing drugs/drug combinations for therapeutic effects. the number of drug combinations used for therapeutic effects is smaller than all possible drug combinations in the present drug databases. these databases consist of a smaller set of labelled positives and a majority of unlabelled drug combinations. therefore, there is a need for determining both reliable positive and reliable negative samples to develop binary classification models. since, we only have labelled positives, the unlabelled data has to be separated into positives and negatives by a reliable technique. this study proposes and demonstrates the significance of using positive unlabelled learning, for determining reliable positive and negative drug combinations for drug repositioning. in the proposed approach, the dataset with known positives and unlabelled samples was clustered by a deep learning based self organizing map. then, an ensemble learning methodology was followed by employing three classification models. the proposed pul model was compared with the frequently used approach that randomly selects negative drug pairs from unlabelled samples. a significant improvement of 19.15%, 20.56% and 20.23% in the precision, recall and f-measure, respectively, was observed for the proposed pul-based ensemble learning approach. moreover, 128 drug repositioning candidates were predicted by the proposed methodology. further, we found literature-based evidence to support five drug combinations that may be able to be repositioned. these discoveries show our proposed pul approach as a promising strategy that is applicable in drug combination prediction for repositioning. keywords— drug repositioning, positive unlabelled learning (pul), deep learning, self-organizing maps (som), support vector machine (svm) i. introduction ntroducing a new drug to the market is time consuming and costly. nearly it takes seven to fifteen years to introduce a new drug to the market and approximately around $700-$1000 million cost for the whole process since it requires to undergo a massive experimental procedure before going to the hand of patients. [1]. therefore, most of the pharmaceutical companies and medical research institutes are trying to find alternatives, which can be used to prevent and cure human diseases. as one of the most efficient and trust worthy approaches, repurposing or the reuse of existing drugs as treatments to some other diseases that still do not have proper treatments is an emerging topic correspondence: yashodha ruchini maralanda (e-mail: yashodar95@gmail.com) received: 24-08-2021 revised:04-01-2023 accepted: 11-01-2023 yashodha ruchini maralanda, pathima nusrath hameedis from department of computer science, university of ruhuna, sri lanka (yashodar95@gmail.com, nusrath@dcs.ruh.ac.lk ). doi: http://doi.org/10.4038/icter.v15i3.7232 © 2022 international journal on advances in ict for emerging regions. from the last decade. this concept is known as drug repositioning or drug repurposing. moreover, drug combinational treatments are identified to be much efficient in avoiding drug resistance at treating complex diseases like cancer [2]. since there exist approximately 16,000 [3] of approved drugs in the market, an extremely large number of drug combinations can be formed. however, only a very small number out of them are confirmed with experimental researches. therefore, there is a need of an accurate and more predictive approach to infer useful drug combinations out of those millions of possible drug combinations, which remains yet unlabelled. existing drug combination repositioning approaches have followed binary classifications [4]–[6] as well as several other approaches such as tree based techniques [8] for repositioning of drug combination data. in the existing binary classification approaches, the unlabelled samples were considered as negatives [4]–[6]. therefore, the results of existing studies might be unreliable, inaccurate and may cause to the loss of valuable and repositionable drug combinations. in this study, a positive unlabelled learning (pul) based approach was proposed to address this problem. it uses a deep learning based unsupervised clustering approach followed by binary classification enabling us to select reliable negatives for binary classification. unsupervised clustering method was based on a deep learning model using identified drug-drug similarities. the clusters with least significant known drug combinations were considered as the clusters with negatives. our model has been compared using the frequently used binary classification approach that randomly selects samples as negatives from unlabelled data. thereby, model predictions were evaluated and the significance of the pul approach has been highlighted. to the best of our knowledge, this is the first attempt focusing on learning from positive unlabelled data for drug combination repositioning using drug-based features. in section ii, an overview of existing literature under the domain, and their limitations are emphasized stating the need and the importance of our work. in methodology and materials section, our dataset and our research workflow are explained in detail. then, in the results section, we have illustrated our results that are relevant to the pul-based ensemble learning methodology and the final predictions. next, under discussions section, we have emphasized the significance of the proposed pul approach, future work capabilities and literature based evidences about some of the predicted results. finally, section vi provides the concluding remarks of this study. ii. related work drug repositioning via in-silico methods has become popular and there exist many successful efforts in this i this is an open-access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited mailto:yashodar95@gmail.com mailto:yashodar95@gmail.com mailto:nusrath@dcs.ruh.ac.lk http://doi.org/10.4038/icter.v15i3.7232 http://doi.org/10.4038/icter.v15i3.7232 https://creativecommons.org/licenses/by/4.0/legalcode https://creativecommons.org/licenses/by/4.0/legalcode 73 yashodha ruchini maralanda, pathima nusrath hameed international journal on advances in ict for emerging regions december 2022 domain. majority of them are single drug repositioning approaches while a considerable number have focused on repositioning of drug combinations as novel therapeutics to diseases. moreover, machine learning techniques such as support vector machine (svm), naïve bayes, logistic regression, and random forest as well as deep learning techniques including deep neural networks, convolutional neural networks and deep feed forward networks were employed. use of unlabelled data under binary classification, involves different methods. one common paradigm is the random selection of samples from unlabelled data as labelled negatives. li et al. [4]'s study was for repositioning of drug combinations and it was a binary classification problem. their dataset was composed of a majority of unlabelled data and a comparably smaller set of labelled positive samples. however, they have taken all the unlabelled samples as negative rather than selecting the plausible positives and negatives in unlabelled data and have further checked for overfitting by varying the positive and negative sample ratio from 1:12. finally, they have chosen 1:1 as the most appropriate ratio since it has produced the best result. even though their study was involved with an ensemble learning methodology, the reliability of the predictions might not be satisfiable because of the random selection of the negative sample. similarly, chen et al. [5] has used this approach of random negative selection from a set of drug combinations, which do not have a proper labelling. they have carried out a binary classification of the selected labelled positives and randomly sampled negatives via random forest based on chemical interactions between drugs, protein interactions between drug targets and the target enrichment of kegg pathways. furthermore, map reduce programming model was used together with svm and naïve bayes classifiers to identify novel drug combinations by sun et al. [6]. their negative dataset was composed of randomly paired drugs, which were belonging to the 103 single drugs that have been selected from dcdb [7]. treecombo [8] is another work, which has used a tree based approach to predict drug combinations with the use of physical and chemical properties of drugs together with gene expression levels of cell lines. use of clinical side-effects to predict drug combinations has been tested by huang et al. [9]. they have applied logistic regression and predicted drug combinations based on their clinical side effects. here, they have categorized drug combinations as safe and unsafe by using three key side effects that were identified as more contributing towards model performance. nllss [10] was another approach that has integrated known synergistic drug combinations, unlabelled combinations, drug-target interactions and drug chemical structures to predict synergistic drug combinations. moreover, they have followed a different method by involving loewe score [11] for drug combinations and they have classified data into principal drugs and adjuvant drugs based on a set of rules. kalantarmotamedi et al. [12] applied a random forest approach with transcriptional drug repositioning in order to identify synergistic drug combinations against malaria. li et al. [13] has implemented pea; an algorithm to model drug combinations using a bayesian network which was also integrated with a similarity algorithm. shi et al. [14] have used matrix factorization to predict potential drug-drug interactions (ddis) between two drugs as well as between a set of drugs by using side effect information of known drugs. moreover, they have introduced the ability of predicting the interaction between new drugs with another new drug that has no yet approved interactions. apart from machine learning, recently, deep learning has grabbed more interest in the domain of drug combination repositioning. several studies have been carried out in order to predict novel drug candidate pairs. matchmaker [15] is a supervised learning framework implemented based on a deep neural network to predict drug synergy scores referred to as loewe score. chemical structures and untreated cell line gene expression profiles of drugs were utilized with three separate sub-networks where two of them are parallel executions for separate drugs in a pair and the third subnetwork is for the whole drug pair. deepsynergy [16] for predicting anti-cancer drug synergy is based on a feed forward neural network which takes three inputs including chemical descriptors from two drugs and the genomic information of the cell line. the output from the network was the synergy score for the given input drugs. these synergy scores then decided whether the drug combination is positive or negative. lee et al. [17] have used deep feed-forward networks to predict ddi effects based on a set of drug features; structural similarity profiles, gene ontology term similarity profiles and target gene similarity profiles. in order to perform feature reductions, they have used autoencoders, which was proven to have improved performances rates than principle component analysis (pca). reduced profile pairs were then concatenated and fed to the network. rmsprop and adam were used as optimizers with the autoencoder and deep feed forward network respectively. autoencoders were trained twice in order to predict ddi types more accurately. li et al. [18] have presented a novel convolutional neural network based model which is capable of predicting indications for new drugs by identifying the relevant lead compounds using the drug molecular structure information and disease symptom information. under this, they have constructed similarity matrices out of the above two vectors of information and they were mapped into one grey scale image. finally, this was used as the input to a convolutional neural network model and that was executed using matlab software. here, they have used stochastic gradient descent as an optimizer. peng et al. [19] has performed a prediction of drug-drug interactions using a deep learning model. they have taken true positive and true negative drug combinations from the dataset under first approach and true positive and sampled negative drug combinations in their second approach. lee et al. [20] has involved drug pairs but it is not an approach for prediction of drug combinations as repositioning candidates. they have used deep feed-forward networks to predict drugdrug interaction effects based on a set of drug features. zhang et al. [21] have implemented an ensemble model for ddi predictions. they have followed a semi-supervised learning approach because they wanted to identify unobserved ddis, which might be available among other possible drug pairs. this is similar to identifying possible positive samples out of an unlabelled sample. improving drug combination repositioning using positive unlabelled learning and ensemble learning 74 december 2022 international journal on advances in ict for emerging regions pul has become an emerging topic since most of the natural data exist as positive and unlabelled data samples rather than having already defined positive and negative samples. there are several researches carried out under learning from pu data. sellamanickam et al. [22] have proposed a ranking based svm model (rsvm) where the positive samples obtain higher scores than the unlabelled samples. a threshold parameter was estimated to form their final classifier. liu et al [23] have followed a similar approach. they have introduced a novel computational framework for drug-drug interaction prediction with dyadic pul. they have identified the lack of a reliable method for separation of their unlabelled data into positives and negatives. therefore, they have introduced a scoring function and assigned a certain score to each data pair. according to the assigned scores they have separated data into positive and negative by making the top scoring data pairs as positives while keeping the lower scoring data pairs as negatives. the top scoring data pairs were defined as the samples, which obtain a higher score than the average score of the unlabelled data pairs. further, zhao et al. [24] have proposed a method for protein complex mining by employing svm with the use of pul. they have introduced an efficient sub graph searching method that can search complex sub graphs. first, they have tried to express the traditional training dataset with positive and negative samples as a non-traditional training set with positive and unlabelled samples. then they have tried to identify the relationship between the two classifiers that were trained with those two types of training samples. even though, there are studies that have used pul, to the best of our knowledge, no study was identified that has specifically focused on learning from positive unlabelled data for drug combination repositioning domain according to drug based features. since, drug combination repositioning is one of the interesting and hot topics, and similarly pul is an emerging field, we can identify the need of a pul study with related to drug combination repositioning. the primary objective of our study is to introduce a new reliable computational method for pul-based drug combination repositioning. iii. materials and methodology a. dataset in order to demonstrate the effectiveness of the proposed approach, 183,315 drug combinations from 606 drugs that was collected from li et al [4]’s study were used. drug target similarity, drug indication similarity, drug structure fig. 1 workflow of the random and pul approaches 75 yashodha ruchini maralanda, pathima nusrath hameed international journal on advances in ict for emerging regions december 2022 similarity, drug expression similarity and drug module similarity of the above drug combinations were also collected from li et al [4]'s study. they consist of jaccard coefficient to represent the above similarities between drug pairs. using them, we constructed a drug combination similarity matrix with the corresponding five feature similarity scores and the file was (183,315, 5) dimensions large. li et al.[4]’s study was composed of 1,196 labelled positive drug combinations for the 606 drugs that we are interested. after separation of labelled positives, there were 182,119 drug combinations in the unlabelled dataset (supplementary files s1 and s2). b. proposed methodology the concept of learning from positive and unlabelled data is a setting where we have only that majority of unlabelled data and a set of already labelled positive data. even though it is yet unlabelled, this set of unlabelled data may also contain both positive and negative samples. with this pul technique, we are trying to identify them separately. the concept of pul has drawn the attention of researchers due to its ability of providing reliable solutions. with the surge of this technique, it has diminished the need of having fully supervised data for computational model driven research work. with this pul concept, it has enabled the involvement of unlabelled data for computational model driven learning processes. many applications and research work have utilized this concept. unlabelled drug combinations may compose of plausible negative samples as well as repositionable drug combinations. therefore, there is a need of a proper mechanism to identify the most probable set of negative samples to develop a reliable classification model. we have introduced a novel pul approach for drug combination repositioning. our proposed method enables learning from positives and unlabelled drug combinations in order to identify plausible negatives as well as plausible positives within majority of unlabelled data. we proposed pul using a deep learning and ensemble learning methodology to predict reliable drug combinations for repositioning. here, we have used two approaches, which can be used to determine negative drug combinations from the unlabelled dataset. firstly, the frequently used random selection of negatives from unlabelled data and secondly, the proposed pul using deep learning and ensemble learning. fig. 1 and fig. 2 illustrates the complete workflow based on the two approaches. we demonstrate a comparison of the performance of both approaches employing receiver operating curve, precision-recall curve, accuracy, precision, recall and f-measure. hence, the significance of the pul approach for drug combination based drug repositioning is emphasized. furthermore, we have identified a set of plausible positive drug combinations that can be repositioned for new/rare diseases. repositioning of these predicted drug combinations need further research with laboratory experiments and other background analysis with expertise knowledge. therefore, it needs to be carried on as a separate experiment which becomes the second phase of our research. c. random approach in this approach, a randomly selected sample of unlabelled drug combinations, which is equal in size to that of the labelled positive sample was employed. our labelled fig. 2 workflow of the ensemble methodology for inferring drug reositioning candidates improving drug combination repositioning using positive unlabelled learning and ensemble learning 76 december 2022 international journal on advances in ict for emerging regions positive sample was composed of 1,196 drug combinations. hence, we have taken a random sample of 1,196 unlabelled drug combinations as negatives. as this was a binary classification, class labels were assigned as 1 and 0, where 1 for positive and 0 for negative classes respectively. classification was carried out using the three classifiers; svm, stochastic gradient descent-based classifier (sgdclassifier) and the deep neural network (dnn) classifier. according to nguyen et al. [25], we have identified that a train-test split of 70:30 is much effective with random sampling. therefore, we decided to use the same split for both approaches. out of the positive and negative datasets, 30% was used for model testing while the remaining 70% was taken for training the model. implementation was carried out using python with scikit-learn library [26] for svm and sgd-classifier and the keras library for the deep neural network. the accuracy, precision, recall and f1scores were then recorded. d. positive unlabelled learning (pul) approach labelled positive sample was the same as in random approach, but selection of the negative sample was carried out by learning from positive and unlabelled drug combination data. a self-organizing map (som) was used to cluster the sample with positive and unlabelled data and then the clusters were analysed to identify plausible negative samples from unlabelled data. for each cluster, probability of having labelled positive samples was calculated according to the positive probability. (positive probabilities for each cluster are provided in supplementary file s3). we defined the positive probability to be the ratio between known drug combinations in cluster i and total number of combinations in cluster i where i is the cluster id. since there are 1,196 known positives, we need 1,196 reliable negatives to train the binary classifier. therefore, we sorted each cluster based on its calculated positive probability value. the unlabelled drug pairs in the clusters with the lowest positive probability are considered as reliable negatives. therefore, we aggregated the clusters with lower positive probability until we observe a sample size greater than or equal to 1,196. accordingly, three clusters with the least positive probabilities were combined to get the set of least significant drug combinations. thereby we observed 3,115 negatives by aggregating the clusters where the positive probability is less than or equal to 0.000962. since we required balanced positive and negative samples, we randomly selected 1,196 negatives from the above-identified 3,115 negatives. after selection of a negative sample via pul, labelled positive and the negative sample were classified using the svm, sgd-classifier and the dnn model. since, we needed to compare the performance of random and the pul approach, we kept the model parameters fixed to the ones that were used in random approach. similarly, 30% of data fig. 3 venn diagram to denote the distribution of the data fig. 1. table 1 performance assessment of the proposed positive unlabelled learning approach and random approach svm sgd-classifier dnn classifier random pul random pul random pul accuracy 0.6421 0.7925 0.7103 0.8774 0.7326 0.9721 precision 0.6799 0.8413 0.8036 0.9564 0.7203 0.9806 recall 0.5628 0.7454 0.5893 0.7917 0.7328 0.9646 f1 score 0.6158 0.7904 0.6800 0.8663 0.7265 0.9725 77 yashodha ruchini maralanda, pathima nusrath hameed international journal on advances in ict for emerging regions december 2022 taken as the testing set while remaining 70% was taken for model training. then, accuracy, precision, recall and the f1score given by the model were recorded. e. ensemble learning methodology figure 2 illustrates the ensemble learning approach used in this study. in order to predict drug repositioning candidates from unlabelled drug combinations, averaging ensemble learning technique was used. first, class probabilities for the unlabelled combinations were predicted using the three individual models separately. then the separate probabilities obtained for each drug combination to be belonged to class 0 (negative class) or class 1 (positive class) were averaged and predicted a novel probability for each drug combination. the new class probabilities were the ensemble learning based class predictions. we then predicted the best candidate drug combinations. f. clustering and classification models 1) self-organizing maps (som): som [27] is an artificial neural network, which is widely used under unsupervised learning problems. the major difference of som with compared to other neural network models is the use of competitive learning. som has the capabilities of dimensionality reduction and it has the ability to identify similarities in data. it is evident that deep learning models have higher performance, with compared to machine learning approaches [28]. so, we have decided to cluster our unlabelled dataset using a minimalistic and numpy based implementation of som known as minisom (https://github.com/justglowing/minisom/), which is a python library and much more adaptive with the environment where it is being used. a two-dimensional som of size 9x9 was chosen as the optimal size with a learning rate of 0.09 which is trained for fig. 4 plot of quantization error and topogrphic error with a fixed learning rate of 0.5 and fixed map size of 7x7 fig. 5 plot of quantization error and topogrphic error with a fixed map size of 9x9 and fixed number of iterations of 8000 https://github.com/justglowing/minisom/ improving drug combination repositioning using positive unlabelled learning and ensemble learning 78 december 2022 international journal on advances in ict for emerging regions 8000 iterations. selection of the optimal size; learning rate and the number of iterations were performed after calculating the quantization and topographic errors by varying their values appropriately [29], [30]. as the first step of optimal parameter identification, a set of initial parameters were needed to be determined. hence, the learning rate of 0.5, was chosen as the initial learning rate for our model. since a large dataset is used in this study, a considerably larger map size is required. therefore, the map size of som was decided gradually increasing the dimensionality from 7x7. hence, the initial parameters for learning rate and map size were defined as 0.5 and 7x7 respectively. model training was carried out multiple times with varying number of iterations, fixed learning rate and map size in order to record the topographic error and quantization error based on each experiment. recorded error values for number of iterations that has been used in each experiment were plotted (see fig. 4). according to the elbow technique, the experiment with 8000 iterations was chosen as the optimal value. since the optimal number of iterations was identified, our next experiment was followed to identify the optimal map size. we fixed the learning rate to 0.5 and number of iterations to 8000 and performed training of the model multiple times by gradually increasing the map size at each experiment. at a map size of 9x9, we could observe a clear deduction in topographic and quantization error, which then again shows an increase in error values (see fig. 5). therefore, we determined 9x9 as the optimal map size. after that, we used the above identified map size and the number of iterations to determine the optimal learning rate. we set the number of iterations and map size to 8000 and 9x9, respectively. the training process was performed multiple times for different learning rates. finally, an experiment of the error values corresponding to a learning rate of 0.09 was determined as the optimal learning rate in our problem. 2) support vector machine (svm): svm [31] is an algorithm, which always finds a hyperplane in an ndimensional space where the number of dimensions is equal to the number of features used in the dataset. this can be applied for both binary classification as well as multi-class classification problems. since this is an algorithm that has been widely used because of its higher prediction capabilities, we have decided to use it as a binary classifier in our work. the employed svm model was followed by a sigmoid kernel, since sigmoid kernel is the most appropriate for binary classification problems. fig. 6 plot of quantization error and topogrphic error with a fixed learning rate of 0.5 and fixed number of iterations of 8000 fig. 7 receiver operating curve and precision recall curve demonstrating the performance of deep neural network classifier for random and pul approache 79 yashodha ruchini maralanda, pathima nusrath hameed international journal on advances in ict for emerging regions december 2022 3) stochastic gradient descent based classifier (sgd-classifier): this is a linear classifier that is emphasized in scikit-learn [26], that has been optimized using stochastic gradient descent (sgd). it supports loss functions and penalties that are used in classification purposes. further, this is capable of minimizing/maximizing the loss function defined by the model. here, we have used the log los function, and with that, our model acts similar to logistic regression (lr). however, importance of using sgd-classifier with log loss apart from direct lr model is that, even if lr is not capable of directly calculating the minimum value of its loss function, with the use of sgdclassifier we can easily perform it. therefore, the performance is comparably better and so that we have used sgd-classifier for classification purposes in our work. even though both log loss and modified_huber loss for the loss parameter in sgd-classifier enables to predict class probabilities for data, log loss has given the best performance in our case. therefore, we employed a sgd-classifier model followed by a log loss function. 4) deep neural network (dnn) classifier: the dnn model that was implemented using keras library (http://github.com/keras-team/keras) was composed of a fully connected network with three layers. since, relu activation function shows better performance when referring to a majority of current researches, it was used in the first two layers and sigmoid activation function was used in the output layer since this is a binary classification problem. the dimensions of the layers were selected as 5, 12, 5 and 1 for the input layer, two hidden layers and the output layer respectively such that it gives a better model for the classification of our dataset. we have set the loss parameter as binary_crossentrphy as it is specifically designed for binary classification problems in keras. further, we have employed the adam optimizer as it is well suited for the instances where there are large datasets. since our prediction dataset is large, we have involved adam optimizer to improve the accuracy of predictions. g. evaluation metrics we have divided our dataset into training and testing sets in order to validate the implemented model performances. 70% of the dataset was used for training and 30% was used for testing. common validation measures including accuracy, precision, recall and f1-scores from the random and pul approaches were calculated using below equations where, tp – true positive, fp – false positive, tn – true negative and fn – false negative. accuracy = (tp + tn) / (tp + tn + fp + fn) (1) precision = tp / (tp + fp) (2) recall = tp / (tp + fn) (3) f1 score = 2 * precision * recall / (precision + recall) (4) furthermore, receiver operating curve (roc) is an important measure at binary classification problems, which fig. 8 receiver operating curve and precision recall curve demonstrating the performance of support vector machine classifier for random and pul approaches fig. 9 receiver operating curve and precision recall curve demonstrating the performance of stochastic gradient discent-based classifier for random and pul approaches http://github.com/keras-team/keras improving drug combination repositioning using positive unlabelled learning and ensemble learning 80 december 2022 international journal on advances in ict for emerging regions plots false positive rate versus true positive rate. precisionrecall (pr) curve provides more information by plotting the precision and recall for different thresholds. therefore, we have observed the roc and pr curves for our two approaches. iv. results in comparison to the random approach for negative sample selection, our proposed pul approach demonstrates a significant improvement in the performance. (see table 1). the accuracy, precision, recall, and f1-score for the pul approach based on the three classifiers svm, sgd-classifier and the dnn classifier shows higher accuracies than the values recorded with random approach. for instance, f1score has improved by 17.46%, 18.63% and 24.60% for svm, sgd-classifier and the dnn classifier respectively when the pul approach is used. when comparing the performance of three classifiers based on accuracy, precision, recall and f1-score, dnn classifier shows relatively higher performance for both random as well as the pul approach. (see table 1) sgdclassifier shows the second-best performance while svm has relatively lower performance with compared to the other two classifiers. a comparison of the roc and pr curves for random and the pul approaches based on the three models also emphasize the higher skill of the model that was trained under the pul approach (see fig. 7, fig. 8, and fig. 9). the roc and pr curves are drawn in blue and orange colours for pul, random approaches respectively. the xaxis represents false positive rate. if this rate is closer to zero, our model predicts only a few false positives. similarly, the y-axis shows true positive rate. if this rate is closer to one, the model predicts a majority of the true positives. therefore, an roc curve that has bowed much towards the (0, 1) coordinate of the plot is considered to have higher skill compared to others. the blue coloured roc plot based on each classifier has bowed towards the (0, 1) coordinate of the plot more than the orange coloured plot of random approach. hence, the roc curves emphasize the higher skill of the models that are trained using pul approach. the x-axis of pr curve represents recall. if recall gives a value that is closer to one, our model predicts only a few false negatives. similarly, the y-axis shows precision. if precision is closer to one, the model predicts only a few false positives. therefore, a pr curve that has bowed much towards the (1, 1) coordinate of the plot is considered to have higher skill compared to others. table 2 performance assessment of ensemble learning the blue coloured plots of pul approach have bowed towards (1, 1) coordinate more than the orange coloured plots of the random approach. this further emphasizes the higher skill of the models that are trained using proposed pul approach. a further comparison between the three roc curves emphasize that dnn classifier gives the highest skilled model out of the three classifiers. the reason is that the roc curve of dnn classifier is bowed the most towards the (0, 1) coordinate of the plot. the pr curve of dnn classifier is bowed the most towards (1, 1) coordinate showing the least number of false negatives and false positives. this further proves the higher skill of the dnn classifier. we built the classifiers using svm, sgd-classifier and dnn and then we combined their individual predictions to obtain the final prediction. this may reduce the variance of the final outputs. table 2 summarizes the performance assessment of the ensemble learning approach where the performance measures of the three classifiers are averaged. the evaluation metrics derived by the ensemble learning method has shown an improvement of 20.23% in the f1 – score for the pul approach over the random approach. hence, the proposed pul approach outperforms the frequently used random approach and it enables predicting reliable repositioning candidates. it should be noted that since we have identified 1,916 known positives [3] and 3,115 negatives by clustering, there are 179,004 remaining unlabelled drug combinations for predictions (see fig. 3). we employed the proposed pulbased three classification models as base predictors of the ensemble learning methodology to classify the unlabelled samples. averaging ensemble learning technique was employed. thereby we could infer 128 drug combinations with the highest posterior probabilities greater than 0.99. we infer this set of 128 drug combinations as potential candidates for drug repositioning. (see supplementary file s4) furthermore, we have employed the proposed pul approach using the three classification models to classify the 1,919 remaining negatives identified by clustering (not used to train the classification models; see fig. 3). we assessed the predicted probabilities greater than 0.5 for class 0 (negative class) for those 1919 drug pairs. we observed 91.40%, 95.73%, and 98.59% accuracy of being predicted as a negative drug combination using svm, sgd-classifier, and dnn classifier, respectively. similarly, we have observed that accuracy is 98.44% when the ensemble averaging technique is applied. moreover, it is relatively higher than that of the svm and sgd-classifiers. these observations confirm the accuracy of the used negatives, and on the other hand, it depicts the high accuracy of the prediction models based on the proposed pul approach. further, it clearly depicts the significance of the ensemble learning methodology. v. discussions most of the real world data exist as positive and unlabelled samples. it is the same in pharmaceutical domain. several drug combination repositioning studies have used binary classification based approaches to build novel drug repositioning models. since there exist only labelled positives and no labelled negatives, researchers use different approaches to define their own negative samples. however, random pul accuracy 0.6950 0.8807 precision 0.7346 0.9261 recall 0.6283 0.8339 f1 score 0.6741 0.8764 81 yashodha ruchini maralanda, pathima nusrath hameed international journal on advances in ict for emerging regions december 2022 directly taking unlabelled samples as negative data might not provide accurate results since unlabelled data may contain unidentified positive samples within it. this will cause the model to provide wrong predictions. the problem of not having an exact method for identifying the most probable set of negative samples from drug combination related unlabelled data is yet not experimented. so, in this study, that gap is being addressed. we have used balanced samples of positives and negatives for both random and pul approaches to train the three classification models because a balanced sample ratio reduces the bias of the model predictions [4]. since we observed a significant improvement when the pul approach is used, it is employed to infer plausible drug combinations. we have predicted the probability of each drug combination to have a positive or a negative class label by using the averaging ensemble learning technique and thereby the label of the highest probability was assigned to the drug combination. carrying out further experiments is essential to validate the effectiveness of the predicted 128 drug combinations (see supplementary file s4) so that some drug combinations out of the above prediction can be experimentally proved as repositionable drug combinations. one limitation involved with our approach is that, it only involves one clustering technique to cluster the drug combinations. another limitation with this study is that we haven’t kept any bench mark dataset to verify the model performances so that, we would have verified our results and findings. furthermore, as a future directive, we will involve side effects associated with the drugs, so that we can filter out the drug combinations, which are free of harmful side effects and it will further improve the reliability and accuracy of the predictions. however, in the current experiment, we did not take side effects associated with the drugs into consideration. a. literature-based evidence for predicted drug combinations out of the 128 predicted candidates, we found literaturebased evidence to support that five drug combinations as already experimentally proven as co-administered drugs. the non-steroidal anti-inflammatory drug, tenoxicam was experimentally identified by moser et al. [32] as a treatment for chronic painful inflammatory conditions that occur with degenerative and extra-articular rheumatic diseases of musculo-skeletal system. this was identified to be as effective as piroxicam. similarly, the ratio of the compounds, nortriptyline to amitriptyline in the plasma of patients who were treated with amitriptyline is identified to be useful in treating patients with depression [33]. terazosin and doxazosin is a drug combination that was predicted by our ensemble methodology and they have shown experimental efficacy in treatment to symptomatic benign prostatic hyperplasia in normotensive men [34]. ofloxacin and norfloxacin is a drug combination that is belonging to fluoroquinolones family and able to be used as antibacterial agents. murillo et al. [35] has tested the resolution of this drug combination as a binary mixture. diltiazem and betaxolol is another drug combination that has been predicted as effective in our study. koh et al. [36] has experimentally proven that diltiazem and betaxolol both are effective in controlling ventricular rate in chronic atrial fibrillation when combined with digoxin. vi. conclusion drug combination repositioning is an emerging research focus that gained attention of pharmaceutical and computational researchers. moreover, computational-based approaches have showed a significant contribution for the development and improvement of drug repositioning. since the number of known drug combinations is significantly low with compared to the number of possible drug combinations, we proposed a positive unlabelled learning based ensemble learning approach to infer reliable plausible drug combinations as repositioning candidates. the ensemble learning approach enables aggregating the classification results of svm, sgd-classifier and dnn classification model to minimize the variance of the final predictions. further, we have shown the applicability of proposed pul approach in predicting drug repositioning candidates. the literature-based evidence shows the clinical significance of the proposed approach. references [1] wouters o. j., mckee m., and luyten j. (2020). estimated research and development investment needed to bring a new medicine to market, jama journal of the american medical association, 323(9), 844–853 [2] devita v. t. & schein, p. s. (1973). the use of drugs in combination for the treatment of cancer: rationale and results. the new england journal of medicine, 288(19), 998–1006. [3] wishart d. et al. (2018). drugbank 5.0: a major update to the drugbank database for 2018. nucleic acids research, 46(d1), d1074–d1082. [4] li j., tong x. y., zhu l. d., zhang h. y. (2020). a machine learning method for drug combination prediction. frontiers in genetics, 11, 1000. [5] chen l., li b. q., zheng m. y., zhang j., feng k. y., cai y. d. (2013). prediction of effective drug combinations by chemical interaction, protein interaction and target enrichment of kegg pathways. biomed research international, 2013, 723780. [6] sun y., xiong y., xu q., wei d. (2014). a hadoop-based method to predict potential effective drug combination. biomed research international, 2014, 196858. [7] liu y., hu b., fu c., chen x. (2010). dcdb: drug combination database, bioinformatics (oxford, england), 26(4), 587–588. [8] janizek j., celik s., lee s. (2018). explainable machine learning prediction of synergistic drug combinations for precision cancer medicine. biorxiv. [9] huang h., zhang p., qu a., sanseau, p., yang, l. (2014). systematic prediction of drug combinations based on clinical side-effects. scientific reports, 4. [10] chen x., ren b., chen m., wang q., zhang l., yan, g. (2016). nllss: predicting synergistic drug combinations based on semisupervised learning. plos computational biology, 12(7), 1-23. [11] loewe s. (1953). the problem of synergism and antagonism of combined drugs. arzneimittel-forschung, 3(6), 285–290. [12] kalantarmotamedi y., eastman r.t., guha r., bender a. (2018). a systematic and prospectively validated approach for identifying synergistic drug combinations against malaria. malaria journal, 17(1), 1-15. [13] li p et al. (2015). large-scale exploration and analysis of drug combinations. bioinformatics (oxford, england), 31(12), 2007–2016. [14] shi j. y. et al. (2018). tmfuf: a triple matrix factorization-based unified framework for predicting comprehensive drug-drug interactions of new drugs. bmc bioinformatics, 19 (14) [15] kuru h. i., tastan o., cicek e. (2021). matchmaker: a deep learning framework for drug synergy prediction. ieee/acm transactions on computational biology and bioinformatics. [16] preuer k., lewis r., hochreiter s., bender a., bulusu k. c., klambauer g. (2018). deepsynergy: predicting anti-cancer drug synergy with deep learning. bioinformatics (oxford, england), 34(9), 1538–1546. [17] lee g., park c., and ahn j. (2019). novel deep learning model for more accurate prediction of drug-drug interaction effects. bmc bioinformatics, 20 (1), 1–8. improving drug combination repositioning using positive unlabelled learning and ensemble learning 82 december 2022 international journal on advances in ict for emerging regions [18] li z. et al. (2020). identification of drug-disease associations using information of molecular structures and clinical symptoms via deep convolutional neural network. frontiers in chemistry, 7. [19] peng b. and ning x. (2019). deep learning for high-order drug-drug interaction prediction. acm-bcb 2019 proceedings of the 10th acm international conference on bioinformatics, computational biology and health informatics, 197–206. [20] lee g., park c., and ahn j. (2019). novel deep learning model for more accurate prediction of drug-drug interaction effects. bmc bioinformatics, 20, (1), 1–8. [21] zhang w., chen y., liu f., luo f., tian g., and li x. (2017). predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data, bmc bioinformatics, 18 (1), 1–12. [22] sellamanickam s., garg p., and selvaraj s. k. (2011). a pairwise ranking based approach to learning with positive and unlabeled examples. international conference on information and knowledge management, proceedings, 663–672. [23] liu y. et al. (2017). computational drug discovery with dyadic positive-unlabeled learning. proceedings of the 17th siam international conference on data mining, sdm, 45–53. [24] zhao j., liang x., wang y., xu z., and liu y. (2016). protein complexes prediction via positive and unlabeled learning of the ppi networks, 13th international conference on service systems and service management, icsssm. [25] nguyen, q.h., ly, h., ho, l.s., al‐ansari, n., le, h.v., tran, v.q., prakash, i., & pham, b.t. (2021). influence of data splitting on performance of machine learning models in prediction of shear strength of soil. mathematical problems in engineering, 2021, 1-15. [26] pedregosa f. et al. (2011). scikit-learn: machine learning in python. the journal of machine learning research. 12: 2825–2830. [27] kohonen t. (1990). the self-organizing map. proceedings of the ieee. 78(9), 1464-1480. [28] aliper a., plis s., artemov a., ulloa a., mamoshina p., zhavoronkov a. (2016). deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. molecular pharmaceutics, 13(7), 2524–2530. [29] kiviluoto k. (1996). topology preservation in self-organizing maps. proceedings of ieee international conference on neural networks (icnn'96). 1, 294-299. [30] pölzlbauer g. (2004). survey and comparison of quality measures for self-organizing maps. proceedings of the fifth workshop on data analysis (wda'04), 67—82. [31] cortes c., vapnik v. (1995). support-vector networks. machine learning. 20, 273–297. [32] moser, u., waldburger, h., schwarz, h. a., & gobelet, c. a. (1989). a double-blind randomised multicentre study with tenoxicam, piroxicam and diclofenac sodium retard in the treatment of ambulant patients with osteoarthritis and extra-articular rheumatism. scandinavian journal of rheumatology, 18(s80), 71–80. [33] jungkunz, g., kuß, h., & nortriptylin-arnitriptylin-, z. (1980). on the relationship of nortriptyline : amitriptyline ratio to clinical improvement of amitriptyline treated depressive patients. pharmakopsychiatrie, neuro-psychopharmakologie. 13, 111–116. [34] kaplan, s. a., soldo, k. a., & olsson, c. a. (1995). terazosin and doxazosin in normotensive men with symptomatic prostatism: a pilot study to determine the effect of dosing regimen on efficacy and safety. european urology. 28(3), 223–228. [35] murillo j. a., alañón m. a., muñoz de la p. a., durán m.i., & jiménez g. a. (2007). resolution of ofloxacin-ciprofloxacin and ofloxacin-norfloxacin binary mixtures by flow-injection chemiluminescence in combination with partial least squares multivariate calibration. journal of fluorescence. 17(5), 481–491. [36] koh, k. k., song, j. h., kwon, k. s., park, h. b., baik, s. h., park, y. s., in, h. h., moon, t. h., park, g. s., cho, s. k., & kim, s. s. (1995). comparative study of efficacy and safety of low-dose diltiazem or betaxolol in combination with digoxin to control ventricular rate in chronic atrial fibrillation: randomized crossover study. international journal of cardiology. 52(2), 167–174. ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2022 15 (1): june 2022 international journal on advances in ict for emerging region user acceptance of a novelty idea bank system to reinforce ict innovations: sri lankan universityindustry perspective chaminda wijesinghe#1, henrik hansson2, love ekenberg3 abstract— information and communications technology (ict) represents an enormous opportunity to introduce significant and lasting positive changes across the developing world. several attributes determine behavioural intention to adopt the technology. some elements will stimulate end users' intention to use the technology, while others are detrimental. this study is designed to measure the relationship between various factors on individuals' perceptions of adopting an ict-based instrument to stimulate ict innovations. a model was developed by combining the technology acceptance model and the diffusion of innovation theory. a survey questionnaire was distributed among students and teachers in higher education and industry professionals working with university collaborations. a sample of 202 responses was analysed using structural equation modelling. a good insight into user acceptance and the adoption of a systematic model to reinforce ict innovations are provided in this study with the derived results. theoretical and practical implications for the factors influencing the acceptance of the system are discussed. keywords— collaboration; diffusion of innovation; ictinnovation; structural equation model, technology acceptance; knowledge management. i. introduction his study is designed to measure individuals' perceptions of adopting an instrument to stimulate ict innovations through university-industry collaborations. this instrument is intended to be a tool between the university and industry to disseminate knowledge and ideas for innovation. knowledge and innovation are considered crucial sources for sustaining the competitive advantage of organisations [1]. knowledge generates value by supporting an organisation's capability to produce innovation [2][3][4] learn and transfer best practices across boundaries [5][6]. successful companies consistently create new knowledge, disseminate it widely throughout the organisation, and quickly embody it in new technologies and products [7]. knowledge management implementations require a wide range of quite diverse tools that come into play throughout the knowledge management cycle. gressgård et al. [8] claim that using ict-based tools to promote external knowledge flow into the organisation may help realise their innovation potential. many ict-based tools have gained popularity as instruments for disseminating knowledge for innovation [9]. organisational knowledge and ict refer to a distinct set of constructs, and several variables need to be considered when assessing the relationship between ict and knowledge management. the technology primarily facilitates communication, collaboration, and content management for better knowledge capture, sharing, diffusion, and application [10]. diffusion is "the process by which an innovation is communicated through certain channels over time among the members of a social system" [11]. it is a particular type of communication in which messages are concerned with new ideas. the nature of diffusion demands at least some sort of heterophily between the two parties. ideally, it can be in education, social status, and the like. the ict tool is the communication channel in the present study, enabling the exchange of messages between two groups or units [11]. universities and industries can benefit from adapting proper channels and tools for their collaboration [12]. students engaging in research can be widely accepted by the industry and impact society. a need has been created for a systematic collaborative platform where industry professionals and academics can engage and share expert knowledge. in contrast, academic supervisors can empower and guide students in theoretical aspects and future directions. innovation consists of an interactive process where diverse expertise are combined through communication amongst and across organisational borders [13]. further, firms can absorb ideas from suppliers, users, and knowledge institutions, as this innovation process demands interaction with many disparate actors. adequate support mechanisms can further accelerate such interactions. it is evident that there are substantial barriers regarding uic and the motivation to interact systematically, and a significant obstacle is the lack of interconnections of this nature. a. the idea bank system the global idea bank (gib) [14] is a platform designed to support these interconnection activities. more precisely, gib is a web platform where individuals can submit, exchange, discuss, and fine-tune fresh ideas to produce new innovations. organisations may use the idea bank to collect users' feedback and enhance their ideation process. when paired with a unique innovation strategy and methodology, the idea bank creates a holistic innovation solution that allows organisations to collectively generate and elaborate ideas. to determine the worth of an idea, the idea bank uses a voting mechanism. the idea bank's underlying premise is that if a large group of individuals collaborate on a project or develop an idea, that project or idea would ultimately improve the performance of those who worked on it [15]. figure 1 illustrates the gib concept. t chaminda wijesinghe is with stockholm university and nsbm green university (bandara@dsv.su.se). henrik hansson, love ekenberg are with the stockholm university (henrik.hansson@dsv.su.se, lovek@dsv.su.se). manuscript received: 26-10-2021 revised: 29-03-2022 accepted: 17-062022 published: 30-06-2022. doi: 10.4038/icter.v15i1.7236 © 2022 international journal on advances in ict for emerging regions user acceptance of a novelty idea bank system to reinforce ict innovations 2 june 2022 international journal on advances in ict for emerging regions fig. 1 global idea bank concept. gib mainly addresses collaboration barriers between universities, governments, companies, and societies to acquire and implement valuable ideas exerted through problems. the authors propose the idea bank as an it platform that is much about ideals as it is about ideas. this fact fuels innovation since the essence of innovation is about changing the world according to a particular vision or ideal [7]. because of the originality of the notion toward proliferating ict innovations, the idea bank system itself may be regarded as an ict innovation. the early phase of innovation is frequently referred to as "fuzzy" [15] because it works best when a collaborative system fosters confusion, disruption, and the fortuitous discovery of ideas. an essential feature of gib is to be a backbone to develop an organisation's innovation culture supporting the fuzzy front end of innovation. every individual must create an account and log into the system. individuals may use the platform to submit items of interest to other users, search for information, comment on information, and find specialists in the organisation when needed, resulting in everyone being well informed. the platform functions as a knowledge management system in this way. it allows users to submit ideas and keeps track of these ideas by enabling others to remark on them, moulding them further, grouping them with similar ideas, and accepting some type of vote mechanism to determine their merit. when ideas in the system receive the most votes, comments, votes on comments, views, "follows," "alerts," bookmarks, and related ideas uploaded, they are automatically promoted. all ideas can be clustered depending on how they match the united nations' sustainable development goals. if an idea does not match any of these seventeen goals, a new group can be created, or it can be added to the "other" category. students who create an idea in the system and expect financial assistance can specifically mention the requirement. other collaborating partners can then support the idea's commercialisation. gib will thus be a potential solution for disseminating knowledge across different social systems and, more importantly, among students to obtain potential opportunities for innovation. however, it is essential to evaluate how users accept such systems, as there is no prior use, especially in universities and industries. b. problem description universities and industries can benefit from adapting proper channels and tools for their collaboration [12]. students engaging in research can be widely accepted by the industry and impact society. a need has been created for a systematic collaborative platform where industry professionals and academics can engage and share expert knowledge. further, firms can absorb ideas from suppliers, users, and knowledge institutions, as this innovation process demands interaction with many disparate actors. adequate support mechanisms can further accelerate such interactions. universities and industries are ideal for sharing knowledge, best practices, and ideas for innovation. however, it is evident that there are substantial barriers to university-industry collaboration (uic) and the motivation to interact systematically, and a significant obstacle is the lack of interconnections of this nature. the above-explained idea bank is a platform intended to solve interconnectivity partly. however, understanding the acceptance of the idea bank system as a tool for stimulating innovation under the purview of uic is an essential factor. the system is intended to be used by a mixed group of participants. most users will be young undergraduates, and their desires may differ from industry professionals. there may be deviations from the system's functional expectations between academics and industry users. can we use the idea bank system with its existing features as the communication channel between the university and industry to stimulate innovations? do such systems require more influential features to be included? the full benefit of the university-industry collaborative platform cannot be achieved unless students and industry partners can use the system. individuals' behavioural intention to use the system may depend on several important factors. the platform will be a potential solution for disseminating knowledge across different social systems and, more importantly, among students to obtain potential opportunities for innovation. however, it is essential to evaluate how users accept such systems, as there is no prior use, especially in universities and industries. therefore, this study was conducted to measure the factors influential on user acceptance of an idea bank system as a source of innovation. c. related works over the years, evaluating user acceptance of technologies has become a popular topic in numerous disciplines. these disciplines include elearning systems [16][17][18][19], mlearning [20][21] in universities, knowledge management systems [1][22][23], e-commerce and m-commerce applications [24][25][26], social networking sites [27][28], professional networking sites [29], and other educational applications such as the adoption of google in education [17][30]. the e-collaboration system was a rare application type, and dasgupta et al. [31] conducted a study about twenty years ago in a similar discipline. however, the aim of collaboration and collaboration technologies is drastically different from today's technological advancements and the nature of collaborations. hence, the scope of the study was limited to measuring user acceptance of e-collaboration technology, where technology was limited to emails and chat messages. however, a collaboration system between universities and industries is different from all the above systems. since most system users are young undergraduates, 3 chaminda wijesinghe #1 , henrik hansson 2, love ekenberg3 international journal on advances in ict for emerging regions june 2022 their intentions to use the system may be different from industry users. there is no study aimed to identify the factors influencing this nature of collaborative system between universities and industries aiming for innovation. d. theoretical background innovation and technology refer to a distinct set of constructs, and several variables need to be considered when assessing the relationship between technology acceptance and innovation. when identifying a theoretical model, understanding human behaviour and the determinants of intention are of concern. several competing theoretical models demonstrate human behaviour for the user acceptance of technologies. previous studies such as the theory of reasoned action (tra) [32], theory of planned behaviour (tpb) [33], technology acceptance model (tam) [34], extended technology acceptance models tam2 [35], tam3 [36], and unified theory of acceptance and use of technology (utaut) [37][38] demonstrate theories in the relevant domain. the tra is designed to predict people's daily life volitional behaviour and understand their psychological behaviours [32]. the theory is more oriented toward individuals' attitudes toward behaviours and subjective norms. subjective norms refer to attitudes toward social pressure to perform a behaviour. according to the tra, the determinants of behavioural intention (bi) to use the system are attitudes toward behaviours and subjective norms. however, a person may not perform an activity even if motivated by positive attitudes because of a lack of control over the person's actions. therefore, tra is extended to tpb [33], including perceived behavioural control as an additional variable. the tpb model's problem is that a person's attitude toward using the computer system becomes irrelevant if the computer system is not accessible to that person. tam is derived from tra [39] by eliminating the uncertain and psychometric status of subjective norms, including two essential factors, perceived ease of use (peu) and perceived usefulness (pu), to determine bi. then, the tam model is extended to tam2 by including the factors for social influence required when evaluating systems beyond the workplace. venkatesh and morris [37] introduced utaut by comparing differences and similarities in eight theories relating to technology acceptance and derived 14 constructs from these eight theories, including effort expectancy, performance expectancy, social influence, and facilitating conditions significant constructs [39]. tam has been considered a parsimonious and powerful theory by the information systems community [40]. moreover, the three constructs peu, pu, and bi used in tam are more relevant in this study than those used in other models. constructs used in other models, such as social norms, image, job relevance, output quality, usage behaviour, result demonstrability, and behavioural controls, are considered less relevant because of the nature of the current study. tam has received substantial empirical support during the past decades, and it has probably become the most widely cited model in technology acceptance studies [39]. therefore, the tam introduced by davis [34] is considered more relevant for evaluating the user acceptance of the lib. structural equation modelling (sem) is a multivariate statistical modelling technique used to analyse structural relationships [41]. furthermore, this has become a standard method for confirming or disconfirming theoretical models in quantitative studies [42]. this technique combines factor analysis as well as multiple regression analysis. it is used for analysing structural relationships between measured and latent variables (constructs). sem has several benefits over more conventional data analysis methods, such as the linear regression model. the capability to account for measurement errors when estimating effects, examine the model's fit to the data, and construct statistical models that more closely agree with the theory, have all been advantageous. this paper is organised as follows: the second section follows the introduction with a conceptual model and hypothesis' development. the research methodology is presented in the third section. in the fourth section, data analysis and findings are provided. the fifth portion presents the discussion and conclusion, while the sixth section presents recommendations for further research. ii. conceptual model and hypothesis the study's conceptual model is presented by modifying the tam with the diffusion of innovation theory (dit) [11]. tam requires external variables to support pu and peu, and integration efforts are needed to understand better technology adoption [40]. the dit presented by rogers [11] helps understand important innovation characteristics. among the five constructs presented in dit, relative advantage (ra), compatibility (co), and trialability (tr) were considered more relevant to the current study. moore and benbasat [43] suggested reducing the instrument based on the research objective and organisational context. in the present study, the system under investigation was a novel system, and people other than the survey participants did not use it. therefore, the complexity and observability of the other two constructs are considered less important in this context. the conjunction between tam and dit has been used in many previous studies in various disciplines, including e-learning systems and mobile apps (e.g., [16][25][44][45]). pu is defined as "the degree to which a person believes that using a particular system would enhance his or her job performance" [34]. peu refers to the degree to which a person believes that using a particular system would be free of effort [34]. bi to use is known as the learners' choice of whether to continue using the new system. this term is seen as a factor that determines the use of technology. in this study, pu refers to how a person believes that using the system would enhance ict innovation. davis [34] claims that an application perceived as easier to use than another is more likely to be accepted by users. based on this, the authors of the current study propose that. h1: users' peu is positively related to the bi to use the system. h2: users' pu is positively related to the bi to use the system. h3: peu is positively associated with pu. the dit explains how innovations are adopted within a population of potential adopters. everett rogers first developed the theory in 1962 based on the observations of 508 diffusion studies [11]. the dit theory consists of crucial elements, innovation, communication channels, time, and user acceptance of a novelty idea bank system to reinforce ict innovations 4 june 2022 international journal on advances in ict for emerging regions social systems. the characteristics of innovation perceived by individuals help explain the different adoption rates measured in 1). relative advantage (2) compatibility, 3). complexity, 4). trialability, and 5). observability [11] as mentioned previously. ra is the degree to which an innovation is perceived as better than the idea it supersedes [11]. what matters in ra is whether an individual perceives innovation as advantageous. the rate of adoption is higher when the ra of innovation is higher. compatibility refers to the degree to which an innovation is perceived as consistent with the existing values, needs, and past experiences of potential adopters [11]. an idea incompatible with the current norms and values of a social system will not be adopted rapidly as a compatible innovation. trialability refers to the extent to which people think that they need to experience innovation before deciding whether to adopt it. trailable innovation tends to have less uncertainty perceived by individuals who consider adopting it, and those individuals tend to learn through this experience. in the current study, this concept refers to how students view their use of the proposed system as having a significant impact on their innovation performance. based on the identified characteristics of dit, the authors propose that, h4: ra is positively related to the pu of the system. h5: ra is positively related to the peu of the system. h6: compatibility is positively related to the pu of the application. h7: trialability is positively related to peu fig. 2: combined theoretical model (tam & dit) tam is combined with dit to check the perceived usefulness and perceived ease of use of the system described by davis [34] in tam. figure 2 explains how dit is combined with the tam to derive a hybrid theoretical model with the derived hypothesis. this study contributes to the literature as a model developed with the diffusion of innovation theory combined with the technology acceptance model to evaluate users' perception of a newly developed tool for escalating ict innovations. the study also helps practitioners focus on the criteria in a university-industry collaborative tool to stimulate innovations. iii. methodology this study aims to understand some factors and how they contribute to the system's usability. we wanted a broad response from various stakeholders to achieve a reasonable idea of these factors. we designed a questionnaire distributed to 300 individuals, including 250 students in higher education, 30 academic staff members, and 20 industry representatives. the survey was conducted in two phases, with a pilot study using a convenient sample of higher education students, followed by the main study. the scope of the study was limited to measuring user acceptance of e-collaboration technology, where technology was limited to emails and chat messages. a. measurement of items the questionnaire was developed from an extensive literature review to test the constructs used in this research's theoretical model. a total of 34 questions (see appendix a) were modified to suit the context under study, including five demographic characteristics, 13 tam factors, and 16 dit factors. among the 29 items used to measure tam and dit, 21 items were selected from moore and benbasat [43] as follows: for ra, out of nine items, eight were selected, and the deselected item measured the control over individuals' work, which is considered less significant in the study; further, all three items for compatibility, all five items for trialability, and all five items for peu were extracted and modified. the remaining eight items for tam were selected as three items for pu, and two items for bi were selected and modified from venkatesh and bala [36]. two items for pu were selected from davis [34] and modified. based on the literature, one item (appendix a: bi2) was created for this study. all scales used in this study were adapted from the existing literature. the items in the constructs were measured using a fivepoint likert scale with answer choices ranging from "strongly disagree (1)" to "strongly agree (5), "previously validated scales operationalise the constructs. a short video was created to understand the system's features and distribute it with the questionnaire. a link to the system was created, and access was granted to all respondents via guest login credentials. this process was required because the system is a novel system, and the respondents were novice users b. pilot study a pilot study, a small-scale rehearsal of the larger research design, was conducted to identify potential issues and check the measurement items. a convenience sample of 50 students studying in the second year in a higher education institute in sri lanka was selected, and the questionnaire was distributed using google forms. all the students are computing undergraduates and are therefore considered highly ict literate. in the google form, the study's title, the purpose of the survey, the objective, and the intended audience to fill the pilot survey were mentioned. a text area was included for each construct, and respondents were asked to write feedback on each construct measurement item. one questionnaire item was h1 h2 h3 h7 h5 h4 h6 compatibil ity (co) trialability (tr) relative advantage (ra) perceived usefulness (pu) perceived ease of use (peu) behaviour al intention (bi) 5 chaminda wijesinghe #1 , henrik hansson 2, love ekenberg3 international journal on advances in ict for emerging regions june 2022 rephrased to improve clarity based on the respondents' feedback. c. main study after the pilot study, university undergraduates in computing and software industry professionals representing university collaborations were selected to distribute the refined questionnaire. the respondents were purposefully selected as all software companies in sri lanka do not have uic, and all undergraduates do not know about ict innovations or collaborations at the early stage of their university education. generally, in sri lanka, students engage in various industryrelated activities while studying at universities. therefore, when choosing undergraduates, those in the second year and onwards were chosen because they are mature enough to understand the system's requirements related to universityindustry collaborations. in the survey, a forced response option was used in the main section after the demographic data section to avoid missing values. thus, all respondents could only move to the next question by answering the current question. finally, the questionnaire was distributed among software industry professionals, academics, and the abovedescribed students, using a google form. all respondents were active in sri lanka, a south asian country. related literature suggests that a minimum sample size of 100 to 150 is required in sem [42][47]. a minimum sample size of 150 is required when the number of constructs is less than seven, with modest (0.5) item communalities and no under-identified constructs [47]. according to barclay et al. [48] and gaskin & happell [49], one guideline for sample size is that the sample should have at least ten times more data points than the number of questionnaire items in the most complex construct in the model. it becomes ten times the number of predictors, either from indicators from the most complex formative construct or the most significant number of antecedent constructs leading to an endogenous construct, whichever is larger. since schumacker & lomax [42] claim that these multiple values should be ten times or 20 times, in the current study, the most complex regression involves the formative construct with eight items requiring a minimum sample size of 20 times 8, which equals 160. the system is intended to be used by university students, especially after the first year, academic staff members, and industry professionals. therefore, the questionnaire was distributed among 300 survey participants, including 250 undergraduates enrolled in the second, third, and fourth years of computer and it-related degree disciplines. twenty industry participants worked in ict-related industries, were familiar with university engagements, and had 30 academic staff members. the respondents' involvement in universityindustry collaboration was questioned in the questionnaire and required to be answered. a total of 210 responses were received, with eight incomplete responses. these eight respondents did not attempt the main questionnaire and were therefore excluded, and the remaining 202 responses were used for data analysis. iv. data analysis and results a. demographic analysis the participants were almost equal in terms of sex. among the 201 participants,47.3% were men, and 52.2% were women. one participant opted not to indicate their gender. the majority of participants were between the ages of 18 and 25 (86.6%), and 9.4% were 25 to 30. the next highest age category was 35–40, which was 2.5%. only 1% were from the age group 30 to 35 years, and only 0.5% were reported from the age group 40 to 45 years. the total number of responses in the age group was 202. among the three categories of participants as students in higher education, academic staff members, and industry respondents, the majority of respondents (87.6%) were students in higher education. the second and third categories were almost equal, 6.4% and 5.9%, respectively. 46.8% of participants had direct universityindustry interactions, and 28.9% of participants had no interactions. there were 24.4% reported as they may have had university-industry interactions with uncertainty. among the undergraduate respondents, 60.6% studied in the third year, and 17.1% studied in the fourth year. there were 7.3% in the second year. the remaining 15% of the participants responded to none of the above categories. students are given an internship in the second semester of the third year in the higher education institute selected for the survey. students can continue their internship or become employees after the internship. therefore, some respondents can become undergraduates, and at the same time, they can be employees. b. model analysis the model analysis was conducted with partial least squares path modelling (pls-pm), which was used in two stages: 1) assessing the validity and reliability of the measurement model and 2) assessing the structural model. the statistical package for social science (spss) amos was used to analyse the questionnaire's data. 1) assessment of the measurement model: the measurement model describes the relationship between constructs (latent variables) and their measures (observed variables) [50]. one of the primary objectives of sem is to evaluate the construct validity of the proposed measurement model [41]. the validity of the measurement model depends on the model fit for the measurement model and construct validity evidence [41]. kline [51] and hair [41] emphasise the guidelines to measure the goodness of fit and suggest reporting model fit statistics as the minimum of model chi-square (χ2) and its degree of freedom (df), root mean square error of approximation (rmsea), and comparative fit index (cfi). the initial model test values are reported as rmsea= 0.066, which indicates a good fit, gfi= 0.799 was below the cut-off value for a good fit, cfi=0.937 reported as a good fit, tli=0.928 reported as a good fit, χ2 = 747.072, df= 362, χ2 /df= 2.064 reported as a good fit. except for gfi, all the values are within the cut-off values for an accepted model fit (see table i). the considered gfi value for a good model fit should be greater than 0.9 to ensure the construct's validity. then, the model modification was conducted by specifying covariances for the error terms. the highest modification indices (mi) were paired (see figure 4), as higher values of mi indicated item redundancy [52]. table i. the three categories of model fit and their level of acceptance (after the model modification) indicate an acceptable model fit. user acceptance of a novelty idea bank system to reinforce ict innovations 6 june 2022 international journal on advances in ict for emerging regions note: rmsea-root mean square of error approximation, gfi-goodness of fit index, cfi-comparative fit index, tli-tucker-lewis index, chsq chisquare, dfdegree of freedom individual item reliability was evaluated by examining loadings. a value of 0.7 or more is considered an indication of acceptable reliability [52][53]. one item from the relative advantage (ra4) reported a low factor loading of 0.62, as shown in figure 3. an item with low factor loading means that a particular item is deemed useless to measure that construct [52] and is therefore removed. figure 3 and figure 4 show the parameter values before and after the model modification, respectively. after the model modification, the reported values were: rmsea= 0.066 (no change), gfi= 0.824 (improved), cfi=0.937 (no change), tli=0.928 (no change), χ2 = 619.6, df= 331, χ2 /df= 1.872 (improved). gfi and the χ2 /df values improved after the model modification. although the gfi value does not reach 0.9, a value above 0.8, should be considered a reasonable fit and acceptable, c.f.,.[54],[55]. every construct has reported ave's value greater than 0.5 (see table ii), confirming the measurement model's convergent validity [53]. discriminant validity was conducted to ensure that the measurement model had no redundant constructs. for adequate discriminant validity, the diagonal elements in table ii should be greater than the corresponding off-diagonal elements in the rows and columns [42]. therefore, the measurement model was verified with discriminant validity, indicating that all diagonal values were greater than their corresponding off-diagonal values. 2) reliability of the measurement model: reliability measures were assessed to verify the latent construct reliability and internal consistency of the measurement model. the reliability of the measurement model was examined using composite reliability (cr) and cronbach's alpha (ca) [42]. if the values of cr and ca are greater than or equal to 0.7, it is considered that the composite reliability for a construct is achieved [53], and if ca is greater than 0.8, it is a good level [56]. the results show that estimates for cr and ca range between 0.8161 and 0.9142 (see table ii), indicating a good reliability level. the structural model was used for assessment when the measurement model was satisfied and confident with the constructs' reliability and validity. 3) assessment of the structural model: the structural model specifies relationships between constructs [50]. we use the standard of gefen et al.[47] (p.45). table iii is developed to test each hypothesis's support by investigating the endogenous latent variables' coefficient of determination (r2). the critical ratio (cr), or t value, has become a popular statistic for evaluating the structural model. fig. 3 initial model showing factor loadings of the measurement model before modifications. note: trtrialability, rarelative advantage, cocompatibility, puperceived usefulness, peuperceived ease of use, bibehavioural intention name of category name of index level of acceptance level achieved absolute fit rmsea rmsea<0.08 0.066 gfi gfi>0.90 0.824 incremental fit cfi cfi>0.90 0.937 tli tli>0.90 0.928 parsimonious fit chsq/df chisq/df <3.0 1.872 7 chaminda wijesinghe #1 , henrik hansson 2, love ekenberg3 international journal on advances in ict for emerging regions june 2022 fig. 4 measurement model after modifications. note: trtrialability, rarelative advantage, cocompatibility, puperceived usefulness, peuperceived ease of use, bibehavioural intention table ii. correlation matrix showing internal consistencies and correlation of constructs indicating measurement model's discriminant validity and reliability note: cacronbach's alpha, crcomposite reliability, aveaverage variance extracted. table iii. hypothesis testing results of the structural model show five supportive and two unsupportive. note: ivindependent variable; dvdependent variable; sestandard error; crcritical ratio; puperceived usefulness; peu perceived ease of use; rarelative advantage; cocompatibility; trtrialability hypothesis iv relation dv estimate se cr p-value result h1 peu bi 0.892 0.156 5.723 0.000 supportive h2 pu bi 0.096 0.136 0.702 0.483 unsupportive h3 ra pu 0.403 0.150 2.680 0.007 supportive h4 ra peu 0.621 0.118 5.265 0.000 supportive h5 co pu -0.095 0.135 -0.708 0.479 unsupportive h6 peu pu 0.626 0.105 5.975 0.000 supportive h7 tr peu 0.323 0.100 3.231 0.001 supportive no of items correlation of constructs internal consistencies ra co tr pu peu bi ca cr ave ra 8 1 0.902 0.9142 0.579723 co 3 0.81 1 0.859 0.8626 0.676857 tr 5 0.78 0.84 1 0.894 0.8946 0.62959 pu 5 0.77 0.69 0.70 1 0.903 0.9044 0.654916 peu 5 0.76 0.71 0.71 0.80 1 0.877 0.8766 0.587306 bi 3 0.80 0.75 0.74 0.76 0.77 1 0.817 0.8161 0.597344 international journal on advances in ict for emerging regions 2022 15 (1): june 2022 international journal on advances in ict for emerging region among all seven hypotheses, five were supportive, and only two were unsupportive, based on the results depicted in table iii. three hypotheses were significant at p<0.001, and two hypotheses were significant at p<0.01. perceived ease of use has a positive effect on the behavioural intention to use the system. (β=0.892, p<0.001), while perceived usefulness had a negative impact on behavioural intention (β=0.096, p>0.05). this result implies that the ease of using the system is more decisive for the behavioural intention to use the system than its usefulness. relative advantage has a positive effect on perceived usefulness (β=0.403, p<0.01) and perceived ease of use (β=0.621, p<0.001). compatibility had a negative effect on perceived usefulness (β=-0.095, p>0.05). perceived ease of use was positively affected by perceived usefulness (β=0.626, p<0.001), and trialability had a positive effect on perceived ease of use (β=0.323, p<0.01). v. discussion and conclusion the tam has been used in various educational studies to evaluate the acceptance of different technologies for education. among these studies examining the behavioural intention to use learning management systems (lms) [18][19] and other e-learning platforms [16][17][45][57], mobile learning systems [20], and e-collaboration systems [31] are considered to be more relevant. the examined information system is different from the above systems in terms of its collaborative features between educational institutes and industries for stimulating ict innovations. therefore, the study's theoretical and practical implications bring new insights to researchers, information system designers, and administrators at uic. a. theoretical implications the tam proposed by davis [34] claims that an individual's adoption of information technology depends on perceived usefulness and perceived ease of use. our results provide limited support for the original tam. first, perceived ease of use positively influences perceived usefulness and behavioural intention to use the system. second, perceived usefulness has a negative effect on the behavioural intention to use. while the first result supports the original tam, the second result contradicts the original tam. in comparing our results with previous studies of tam, behavioural intention to use e-learning systems (e.g. [16][17][45]) and learning management systems [18][19] have been influenced by perceived usefulness and perceived ease of use. however, investigating the behavioural intention to use an ecollaboration system in another study [31], the authors concluded that perceived usefulness has a negative relationship with the use of an e-collaboration system. this result contradicts the results of the original tam. however, perceived ease of use has a strong positive effect on the perceived usefulness of the system, supporting the results of the original tam. however, the e-collaboration system has not radically changed over the last few years (ibid). consequently, advanced system users may be efficient users who are familiar with the navigational structure of the system. this contradictory situation is supported in two other studies [26][28], and perceived usefulness does not influence behavioural intention due to environmental factors. the possible reason for the contradictory relationship in our study may be that participants already know the usefulness of an it-supported system for ict innovations. that is not considered an extrinsic motivational factor. the result may have also been affected because all participants had a strong background in it. however, the current study's findings are consistent with the original findings of the tam [34]. moreover, it is evident that external factors derived from the diffusion of innovation theory comply with the existing similar studies [16][25] in the literature regarding their relationship with tam. the relative advantage greatly influences the perceived ease of use and perceived usefulness. this result implies that when users regard the idea bank system as better than the traditional collaboration system or approaches, they may perceive the idea bank system to be more useful [24][44]. the factor trialability also positively supports the perceived ease of use. however, factor compatibility negatively affects perceived usefulness. however, many studies (e.g.[16][24]) have shown a positive relationship between compatibility and perceived usefulness b. practical implications the idea bank system has many features that contribute positively to ict innovation. the relative advantage of using the idea bank system compared to other systems significantly influences perceived usefulness and perceived ease of use. perceived ease of use is an assessment of the cognitive effort involved in using the system. in such situations, users' focus is on the interaction with the system and not on objectives external to the interaction. since the system is mainly intended to be used by young undergraduates in universities, they will be more focused on the ease of using the system than its usefulness. previous studies have shown that factors affecting user acceptance may vary between hedonic and utilitarian systems [58]. higher demand for perceived ease of use is expected in hedonic systems than perceived usefulness. heijden [58] suggested that developers include hedonic features to invoke other configurations to achieve user acceptance. systems trialability before deciding to purchase is a significant factor in the ease of using the system. this result implies that when users had more opportunities to try the idea bank system, they could view it as easier to use [45]. since the idea bank system is quite a new system, it would be good to make it available for users on a trial basis and grow the system with ideas before deciding to purchase. additionally, cognitive absorption may also be experienced with visually rich and appealing technologies when designing an information system [59]. diversity in societies and organisational structures may challenge the use of information systems. while some factors such as it proficiency and experience promote the system's ease of use, technology acceptance and intention to use may be moderated by its rules, policy, and it guidelines [60]. generalisability is commonly accepted as a quality criterion in quantitative research [61]. adopting a modelling framework (sem) that allows for a variety of statistical models in the study increases the study's credibility. the reliability of the survey is high since it was preceded by a thorough literature review to adjust the scales used in the study. respondents were carefully chosen to adequately represent academic staff, industry, and students involved in the university-industry partnership, assuring the internal validity. 9 chaminda wijesinghe #1 , henrik hansson 2, love ekenberg3 international journal on advances in ict for emerging regions june 2022 c. conclusion a good insight into the user acceptance and adopting a systematic model to rein-force ict innovations are provided in this study with its derived results. combining the technology acceptance model and diffusion of innovation theory derives varied results based on the system under investigation. the significance of the current study is the identification of factors affecting users' perceptions of using a collaborative tool for stimulating innovation. this study fills the gap in the literature by providing valuable features in such a collaborative system, primarily when young students mainly use it in universities. since the system is intended to be used primarily by young undergraduates, developers of such systems are encouraged to use hedonic features. the study presents several essential findings for researchers in a similar domain, information system designers, and university-industry partnership managers. our main conclusion is that the usefulness and behavioural intention to use the system will be determined by how far the system use is free of effort. second, the relative advantage is an excellent determinant of perceived usefulness and the perceived ease of use of the system. the free effort to use the system is an influential factor for behavioural intention to use the idea bank system. the authors recall that the relative advantage is the degree to which a new product is superior to an existing product. therefore, new products should be incorporated with ease of use. these results indicate that the system becomes more valuable by increasing the ease of use, and hence, the behavioural intention to use the system can be improved. finally, since there has been no prior study that has evaluated user acceptance of an idea bank system or university-industry collaborative system aiming at innovation, the results of the study are unique vi. limitations and suggestions for future works one limitation of the study is that the respondents have not used the system for an extended period to become familiar with all system functionalities. that can make respondents feel that the system's ease of use is more important than its usefulness. enthusiastic researchers in the same domain can examine hedonic features in educational systems to consistently use such information systems. appendix a questionnaire items-part i demographic characteristics gender age group status of the respondent (undergraduate/ graduate employee/ non-graduate employee) involvement in university-industry interactions if undergraduate: year of study questionnaire items-part ii system characteristics perceived usefulness pu1. using the system would improve my innovation performance. pu2. using the system would help to accomplish innovation tasks more quickly. pu3. using the system would improve the quality of innovation. pu4. using the system would enhance the effectiveness of innovation activity. pu5. i feel that the system is useful to increase innovations. perceived ease of use peu1. learning to operate the system would be easy for me. peu2. my interaction with the system is clear and understandable. peu3. i believe it would be easy to get the system to do what i want it to do. peu4. it is easy for me to remember how to perform tasks using the system. peu5. overall, i believe the system would be easy to use. behavioural intention to use. bi1. assuming i had access to the system, i intend to use it. bi2. i intend to recommend others to use this system for future work. bi3. for future work, i would use the system. relative advantage ra1. using this system enables me to accomplish innovation tasks more quickly than before. ra2. using the system improves the quality of innovation activities. ra3. using the system makes it easier to do innovation activities. ra4. the disadvantages of my using the system far outweigh the advantages. ra5. using the system improves innovation performance. ra6. overall, i find that using the system will be advantageous for innovation activities. ra7. using the system enhances the effectiveness of innovation activities. ra8. using the system increases my productivity. compatibility co1. using the system would be compatible with all aspects of innovations. co2. i think that using the system would fit well with the way i like to collaborate with industry/university. co3. using the system would fit into my work style. trialability tr1. i have a great deal of opportunities to try various applications in the system. tr2. i know where i can go to satisfactorily try out various uses of the system. tr3. the system will be available to me to test various applications adequately. tr4. before deciding whether to use any system applications, i will be able to properly try them out. tr5. i will be permitted to use the system on a trial basis long enough to see what it can do. i am able to experiment with the system as necessary. user acceptance of a novelty idea bank system to reinforce ict innovations 10 june 2022 international journal on advances in ict for emerging regions appendix b acronyms ave average variance extracted bi behavioral intention ca cronbach's alpha cfi comparative fit index chsq chi-square co compatibility cr composite reliability df degree of freedom dit diffusion of innovation theory dp dependent variable gfi goodness of fit index gib global idea bank ict information and communication technology it information technology iv independent variable lib local idea bank mi modification indices nib national idea bank peu perceived ease of use pls-pm partial least squares path modelling pu perceived usefulness ra relative advantage rmsea root mean square error approximation sdg sustainable development goals sem structural equation modelling se standard error spss statistical package for social science tam technology acceptance model tli tucker–lewis's index tpb theory of planned behaviour tra theory of reasoned action tr trialability uic university-industry collaboration utaut unified theory of acceptance and use of technology references [1] a. elmorshidy, "the impact of knowledge management systems on innovation: an empirical investigation in kuwait," vine j. inf. knowl. manag. syst., vol. 48, no. 3, pp. 388–403, 2018, doi: 10.1108/vjikms-12-2017-0089. [2] g. ahuja and r. katila, "technological acquisitions and the innovation performance of acquiring firms: a longitudinal study," strateg. manag. j., vol. 22, no. 3, pp. 197–220, mar. 2001, doi: https://doi.org/10.1002/smj.157. [3] j. darroch, "knowledge management, innovation and firm performance," j. knowl. manag., vol. 9, no. 3, pp. 101–115, jan. 2005, doi: 10.1108/13673270510602809. [4] k. z. zhou and c. b. li, "how knowledge affects radical innovation: knowledge base, market knowledge acquisition, and internal knowledge sharing," strateg. manag. j., vol. 33, no. 9, pp. 1090–1102, sep. 2012, doi: https://doi.org/10.1002/smj.1959. [5] m. t. hansen, "the search-transfer problem: the role of weak ties in sharing knowledge across organization subunits," adm. sci. q., vol. 44, no. 1, pp. 82–111, sep. 1999, doi: 10.2307/2667032. [6] g. patriotta, a. castellano, and m. wright, "coordinating knowledge transfer: global managers as higher-level intermediaries," j. world bus., vol. 48, no. 4, pp. 515–526, 2013, doi: 10.1016/j.jwb.2012.09.007. [7] i. nonaka, "the knowledge-creating company," harv. bus. rev., vol. nov-dec, no. 1, 1991. [8] l. j. gressgård, o. amundsen, t. m. aasen, and k. hansen, "use of information and communication technology to support employee-driven innovation in organisations: a knowledge management perspective," j. knowl. manag., vol. 18, no. 4, pp. 633–650, 2014, doi: 10.1108/jkm-01-2014-0013. [9] p. h. j. hendriks, "many rivers to cross: from ict to knowledge management systems," j. inf. technol., vol. 16, no. 2, pp. 57–72, 2001, doi: 10.1080/02683960110054799. [10] k. dalkir, "knowledge management tools," in knowledge management in theory and practice, 2011. [11] e. m. rogers, diffusion of innovations, third edition. 1983. [12] e. bellini, g. piroli, and l. pennacchio, "collaborative know-how and trust in university–industry collaborations: empirical evidence from ict firms," j. technol. transf., pp. 1–25, 2018, doi: 10.1007/s10961-018-9655-7. [13] l. johnston, s. robinson, and n. lockett, "article information :," int. j. entrep. behav. res., vol. 16, no. 6, pp. 540–560, 2010, doi: 10.1108/jhom-09-2016-0165. [14] "global idea bank," 2019. https://ideabank.global/launch/ (accessed mar. 17, 2021). [15] c. sandström and j. björk, “idea management systems for a changing innovation landscape,” int. j. prod. dev., vol. 11, no. 3– 4, pp. 310–324, 2010, doi: 10.1504/ijpd.2010.033964. [16] w. m. al-rahmi et al., "integrating technology acceptance model with innovation diffusion theory: an empirical investigation on students' intention to use e-learning systems," ieee access, vol. 7, pp. 26797–26809, 2019, doi: 10.1109/access.2019.2899368. [17] r. cheung and d. vogel, "predicting user acceptance of collaborative technologies: an extension of the technology acceptance model for e-learning," comput. educ., vol. 63, pp. 160–175, 2013, doi: 10.1016/j.compedu.2012.12.003. [18] s. alharbi and s. drew, "using the technology acceptance model in understanding academics' behavioural intention to use learning management systems," int. j. adv. comput. sci. appl., vol. 5, no. 1, 2014, doi: 10.14569/ijacsa.2014.050120. [19] n. fathema, d. shannon, and m. ross, "expanding the technology acceptance model (tam) to examine faculty use of learning management systems (lmss) in higher education institutions.," merlot j. online learn. teach., vol. 11, no. 2, pp. 210–232, 2015. [20] n. wei and z. w. li, "telepresence and interactivity in mobile learning system: its relation with open innovation," j. open innov. technol. mark. complex., vol. 7, no. 1, pp. 1–18, 2021, doi: 10.3390/joitmc7010078. [21] l. briz-ponce and f. j. garcía-peñalvo, "an empirical assessment of a technology acceptance model for apps in medical education," j. med. syst., vol. 39, no. 11, nov. 2015, doi: 10.1007/s10916-015-0352-x. [22] m. del giudice and m. r. della peruta, "the impact of it-based knowledge management systems on internal venturing and innovation: a structural equation modeling approach to corporate performance," j. knowl. manag., vol. 20, no. 3, pp. 484–498, 2016, doi: 10.1108/jkm-07-2015-0257. [23] j. xu and m. quaddus, "development of a research model of adoption and continued use exploring the factors influencing end users' acceptance of knowledge management systems :," j. organ. end user comput., vol. 19, no. 4, 2007. [24] j. h. wu and s. c. wang, "what drives mobile commerce? an empirical evaluation of the revised technology acceptance model," inf. manag., vol. 42, no. 5, pp. 719–729, 2005, doi: 10.1016/j.im.2004.07.001. [25] s. min, k. k. f. so, and m. jeong, "consumer adoption of the uber mobile application: insights from diffusion of innovation theory and technology acceptance model," j. travel tour. mark., vol. 36, no. 7, pp. 770–783, 2019, doi: 10.1080/10548408.2018.1507866. [26] r. septiani, p. w. handayani, and f. azzahro, "factors that affecting behavioral intention in online transportation service : case study of go-jek," in procedia computer science, 2018, vol. 124, pp. 504–512, doi: 10.1016/j.procs.2017.12.183. [27] e. constantinides, c. lorenzo-romero, and m.-c. alarcón-delamo, "social networking sites as business tool: a study of user behaviour bt business process management: theory and applications," pp. 221–240, 2013, [online]. available: http://dx.doi.org/10.1007/978-3-642-28409-0_9. [28] p. tantiponganant and p. laksitamas, "an analysis of the technology acceptance model in understanding students' behavioral intention to use university's social media," proc. 2014 iiai 3rd int. conf. adv. appl. informatics, iiai-aai 2014, vol. 12, pp. 8–12, 2014, doi: 10.1109/iiai-aai.2014.14. [29] w. a. khan afridi, h. hashim, and m. kuppusamy, "the critical 11 chaminda wijesinghe #1 , henrik hansson 2, love ekenberg3 international journal on advances in ict for emerging regions june 2022 success factors of professional networking sites'adoption by university students," test eng. manag., vol. 82, no. 1–2, pp. 653– 673, 2020. [30] r. s. al-maroof, a. m. alfaisal, and s. a. salloum, "google glass adoption in the educational environment: a case study in the gulf area," educ. inf. technol., vol. 26, no. 3, pp. 2477–2500, 2021, doi: 10.1007/s10639-020-10367-1. [31] s. dasgupta, m. granger, and n. mcgarry, "user acceptance of ecollaboration technology : an extension of the technology ...," gr. decis. negot., vol. 11, no. 2, p. 87, 2002. [32] m. fishbein and i. ajzen, belief, attitude, intention and behavior: an introduction to theory and research. addison-wesley, 1975. [33] i. ajzen, "from intentions to actions-tpb.1985.pdf," in action control: from cognition to behavior, 1985, pp. 11–39. [34] f. d. davis, "perceived usefulness, perceived ease of use, and user acceptance of information technology," mis q. manag. inf. syst., vol. 13, no. 3, pp. 319–339, 1989, doi: 10.2307/249008. [35] v. venkatesh and f. d. davis, "a theoretical extension of the technology acceptance model: four longitudinal field studies," manage. sci., vol. 46, no. 2, pp. 186–204, 2000, doi: 10.1287/mnsc.46.2.186.11926. [36] v. venkatesh and h. bala, "technology acceptance model 3 and a research agenda on interventions," decis. sci., vol. 39, no. 2, pp. 273–315, 2008, doi: 10.1111/j.1540-5915.2008.00192.x. [37] v. venkatesh, m. g. morris, g. b. davis, and f. d. davis, "user acceptance of information technology: toward a unified view," mis q., vol. 27, no. 3, pp. 425– 478, 2003, doi: 10.1006/mvre.1994.1019. [38] v. venkatesh, j. y. l. thong, and x. xu, "unified theory of acceptance and use of technology: a synthesis and the road ahead," j. assoc. inf. syst., vol. 17, no. 5, pp. 328–376, 2016, doi: 10.17705/1jais.00428. [39] h. taherdoost, "a review of technology acceptance and adoption models and theories," procedia manuf., vol. 22, pp. 960–967, 2018, doi: 10.1016/j.promfg.2018.03.137. [40] y. lee, k. a. kozar, and k. r. t. larsen, "the technology acceptance model: past, present, and future," commun. assoc. inf. syst., vol. 12, no. december, 2003, doi: 10.17705/1cais.01250. [41] j. f. hair, w. c. black, b. j. babin, and r. e. anderson, multivariate data analysis, 7th ed. 2014. [42] r. e. schumacker and r. g. lomax, a beginner's guide to structural equation modeling, 3rd edn, vol. third edit. 2010. [43] g. c. moore and i. benbasat, "development of an instrument to measure the perceptions of adopting an information technology innovation," inf. syst. res., vol. 2, no. 3, pp. 192–222, 1991, doi: 10.1287/isre.2.3.192. [44] s. c. chang and f. c. tung, "an empirical investigation of students' behavioural intentions to use the online learning course websites," br. j. educ. technol., vol. 39, no. 1, pp. 71–83, 2008, doi: 10.1111/j.1467-8535.2007.00742.x. [45] y. h. lee, y. c. hsieh, and c. n. hsu, "adding innovation diffusion theory to the technology acceptance model: supporting employees' intentions to use e-learning systems," educ. technol. soc., vol. 14, no. 4, pp. 124–137, 2011. [46] f. d. davis, "it usefulness and ease of use," mis q., vol. 13, no. 3, pp. 319–340, 1989. [47] d. gefen, d. straub, and m.-c. boudreau, "structural equation modeling and regression: guidelines for research practice," commun. assoc. inf. syst., vol. 4, no. october, 2000, doi: 10.17705/1cais.00407. [48] c. barclay, d., thompson, r., dan higgins, "the partial least squares (pls) approach to causal modeling: personal computer adoption and use an illustration," technol. stud., vol. 2, no. 2, pp. 285–309, 1995. [49] c. j. gaskin and b. happell, "on exploratory factor analysis: a review of recent evidence, an assessment of current practice, and recommendations for future use," int. j. nurs. stud., vol. 51, no. 3, pp. 511–521, 2013, doi: 10.1016/j.ijnurstu.2013.10.005. [50] j. c. anderson and d. w. gerbing, "some methods for respecifying measurement models to obtain unidimensional construct measurement," j. mark. res., vol. 19, no. 4, p. 453, 1982, doi: 10.2307/3151719. [51] r. b. kline, principles and practices of structural equation modelling ed. 4. 2015. [52] z. awang, "validating the measurement model: cfa the measurement model of a latent construct," in sem made simple, mpws, 2015, pp. 54–74. [53] c. fornell and d. f. larcker, "evaluating structural equation models with unobservable variables and measurement error," j. mark. res., vol. 18, no. 1, p. 39, 1981, doi: 10.2307/3151312. [54] h. baumgartner and c. homburg, "applications of structural equation modeling in marketing and consumer research: a review," int. j. res. mark., vol. 13, pp. 139–161, 1996, doi: 10.1016/j.ecolecon.2015.08.017. [55] w. j. doll, w. xia, and g. torkzadeh, "a confirmatory factor analysis of the end-user computing satisfaction instrument," mis q. manag. inf. syst., vol. 18, no. 4, pp. 453–460, 1994, doi: 10.2307/249524. [56] g. ursachi, i. a. horodnic, and a. zait, "how reliable are measurement scales? external factors with indirect influence on reliability estimators," procedia econ. financ., vol. 20, no. 15, pp. 679–686, 2015, doi: 10.1016/s2212-5671(15)00123-9. [57] i. a. c. jimenez, l. c. c. garcía, m. g. violante, f. marcolin, and e. vezzetti, "commonly used external tam variables in e-learning, agriculture and virtual reality applications," futur. internet, vol. 13, no. 1, pp. 1–21, 2021, doi: 10.3390/fi13010007. [58] h. van der heijden, “user acceptance of hedonic systems,” mis q., vol. 28, no. 4, pp. 695–704, 2004. [59] r. agarwal and j. prasad, "a conceptual and operational definition of information technology," inf. syst. res., vol. 9, no. 2, pp. 204–215, 1998. [60] p. ajibade, "technology acceptance model limitations and criticisms: exploring the practical applications and use in technology-related studies, mixed-method, and qualitative researches," libr. philos. pract., vol. 2019, 2019. [61] f. n. kerlinger lee, howard b., foundations of behavioral research. fort worth, tx: harcourt college publishers, 2000.  the international journal on advances in ict for emerging regions 2010 03 (02) : 25 -33  abstract—the eminent presence of globalization coupled with the advancement of information technology continues to penetrate and integrate societies and nations in a manner previously unknown. to stay competitive, most countries advocate and implement policies and amenities to encourage technological acceptance and adaptability. in this study, we investigate into the factors explaining the diffusion of ict and its impact on economic activities. to gauge such, we examine the influence of ict diffusion on a country’s economic orientation. our rationale is that greater the extent of ict adoption in societies, the greater the economic and social development. for the purpose of this study, a set of latin american countries is studied to capture the relative effect of various social, economical, and infrastructural variables into the overall ict level, where ict level is proxied by the network readiness index adopted from world economic forum. the result indicates that a country’s it expenditure, age dependency ratio, literacy rate, and urbanization draw significant attention in explaining ict diffusion in this region. index terms—age dependency ratio, latin america countries, information technology, ict readiness. i. introduction consensus among scholars and policy makers is that to understand the impact of globalization on world economies, as well as to gain an advantage in the opening of the competitive world that lowering of barriers would create, economies should not ignore the infusion effect of information technology and its diffusion within the immediate surroundings. while this phenomenon has increased trade, countries at present are relying on other strategies to attract more business and growth within their economies. via regionalization, most countries that share similar traits often are compelled to rally around each other to attract the best of their community. consequently, in this study we focus on the latin american countries, henceforth lac, to understand which indicators either increase or decrease ict in these countries. a selected number of lac was chosen based on their gross domestic product per capita purchasing power parity. the rationale is high per capita countries are most likely to manuscript received april 10, 2010. accepted august 30, 2010. aziz bakay is with the texas a&m international university, laredo, tx 78041 usa. (e-mail: azizbakay@dusty.tamiu.edu) collins e. okafor is with the texas a& m international university, laredo, tx 78041 usa. (e-mail: doncluminatti@dusty.tamiu.edu). nacasius u. ujah is with the texas a& m international university, laredo, tx 78041 usa. (nacasius.ujah@dusty.tamiu.edu) attract and spend more on information and communication technology, which also follows suit with findings by chinn and fairlie [10], bagchi and udo [4], etc. we differentiate our result by focusing on a particular region; all countries selected for this paper are developing or emerging countries to assert that possibly the degree to which these indicators play in these countries maybe different from previous findings, as argued by bagchi and edo [4]. the effects of these factors could be similar or different on developed and developing nations depending on the specific factor considered. furthermore, we find that the economic contributions of the statistically significant indicators used in this paper: gross domestic product per capita on average, for every $3,333 increase in the income per person in a country would lead to approximately 1 point increase in the network readiness score for the country. in considering ict, two terms are most prevalent: ict diffusion and digital divide (dd). here, we distinctively bridge the gap between both concepts. a. ict diffusion in pursuit of growth and sustainability, countries continue to equip themselves with tools and resources to facilitate a clear-cut information communication technology diffusion. diffusion here entails the implementation of essential measures that facilitate a country’s network readiness (where readiness is considered on the basis of adaptability and efficiency of countries’ productivity). information and communication technologies (icts) are identified as network technologies [36] and key factors influencing economic growth [16] [26] [27] [36]. the sector of information and communication services is considered to be vital to the development processes of societies as long as it is competitive and vibrant in structure [38]. icts diffusion has been dynamic and variant across countries [27]. the diffusion is perceived important because ict is substantially related to the electronic commerce in particular [28] and to the economic development at large [36]. the accurate usage of ict in investment skills, organizational change and innovation leads to efficient and flourishing businesses [26]. icts continue to be an important factor in the development, effectiveness, and efficiency of both human and economic advancement. this is evident in the information sharing and network building that engulf most sectors (education, health, agriculture, financial markets etc.). institutions and other business mediums continue to factors explaining ict diffusion: case study of selected latin american countries aziz bakay, collins e. okafor, and nacasius u. ujah a 26 aziz bakay, collins e. okafor, and nacasius u. ujah the international journal on advances in ict for emerging regions 03 december 2010 elevate towards more integrated and productive business environments [38]. b. digital divide digital divide (dd) is defined as the divergence of the icts worldwide and the disparity of digital opportunities within nation states and their dispersion among countries and territories (digital divide, n.d.). despite the fact that dd has been bridged to some extent, there still exist inequalities across the world. as such, the challenge in this regard has been the process of diffusing and integrating icts within societies to garner the benefits of icts towards economic development [26]. across countries, dd has been a persistent issue with reference to the speed and quality of access to icts. nevertheless, the gap across countries and regions has been contracting [16] due to the increasing number of people having access to various communication devices and services. gurstein [14] elaborated on the differences between “access to icts” and “effectively use of icts”. his argument is built upon the dd discussions on which recent summits, international meetings focus. however, according to him these meetings omitted aforementioned difference. noticing that many applaud “having access” issues in terms of icts and pay attention to “haves” and “have nots” with respect to dd, he energizes the dd discussions by commenting on “effective use of icts”. he defines it as follows: “the capacity and opportunity to successfully integrate icts into the accomplishment of self or collaboratively identified goals”. hence, the interaction of the society via ict channels is levered by gurstein’s point. therefore, the implementation of effectively using icts would create “benefit” that could be derived from the precondition of “having access” to icts. enlarging the discussions to a broader understanding of the ict usage in the essences of commerce, citizen participation, education, and government can form new ways of social development and growth. the measures of icts include the number of mobile subscribers, main telephone lines, internet users, and fixed broadband subscribers. the mobile phone usage has been analyzed as the most evenly distributed ict service device across countries of various income levels, whereas the fixed broadband subscribers were found to be the most unevenly distributed [17]. the international telecommunication union offers a snapshot for dd among countries. each country is given a weighted score based on various ict indicators highlighted above. table 1 [16] captures the ten largest countries in latin america based on their digital opportunity scores. the higher the score a country is assigned, the easier it is for citizens to gain access to ict components. for latin american countries, the highest score (0.57) is given to chile, followed by argentina, whereas guatemala has the lowest score (0.37) preceded by ecuador and peru (both 0.40). considering the range of these scores (from highest 0.80-korea to lowest 0.03-niger), these selected countries fall fairly in the middle section. the average of these ten countries (0.45) is slightly higher than the world average score (0.40) and higher than the world median score (0.41-shared by tunisia, georgia, panama, ukraine, egypt, tonga). source: international telecommunication union (2007) perhaps asynchronously staged are the two concepts: digital diffusion and dd, where the former expresses the degree of concentration and spread of ict while the later offers insight into the gap evident in countries regarding the circulation of icts. the two concepts refer to the same issue, which is digitalization of economies, but the terminology addresses the concept from two perspectives. therefore, ict diffusion relies on the spread of information about the existence of a new technology and learning from experience, as noted by masfield [22]. also, the greater the number of adopters, the greater the probability that other users will be contaminated, leading to the further spread of information and an accelerated diffusion speed [19]. ii. literature review the literature on dd and ict measures can be classified into two streams of research: measuring dd across countries and explaining ict measures [7]. on one hand, some studies focus on measuring and quantifying the dd, considering the evolution of the digital gap, in particular [25] [11] [3] [35]. the multi-dimensional character of the dd has led to elaborate ict indexes to summarize information about the level of digitalization, such as the information society index, the networked readiness index, the digital access index, the digital opportunity index and the dd index [33]. for instance, corrocher and ordanini [11] combine six different dimensions of digitalization in an index to obtain several patterns of digitalization in ten developed countries. bagachi [3] creates an indicator including four different technologies, such as telephone (fixed and mobile), pc and internet usage to measure and analyze the divide both globally and in different groups of table i digital opportunity scores of largest ten countries in latin america chile 0.57 argentina 0.51 brazil 0.48 mexico 0.47 venezuela 0.46 colombia 0.45 dominican republic 0.42 peru 0.40 ecuador 0.40 guatemala 0.37 average 0.47 aziz bakay, collins e. okafor, and nacasius u. ujah 27 december 2010 the international journal on advances in ict for emerging regions 03 countries. vicente and lo’pez use ten variables, [35] including pc, telephone lines, broadband connections, and secure servers to determine the dd among 15 european countries. on the other hand, other empirical studies concentrate on explaining the determinants of ict adoption and diffusion [15] [18] [6] [10]. some researchers use cross-sectional data for a particular group of developed countries [15] [35], developing countries [30] [37] or both [6]. several studies extend the analysis to consider cross-sectional time series for developing countries [31] [29] [13], while others include a combination of developing as well as developed countries. in table 2 we reference scholarly works that have some empirical findings related to our initial rationale for this paper. iii. methodology to measure the impact ict diffusion in the selected countries in the latin america region, we analyzed the ict readiness levels on socio-economic indicators that might affect ict diffusion either positively or negatively. these variables include unemployment rate, gross domestic product growth, gdp per capita, ict expenditure percentage of gdp, urbanization, age dependency ratio, gini index (income inequality), adult literacy rate, hdi (human development index). the variables tested are defined below in table 3. a. the conceptual model there are varying methods used to evaluate the factors that impact ict diffusion, although very few have paid attention table ii brief summaries of extant literature bagchi (2005) studied digital distance and its determinants in terms of various indicators: economic, social, ethno-linguistic and infrastructural factors becchetti and adriani (2005) empirical results indicates that the level and rate of growth in income per worker predicatively explained by the role of the ict diffusion crenshaw & robison (2006) examined the diffusion of internet usage in numerous developing countries within the context of globalization, integrating the cost of particular ict usage, education and the extent of liberalization of the country kiiski & pohjola (2002) affirms that gdp per capita and internet access cost have been the significant explaining factors as opposed to the previous studies that found the competition in telecommunication as a significant explanatory variable udo et al. (2008) incorporated a qualitative approach covering four developing countries in order to determine the differences of ict diffusion venkatesh and brown (2001) using a qualitative survey approach, investigated the determinants of pc adoption by us households whether it pertains to utilitarian, hedonic or social outcomes. table iii definition of variables ict readiness (network readiness indexnri) degree of preparation of a nation or economy to participate in and benefit from ict developments. ict readiness is proxied by network readiness index of world economic forum. human development index (hdi) a summary composite index that measures a country's average achievements in three basic aspects of human development: health, knowledge, and a decent standard of living. health is measured by life expectancy at birth; knowledge is measured by a combination of the adult literacy rate and the combined primary, secondary, and tertiary gross enrolment ratio; and standard of living by gdp per capita (ppp us$). gdp per capita (gdp) a country’s productivity in terms of the value of goods and services produced per person. it is computed by dividing the overall gdp by the country’s population. it can also be related to productivity and efficiency. annual gdp growth rate (agdp) annual growth rate is the annual percentage change in the total annual output of a country's economy in constant prices. gdp is the total market value of all final goods and services produced in a country in a given year, equal to total consumer, investment, and government spending. age dependency ratio (% of working-age population) age dependency ratio is the ratio of dependents--people younger than 15 or older than 64--to the working-age population-those ages 15-64. data are shown as the proportion of dependents per 100 workingage population ict expenditure percentage of gdp (ict_exp) information and communications technology expenditures include computer hardware, computer software, computer services, and communications services, and wired and wireless communications equipment. adult literacy rate (alr) derived from hdi; comprises of adult literacy rates and the combined gross enrolment ratio for primary, secondary and tertiary schooling, weighted to give adult literacy more significance in the statistic. gini index (gini) measures the degree of income inequality in a society or country. the level of income inequality addresses the issue of fair or uneven income distribution in a given country or society. urbanization (urb) a process in which people migrate from smaller village or towns to bigger cities and suburbs in search of greener pastures. this process can be connected to industrialization. as a country gets more industrialized, more jobs and other opportunities are created for its citizens. unemployment (unrate) the percentage of those in the labor force who are unemployed helps measure a country’s economic position at a particular stage of study. this economic position includes insufficient effective demand for goods and services in the economy, thereby creating unemployment as firms and organizations result to downsizing. source: world econiomic forum (wef) & world development indicator (wdi) 28 aziz bakay, collins e. okafor, and nacasius u. ujah the international journal on advances in ict for emerging regions 03 december 2010 to the diffusion of ict in developing nations, like udo et al [4]. although udo et al [4] focused on qualitative analysis to show nation-specific impact, for the purpose of this study we have chosen to use regression models to quantitatively show to what extent these variables are related to ict diffusion in the selected latin american countries. other authors have focused more on developed societies. for instance, dedrick et al [39] used nation-specific analysis to investigate the factors accounting for the divergence of ict diffusion in nine developed countries. they concluded that various factors, like economic development, education system, infrastructures, investment in ict, and government policies, resulted in the differences in ict diffusion. eindor et al [40] found that unique factors like government policies of countries were attributed to degrees of ict diffusion. fig. 1. conceptual model in our study we pay attention to the raising impact of regionalization, looking at the factors that impact ict diffusion in latin american countries. in this paper we selected the ten biggest earning countries based on their gross domestic product. all the factors selected have been considered in previous research to impact ict diffusion, however, to what extent they are prevalent in developing countries is the primary focus of our study. as such, figure 1 below is our conceptual model outlining our dependent and independent variables derived from the earlier studies in this subject. our variables, which are categorized into demographic and economic variables, impact the readiness and diffusion of ict in each of these countries. the ict readiness, which is used as our dependent variable, is proxied by network readiness index. this index is pulled from the global information technology report, a world economic forum publication. the scores given to countries in this index underline the level of technological readiness and innovation that are essential engines towards growth needed in countries to prevail over economic crisis as well as position themselves for sustainability. b. analytic model in this study, we use specific information from latin american countries extracted from various international non-governmental agencies like the united nations development program, world development indicators from world bank group, and global technology report from the world economic forum. the purpose, here, is to regress and test these variables and analyze the magnitudes of their impact on nri. the latin american countries used in this study are mexico, brazil, argentina, colombia, venezuela, chile, peru, dominican republic, guatemala, and ecuador. prior to regressing these variables on the ict readiness scores, we ran a correlation matrix to confirm that these variables were actually not excessively correlated with our dependent variable. the table 4 demonstrates the correlation matrix; there is an exceptionally high correlation coefficient (0.920) between urbanization and adult literacy rates and their individual variance inflation factor (vif) mean scores were high as well. therefore, in order to address this issue, we created an artificial variable consisting of these two variables so that we could hinder bias of multicollinearity in our linear models. we named this new variable as alr_urb. our reasoning of this particular variable creation is as follows: in most of the developing countries in latin america, the proportion of literate people in the urban areas is higher compared to the proportion of literate people in rural areas. hence the higher the urban force and literacy level, the more ict diffusion would take place. thus, we conclude that these two variables are measuring highly similar component of the countries, which creates the need for eliminating one of the variable or merging both variables. the technique as stipulated above is derived from the concept of vif. across the correlation table, we see that gdp per capita as measured with the purchasing power parity does have the highest positive correlation with nri (network readiness index), hdi (human development index), urb (urbanization) and, alr (adult literacy rate), yet their correlations are not considered prohibitively high. hence, we regress all, although we were cautious because some of the variables maybe highlighted correlated with each other. our present study uses longitudinal (2004 – 2008) data for these countries. in running the test, we assume that: not all variables would impact ict readiness in the aforementioned countries, which led us to create four models. thus we could test and observe to what extent these variables have impact on ict diffusion. the possibility of a lag effect exists in which a dependant variable is correlated with values of the independent (delayed) variable. scholars that have used this method include [21] and [9]. although their studies were not based demographic variables adult literacy rate degree of urbanization age dependency ratio economic variables gpd per capita ict expenditure unemployment rate annual gdp growth gini index human development index ict diffusion aziz bakay, collins e. okafor, and nacasius u. ujah 29 december 2010 the international journal on advances in ict for emerging regions 03 on ict topics, we utilize their approach to reduce the effect of multicollinearity as well as endogeneity. as such, we decided to test our four regression models (given below) in three stages. first, we analyzed a situation in which there is no lag on the right-hand side as well as left hand-side variables. second, we take into consideration a one year lag effect, and third, we take into consideration a two year lag effect. in all our results, we must check for the presence of variance inflation factor and test for robustness. consequently, the linear regression models used to regress the disparity and diffusion of ict among latin american countries are as follows: model 1: ict_ri = β0 + β1 gdpi + β2 ict_expi + β3 alr_urbi (1) model 2: ict_ri = β0 + β1 gdpi + β2 ict_expi + β3 alr_urb + β4 agedepi (2) model 3: ict_ri = β0 + β1 gdpi + β2 ict_expi + β3 alr_urbi + β4 agedepi + β5 unratei (3) model 4: ict_ri = β0 + β1 gdpi + β2 ict_expi + β3 alr_urbi + β4 agedepi + β5 unratei + β6 agdpi + β7 ginii (4) where; ict_ri : information and communication technologies readiness in country i. gdp : gpd per capita. ict_exp : ict expenditure percentage of gdp. alr : adult literacy rate. urb : urbanization. alr_urb : artificial variable combination of adult literacy rate and urbanization. agedep : age dependency ratio. unrate : unemployment rate. agdp : annual gdp growth rate. gini : gini index. hdi : human development index. iv. results initially three stages of regressions were performed: a no lag, a one-year lag, and a two-year lag. we found that no lag and one-year lag regressions signaled higher power in the sense of statistical and economic significance of the variables as opposed to two-year lag models. we later, eliminate the two year lag effect from our regression results. we reveal our regression model (ols) results for each model considering the presence of lag effects at a one year and a no-lag effect on the variables. below is table 5, which illustrates the descriptive statistics of variables included in our study. to analyze the data, we perform an ordinary least squares (ols) regression analysis. ols generates estimates for unknown parameters by minimizing sum of squared residuals (distances between the observed values of the independent variables and predicted values from linear approximation). the regression coefficients give us a better insight into variables’ impact on ict readiness. all regression results are robust (to reduce the effect of outliers and variations), standardized, and constantly checked for high multicollinearity (to reduce the pressure of correlation among variables). we have utilized stata software for this regression. the results took into effect missing variables; thereby reducing our data size from fifty (50) to thirty (30) observations. to control for the multicollinearity, we tested our variables with the variance inflation factor (vif). this command generates vif scores for every variable that ranges from 1.00 to infinity, as well as a mean vif score. a vif score higher than 10.0 is recognized as an indication of multicollinearity that could hamper the purpose of the analysis. in our various models, the mean vif scores are less than 2.5, an acceptable level of multicollinearity. table iv correlation matrix ict_r hdi gdp agdp agedep ict_r 1.000 hdi 0.153 1.000 gdp 0.423 0.850 1.000 agdp -0.212 0.128 -0.119 1.000 agedep -0.461 -0.271 -0.585 0.027 1.000 ict_exp 0.148 0.188 -0.056 -0.120 0.291 alr 0.108 0.649 0.629 0.348 -0.733 gini -0.023 -0.603 -0.621 -0.255 0.174 urb 0.083 0.701 0.642 0.475 -0.557 unrate -0.131 -0.158 -0.265 0.456 -0.140 it_exp alr gini urb unrate ict_exp 1.000 alr -0.013 1.000 gini 0.404 -0.388 1.000 urb -0.033 0.920 -0.598 1.000 unrate 0.298 0.366 0.356 0.363 1.000 table v descriptive statistics mean std. dev. min max ict_r 1.86 2.07 -1.08 3.91 hdi 0.82 0.026 0.79 0.86 gdp 9707.84 2770.67 5806.61 14495.3 3 agdp 6.80 3.29 1.77 18.28 agedep 57.64 2.52 53.62 63.18 ict_exp 4.42 0.84 3.43 6.14 alr 91.63 4.46 84.19 97.64 gini 51.73 4.49 43.44 58.66 urb 78.32 10.63 62.94 93.32 unrate 9.37 3.49 3.16 15.02 30 aziz bakay, collins e. okafor, and nacasius u. ujah the international journal on advances in ict for emerging regions 03 december 2010 our regression results are shown in table 6a and 6b. in both no lag and one-year lag regressions, gdp per capita has been found to be significant in 4 out of 8 models, and the variable has positive sign in 6 out of 8 models given below (table 6a and 6b). software produced positive coefficient for annual gdp growth having no statistical significance. gini index had negative coefficients in both regressions having no statistical significance. unemployment rate had positive coefficient in 3 out of 4 models below. although its direction confirms with the intuitive interpretation of the model, none of them is statistically significant. results suggest that socio-economic variables like, expenditure on ict (ict_exp) and age dependency ratio (agedep) of the selected latin american countries, do impact ict. both indicators are statistically significant and their coefficients are attributed vital impact. an increase in age dependency ratio (coefficients range from -0.5557 to 1.0354) consequently showed a negative impact on ict readiness in the selected lacs. all coefficients of age dependency ratio are highly significant (1% level). in oneyear lag regression, coefficient for this variable in model 4 is (-0.7326). therefore, an increase in ict expenditure (coefficients range from 0.4272 to 1.8184) positively impacted the ict readiness in the selected lacs. in oneyear lag regression, coefficient for this variable in model 4 is (1.3007) and it is significant under 1% level. the combination of adult literacy and urbanization (alr_urb) manifests level of significance (5 out of 8 models). in model 4 of the one-year lag regression, software generated (0.2308) as the coefficient of alr_urb signifying a negative association compared to slightly larger coefficient of the same variable (-0.4479) in model 4 of the no lag regression. considering the overall explanatory power of the models, r-squared scores are 0.6366 and .6084 in no lag and oneyear lag regressions respectively. as the additional variables are included in the models r-squared scores improve. to further strengthen this paper, we decided to run a stepwise regression. although this method is mostly used on the onset of knowing which variable would impact a dependant indicator, we decided to use it at the confirmatory stage. thus, it will serve to reaffirm that the indicators tested above were actually impacting the left-hand side variable. the two stages were also applied; hence, we generated a table for easy comparison of all regression results as shown in table 6. table 6a and 6b show results considering the no-lag effect, one year lag, two year lag. below are results of the stepwise regression (table 7). this method included the hdi index in addition to the aforementioned variables that are attributed statistical significance in the no lag model. hence our regression model changed from the initial to the table below. the stepwise regression created similar outputs compared to the previous models: larger coefficients for ict_exp, negative but larger coefficients of agedep, positive coefficient of annual gdp growth. gdp per capita, combination of adult literacy rate and urbanization, and gini index have never found to be as much contributive in our models, hence they were not assigned any coefficient. human development index has a positive and significant (1% level) coefficient. the rsquared of the stepwise regressions are higher than 0.5 in both models. therefore, all of the coefficients are significant at least 10% level which is the principle of stepwise regression that we utilized. table via no lag registration results (dependant variable is ict readiness index) model 1 model 2 model 3 model 4 gdp 0.0005*** 0.0003* 0.0003 0.0003 ict_exp 0.4272 0.8649** 0.8461* 1.8184** alr_urb -0.0887 -0.1685*** -0.1734* -0.4479*** agedep -0.5609*** -0.5557*** -1.0354*** unrate 0.0129 0.1488 agdp 0.2452 gini -0.3162 n 30 30 30 30 constant 2.8727 41.5392*** 41.4705*** 101.5646*** r-squared 0.2668 0.4959 0.4960 0.6366 significance levels: 1% (***), 5% (**), 10% (*). table vib one year lag regression results (dependant variable is ict readiness index) model 1 model 2 model 3 model 4 gdp 0.0004*** 0.0003* 0.0000 0.0000 ict_exp 0.6210* 0.8661** 1.0642** 1.3007*** alr_urb -0.0690 -0.1494** -0.0642 -0.2308* agedep -0.5505*** -0.5795*** -0.7326*** unrate -0.1766 0.0100 agdp 0.0627 gini -0.2083 n 30 30 30 30 constant 0.8652 40.2880*** 38.2972*** 68.1663*** r-squared 0.3032 0.5221 0.5456 0.6084 significance levels: 1% (***), 5% (**), 10% (*). table vii stepwise regression results (no lag, one year lag and two years lag models) no lag one yr lag gdp ict_exp 1.4604*** 1.2623*** alr -1.0417*** -0.7442*** urb 0.1391** 0.1898*** alr_urb agedep -1.4458*** -1.010*** unrate -0.1400** agdp 0.1813* gini hdi 38.9992*** constant 129.9856*** 110.0959*** r-squared 0.7303 0.7123 significance levels: 1% (***), 5% (**), 10% (*). aziz bakay, collins e. okafor, and nacasius u. ujah 31 december 2010 the international journal on advances in ict for emerging regions 03 excluding the stepwise regression model, we see that the presence of lag-effect does not really exist when regressing for ict diffusion in latin american countries. this result could be due to the following reason; literacy rate, urbanization, and age dependency ratio are not influenced by diffusion, ict could be an enhancer to these variables. however, we can state finitely that ict expenditure (has a positive correlation coefficient) and the age dependency ratio (has a negative correlation coefficient) does have strong correlation in explaining the level of ict diffusion in the selected latin american countries. v. implications using ict readiness index scores as a proxy for ict diffusion, we align policy implication from the statistically significant variables in table 6b model 4. although the output of table 6a results are more linear compared to table 6b when considering the r-square, we feel that socioeconomic indicators captured in this paper as proposed by scholars like bagchi [3] and becchetti and adriani [5] their effect would proceed to ict diffusion. for instance, it expenditure in the selected latin american countries would aide towards the diffusion of ict among its citizens, as such the economic substance of the statistical significant variables are enumerated from table 6b. from both tables 6a and 6b, we find that it expenditure and age dependency ratio are statistically significant. following economic substance deduction as suggested by [23], we make our posit from table 6b that an increase in one standard deviation of ict expenditure and age dependency ratio in these selected latin american countries would impact ict diffusion in these countries by 0.5874% and -0.9926% respectively. these numbers in table 8 were generated by applying the following computation adopted from [23]. [(coefficient* std)/mean] = economic significance (5) therefore, in précis we suggest that these countries should increase their investment in ict, and economically create jobs that would reduce the gap of the dependency ratio to their working population. in table 5, we find that the average unemployment rate of the ten selected latin american countries is 9.37. we can also conclude that an increase in ict diffusion would positively impact the countries by creating jobs either via ict investment or other industrial channels; this would in turn lead to a substantial improvement in age dependency ratio. while this paper does give an overall perspective to the economic situations within these economies, we cannot categorically infer which sector or industry each country should further explore. also, the amplitude for investing in ict cannot specifically address what types of ict procurement and service should these countries undertake. as such, further research in this area would need to be undertaken. vi. limitations and future research a sample size of ten countries in latin america was used in this research. these countries were chosen in terms of their gdp and populations size. the selected countries might not have been a good representation of latin america as a whole, which might be viewed as a limitation or constraint. the data collection process was hampered by the unavailability of data for certain periods. this might have slightly thrown off the absolute validity of the data in terms of the variable computation. some countries had missing data for some of the years being studied. even though, based on our analysis, the missing data do not pose a significant treat to the overall result, awareness of this constriction is vital for future research. the time periods observed for the purpose of this research are 2002-2008. for future research, an extended period would probably yield a more grounded result. for example, unforeseen events like natural disasters, terrorism and other economical and political events might significantly swerve the behavioral dynamics of a country’s stability. if so, such events should be evaluated and accounted for when compiling and deciding which countries would better represent an entire region. also, perhaps it may be best to perform a country analysis on the variables as one considers all latin american countries. focusing on each country at a time would clearly indicate which variable(s) do affect the diffusion of ict in these countries. for the purpose of this study, we contained our research within the boundaries of latin america; a comparison of different geographic regions would also be an interesting idea for future research. this would expand the criteria for ict infusion determinant, as differences in culture and governmental policies would greatly differ when addressing vastly divergent societies. another idea would be to use countries that share similar economic traits like developing, emerging and developed countries. the combination of countries from these economic classifications might result in an association disparity. vii. discussion and conclusion this paper examines the threads of association between the levels of ict in countries with analytical insights into the economic, social and demographic frameworks. researchers and scholars have concluded that ict level is a function of economic development and infrastructure of a country, as well as a driver for competitiveness. the table viii economic significance coefficient (ict_exp) std (ict_exp) mean (ict_r) economic significance (%) 1.3007 0.84 1.86 0.587412903 -0.7326 2.52 1.86 -0.992554839 32 aziz bakay, collins e. okafor, and nacasius u. ujah the international journal on advances in ict for emerging regions 03 december 2010 macroeconomic variables that we have operationalized in this study have helped in explaining the ict phenomenon in latin american countries. in most of the regression analysis, gdp per capita and it expenditure divided by gdp are found to be significant in explaining the variance in the ict scores of countries. these results are supporting the extant literatures like [3] [2] [7] implying the importance of gpd per capita. although our statistical significant findings are in line with the above mentioned scholars, we go further to discuss the economic contribution of significant variables. in this paper, we find that for every $3,333 increase in the income per person in a country, it would lead to approximately a one point increase in the network readiness score of the country. also, all things being equal, the expenditure on information technology as it relates to gdp or any other variable should be increased. an increase in expenditure would impact ict diffusion in latin american countries by an average of forty-eight percent, allowing infrastructural improvements and enhancing both individual and business usage of ict goods and services across the country. the degree of age dependency ratio (a social factor) is another significant variable posit a negative association with ict level. the rationale behind this is based on the following argument: the younger the labor force, the more association and assimilation with connection and communication through ict, thus, the increase of network, which could translate to more social interaction. therefore, the individual and business usage of mobile phone, pc, internet, etc. increases as the level of age dependency ratio decreases. as mentioned in the methodology section of this paper, the construct “a” was generated to control for the multicollinearity between urbanization and literacy variables. therefore, in most of the regression analyses `a` construct was found to capture the significant portion of the dependent variable variance. we have concluded that the country having more urbanization and literacy creates more access to and opportunities of ict for its citizens. acknowledgment the authors would like to thank four anonymous reviewers for their critique and suggestions. references [1] asfaw, m., & korrapati, r. (2006). information and communication technologies with ethopia: sociopersonal factors affecting adaptation and use. allied academies international conference. academy of information and management sciences. proceedings, 10(2), 21-27. [2] bagchi, k., hart, p. & peterson, m. f. (2004). national culture and information technology product adoption. journal of global information technology management, 7(4), 29-46. [3] bagchi, k. (2005). factors contributing to global digital divide: some empirical results. journal of global information technology management, 8(3), 47-65. [4] bagchi, k. & udo, g. (2007). empirically testing factors that drive ict adoption in africa and oecd set of nations. issues in information systems, 8(2), 45-52. [5] becchetti, l. & adriani, f. (2005). does the digital divide matter? the role of information and communication technology in crosscountry level and growth estimates. economics of innovation & new technology, 14(6), 435-453. [6] beilock, r., & dimitrova, d. v. (2003). an exploratory model of inter-country internet diffusion. telecommunications policy, 27(3–4), 237–252. [7] billon, m; marco, r; & lera-lopez, f. (2009). disparities in ict adoption: a multidimensional approach to study the cross-country digital divide. telecommunication policy, 33(10), 596-610. [8] the digital divide (n.d.) the digital divide. retrieved march 31, 2010 from bridge the digital divide website: http://www.bridgethedigitaldivide.com/digital_divide.htm [9] caetano, jose; and caleiro, antonio. (2009). is there a relationship between transparency in economic and political systems and foreign direct investment flows? icfai journal of applied economics, 8(2), 45 – 58. [10] chinn, m. d., & fairlie, r. w. (2007). the determinants of the global digital divide: a cross-country analysis of computer and internet penetration. oxford economic papers, 59(1), 16–44. [11] corrocher, n., & ordanini, a. (2002). measuring the digital divide: a framework for the analysis of cross-country differences. journal of information technology, 17(1), 9–19. [12] crenshaw, e. m. & robison, k. k. (2006). globalization and the digital divide: the roles of structural conduciveness and global connection in internet diffusion. social science quarterly, 87(1), 190-207. [13] dasgupta, s., lall, s., & wheeler, d. (2005). policy reform, economic growth and the digital divide. oxford development studies, 33(2), 229–243. [14] gurstein, m. (2003). effective use: a community informatics strategy beyond the digital divide. first monday, 8(12). retrieved from: http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/vie w/1107/1027 [15] hargittai, e. (1999). weaving the western web: explaining differences in internet connectivity among oecd countries. telecommunications policy, 23(10–11), 701–718. [16] itu (international telecommunication union) & unctad. world information society report 2007: beyond wsis. geneva: itu, 2007. [17] itu. report on world summit on information society stocktaking. geneva: itu, 2008. [18] kiiski, s. & pohjola, m. (2002). cross-country diffusion of the internet. information economics and policy, 14(2), 297-310. [19] karshenas, m., & stoneman, p. (1995). technological diffusion. in p. stoneman (ed.), handbook of the economics of innovation and technological change (pp. 265–297). oxford: blackwell. [20] kiiski, s., & pohjola, m. (2002). cross-country diffusion of the internet. information economics and policy, 14(2), 297–310. [21] lo, a. w., and mackinlay, a. c. (1990). when are contrarian profits due to stock market overreaction? review of financial studies, 3(2), 175 – 205. [22] mansfield, e. (1968). industrial research and technological innovation. new york: norton. [23] miller, j. e. & rodgers, y. van der m. (2008). economic importance and statistical significance: guidelines for communicating empirical research. feminist economics, 14(2), 117 – 149. [24] mutula, m.s., and brakel, p.v. (2007). ict skills readiness for the emerging global digital economy among the smes in developing economies: case study of botswana. library hi tech, 25(2): 231245. [25] oecd. working party on indicators for the information society. dsti/iccp/iis4. paris: oecd, 2005. [26] oecd. the economic impact of ict: measurement, evidence and implications. paris: oecd, 2004. [27] oecd. ict and economic growth: evidence from oecd countries, industries and firms. paris: oecd, 2003. [28] oecd. the economic and social impact of electronic commerce: preliminary findings and research agenda. paris: oecd, 1999. http://www.bridgethedigitaldivide.com/digital_divide.htm http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1107/1027 http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1107/1027 aziz bakay, collins e. okafor, and nacasius u. ujah 33 december 2010 the international journal on advances in ict for emerging regions 03 [29] oyelaran-oyeyinka, b., & lal, k. (2005). internet diffusion in subsaharan africa: a cross-country analysis. telecommunications policy, 29(7), 507–527. [30] quibria, m. g., shamsun, a. n., tschanh, t., & reyes-macasaquit, m. (2003). digital divide: determinants and policies with special reference to asia. journal of asian economics, 13(6), 811– 825. [31] tanner, e. (2003). bridging latin america’s digital divide: government policies and internet access. journalism mass communication quarterly, 80(3), 646–665. [32] udo, g.; bagchi, k. k. & kirs, p. j. (2008). diffusion of ict in developing countries: a qualitative differential analysis. journal of global information technology management, 11(1), 6-27. [33] vehovar, v., sicherl, p., hu¨sing, t., & dolnicar, v. (2006). methodological challenges of digital divide measurement. the information society, 22(5), 279–290. [34] venkatesh, v. & brown, s. a. (2001). a longitudinal investigation of personal computers in homes: adoption determinants and emerging challenges. mis quarterly, 25(1), 71-102. [35] vicente, m. r., & lo´pez, a. j. (2006). a multivariate framework for the analysis of the digital divide: evidence for the european union15. information and management, 43(6), 756–766. [36] wef (world economic forum). the global information technology report 2008–2009. geneva: wef, 2009. [37] wong, p. k. (2002). ict production and diffusion in asia: digital dividends or digital divide? information economics and policy, 14(2), 167–187. [38] world bank. information technologies and development, geneva: world bank, 2007. [39] dedrick, j.l.; goodman, s.e. & kraemer, k.l. (1995). little engines that could: computing in small energetic countries. communications of acm, 38(5), 21-26. [40] ein-dor, p.; myers, m.d. & raman, k.s. (1997). information technology in three small developed countries. journal of management information systems, 13(4), 61-89. ieee paper template in a4 (v1) international journal on advances in ict for emerging regions 2022 15 (3): december 2022 international journal on advances in ict for emerging regions neural machine translation for sinhala-english code-mixed text archchana kugathasan#1, sagara sumathipala abstract— multilingual societies use a mix of two or more languages when communicating. it has become a famous way of communication in social media in south asian communities. sinhala-english code-mixed texts (scmt) are known as the most popular text representation used in sri lanka in the informal context such as social media chats, comments, small talks etc. the challenges in utilizing the scmt sentences are addressed in this paper. the main focus of this study is translating code-mixed sentences written in sinhala-english to the standard sinhala language. since sinhala is a low-resource language, we were able to collect only a limited number of scmtsinhala parallel sentences. creating the parallel corpus of scmt-sinhala was a time-consuming and costly task. the proposed architecture of neural machine translation(nmt) to translate scmt text to sinhala, is built with a combination of normalization pipeline, long short term memory(lstm) units, sequence to sequence(seq2seq) and teachers forcing mechanism. the proposed model is evaluated against the current state-of-the-art models using the same experimental setup, which proves the teacher forcing algorithm combined with seq2seq and normalization improves the quality of the translation. the predicted outputs from the model are compared using the bleu (bilingual evaluation understudy) metric and our proposed model achieved a better bleu score of 33.89 in the evaluation. keywords— neural machine translation, lstm, seq2seq, sinhala-english code-mixed i. introduction code-mixing has been a practice in multilingual communities. in a given sentence, if the elements of one language such as terms, morphemes and words are mixed with the elements of another language, it is called as code-mixing. lexicon and syntactic formulation from two different languages are combined to generate a code-mixed sentence [1]. the communities which use more than one language for communication are called multilingual communities. most srilankans are multilingual people who speak sinhala-english, tamil-english, malay-english, etc. several research studies have proven that multilingual communities use online social media as the chosen platform to express their opinions and feelings [2]. posts, comments, reviews etc., are considered usergenerated texts in social media. information extraction from user-generated text has great demand when it comes to business. analysing the sentiment, extracting the entities, correspondence: archchana kugathasan (e-mail: archchanakugathasan@gmail.com) received: 20.12-2021 revised:07-11-2022 accepted: 14-11-2022 archchana kugathasan, from sri lanka institute of information technology and sagara sumathipala from university of moratuwa, sri lanka (archchanakugathasan@gmail.com, sagaras@uom.lk ). doi: http://doi.org/10.4038/icter.v15i3.7250 © 2022 international journal on advances in ict for emerging regions identifying the user interest and providing personalized content for users has become a trending protocol followed when it comes to business marketing strategies using social media [3, 4]. code-mixing has been identified as a barrier on utilizing user-generated texts for processing due to the mixing of languages. the need of the translation of code-mixed texts to a standard language has been a requirement for a long time. due to the increasing amount of usage of scmt in social media, there is a huge demand nowadays to translate scmt into the sinhala language. the focal point of this research study is to translate sinhala-english code-mixed (scm) sentence into a sinhala sentence. currently, available translation systems are not very successful in translating codemixed texts to a standard language [5]. code-mixed sentence of sinhala-english has the syntax of the sinhala language but borrow a few vocabularies from english. figure 1 shows an example of sinhala code-mixed text, where the word ‘price’ is an english word, ‘eka’ and ‘wadi’ are transliterated sinhala words. transliteration is the process where a word from one language is represented using the alphabet of another language. the words ‘eka’ and ‘wadi’ are words from the sinhala language written with the english alphabet. translating scmt into sinhala is a formidable task. the major challenge is the implementation of a machine translation system needs a parallel corpus [6]. this sort of dataset is typically available for standard languages, and for scmt, there is no available data resource. due to this issue an scmt sinhala parallel corpus is built in this study. also, this paper discusses a detailed analysis of scmt and proposes an approach to using and adopting the prevailing models with the goal to translate scmt to the sinhala language. the basic architecture of the proposed model is a neural network model which includes the combination of normalization, seq2seq, lstm and teacher forcing mechanism [7]. capability to learn temporal dependencies is very successful in lstm [8]. fig.1 example of sinhala-english code-mixed text with language tags this is an open-access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited mailto:archchanakugathasan@gmail.com doi:%20http://doi.org/10.4038/icter.v15i3.7250 doi:%20http://doi.org/10.4038/icter.v15i3.7250 https://creativecommons.org/licenses/by/4.0/legalcode https://creativecommons.org/licenses/by/4.0/legalcode 61 archchana kugathasan#1, sagara sumathipala international journal on advances in ict for emerging regions december 2022 the seq2seq model is chosen because it can map the sequence of different lengths of source and target sentences to each other [9]. the teacher forcing mechanism is applied in the decoding phase of the seq2seq model to fasten the training and reduce the prediction errors. finally, the inference model will predict the sinhala translation for the given scmt. bleu evaluation metric is used to evaluate the model. the rest of the paper is divided into the following sections: initiated with a study on the groundwork of the research area table i survey result usage of sinhala-english code-mixed text questions in survey answer options response percentage communication method often used when communicating through text in social media platforms or other online platforms? using sinhala-english code-mixed text in social media 85.2% using native language in social media 8.5% other 6.3% what is the main reason to use sinhalaenglish code-mixed text? using sinhala-english code-mixed text because of easiness/flexibility with the keyboard 78.0% interested in using sinhala-english code-mixed text 12.2% other 10.0% in what kind of platforms you use sinhala-english code-mixed text? social networking sites(facebook, twitter, instagram etc.) 59.80% chat applications(whatsapp, viber, emo etc) 93.90% community blogs 8.50% discussion forums 7.30% other 1.20% of normalization and machine translation in section ii. the next section discusses code-mixing in sri lanka. it provides details about the challenges in scm sentences and usage of code-mixed text in sri lanka. section iv discusses the parallel corpus preparation and its features. section v &vi includes detail such as the system architecture, model, experimental setting and the obtained result. section vii describes the evaluation study and discusses the results, and section viii concludes with the conclusion. ii. related work a. normalization of code-mixed text the rapid growth of user-generated texts in social media allures researchers to focus on the normalization domain. normalization of the code-mixed texts could lead the models to improve their accuracy. the first corpus for normalization was introduced by wong and xia et al. (2008) [10]. source channel model, which finds the most suitable translation based on probability and phonetic mapping, is used to normalize the corpus text. furthermore, this model was improved by xue et al.(2011) as a multi-channel model that considers the phonetic factor, orthographic factor, acronym expansion, and contextual factor [11]. two approaches were proposed by mandal et al. (2018) [12] to convert the phonetically transliterated text to standard roman transliteration. sequence to sequence (seq2seq) model with rnn (recurrent neural network) and long short term memory (lstm) is used in the first approach for the conversion. the second approach is based on string matching using levenshtein distance [13]. the first approach provided better accuracy than the second approach for the code-mixed text normalization task. singh et al.(2018) [14] proposed a skip-gram [15] edit distance [16] method to normalize the anomalies of code-mixed text such as spelling variations and grammatical errors. skip-gram has a similarity metric created from considering the context of a word in a given semantic space. considering the similarity metric, the most frequently used word is used as the substitution for the variation of the same word, which normalises the data and reduces the noise. barik et al. (2019) [17] introduce a normalization approach with language identification with crf (conditional random field) and lexical normalization by replacing the oov (out of vocabulary) tokens with its standard tokens from the dictionary. lourentzou et al. (2019) [18] and dirkson et al. (2019) [19] proposed character-based and word-based normalization approaches for out of vocabulary (oov) words. arora and kansal (2019) [20] used a convolutional neural network (cnn) model with character embedding to normalize the unstructured and noisy texts from social media. a similar approach was followed by kayest and jain (2019) [21] and liu et al. (2021) [22]. b. machine translation the importance of machine translation (mt) is increased because of the high demand for translation in overseas businesses, military services, profitable customers with the prevalence of different languages and valuable social media content for business development. neural machine translation(nmt) is the currently trending domain in machine translation. recurrent neural network [23], seq2seq approach [8], attention based nmt [24] are considered trending approaches for nmt neural machine translation for sinhala-english code-mixed text 62 december 2022 international journal on advances in ict for emerging regions table ii challenges identified in sinhala-english code-mixed sentences sinhala-english code-mixed sentence(scmt) sinhala sentence english sentence identified issues in scmt kama vry gd කෑම ග ොඩොක් ග ොඳයි food is very good spelling error the words ‘vry’ and ‘gd’ represents the english words ‘very’ and ‘good’. mama wathura bonawa මම වතුර ග ොනවො i drink water inconsistent phonetic transliteration the same sentence is written in different patterns. the word ‘water’ is represented as ‘vathura’,‘wathura’ and the word ‘drinking’ is represented as ‘bonawa’,‘bonawaa’. mama vathura bonawa mama vathura bonawaa 4to gaththa ඡායා රූප ගත්තා took photo the use of special characters and numeric characters the word ‘4to’, it absorbs the phonetic sound of word ‘four’ and combines it with the word ‘to’, together it represents the phonetic sound of photo. service eka hondai ස ේවාව ස ාඳයි service is good borrowing of words the sentence starts with an english ‘service’ and suddenly switches to sinhala transliterated words ‘eka’ and ‘hondai’. teacherla hamoma enna ගුරුවරුන් ැසමෝම එන්න all the teachers are requested to come integration of suffixes the word ‘teachers’ is an english word which is a singular noun and the suffix ‘la’ is in the transliterated form taken from sinhala. together the word stands for the meaning ‘teachers’ which is plural. niyama kama so ayeth kanna hithenava නියම කෑම ඒ නි ා ආසයත් කන්න හිතනවා great food, so like to eat again switching for discourse marker in this sentence, an english discourse marker 'so' is used to join the two sinhala transliterated sentences. many studies have been carried on translation based on monolingual datasets. gulcehre et al.(2015) [25] present two methods, shallow and deep fusion to combine language models with neural machine translation(nmt) techniques. sennrich et al. (2016) [26] proposed two techniques to use monolingual data for translation. to fix the encoder and attention model parameters when training, the monolingual dataset is matched with dummy inputs in the first approach. the second approach suggested is using a model trained on a parallel corpus with neural translation techniques for monolingual translation. cheng(2019) [27] proposed a semisupervised approach for monolingual machine translation by combining labelled and unlabelled corpus. labelled corpus is parallel language corpus and unlabelled corpus is monolingual corpus. there are multilingual nmt models available where a single model supports translating from multiple source languages to multiple target languages. these systems inspire knowledge translation among language pairs[28, 29], zeroshot translation(direct translation among a language pair that has never been used in the training phase) [30, 31, 32, 33] and enhance translation of low resource language pairs[34, 35]. rather than these benefits, multilingual nmt systems show poor performance [32,34] and bad translations when accommodating many languages [36]. zhanget al. (2020) [37] propose an improved nmt model where a normalization layer and linear transformation layers are used to overcome the representation issue of other multilingual nmt models. also, the research study [37] addresses how the output from multilingual nmt models are affected by the unavailability of the parallel corpus. a random online back translation approach(robt) is proposed to overcome the issue of unseen december 2022 international journal on advances in ict for emerging regions table iii sample sentences from the annotated corpus; an1 – annotator 1, an2 – annotator 2 sinhala-english code-mixed sentence sinhala sentence translated by human translator an1 an2 alternate translation by annotator1 alternate translation by annotator2 finalized translations by the translator gaana wadi ගාන වැඩියි fc fc n/a n/a n/a price ekata shape wenna hoda rasata kama hambenawa මිලට රියන්න ස ාඳ සට ේි කෑම ම්සෙනවා fc cr n/a මිලට රියන්න ස ාඳ ර වත් කෑම ම්සෙනවා මිලට රියන්න ස ාඳ ර වත් කෑම ම්සෙනවා calm place ekak, enjoy kranna puluwn කාම් තැනක් , එන්සජෝයි කරන්න පුළුවන් cr cr න්ුන් තැනක් , විසනෝද කරන්න පුළුවන් න්ුන් තැනක් , එන්සජෝයි කරන්න පුළුවන් න්ුන් තැනක් , විසනෝද කරන්න පුළුවන් singappooru kola kiyalai api kiyanne me gedi hedena gahata සිංගපූරු සකෝලා කියලයි අපි කියන්සන් සම් සගඩි ැසදන ග ට fc fc n/a n/a n/a parking loku aulak na. පාකින් සලාකු අප ු නැත cr fc වා න නැවැත්ීසම් සලාකු අප ු නැත n/a වා න නැවැත්ීසම් සලාකු අප ු නැත when it comes to code-mixed languages, the translation domain consists of only very few research. carrera et al. (2009) [38] introduce a qualitative study on the combined codeswitched corpus from social media. according to the study, hybrid models combined with statistical modelling [39] and the knowledge translation approach [40] achieved comparatively good translation. in the code-mixed machine translation model introduced by rijhwani et al.(2016) [6], the dominant language in a sentence is called matrix language. the non-dominant language is called an embedded language. the initial task in this model is word-level language identification and matrix language detection. then the data is applied to a current translator to translate code-mixed tweets to the language of the user’s choice. an augmentation pipeline for code-mixed text machine translation is proposed by dhar et al. (2018) [5]. they introduce a parallel corpus with code mixed hindi-english sentences as source sentences and english sentences as target sentences. the pipeline includes language identification, matrix language identification, translation to matrix language, and translation to the target language. the final output from the model would be translated monolingual sentence. the augmentation pipeline is applied with current translation models such as google’s neural machine translation system (nmts) [41], moses [42] and bing translator. each of these models provided an improved bleu score when the augmentation pipeline is added in the pre-processing phase. masoud et al. (2019) [43] introduced a back translation model for tamil-english code-switched text. baseline, monolingual and hybrid approaches are used to evaluate the system. the back-translated approach gave the highest bleu score of 25.28 for the code-switched sentences. iii. code-mixing in sri lanka kachru (1986) [44] explains the necessity of english in south asia in his research study. many former angloamerican colonies have been identified with english language varieties, which is called a deviation from standard english to the later development world. according to his observation in south asia, the english language is considered as a sign of ‘modernization’, ‘achievement’ and ‘strength’. he defines code mixing as a highlight of modernization, social and economic status and membership in an aristocratic society. the widest code-mixing range is identified with the english language. the main reason for code-mixing in sri lanka occurred due to the colonization of the british. sri lanka acknowledges sinhala, english and tamil as the formal languages used for official activities. sri lanka mainly has two code-mixed language categories: sinhalaenglish and tamil-english, but there is no mixing between sinhala and tamil languages. people have massively adopted internet usage in the 21st century. code-mixed texts are adapted to the vocabulary and grammar of languages used by the particular bilingual or multilingual user. the structure of code-mixed text used is depended on the individuals [45]. the sinhala language has a base of brahmi script in its ornamentation of writing. according to the unicode standard, 41 consonants, 18 vowels, and 2 half vowels altogether 61 characters are there in the latest sinhala alphabet [46]. even though there are 61 letters, the language has only 40 different sounds represented by those letters [47]. sinhala-english codemixing originated from the multilingual society of sinhala english speaking people. srilankans use scm as one of the main communication languages in social media. it has become very popular among the younger generation of the 21st century. we conducted a survey study for identifying the necessity of translation of sinhala code-mixed text. according to a recent research study on social media usage, users aged 20-29 are 32.2% of the whole social media users[56]. to identify the extent of usage of the code-mixed text in sri lankan social media, we decided this specific age group would be more appropriate to collect reliable data as they are the most active age group of social media. 82 individuals participated in this survey study who are native sinhala language speakers and aged between 20-29. according to the survey result shown in table i, 85.2% of people have stated as using code-mixed text for writing in social media rather than the native language. increased usage of scmt increases the demand for processing the scmt. the best way to use the code-mixed text is to translate the text into a standard language so the data could be easily used for machine learning tasks such as recommendations, sentiment analysis, entity extractions etc. neural machine translation for sinhala-english code-mixed text 64 december 2022 international journal on advances in ict for emerging regions in scmt, there are several challenges in representing the text: spelling errors, integration of suffixes, the usage of special and numeric characters in the text, borrowing words from another language, combining languages, switching of discourse markers and inconsistent phonetic transliteration. table ii provides a detailed description of challenges in sinhala-english code-mixed text with examples. due to different patterns of scmt, it is difficult to translate scmt without a parallel corpus. iv. corpus creation most machine translation systems need a remarkable number of parallel sentences to accomplish a good outcome. our study required creating a parallel corpus with parallel sentences of scmt and sinhala text. to achieve this goal, scm (sinhala-english code-mixed) sentences were gathered from social media. 5000 scm sentences are used to create the parallel corpus. after the extraction process, each scm sentence in the corpus is human translated into sinhala sentences with the help of a human translator, who is a sinhala native speaker. the translator followed the mapping proposed in the research study of kugathasan and sumathipala et al. (2020) [48] for the manual human translation process. thus, the scm sentence is the source sentence, and the sinhala sentence is the target sentence. the translated dataset is validated using the crowd sourcing method [49]. using the crowd sourcing technique in our research aims to discriminate good translations from bad ones. we split our corpus into groups of 15 where each annotator gets approximately 300 sentences and each group had a number of 2 annotators who are sinhala native speakers, bilingual and good in english. the reviewers were instructed to make sure that their sinhala translation: does not have any spelling errors, and should be grammatically correct and natural-sounding sinhala. the annotators judge the translated sinhala sentences into two categories. fully correct(fc) and change required(cr). if the sentence is labelled with cr, then an alternative sinhala translation would also be provided by the same annotator. the alternative sentence provided for each scm sentence was a more fluent and grammatically correct sinhala sentence. when there are contradictory tags by annotators for a specific translation, only the alternative translation with the cr tag is considered. when both annotators have annotated with cr tag, the best alternate provided is selected by the human translator who worked in the initial phase of creating the corpus. some annotated sample sentences from the corpus are shown in table iii. after correcting the alternatives, the corpus is updated with the corrections. we randomly choose 100 translated sentences, provided them to the linguistic experts of sinhala language, and asked them to rank the translation good or bad, considering the following factors: spelling errors, the grammatical pattern of the sentence, and meaningful translation. in the ranking process, we gained judgments from three different linguists. each translation has 3 rating labels from two categories. we used fleiss’ kappa method [50, 51] to measure the reliability of the agreement between the raters while assigning a rating for the translated sentences. the fleiss’ kappa score received for the translation of secm to sinhala is 0.88, which is almost near to full agreement for the translated sentences are correct. v. system architecture the mt model proposed in this study is an adopted and enhanced approach to the research work of sutskever et al. (2014) [8]. the model consists lstm, seq2seq, teachers forcing mechanism and a normalization pipeline to translate the code-mixed text. a. sequence to sequence(seq2seq) seq2seq approach introduced by sutskever et al.(2014) is a model with the goal of mapping the input sequence with a fixed length to an output sequence with fixed length even though the input and output lengths are different. for example, “did you eat?” in english has three words as input and its output sentence in sinhala “ඔයා කෑවද?” has two words. in this approach sequence of source sentences is matched with the sequence of the target sentence[20]. in this machine translation model, source sequence would be the input and target sequence would be the output. seq2seq model is also called as encode-decoder framework as shown in figure 3. source language is read and used as the input to the encoder. a context vector which can also be called the hidden state is created with the encoder by encoding the input data into a realvalued vector. word-by-word encoder reads the input sequence. meaning of the input sequence encoded into a single vector. the outputs gained from the encoder are discarded and only the hidden states have proceeded as the inputs to the decoder. the decoder takes the hidden state and the starting string ‘start’ as the input. hidden states are produced by the encoder and the input of the decoder is read word by word during decoding. in the training phase of the decoder, the seq2seq baseline model lets the predicted output from the previous timestamp as the input to the next timestamp in the decoder. but in our proposed approach we applied teacher forcing. fig. 3 seq2seq model december 2022 international journal on advances in ict for emerging regions fig. 4 system diagram of the proposed model source language is read and used as the input to the encoder. a context vector which can also be called as the hidden state is created with the encoder by encoding the input data into a real-valued vector. word-by-word encoder reads the input sequence. meaning of the input sequence encoded into a single vector. the outputs gained from the encoder are discarded and only the hidden states have proceeded as the inputs to the decoder. decoder takes the hidden state and the starting string ‘start’ as the input. hidden states are produced by the encoder and the input of the decoder is read word by word during decoding. in the training phase of the decoder, the seq2seq baseline model lets the predicted output from the previous timestamp as the input to the next timestamp in the decoder. but in our proposed approach we applied teacher forcing mechanism in the training phase of the decoder neglecting the predicted outputs from the timestamps. b. long short term memory(lstm) lstm network is chosen as the basic unit for text generation with the seq2seq model as shown in figure 3. lstm has internal technique gates that control the flow of information. gates decides the important details to keep or forget in the cell state along the long chain of sequence. gates learns what information is relevant and what to keep or throw away during the training. lstm cell has three main gates, which are the input gate, forget gate and output gate as shown in figure 5. according to the concept, when an input is given to the lstm unit, it is converted into machine-readable vectors and these sequences of vectors would be processed one by one. in the forget gate the information from the hidden state from the previous timestep(ht-1) and current input(xi) would be passed as inputs. forget gate has a sigmoid activation function which turns the values between 0 to 1. if the output value from the sigmoid is closer to 0, that information will be forgotten and if it is closer to 1, it will be stored. in the input gate previous hidden state(ht-1) from the previous timestep and current input(xi) would be passed into the sigmoid function and tanh function separately. tanh activation function turns the values in between -1 to 1 to control the network. tanh output would be multiplied with the output from the sigmoid and the sigmoid would decide which information to keep and forget. outputs gathered from forget gate and input gate would be utilized to upgrade the cell state. the next hidden state(ht) would be decided by the output gate. the preceding hidden state(ht-1) and the current input(xi) passed into the sigmoid function and the newly upgraded cell state would be transited through tanh function. sigmoid and tanh output decides the information that should be carried by the next hidden state. the upgraded new cell state (ct) and the hidden state(ht) would be transited to the next time step. likewise, each unit of lstm would run through these gates to store only the important details from the sequence. c. teacher forcing using the ground truth from a prior timestamp as input for the current timestamp for quick and efficient training of recurrent neural network is called as teacher forcing method [54]. teacher forcing method functions by utilizing the actual output from the previous timestamp t as input to the next timestep t+1. figure 6 shows how the decoder of seq2seq model would be trained with teacher forcing and without teacher forcing. in our proposed model to translate scmt to sinhala, teacher forcing method is applied in the decoding phase. fig. 5 architecture inside a lstm unit neural machine translation for sinhala-english code-mixed text 66 december 2022 international journal on advances in ict for emerging regions fig 6. example of decoder with the application of teacher forcing method and without teacher forcing method vi. model, experimental setting & result the initial phase of the model consists of the data preprocessing. then, the dataset is cleaned by converting the sentences into lowercase, removing emojis, removing quotes and removing unnecessary spaces. normalization is considered an important process when it comes to the translation of code-mixed text. compared to monolingual sentences, code-mixed sentences have more noisy data. dictionary-based approach and levenshtein edit distance [52] based approaches are used for the normalization task in our model. spelling error is one of the challenges in sinhala-english code-mixed. for example, ‘accident’ can be misspelt ‘accsident; accxident; acddent etc’. this happens mainly because most bilingual users are fluent only in their native language sinhala and not experts in the second language english. the first step of the normalization is the out-ofvocabulary english words from the texts are normalized using the birkbeck spelling error corpus dictionary [53], which contains 36,133 misspellings of 6,136 words gathered from various sources. slang words in the code-mixed text were identified as another barrier to the translation of the scm sentences. this issue is sorted using the slangnorm dictionary, which contains 5427 slang words. for example, words such as ‘2mrw’ and ‘3wheeler’ will be replaced with the correct form ‘tomorrow’ and ‘three wheeler’ using slangnorm dictionary. in scmt the same word is represented in different transliterated forms in various sentences in the corpus. levenshtein edit distance approach [52] is used to normalize the transliterations by substituting the high-frequency words with the corresponding low-frequency words based on the edit distance. a dictionary with a frequency list of the words in the corpus is maintained. after the normalization of the sentences, target sentences are added with a ‘start’ token at the beginning of the sentence and an ‘end’ token is added at the completion of the sentence. tokens assist the model to recognize when to begin the translation and end the translation in the decoder. the distinctive words are identified from the source and target corpus. a unique number is allocated to each distinctive word to create dictionaries of words to index and vice versa. these dictionaries are used in the embedding phase of the encoder and decoder. in this research, a seq2seq model is fabricated using lstm as the basic unit. the sequence of the source sentence is matched with the sequence of the target sentence where the source sequence would be the scm sentence, and the target sequence would be the sinhala sentence. the primary hidden layer of the encoder is the embedding layer. large scattered vectors are transformed into a dense dimensional space in the embedding layer. semantic relationships will be conserved by lstm units even though the transformation happens. outputs from the encoder are repudiated and only the hidden states in the context vector are passed to the decoder. the decoder also has embedding as its primary hidden layer. hidden states passed from the encoder and the outputs given by the embedding layer in the decoder will be taken as the input of lstm layer in the decoder. teachers forcing mechanism is applied in the training part of the decoder. decoder pursues to implement a word at t+1 timestamp, considering the actual output at t timestamp, not the predicted output. this lets the model learn from the actual values rather than wrongly predicted values. lstm layer in the decoder returns internal states and output sequences. internal states are stored and used in the prediction phase. the dense layer is applied with the softmax activation, and decoder outputs are generated. the data is shuffled before training to lower the variance to make sure the model overfits less and the model is more vigorous. we allocate 70% of the dataset for training and 30% for testing. encoder and decoder inputs are in the shape of a 67 archchana kugathasan#1, sagara sumathipala international journal on advances in ict for emerging regions december 2022 table iv example of some predicted sinhala translation and bleu score. ref and pre column refers to the number of words in the reference sentence and predicted sentence, the rest of the columns shows the count of the n-gram tokens used for the calculation of modified precision no input reference prediction length modified precision ref pre 1 gram 2gram 3 gram 4gram 1 ganan wadi ගාන වැඩියි ගණන් වැඩියි 2 2 1 2 0 1 0 1 0 1 2 budu saranai dewi pihitai බුදු රණයි සදවි පිහිටයි බුදු රණයි සදවි පිහිටයි 4 4 4 4 3 3 2 2 1 1 3 place eka super clean තැන ුපිරි පිරිසදුයි තැන ුපිරි පිරිසදුයි 3 3 3 3 2 2 1 1 0 1 4 kama raha unta gana hondatama wadi eh gaanata worth na කෑම ර උනාට ගාන ස ාඳටම වැඩියි ඒ ගානට වින්සන් නෑ කෑම ර උනාට ගාන ස ාඳටම වැඩියි ඒ ගානට වින්සන් නෑ 10 10 10 10 9 9 8 8 7 7 5 price eka tikak wadi customer service eka madi staff eka thawa improve wenna one මිල ිකක් වැඩියි පාරිස ෝගික ස ේවය මදියි කාර්ය මණ්ඩලය වැඩි දියුණු කළ යුතුයි මිල ිකක් වැඩියි ැෙැයි කාර්ය මණ්ඩලය වැඩි 12 7 6 7 4 6 2 5 0 4 6 meya hithan inne i phone thiyenne photo ganna witarai kiyala සමයා හිතන් ඉන්සන් අයි ස ෝන් තිසයන්සන් ස ාසටා ගන්න විතරයි තියන්සන් කියලා සමයා තියන්සන් කියලා 11 3 3 3 1 2 0 1 0 1 7 mn recommend karana thanak මන් නිර්සේශ කරන තැනක් මන් නිර්සේශ කරන තැනක් 4 4 4 4 3 3 2 2 1 1 8 main road eka laga nisa noisy ප්රධාන පාර ළඟ නි ා ේද වැඩියි පාර ළඟ නි ා ේද වැඩියි 6 5 5 5 4 4 3 3 2 2 9 kaama echchara special naha කෑම එච්චර විසශේෂ නෑ ැ කෑම එච්චර විසශේෂ නෑ ැ 4 4 4 4 3 3 2 2 1 1 10 kama denna puluwan කෑම සදන්න පුළුවන් කෑම සදන්න පුළුවන් 3 3 3 3 2 2 1 1 0 1 2d array. the encoder 2d array has batch sizes of 10, the maximum source sentence length is 27, and the shape of the encoder input will be (10,27). the decoder 2d array has batch sizes of 10, a maximum source sentence length of 26 and the shape of the encoder input is (10,26). decoder outputs are in the shape of a 3d array with a batch size of 10, the maximum target sentence length 26. numpy, pandas, tensorflow, sacrebleu are some important libraries used to build the model in the technological point of view. after the training phase of the model, to produce the translation outputs, a prediction phase is implemented. in the prediction phase, an input sequence from the corpus(scm sentence) will be provided to predict the sinhala translation. this phase contains an encoder-decoder framework without teacher forcing mechanism, where the predicted output from the previous timestamp t would be fed for the current timestamp t+1 instead of the actual output. figure 4 shows the system architecture of the proposed model. vii. evaluation & discussion we evaluated the performance of our system by comparing our model with the most commonly used translation models. we applied our dataset to the seq2seq baseline model [8] and the attention model [24] with the same experimental setting. each model was trained with the normalization pipeline and without the normalization pipeline. after training the models, we evaluated the translation outputs using bleu [55] metric. 𝐵𝐿𝐸𝑈 = 𝐵𝑃.exp⁡(∑ 𝑊𝑛𝑙𝑜𝑔𝑃𝑛 𝑁 𝑛=1 )⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(1) 𝐵𝑃⁡ = { 1⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡𝑖𝑓⁡𝑐 > 𝑟 exp (1 − 𝑟 𝑐 )⁡⁡⁡⁡⁡⁡⁡𝑖𝑓⁡𝑐 ≤ 𝑟 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(2) in the bleu score equation (1), bp is the brevity penalty, n is the number of n-grams(1-gram,2-gram,3-gram,4-gram), wn is the weight for each modified precision, pn is the modified precision [55]. pn for each n-gram up to 4-gram is calculated based on the clipped count and the total number of the particular n-gram in the predicted sentence [55]. when the ngram order is greater than the length of the reference sentence, to avoid the zero division error the total number of n-gram values is set to 1. the brevity penalty(bp) depends on the values of c, the count of unigrams in all the predicted sentences and r is the most probable matching length of sentence in the corpus. hundred sinhala code-mixed sentences are selected from the corpus. its relevant translation of sinhala sentences is predicted using our proposed model. initially, the number of clipped counts [55] and the total number of the particular n-gram in the predicted sentence are extracted to calculate the modified precision as shown in table iv. then the overall bleu score is calculated for those hundred sentences. finally, the same evaluation approach with the same experimental setting as explained in section iv, is applied with the seq2seq baseline model and attention models with and without the normalization task. a summary of the comparison among the models is shown in table v. neural machine translation for sinhala-english code-mixed text 68 december 2022 international journal on advances in ict for emerging regions table v comparison of results received from different models seq2seq baseline model with normalization and without normalization showed the lowest performance and achieved the lowest bleu score compared to the other two models. among the attention and teacher forcing models, the best bleu score is 33.89 received by teacher forcing algorithm, proving the proposed model comparatively works well with sinhala-english code-mixed text. also, the comparison study with and without the normalization task demonstrated that the models performed well and provided a better bleu score when the normalization pipeline is applied to each of the models. not only the bleu scores, but the proposed model also achieved comparatively fair values for training and testing accuracies and loss as shown in figure 8. an analysis of the predicted sentences is performed to identify whether the proposed model helped to overcome the challenges pointed out in table ii. if we take the sample sentence (1) shown in table iv, (codemixed text cmt, reference textref, translated text trans): cmt : gaana wadi (1) ref : ගාන වැඩියි trans : ගණන් වැඩියි in this sentence (1) even though the trans doesn’t match the exact ref sentence, the meaning of both sentences is the same, and the prediction is correct. in the sample sentence (3) shown in table iv, cmt : place eka super clean (3) ref : තැන ුපිරි පිරිසදුයි trans : තැන ුපිරි පිරිසදුයි in sentence (3), the cmt sentence contains english words such as ‘place’, ‘super’ and ‘clean’. in trans the words are translated to sinhala. this translation shows us that borrowing words from another language issue is sorted out with our proposed translation model. in the sample sentence (9),(10) shown in table iv, cmt : kaama echchara special naha (9) trans : කෑම එච්චර විසශේෂ නෑ ැ cmt : kama denna puluwan (10) trans : කෑම සදන්න පුළුවන් model training accuracy training loss testing accuracy testing loss precision brevity penalty (bp) bleu score 1-gram 2-gram 3-gram 4-gram w1 = 0.25 w2 = 0.25 w3 = 0.25 w4 = 0.25 w1*log(p1) w2*log(p2) w3*log(p3) w4*log(p4) seq2seq baseline model without normalization 53.83 1.4032 27.92 1.76 -0.16229 -0.323259 -0.496841 -0.628076 0.6397 12.78 seq2seq baseline model + normalization 57.11 0.7753 31.97 1.75 -0.145237 -0.204693 -0.275824 -0.389159 0.573 20.77 seq2seq + attention without normalization 70.55 0.303 30.3 1.15 -0.080998 -0.162399 -0.252416 -0.369135 0.6876 28.95 seq2seq + attention + normalization 70.22 0.5023 31.05 1.05 0.0689162 -0.141996 -0.208556 -0.292517 0.6413 31.46 seq2seq + teacher forcing without normalization 71.42 0.5095 37.17 0.38 -0.066960 -0.1232 -0.181972 -0.262455 0.595 31.54 seq2seq + teacher forcing + normalization 71.57 0.4979 37.87 0.38 -0.06046 -0.1232717 -0.189274 -0.251089 0.6326 33.89 69 archchana kugathasan#1, sagara sumathipala international journal on advances in ict for emerging regions december 2022 fig 8. experimented models accuracies, loss & relevant bleu scores the sentences (9) and (10) have the same word in two different transliterations format. but in the predicted sentence both the words ‘kaama’ and ‘kama’ are correctly identified as one sinhala word ‘කෑම’. the transliteration issue has also been solved with our model. the use of special characters and numeric character issues were sorted in the normalization phase with the slangnorm dictionary. viii. conclusion the main goal of this research is to utilize the usergenerated sinha-english code-mixed sentences and convert the sentences into a standard language, so the code-mixed texts can also be used for several research and business purposes. from analyzing the challenges in scmt text, we pointed out the key issues that have been a barrier to processing the sinhala-english code-mixed text. creating a dataset for this research study was a challenging task due to the unavailability of current resources. the dataset created in the study was created following several processes such as manual translation with a human translator, crowdsourcing to annotate the dataset to check whether the human-translated sentences are correct and rating the translation with linguistic experts to analyze the fleiss’ kappa score. the received score of 0.88 shows almost full agreement with the translation. the corpus created in this study using proper rules and regulations could promote research based on the sinhala code-mixed domain. the proposed approach, which is a combination of the seq2seq model with the lstm unit and the teachers forcing mechanism gives a comparatively higher bleu score of 33.89 for code-mixed text translation compared to the other models. moreover, the evaluation study proves that most of the challenges identified in scm sentences can be solved using our proposed model. but somehow, a few of the challenges such as integration of suffixes, and change of discourse marker remain unsolved. this research study can be considered an initiative for sinhala-english code-mixed text translation. as the future work of this study, we are planning to solve the rest of the challenges which we were not able to solve with the current proposed model. furthermore, we would like to extend the corpus to focus on other tasks of code-mixing such as sentiment analysis, language identification, entity extraction etc. references [1] e. e. davies and a. bentahila, “contact linguistics: bilingual encounters and grammatical outcomes,” 2007. [2] k. r. chandu, m. chinnakotla, a. w. black, and m. shrivastava, “webshodh: a code mixed factoid question answering system for web,” in international conference of the cross-language evaluation forum for european languages. springer, 2017, pp. 104– 111. [3] m. yang, y. ren, and g. adomavicius, “understanding usergenerated content and customer engagement on facebook business pages,” information systems research, vol. 30, no. 3, pp. 839–855, 2019. [4] e. qualman, socialnomics: how social media transforms the way we live and do business. john wiley & sons, 2012. [5] m. dhar, v. kumar, and m. shrivastava, “enabling code-mixed translation: parallel corpus creation and mt augmentation approach,” in proceedings of the first workshop on linguistic resources for natural language processing. santa fe, new mexico, usa: association for computational linguistics, aug. 2018, pp. 131–140. [online]. available: https://www.aclweb.org/anthology/w183817 [6] s. rijhwani, r. sequiera, m. c. choudhury, and k. bali, “translating codemixed tweets: a language detection based system,” in 3rd workshop on indian language data resource and evaluationwildre3, 2016, pp. 81–82. [7] p. goyal, s. pandey, and k. jain, “deep learning for natural language processing,” new york: apress, 2018. [8] i. sutskever, o. vinyals, and q. v. le, “sequence to sequence learning with neural networks,” arxiv preprint arxiv:1409.3215, 2014. neural machine translation for sinhala-english code-mixed text 70 december 2022 international journal on advances in ict for emerging regions [9] k. cho,b.van merriënboer, c.gulcehre, d. bahdanau, f. bougares, h. schwenk, and y. bengio, “learning phrase representations using rnn encoder–decoder for statistical machine translation,” in proceedings of the 2014 conference on empirical methods in natural language processing (emnlp). doha, qatar: association for computational linguistics, oct. 2014, pp. 1724–1734. [online]. available: https://www.aclweb.org/anthology/d14-1179 [10] k.-f. wong and y. xia, “normalization of chinese chat language,” language resources and evaluation, vol. 42, no. 2, pp. 219–242, 2008. [11] z. xue, d. yin, and b. d. davison, “normalizing microtext,” in workshops at the twenty-fifth aaai conference on artificial intelligence. citeseer, 2011. [12] s. mandal, s. d. das, and d. das, “language identification of bengalienglish code-mixed data using character & phonetic based lstm models,” arxiv preprint arxiv:1803.03859, 2018. [13] l. yujian and l. bo, “a normalized levenshtein distance metric,” ieee transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 1091– 1095, 2007. [14] r. singh, n. choudhary, and m. shrivastava, “automatic normalization of word variations in code-mixed social media text,” arxiv preprint arxiv:1804.00804, 2018. [15] d. guthrie, b. allison, w. liu, l. guthrie, and y. wilks, “a closer look at skip-gram modelling.” in lrec, vol. 6. citeseer, 2006, pp. 1222–1225. [16] [16] e. s. ristad and p. n. yianilos, “learning string-edit distance,” ieee transactions on pattern analysis and machine intelligence, vol. 20, no. 5, pp. 522–532, 1998. [17] a. m. barik, r. mahendra, and m. adriani, “normalization of indonesian-english code-mixed twitter data,” in proceedings of the 5th workshop on noisy user-generated text (w-nut 2019), 2019, pp. 417– 424. [18] i. lourentzou, k. manghnani, and c. zhai, “adapting sequence to sequence models for text normalization in social media,” in proceedings of the international aaai conference on web and social media, vol. 13, 2019, pp. 335–345. [19] a. dirkson, s. verberne, a. sarker, and w. kraaij, “data-driven lexical normalization for medical social media,” multimodal technologies and interaction, vol. 3, no. 3, p. 60, 2019. [20] m. arora and v. kansal, “character level embedding with deep convolutional neural network for text normalization of unstructured data for twitter sentiment analysis,” social network analysis and mining, vol. 9, no. 1, pp. 1–14, 2019. [21] m. kayest and s. k. jain, “an incremental learning approach for the text categorization using hybrid optimization,” international journal of intelligent computing and cybernetics, 2019. [22] j. liu, s. zheng, g. xu, and m. lin, “crossdomain sentiment aware word embeddings for review sentiment analysis,” international journal of machine learning and cybernetics, vol. 12, no. 2, pp. 343–354, 2021. [23] n. kalchbrenner and p. blunsom, “recurrent continuous translation models,” in proceedings of the 2013 conference on empirical methods in natural language processing. seattle, washington, usa: association for computational linguistics, oct. 2013, pp. 1700–1709. [online]. available: https://www.aclweb.org/anthology/d13-1176 [24] d. bahdanau, k. cho, and y. bengio, “neural machine translation by jointly learning to align and translate,” arxiv preprint arxiv:1409.0473, 2014. [25] c. gulcehre, o. firat, k. xu, k. cho, l. barrault, h.-c. lin, f. bougares, h. schwenk, and y. bengio, “on using monolingual corpora in neural machine translation,” arxiv preprint arxiv:1503.03535, 2015. [26] r. sennrich, b. haddow, and a. birch, “improving neural machine translation models with monolingual data,” in proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). berlin, germany: association for computational linguistics, aug. 2016, pp. 86–96. [online]. available: https://www.aclweb.org/anthology/p16-1009 [27] y. cheng, “semi-supervised learning for neural machine translation,” in joint training for neural machine translation. springer, 2019, pp. 25– 40. [28] s. m. lakew, m. cettolo, and m. federico, “a comparison of transformer and recurrent neural networks on multilingual neural machine translation,” arxiv preprint arxiv:1806.06957, 2018. [29] x. tan, j. chen, d. he, y. xia, t. qin, and t.-y. liu, “multilingual neural machine translation with language clustering,” arxiv preprint arxiv:1908.09324, 2019. [30] m. al-shedivat and a. parikh, “consistency by agreement in zero-shot neural machine translation,” in proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). minneapolis, minnesota: association for computa tional linguistics, jun. 2019, pp. 1184–1197. [online]. available: https://www.aclweb.org/anthology/n191121 [31] j. gu, y. wang, k. cho, and v. o. li, “improved zero-shot neural machine translation via ignoring spurious correlations,” in proceedings of the 57th annual meeting of the association for computational linguistics. florence, italy: association for computational linguistics, jul. 2019, pp. 1258–1268. [online]. available: https://www.aclweb.org/anthology/p19-1121 [32] m. johnson, m. schuster, q. v. le, m. krikun, y.wu, z.chen, n.thorat, f. viégas, m. wattenberg, g. corrado, m. hughes, and j. dean, “google’s multilingual neural machine translation system: enabling zero-shot translation,” transactions of the association for computational linguistics, vol. 5, pp. 339–351, 2017. [online]. available: https://www.aclweb.org/anthology/q171024 [33] o. firat, b. sankaran, y. al-onaizan, f. t. yarman vural, and k. cho, “zero-resource translation with multi-lingual neural machine translation,” in proceedings of the 2016 conference on empirical methods in natural language processing. austin, texas: association for computational linguistics, nov. 2016, pp. 268–277. [online]. available: https://www.aclweb.org/anthology/d16-1026 [34] n. arivazhagan, a. bapna, o. firat, d. lepikhin, m. johnson, m. krikun, m. x. chen, y. cao, g. foster, c. cherry et al., “massively multilingual neural machine translation in the wild: findings and challenges,” arxiv preprint arxiv:1907.05019, 2019. [35] t.-l. ha, j. niehues, and a. waibel, “toward multilingual neural machine translation with universal encoder and decoder,” arxiv preprint arxiv:1611.04798, 2016. [36] k. toutanova, a. rumshisky, l. zettlemoyer, d. hakkani-tur, i. beltagy, s. bethard, r. cotterell, t. chakraborty, and y. zhou, “proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies,” in proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies, 2021. [37] b. zhang, p. williams, i. titov, and r. sennrich, “improving massively multilingual neural machine translation and zero-shot translation,” in proceedings of the 58th annual meeting of the association for computational linguistics. online: association for computational linguistics, jul. 2020, pp. 1628–1639. [online]. available: https://www.aclweb.org/anthology/2020.aclmain.148 [38] j. carrera, o. beregovaya, and a. yanishevsky, “machine translation for cross-language social media,” promt americas inc, 2009. [39] m. c. neale, s. m. boker, g. xie, and h. m. maes, “statistical modeling,” richmond, va: department of psychiatry, virginia commonwealth university, 1999. [40] p. sudsawad, knowledge translation: introduction to models, strategies and measures. southwest educational development laboratory, national center for the …, 2007. [41] y. wu, m. schuster, z. chen, q. v. le, m. norouzi, w. macherey, m. krikun, y. cao, q. gao, k. macherey et al., “google’s neural machine translation system: bridging the gap between human and machine translation,” arxiv preprint arxiv:1609.08144, 2016. [42] p. koehn, h. hoang, a. birch, c. callisonburch, m. federico, n. bertoldi, b. cowan, w. shen, c. moran, r. zens, c. dyer, o. bojar, a. constantin, and e. herbst, “moses: open source toolkit for statistical machine translation,” in proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions. prague, czech republic: association for computational linguistics, jun. 2007, pp. 177–180. [online]. available: https://www.aclweb.org/anthology/p07-2045 [43] m. masoud, d. torregrosa, p. buitelaar, and m. arčan, “backtranslation approach for codeswitching machine translation: a case study,” in 27th aiai irish conference on artificial intelligence and cognitive science. aics2019, 2019. [44] b. b. kachru, “the power and politics of english,” world englishes, vol. 5, no. 2-3, pp. 121–140, 1986. [45] n. choudhary, r. singh, i. bindlish, and m. shrivastava, “sentiment analysis of code-mixed languages leveraging resource rich languages,” arxiv preprint arxiv:1804.00806, 2018. [46] m. punchimudiyanse and r. meegama, “unicode sinhala and phonetic english bi-directional conversion for sinhala speech recognizer.” ieee international conference on industrial and information systems 2015, 2015. [47] a. m. gunasekara, a comprehensive grammar of the sinhalese language. asian educational services, 1999. [48] a. kugathasan and s. sumathipala, “standardizing sinhala code-mixed text using dictionary based approach,” in 2020 international conference on image processing and robotics (icip). ieee, 2020, pp. 1–6. https://www.aclweb.org/anthology/q17-%201024 https://www.aclweb.org/anthology/p07-2045 71 archchana kugathasan#1, sagara sumathipala international journal on advances in ict for emerging regions december 2022 [49] e. estellés-arolas and f. gonzález-ladrón-de guevara, “towards an integrated crowdsourcing definition,” journal of information science, vol. 38, no. 2, pp. 189–200, 2012. [50] t. r. nichols, p. m. wisner, g. cripe, and l. gulabchand, “putting the kappa statistic to use,” the quality assurance journal, vol. 13, no. 3-4, pp. 57– 61, 2010. [51] j. j. randolph, “online kappa calculator,” retrieved october, vol. 20, p. 2011, 2008. [52] g. navarro, “a guided tour to approximate string matching,” acm computing surveys (csur), vol. 33, no. 1, pp. 31–88, 2001. [53] “birkbeck spelling error corpus / roger mitton,” oxford text archive. [online]. available: http://hdl.handle.net/20.500.12024/0643 [54] i. goodfellow, y. bengio, and a. courville, “deep learning (adaptive computation and machine learning series),” p. 372, 2016. [55] k. papineni, s. roukos, t. ward, and w.-j. zhu, “bleu: a method for automatic evaluation of machine translation,” in proceedings of the 40th annual meeting of the association for computational linguistics, 2002, pp. 311–318. [56] k. simon, “digital 2022: global overview report,” jan. 26, 2022. https://datareportal.com/reports/digital-2022-globaloverview-report [57] [58] k. simon, “digital 2022: global overview report,” jan. 26, 2022. https://datareportal.com/reports/digital-2022-globaloverview-report [59] 40th annual meeting of the association for computational linguistics2 international journal on advances in ict for emerging regions 2011 04 (01) : 15 25 abstract — as the demand for the computer literate is increasing at a rapid pace, possessing of computer skills is an important asset for a university student. thus having a good computer knowledge improves the quality of their study programs. this paper discusses a case where information was collected through a survey to assess the computer knowledge of entering freshmen in five faculties (science, arts, management & finance, law and medicine) of the university of colombo, sri lanka. a survey was conducted among 300 new entrants of the above faculties. a descriptive analysis was used to identify the patterns of computer usage. it is found that from the entry-level university of colombo undergraduates, majority of students have used a computer (93%) and/or internet (60%). moreover, 60% are computer aware while only 47% are computer literate. it must be noted that males in general outperformed females in computer awareness, computer literacy and internet usage. since chisquare test confirmed that the two variables, computer awareness and computer literacy are associated with each other, rather than considering these two variables separately, considering them jointly is or was??? expected to yield better results. hence, the two variables are combined into one, with 4 levels and a generalized logit model isor was??? fitted to this nominal multi-category variable to find the factors affecting computer awareness and/or computer literacy. the model suggests that the new variable is dependent on the factors: usage of internet, monthly family income level, methods of obtaining computer knowledge, and locations of using a computer. index terms — computer awareness, computer literacy, nominal categorical data, generalized logit model i. introduction echnology plays an important role in accelerating economic growth and promoting development. perhaps no other single technological innovation during the second half of the 21st century that has touched so many lives, than the computer [1]. manuscript received may 01, 2010. recommended by dr. k.p. hewagamage on may 18, 2011. thilaksha h. tharanganie is with the department of statistics, university of colombo, 35, reid avenue, colombo-7, sri lanka. (e-mail: thilakshat@yahoo.com). w. n. wickremasinghe and g. p. lakraj are also with the department of statistics, university of colombo, 35, reid avenue, colombo-7, sri lanka. (e-mail: wnw@stat.cmb.ac.lk, pemantha@stat.cmb.ac.lk). with the increased use of information and communication technologies in education, students entering university need basic computer skills. as students come from different socio-economic backgrounds, they have different learning experiences, capabilities, and needs. it is rarely the case that computer skills of university freshmen are at the same level. computer literacy is a mixture of awareness (eg: awareness of the computer‟s importance), knowledge (eg: knowledge of what computers are and how they work), and interaction (ability to interact with computers). this view is embraced by [2], [3], and [4]. the perspective of computer literacy in [5], involves conceptual knowledge related to basic terminology (including social, ethical, legal, and global issues) and skills necessary to perform tasks in word processing, database, spreadsheets, presentation, graphics, and basic operating system functions. a search for finding out the factors affecting computer awareness and computer literacy by modeling responses with suitable models did not produce favorable results. however, the review revealed a small number of studies of similar nature. one study was conducted by the temple university [6] with 259 entry level students to determine their computer literacy at the beginning of the 1997-98 academic years. the study used a questionnaire and revealed the following: at least 60% of the entering students had access to computers. approximately 60% of the students had used email, online information services, or the world wide web. students used word processing most frequently and database management systems software least often. in [7], results from a computer concepts assessment given to students enrolled in a computer literacy course at midwestern regional university were reported. slightly over 75% of these students scored more than the minimum college entrance acceptable score on word processing and 63.55% for presentation skills but only 40% for database. only 6.13% of these students had college entrance scores that exceeded the minimum considered to be acceptable for all components of the test, which also included a wide range of additional topics (networking and the internet, social and ethical issues, presentation, graphics, operating systems/hardware, word processing, database, and spreadsheets). besides, all students had a vague idea of an assessment of computer awareness and literacy among entry-level university of colombo undergraduates: a case study thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj t mailto:wnw@stat.cmb.ac.lk 16 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj international journal on advances in ict for emerging regions 04 june 2011 selected computer terms, with some variation by discipline. another study [8], was a written questionnaire for incoming medical students at the school of medicine virginia commonwealth university from 1991 to 2000. the survey's purpose was to learn the students' levels of knowledge, skill, and experience with computer technology to guide instructional services and facilities. the questionnaire was administered during incoming medical students‟ orientation or mailed to students' homes after matriculation. the average survey response rate was 81% from an average of 177 students. six major changes were introduced based on information collected from the surveys and advances in technology: distribute cd-roms containing required computer-based instructional programs, delivery of evaluation instruments via the internet, modification of the lab to pc-based environment, development of an electronic curriculum website, development of computerized examinations to prepare them for the computerized national board examinations, and initiation of a personal digital assistant (pda) project. this paper is based on a survey to assess the computer knowledge of entering freshmen in five faculties of the university of colombo, in order to determine if incoming students possess the basic computer knowledge and skills to begin studies effectively. the survey also identifies the factors affecting new entrants‟ computer awareness and computer literacy. thus, this study provides the computer usage of freshmen in different aspects but does not focus on the knowledge of the undergraduates studying from second year to fourth year. the survey was carried out within freshmen‟s first three months of the entry to the university in the year 2009 and a sample of 300 was selected using stratified random sampling. they were interviewed using face-to-face interviewing mode using a questionnaire containing questions of the type single-choice, multiplechoice, 1-3 ranking (1-highest to 3-lowest), and 5-point likert-scale (1-strongly disagree to 5-strongly agree). initially a descriptive analysis was done followed by an advanced analysis that resulted in fitting a generalized logit model. ii. methodology a. sampling procedure sampling procedure for our study contains two steps in selecting the sample size. firstly using the stratified random sampling [9][10], different strata were identified and secondly using proportional allocation method [9][10], the sample size for each stratum was calculated. in this study, five faculties were considered as five strata and then the sample size within each stratum was divided considering gender frequency. then, total sample size 300 was divided among the ten strata such that the sample size is proportional to the population size. a benefit of stratified sampling is that estimates will be more precise since each stratum is more homogeneous (less variable) than the population as a whole. table i displays the sample size allocated for each stratum. table i sample size for each stratum using proportional allocation method in this sampling method, the simple random sampling [9][10] was performed independently within each stratum. according to the table i, a simple random sample of 39 male students had to be taken from science faculty for the data collection. likewise, 10 simple random samples of different sizes were required. since we use the proportional allocation method, the sample sizes are calculated proportionately to the population size, different sample sizes were obtained. if the strata are about the same size then it will be more convenient to take the same sample size in each stratum. in this study, a particular procedure was used to make certain, the samples are random and without bias. at first, contact details of male and female students of the five faculties were obtained and then the specified number of students from each stratum was selected using a random number table, and only those selected, were contacted later for an interview. if a particular student was unwilling to respond, another student was contacted. interviews were through direct face-to-face interaction to minimize possible questions and confusions that may arise during the process of answering the questionnaire. however, the respondent filled the questionnaire while the interviewer helped them with clarifications when necessary. at the time of collecting questionnaires, it was ensured that all required fields were answered. b. computer awareness a few definitions are available in literature for the term „computer awareness‟ worldwide [1] [11]. in [1], if a person has heard of at least one of the uses of a computer (eg: playing games to complicated aeronautic applications), then he/she is considered a person with computer awareness. another study [11], has used five pointers: short history of computers, short history of internet, ways computers are used in the society, occupations related to computer use, and computer ethics, in order to measure the awareness. in this study, five indicators were created with the help of the above two studies. five pointers of the study [11] were strata gender population size sample size percentage of total sample size 300 science male 253 39 13.00 female 221 35 11.67 arts male 139 22 7.33 female 508 78 26.00 law male 34 5 1.67 female 168 26 8.67 management & finance male 166 26 8.67 female 237 37 12.33 medicine male 106 17 5.67 female 98 15 5.00 total male 697 109 36.33 female 1230 191 63.67 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj 17 june 2011 international journal on advances in ict for emerging regions 04 adjusted in a manner so that the respondents can understand the indicators easily. these indicators were created in order to go with the knowledge of the new entrants of the university of colombo. hence, a person is considered “computer aware” if he/she possesses all of the following five indicators:  knowledge about the fundamentals of computers (i.e. hardware and software computer systems, computer generations etc.).  knowledge about the fundamentals of internet (i.e. what is internet, what are the services offered by the internet? etc.).  knowledge related to computer concepts such as social, ethical and legal issues.  knowledge about at least three ways that computers are used in society.  knowledge about at least three occupations related to computer usage. c. computer literacy there is no precise consensus on how to define computer literacy [1] [11] [12] [13]. the term can mean different things to different people. since the respondents of this study are university freshmen, the technical definitions may not be appropriate. after a careful search in literature, it was understood that the computer literacy is defined with three types of skills: basic, intermediate, and advanced. basic and intermediate skills are being able to use basic operating system functions, word processor, spreadsheets, presentation graphics, databases, internet, and e-mail. advanced skills include programming, fixing software conflicts, and repairing computer hardware etc. since advanced skills demand technical knowledge beyond the level of a university freshman, only the basic and intermediate skills were considered in this study. hence, a person was considered “computer literate” if he/she possesses all the following six skills:  skills in basic hardware and basic operating system functions – identifying computer parts, powering up and powering down the computer, open/save files, recognize different file types  skills in word processing – create/save/print documents, insert tables/charts/ labels/symbols, format page layout (margins, page numbers, page borders)  skills in spreadsheets – create/save/print spreadsheets, insert tables/charts, insert functions/formulas  skills in presentation graphics – create/save/print slide shows, insert new slide/layout/tables/charts, create animations  skills in databases – design basic databases with queries and reports/forms  skills in internet & e-mail – surfing the internet and sending e-mail messages. it is understood that in measuring the ability to achieve several functions, an assessment is the best option. however due to lack of facilities to conduct an assessment for each and every respondent, (for instance, the field work of the main survey was conducted by only one interviewer in such a way to minimize the interviewer bias and other errors; limited time frame existed to conduct the survey as the assessment is infeasible since it is a time consuming tool; and respondents‟ unwillingness to grant more time even at the completion of the questionnaire may affect the possibility of performing an assessment successfully), they were interviewed through direct face-to-face interaction in order to minimize the possible questions and confusions during the cause of answering the questionnaire, thus to reduce the gap between the results through an assessment and a questionnaire. d. chi-square test the chi-square test [14] [15] provides a method for testing the association between two categorical variables in a twoway table. categorical variables [16] are the ones having two or more categories. in this study, the chi-square test was used to test the association between two categorical variables computer awareness and computer literacy. the null hypothesis h0 assumes that there is no association between the variables (in other words, one variable does not influence to the other variable), while the alternative hypothesis h1 claims that some association does exist. the chi-square test statistic is computed as, ∑ ( ) ( ) where ( ) chi-square test is engaged with (r-1)(c-1) degrees of freedom in a two-way table, where r represents the number of categories of the row variable and c represents the number of categories of the column variable in the two-way table. degree of freedom [17] is the number of values in the final calculation of an equation that are free to vary. when degree of freedom is one, an adjustment known as yates continuity correction [18] must be employed. in this correction, a value of 0.5 is subtracted from the absolute value (irrespective of algebraic sign) of the numerator contribution of each cell. then the basic chi-square computational formula becomes: ∑ ( | | ) ( ) an example of a chi-square test for a two-way table is given below with the objective of studying the association between smoking habit and heart attack. 18 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj international journal on advances in ict for emerging regions 04 june 2011 since degrees of freedom = (r-1)(c-1) = (2-1)(2-1) = 1, equation (3) is used, and x 2 = (76 – 67.99 – 0.5) 2 / 67.99 + … + (27 – 18.99 – 0.5) 2 / 18.99 = 7.97. table ii an example to study association between smoking and heart attack (expected frequencies within brackets) smoking habit row total smoker non smoker heart attack yes 76 (67.99) 35 (43.01) 111 no 22 (30.01) 27 (18.99) 49 column total 98 62 160 this x 2 value is compared with which is taken from chi-square tables at 5% significance level. since x 2 (7.97) > (3.84), we reject h0 at 5% level and conclude that there is an association between smoking habit and heart attack. in practice, statistical software is used to perform a chi-square test, and in this study, spss ® [19] statistical software was used. e. generalized logit model generalized logit model [20] [21] [22] is used when the variable is nominal multi-category [16] (two or more categories but which do not have an intrinsic order). in this study, chi-square test resulted that the variables computer awareness and computer literacy are associated with each other. hence, it would not be possible to consider them separately. thus, by joining these two variables, a new variable with four categories was constructed such that these categories are not ordered or ranked. thus, a generalized logit model was fitted to this new variable with the objective of finding the factors affecting it. suppose there is a nominal [16] variable with i categories. in fitting the generalized logit model, it is needed to take one of the categories as the „baseline‟ so that other categories can be compared according to it. in usual practice, the last category is taken as the baseline, as the comparison will be more meaningful. thus, when the last category (i) is the baseline with a factor x, the generalized logit model is, ( ) , i = 1,…, i-1 (4) where is the probability of occurrence of the response of interest (conditional probability) of the i th level of factor x; is the intercept of the i th level of factor x; and is the parameter estimate of the i th level of factor x. model (4) indicates that the factor x affects the nominal variable. after fitting a generalized logit model, the next step will be to compute the conditional probabilities { } so that the vital conclusions can be obtained after examining them. if a model contains i categories, then it has (i-1) logits. hence, the model (4) consists of (i-1) logits * ( ) ( ) , … , ( )+ and using parameter estimates, the values of these logits can be calculated. finally conditional probabilities can be computed since they satisfy the equation ∑ f. analysis of single-choice and multiple-choice questions in a single-choice question, there is only one response. in a multiple-choice question, there are a number of responses. these responses are usually marked by a „tick ()‟. an example for a multiple-choice question is as follows. example: which locations have you used to make use of computers when you enter the university? (multiple answers possible) a. home b. internet cafe c. study institution d. school e. friends / relatives place f. other (specify)………………………………… in the analysis of single-choice and multiple-choice questions, first the number of responses of each factor is counted and then percentage of each factor is obtained as follows: table iii results of the locations of using a computer g. analysis of ranked responses instead of simply choosing the responses using a tick in a single-choice or multiple-choice question, in this type of a question the responses are ranked. in this study, rankings are in 1-3 scale with „1‟ for the highest rank and „3‟ for the lowest rank. an example is given below: example: what are the three mostly used software packages when you enter the university? (please rank the three most factors 1-highest … 3lowest) a. ms office packages b. database management c. computer graphics d. web designing e. other (specify) .…………………………………… as the first step of the analysis, frequencies of the three ranks are counted and then multiplied by weights 0.5 or 0.3 or 0.2 such that the highest rank (i.e. rank 1) is multiplied by the highest weight (i.e. 0.5) and so on. in practice, these weights are chosen such that the sum of the weights is equal to one. then the total score of each factor is calculated. factor total percentage home 183 183/638% = 29% internet cafe 56 9% study institution 151 24% school 143 22% friends / relatives place 86 13% other 19 3% total 638 100% thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj 19 june 2011 international journal on advances in ict for emerging regions 04 finally percentage of each factor is obtained from the total score of all factors. table iv results of the software packages used iii. results a. descriptive analysis 1) computer usage majority of students (93%) have used computers when they enter the university. from these respondents, the survey sought to determine the reasons for computer usage. since frequency of using a computer is a single-choice question and locations of using a computer is a multiple-choice question, they were analyzed according to section ii f. while the other three factors (purposes of using a computer, software packages used and methods of obtaining computer knowledge) use ranked responses, they were analyzed according to section ii g. the results are given in table v. table v results of the computer usage from few respondents who have not used computers, reasons for not using a computer were obtained. this is a question with ranked responses and was analyzed according to section ii g. according to the results, the majority (35%) of the respondents has indicated that the main reason for not using a computer is not having a computer at home while for 32% of the respondents the main reason is financial constraints. the results of the analysis are given in table vi. table vi results of the computer non-usage 2) computer awareness and computer literacy according to the definitions for the computer awareness and computer literacy used in this study, an attempt was made to find out the percentages of freshmen having computer awareness and computer literacy. as stated in section ii b and section ii c, a person was considered as computer aware if he/she possesses all five indicators of computer awareness and a person was considered as computer literate if he/she possesses all six indicators of computer literacy. table vii and viii show the percentages of freshmen having computer awareness and computer literacy. the study shows that 60% of the new entrants are computer aware whereas only 47% are computer literate. it is important to note that males in general surpass females in both computer awareness and computer literacy. table vii results of computer awareness total number number of computer aware percentage of computer aware male 109 71 71/109% = 65% female 191 108 108/191% = 57% total 300 179 179/300% = 60% table viii results of computer literacy total number number of computer literate percentage of computer literate male 109 55 55/109% = 50% female 191 85 85/191% = 45% total 300 140 140/300% = 47% 3) internet usage out of the 278 freshmen who have used a computer when they enter the university, around 60% are internet users and they mainly use internet for educational purposes. in addition, 30% students use the internet several times a week, while a significant proportion of students (26%) are rare internet users. the results of the internet usage are given in two tables below. factor score of the first three ranks total score percentage rank1 score rank2 score rank3 score a 259*0.5 = 129.5 6*0.3 = 1.8 5*0.2 = 1 132.3 132.3/191. 8%= 69% b 1*0.5 = 0.5 41*0.3 = 12.3 19*0.2 = 3.8 16.6 9% c 8*0.5 = 4 64*0.3 = 19.2 21*0.2 = 4.2 27.4 14% d 2*0.5 = 1 13*0.3 = 3.9 25*0.2 = 5 9.9 5% e 4*0.5 = 2 6*0.3 = 1.8 9*0.2 = 1.8 5.6 3% total 191.8 100% frequency of using a computer locations of using a computer purposes of using a computer software packages used methods of obtaining computer knowledge several times a week (33) home (29) educational activities (47) ms office packages (69) computer courses (37) daily (29) study institution (24) leisure activities (25) computer graphics (14) self study (25) once a week (22) school (22) surfing internet (12) database management (9) school (20) rarely (11) friends/ relatives place (13) for e-mails (6) web designing (5) another person (9) once a month (5) internet café (9) office work (6) other (3) family members (8) other (3) selfemployment (3) other (1) other (1) reasons for not using a computer no computer at home (35) financial constraints (32) computer usage was difficult (17) computer usage was not required (8) other (5) computer usage was not knowledgeable (3) 20 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj international journal on advances in ict for emerging regions 04 june 2011 table ix results of the internet usage total number number of internet users percentage of internet users male 101 64 64/101% = 63% female 177 100 100/177% = 56% total 278 164 164/278% = 59% table x results of the internet usage b. testing the association between computer awareness and computer literacy in order to fit suitable models for the two variables computer awareness and computer literacy, it was first needed to find out whether they are associated or not. chisquare test [14] [15] is used for testing the association between two or more categorical variables. here, computer awareness and computer literacy are two categorical variables with two categories each. categories computer awareness computer literacy category 1 computer aware (yes) computer literate (yes) category 2 non-computer aware (no) computer illiterate (no) thus in order to find the association between these two variables, chi-square test was used. the hypothesis of the test is, h0: computer awareness and computer literacy are not associated vs h1: computer awareness and computer literacy are associated. since the degrees of freedom is one ( (r-1)*(c-1) = (2-1)*(2-1) = 1), the yates continuity correction was used (section ii d). results of the chisquare test obtained from spss ® [19] statistical software are as follows. table xi results of the chi-square test a chi-square value 20.676 with 1 degrees of freedom ( p = .000 < 0.05 ) illustrates that the test is significant (reject h0) at 5% level and there is significant evidence to confirm that the two variables are associated with each other. c. fitting a generalized logit model the above result (table xi) suggests that computer awareness and computer literacy have to be considered jointly rather than separately. in order to consider them jointly, a new variable was created as follows: since the four categories of this new variable are not ordered, a generalized logit model was suitable (refer section ii e). moreover, the last category (category 4) was taken as the baseline and sas ® [23] statistical software was used to carry out the model selection. the forward selection procedure [24] was used in finding the best model. this procedure starts with the null model (intercept term only) and factors that contribute to the new variable are added one at a time and the factor which gives minimum p value is selected. these factors were identified from the questionnaire (refer appendix) and they are gender, district, family member who does it related job, monthly family income, usage of resources, purposes of using a computer, frequency of using a computer, locations of using a computer, software packages used, methods of obtaining computer knowledge, usage of internet, uses of internet, and frequency of using internet. then, the rest of the factors were added to the selected model (with the most significant factor) and the next most significant factor was selected. this process continues until none of the factors are significant. the final best model was: ( ) (5) where uin = usage of internet, inc = monthly family income level, met = methods of obtaining computer knowledge, and loc = locations of using a computer; i = 1, 2, 3; j = 1 (internet user), 2 (non-internet user); k = 1 ( < rs.15000), 2 (rs.15000 rs.30000), 3 (rs.30000 rs.50000), and 4 ( > rs.50000); l = 1 (computer courses followed), 2 (school), 3 (self study, family members, another person, other); and m = 1 (one location), 2 (two locations), 3 (three locations), and 4 (more than three locations). purposes of using internet frequency of using internet education & learning activities (37) several times a week (30) for getting information (25) rarely (26) leisure activities (22) once a week (21) communication (14) daily (14) office work (1) once a month (9) self employment (1) other (0) value df asymp. sig. (2-sided) exact sig. (2-sided) pearson chi-square 21.817 1 .000 continuity correction 20.676 1 .000 likelihood ratio 22.176 1 .000 fisher's exact test .000 linear-by-linear association 21.738 1 .000 n of valid cases 278 computer awareness computer literacy categories of a new variable yes yes 1 yes no 2 no yes 3 no no 4 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj 21 june 2011 international journal on advances in ict for emerging regions 04 as there are four levels for the new variable, model (5) consists of three logits (refer section ii e). in model (5), ( ) ; i = 1, 2, 3, are known as logits. parameter estimates of these logits are provided in table xii. after fitting a model, the usual practice is to test adequacy of the model. this aspect of the adequacy of a model is referred to as goodness of fit [24]. the measures used to determine the goodness of fit of the model are likelihood ratio deviance [25] and pearson chi-square [26]. the hypothesis of testing the adequacy of the model is, h0: the model fits the data well vs h1: the model does not fit the data well. the results of these tests obtained from sas ® [23] show a chi-square value of 558.4536 with 741 degrees of freedom ( p = 1.000 > 0.05 ) for deviance test and a chisquare value of 729.6255 with 741 degrees of freedom ( p = 0.6101 > 0.05 ) for pearson test, suggesting that the model (5) fits the data well at 5% significance level. table xii parameter estimates of three logits of model (5) conditional probabilities after parameter estimation, conditional probabilities for each generalized logit model were calculated, and the following results were obtained. model 1: when i = 1, the model (5) becomes, ( ) which models the probability of category 1 of new variable (having both computer awareness and computer literacy) relative to the category 4 of new variable (not having both computer awareness and computer literacy) model 2: when i = 2, the model (5) becomes, ( ) which models the probability of category 2 of new variable (having only computer awareness) relative to the category 4 of new variable (not having both computer awareness and computer literacy) model 3: when i = 3, the model (5) becomes, ( ) which models the probability of category 3 of new variable (having only computer literacy) relative to the category 4 of new variable (not having both computer awareness and computer literacy) then a total of 278 sets of conditional probabilities {p1, p2, p3, p4} have to be calculated for the 278 respondents who have used a computer when they enter the university. in order to do this for different j, k, l, and m values, parameter estimates of twelve terms ( ) from table xii have to be substituted in the above three models. subsequently, conditional probabilities {p1, p2, p3, p4} are found using the constraint ∑ =1. using the conditional probabilities, conclusions for respondents of each category of new variable can be derived. in order to describe category 1, the records which have p1 as the highest conditional probability are taken from the total of 278 records. then these highest p1 conditional probabilities have to be arranged in descending order. some of them are listed in table xiii. altogether 163 records were found with highest conditional probability as p1. then, these 163 records were examined to understand about the type of respondents in category 1, and it was found that these records have higher chance of having both computer awareness and computer literacy. moreover, it is clear that the respondents who are internet users having higher (rs.30,000 or greater) monthly family income level and who use three or more locations for computing are more likely to be both computer aware and computer literate. further, they are likely to obtain computer knowledge from several sources such as computer courses, self study, family members, another person and school. there were 49 records in which the highest conditional probability is p2 (category 2 only having computer awareness). from these records, it can be said that the respondents who are mostly non-internet users having monthly family income level less than rs.30,000 and use few (less than 3) locations to use a computer are more likely to be in the computer awareness category only. additionally, these students obtain their computer knowledge from school and/or by following computer courses. for category 4 (not having both computer awareness and computer literacy), 66 records were found. from these records, it is evident that the respondents who are mostly non-internet users having monthly family income level less than rs.15,000 and use only one location to utilize a computer, seem to be the ones with the highest probability factor factor levels parameter estimates of three logits ( ) ( ) ( ) intercept 0.3553 0.1482 -1.1730 uin j = 1 1.1199 0.1859 1.1392 j = 2 0.0000 0.0000 0.0000 inc k = 1 -1.3585 -0.7147 -0.7362 k = 2 -0.8175 -0.1781 -0.5009 k = 3 0.5624 -0.0752 0.0822 k = 4 0.0000 0.0000 0.0000 met l = 1 0.3578 0.1948 0.5426 l = 2 -0.2170 0.6758 -0.6477 l = 3 0.0000 0.0000 0.0000 loc m = 1 -0.7411 -0.0666 -0.7831 m = 2 -0.0318 0.2647 0.2148 m = 3 0.5539 0.8020 0.9503 m = 4 0.0000 0.0000 0.0000 22 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj international journal on advances in ict for emerging regions 04 june 2011 of not having both computer awareness and computer literacy. moreover, it appears that their main location of obtaining computer knowledge is school. it is noted from the results that there was no highest conditional probability found for category 3 (having only computer literacy). this result reveals that there is less probability that a person is only computer literate without being aware of computers. table xiii some estimated conditional probabilities for category 1 of new variable iv. discussion this research was carried out with the objective of assessing the computer background knowledge of the university of colombo freshmen representing five faculties in the year 2009. preliminary analysis shows that male students entering the university have higher computer knowledge than their female counterparts. moreover, a majority of the students have used a computer (93%) and/or internet (60%) when they enter the university. for those who have not used a computer, not having a computer at home (35%) is the main reason closely followed by financial constraints (32%). about half of the students have used computers mainly for educational and learning activities (47%), and microsoft office packages (69%) are the most common in use. a majority has obtained computer knowledge from computer courses (37%), while home (29%) is the most common location of use. the survey sought to determine the computer and internet usage of new entrants of the university of colombo. according to the findings of the study, 60% are computer aware while only 47% are computer literate. it must be noted that males in general do better than females in both computer awareness and computer literacy. from computer users, only 59% are internet users whereas for them, educational and learning activities (37%) are the main purpose of surfing internet and most of them use internet several times a week (30%). a chi-square test was used to identify whether there is an association between computer awareness and computer literacy in order to model the effect of the explanatory variables separately on the two binary response variables computer awareness and computer literacy using two logistic models or jointly on the two responses using a generalized logit model. the chi-square analysis proved that the two response variables are associated with each other, hence a generalized logit model is fitted for the new variable by combining the levels of the two binary responses. the generalized logit model for the combined responses suggests that the combined response is dependent on the factors usage of internet, monthly family income level, methods of obtaining computer knowledge, and locations of using a computer. from the research findings, it was revealed that university of colombo freshmen who are likely to be both computer aware and computer literate possess several characteristics. these respondents are internet users, and their monthly family income level is high. further they use more locations for using a computer and they obtain computer knowledge from several sources such as computer courses, self study, family members, another person and school. in contrast, for the respondents who are likely to be both non-computer aware and computer illiterate, it is the other way round; i.e. most of them are non-internet users from families having low monthly income. in addition, they choose only one location for using a computer and obtain computer knowledge from few sources such as school and/or computer courses. the analysis further proved that new entrants of the university of colombo who are likely to be only computer aware, hold following features. they are mostly noninternet users from families having medium level of monthly income. besides, they obtain computer knowledge from few types of sources such as school and/or computer courses and use one or two locations for using a computer. a significant finding was that there are no highest conditional probabilities found for respondents who are likely to be only computer literate. since these values are probabilities, even though the highest conditional probabilities were not found for this category in this study, one cannot conclude that there is no possibility that a person in general to be only computer literate without being aware of computers. in conclusion, the findings provide evidence that the computer modules at university of colombo should uin inc met loc p1 p2 p3 p4 1 1 3 3 3 0.86219 0.07190 0.17928 0.05944 2 1 3 3 4 0.86219 0.01938 0.07732 0.09716 3 1 3 1 1 0.86219 0.17270 0.09679 0.11727 4 1 1 3 2 0.85911 0.15672 0.19829 0.24578 5 1 1 3 4 0.85911 0.04849 0.11967 0.26943 6 1 2 3 4 0.82931 0.03940 0.13140 0.37431 7 1 3 3 3 0.82931 0.07190 0.17928 0.05944 8 1 3 3 4 0.82931 0.01938 0.07732 0.09716 9 1 4 3 1 0.82931 0.11008 0.11905 0.07642 10 1 3 3 1 0.82931 0.09728 0.10216 0.19170 11 1 1 3 4 0.82226 0.04849 0.11967 0.26943 12 1 3 3 4 0.80613 0.01938 0.07732 0.09716 13 1 1 3 4 0.80613 0.04849 0.11967 0.26943 14 1 3 1 4 0.80613 0.03515 0.07483 0.06071 15 1 2 1 3 0.80613 0.24021 0.26728 0.12971 16 1 4 3 1 0.71863 0.11008 0.11905 0.07642 17 1 2 3 3 0.71863 0.13672 0.28505 0.21426 18 1 2 1 2 0.71863 0.23388 0.21345 0.21614 19 1 2 1 3 0.71863 0.24021 0.26728 0.12971 20 1 3 1 4 0.71863 0.03515 0.07483 0.06071 . . . . . . . . . thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj 23 june 2011 international journal on advances in ict for emerging regions 04 concentrate more on improving the computer literacy skill base of students, especially female learners. however, both groups would benefit from further instruction and practical experience in this subject matter. in order to better prepare for university computer modules, the administrative bodies of the university should consider offering practical computer sessions and teach students helpful tips and shortcuts for better computer fluency. further, administrative bodies can compare these results with results from future classes. another interesting idea would be to repeat the same survey at the conclusion of the course and compare the preand postresults. acknowledgement the authors wish to sincerely thank all those who supported and participated in this survey. references [1] department of census and statistics – sri lanka. (2008). household computer literacy survey of sri lanka: 2006/07. accessed on december 2008 from http://www.statistics.gov.lk/recent%20publications.htm [2] hess c. a. (1994). computer literacy: an evolving concept. school science and mathematics, 94(4): 208-214. [3] brock f. j. and thomsen w. e. (1992). the effects of demographics on computer literacy of university freshmen. journal of research on computing in education, 24(4): 563-570. [4] capron h. l. and johnson j. a. (2004). computers: tools for an information age complete (8 th ed.) prentice hall, inc., new jersey. [5] hindi n. m., miller d. and wenger j. (2002). computer literacy: implications for teaching a college-level course. journal of information systems education, 13(2): 143-152. [6] patrikas e. o. and newton r. a. (1999). computer literacy. technological horizons in education journal, 27(5). [7] meier r. (2001). cis 101 concepts test at fort hays state university. kansas core outcomes project, 10-18. [8] schlesinger j. b. and hampton c. l. (2002). using a decade of data on medical student computer literacy for strategic planning. journal of the medical library association, 90(2): 202-209. [9] barnett v. (1974). elements of sampling theory. (3rd ed.) hodder & stoughton, london. [10] cochran w. g. (1977). sampling technques. (3rd ed.) john wiley & sons inc., new jersey. [11] k-5 technology integration scope and sequence. accessed on january 2009 from http://www.hhh.k12.ny.us/uploaded/pdfs/technology/k5_scope _seq.pdf [12] hindi n. m., miller d, and wenger j. (2002). computer literacy: implications for teaching a college-level course. journal of information systems education, 13(2): 143-151 [13] computer literacy. accessed on december 2008 from http://en.wikipedia.org./wiki/computer_literacy [14] two-way tables and the chi-square test. accessed on march 2009 from http://www.stat.yale.edu/courses/1997-98/101/chisq.htm [15] chi-square test for association using spss. accessed on march 2009 from http://statistics.laerd.com/spss-tutorials/chi-square-test-forassociation-using-spss-statistics.php [16] types of variable. accessed on march 2009 from http://statistics.laerd.com/statistical-guides/types-of-variable.php [17] degrees of freedom (statistics). accessed on march 2009 from http://en.wikipedia.org/wiki/degrees_of_freedom_%28statistics% 29 [18] module s7 – chi square. accessed on march 2009 from http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/n ewpage28.htm [19] everitt b. s., and landau s. (2004). a handbook of statitical analyses using spss. (1st ed.) chapman & hall, crc press llc. [20] sas publishing (2001). sas/stat® software: changes and enhancements, release 8.2. (1st ed.) pp.117-128. sas institute inc., north carolina. [21] agresti a. (2002). categorical data analysis. (2nd ed.) john wiley & sons inc., new jersey. [22] gerken j. (1991). generalized logit model. journal of transportation research part b: methodological, 25(2-3): 75-88. [23] geoff d. and everitt b. s. (2002). a handbook of statistical analysis using sas. (2nd ed.) chapman & hall, london. [24] collett d. (2003). modelling binary data. (2nd ed.) chapman & hall, london. [25] likelihood ratio tests. accessed on february 2009 from http://warnercnr.colostate.edu/class_info/fw663/likelih oodratiotests.pdf [26] 1.3.5.15. chi-square goodness-of-fit test. accessed on march 2009 from http://itl.nist.gov/div898/handbook/eda/section3/eda35f.htm appendix questionnaire please tick () or rank the appropriate boxes as required and follow the instructions carefully. section 1 – personal details 1.1 faculty : ……………………………. 1.2 stream : ……………………………. 1.3 gender : ……………………………. 1.4 district of the school you attended during the g.c.e.(a/l) : ……………………………. 1.5 district you live when you enter the university: ………... 1.6 do you have any family member who does it (information technology) related job when you enter the university? yes no 1.7 monthly income of your family when you enter the university (rs.): a. less than 15,000 b. 15,000 – 30,000 c. 30,000 – 50,000 d. above 50,000 serial number http://www.statistics.gov.lk/recent%20publications.htm http://www.hhh.k12.ny.us/uploaded/pdfs/technology/k5_scope_seq.pdf http://www.hhh.k12.ny.us/uploaded/pdfs/technology/k5_scope_seq.pdf http://en.wikipedia.org./wiki/computer_literacy http://www.stat.yale.edu/courses/1997-98/101/chisq.htm http://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php http://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php http://statistics.laerd.com/statistical-guides/types-of-variable.php http://en.wikipedia.org/wiki/degrees_of_freedom_%28statistics%29 http://en.wikipedia.org/wiki/degrees_of_freedom_%28statistics%29 http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm http://warnercnr.colostate.edu/class_info/fw663/likelih%20oodratiotests.pdf http://warnercnr.colostate.edu/class_info/fw663/likelih%20oodratiotests.pdf http://itl.nist.gov/div898/handbook/eda/section3/eda35f.htm 24 thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj international journal on advances in ict for emerging regions 04 june 2011 section 2 2.1 which of the following resources have you used when you enter the university? (multiple answers possible) a. radio b. television c. desktop computer d. laptop computer e. cd f. printer g. scanner h. mobile phone 2.2 which of the followings have you known about computers when you enter the university? (multiple answers possible) a. knowledge about the fundamentals of computer (i.e. hardware and software computer systems, computer generations etc.) b. knowledge about the fundamentals of internet (i.e. what is internet, what are the services offered by internet? etc.) c. knowledge related to computer concepts (e.g. social, ethical and legal issues) d. at least three ways that the computers are used in society e. at least three occupations related to computer use section 3 – usage of a computer 3.1 have you used a computer when you enter the university? yes no if the answer is yes, skip to question 3.3. otherwise, move to the next question. 3.2 what are the three main reasons for not using a computer when you enter the university? (please rank the three most factors 1-highest … 3-lowest) a. computer usage was not required b. computer usage was not knowledgeable c. computer usage was difficult d. had no computer at home e. financial constraints f. other (specify)………………………… now, skip to section 5. 3.3 which of the following skills have you possessed when you enter the university? (multiple answers possible) a. skills in basic hardware and basic operating system functions (identifying computer parts, powering up and powering down the computer, open/save files, recognize different file types) b. skills in word processing (create/save/print documents, insert tables/charts/ labels/symbols, format page layout-margins, page numbers, page borders) c. skills in spreadsheets (create/save/print spreadsheets, insert tables/charts, insert functions/formulas) d. skills in presentation graphics (create/save/print slide shows, insert new slide /layout/ tables/charts, create animations) e. skills in databases (design basic databases with queries and reports/forms f. skills in internet & e-mail (surfing the internet and sending e-mail messages) 3.4 what are the three main purposes of using a computer when you enter the university? (please rank the three most factors 1-highest … 3-lowest) a. education & learning activities b. leisure activities c. surfing internet d. for e-mails e. office work f. self employment g. other (specify) ……………………….. 3.5 how often do you use a computer when you enter the university? a. daily b. several times a week c. once a week d. once a month e. rarely 3.6 which locations have you used to make use of computers when you enter the university? (multiple answers possible) a. home b. internet cafe c. study institution d. school e. friends / relatives place f. other (specify) ………………………… 3.7 what are the three mostly used software packages when you enter the university? (please rank the three most factors 1-highest … 3-lowest) a. ms office packages b. database management c. computer graphics d. web designing e. other (specify) ……………………… 3.8 what are the three main methods of obtaining computer knowledge when you enter the university? (please rank the three most factors 1-highest … 3-lowest) a. computer courses followed b. school c. self study d. family members e. another person f. other (specify) ………………………… thilaksha h. tharanganie, w. n. wickremasinghe and g. p. lakraj 25 june 2011 international journal on advances in ict for emerging regions 04 section 4 – usage of internet 4.1 have you used internet when you enter the university? yes no if the answer is yes, move to the next question. otherwise, skip to section 5. 4.2 what is your ability to use internet when you enter the university? a. can use without assistance b. can use with assistance 4.3 what are the three main uses of internet when you enter the university? (please rank the three most factors 1-highest … 3-lowest) a. education & learning activities b. leisure activities c. for getting information d. communication e. office work f. self employment g. other (specify) ……………………… 4.4 how often do you use internet when you enter the university? a. daily b. several times a week c. once a week d. once a month e. rarely section 5 5.1 comment on the following statements. 5.2 give your comments on any other factors of measuring computer literacy. ………………………………………………………… ………………………………………………………… ………………………………………………………… thank you! s tr o n g ly a g re e a g re e n o o p in io n d is a g re e s tr o n g ly d is a g re e i think having computer knowledge when enter the university is beneficial. having computer knowledge make my life easier.  the international journal on advances in ict for emerging regions 2010 03 (01) : 11 24  abstract—in this paper we describe the construction of a spell checker for sinhala, the language spoken by the majority in sri lanka. due to its morphological richness, the language is difficult to enumerate completely in a lexicon. the approach described is based on n-gram statistics and is relatively inexpensive to construct without deep linguistic knowledge. this approach is particularly useful as there are very few linguistic resources available for sinhala at present. the proposed algorithm has been shown to be able to detect and correct many of the common spelling errors of the language. results show a promising performance achieving an average accuracy of 82%. this technique can also be applied to construct spell checkers for other phonetic languages whose linguistic resources are scarce or non-existent. index terms—spell checking, sinhala, data driven, n-gram i. introduction pell checking deals with detecting misspelled words in a written text and possibly assisting users in correcting them with the use of a dictionary or otherwise. spell checkers are well-known components of word-processing applications. in addition, spell checkers are widely used in other applications such as optical character recognition (ocr) systems, automatic speech recognition (asr) systems, computer aided language learning (call) software, machine translation (mt) systems and text-tospeech (tts) systems [1] [2]. the history of automatic spelling correction goes back to the 1960s [3]. even after decades of extensive research and development, the effectiveness of spell checkers remains a challenge today. common spelling mistakes can be classified into two broad categories: 1) non-word errors, where the word itself manuscript received march 30, 2010. accepted september 12, 2010. this research was partially supported by the pan localization project, (http://www.panl10n.net) a grant from the international development research center (idrc), ottawa, canada, administered through the center for research in urdu language processing, national university of computer and emerging sciences, pakistan. r. a. wasala was with language technology research laboratory, university of colombo school of computing, 35, reid avenue, colombo 07, sri lanka. he is now with the localisation research centre, department of computer science and information systems, university of limerick, limerick, ireland. (e-mail: asanka.wasala@ul.ie). a.r. weerasinghe and d.e. jayalatharachchi are with the university of colombo school of computing, 35, reid avenue, colombo 07, sri lanka. (e-mail: arw@ucsc.cmb.ac.lk, dej@ucsc.cmb.ac.lk). r. pushpananda and c. liyanage are with the language technology research laboratory, university of colombo school of computing, 35, reid avenue, colombo 07, sri lanka (e-mail: rpn@ucsc.cmb.ac.lk, cml@ucsc.cmb.ac.lk). is invalid (i.e. not present in a valid lexicon) and 2) realword errors, where the word is valid yet inappropriate in the context [3] [1] [2]. based on above categorisation, the task of spelling correction can be classified into two approaches: isolated-word correction and context-sensitive error correction. real-word errors are usually recognized and corrected using non-context-sensitive spelling error correction approaches [3]. context-sensitive spelling error correction is more complex and requires advanced statistical and natural language processing (nlp) techniques. in this paper, we focus on detecting and correcting nonword errors, especially to address a prominent issue prevalent in written sinhala, casually referred to as “na-nala-la” dissension. a data-driven algorithm based on n-gram statistics is proposed to solve these spelling problems. in addition, the proposed algorithm is also capable of addressing common spelling errors due to phonetic similarity of letters. at present, there is no published work on sinhala spell checking. to the best of the our knowledge, this is the first implementation of a spell checker for sinhala using a data-driven approach. the rest of this paper is organized as follows: section ii summarizes the related work in this area. section iii gives an overview of the linguistic features related to sinhala spelling and describes the core spell checking algorithm implemented while section iv presents an evaluation of the current system. section v discusses the main findings of the research. finally, the paper concludes with a summary of the current research and discusses future research directions. ii. related work spell checkers for european languages such as english [3] are well developed. literature concerning spell checkers in indic languages such as assamese [2] bangla [4] [1] malayalam [5] marathi [6] and tamil [7] are less well developed. however, similar research in several other languages, including sinhala, is underway and need special attention owing to morphological richness. several commercial products [18] of sinhala spell checkers have been announced in recent years. work on open-source spell checkers has also shown an increase recently. hunspell (the spell checker of openoffice.org, mozilla firefox & thunderbird, google chrome, mac os x and opera [19]) has support for sinhala on openoffice.org through extensions [20]. a dictionary-based spell-checker is available for mozilla firefox as an add-on [21]. a data-driven approach to checking and correcting spelling errors in sinhala asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi s 12 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 furthermore, there is a sinhala search-engine [22] that uses a dictionary-based technique to automatically correct spelling mistakes in query strings. unfortunately, none of these spell-checkers have been systematically assessed or benchmarked. one of the major issues worth noting in implementing spell checkers in these languages is resource deficiency. morphological analyzers, tagged corpora and comprehensive lexica are scarce for many languages including sinhala. moreover, due to the rich morphological features of these languages, developing entirely rule based systems or integrating with existing open-source spell checkers such as aspell are arduous tasks. therefore, the research in spell checker development in languages such as sinhala has many unresolved issues. the problem of spell checking has been addressed using edit distance based approaches, morphology and rule based approaches, dictionary-lookup and reversed dictionary lookup techniques, n-gram analysis, probabilistic methods, and neural nets [3] [9] [1] [4]. of these, morphology based approaches [5] [6] [7] [2] and reverse dictionary lookup techniques [1] [4] [5] are the most popular ones used in indic languages, most of which have to deal with rich morphology. probabilistic or data driven approaches are scarcely reported due to the lack of resources such as corpora, although n-gram based approaches are shown to be effective in addressing spelling errors in other languages [9] [3]. iii. methodology a rigorous linguistic analysis and a literature survey were carried out to investigate the factors leading to most common non-word spelling errors in sinhala. based on this linguistic analysis and a thorough analysis of a text corpus, an algorithm is proposed to detect and correct spelling mistakes typically found in sinhala writing. a. linguistic analysis sinhala is diglossic; its spoken form is different from its written form. sinhala orthography consists of 60 graphemes and an estimated 40 phonemes [10]. the study revealed that most of the non-word spelling errors occur due to three factors: 1) the phonetic similarity of sinhala characters, 2) irregular correspondence between sinhala graphemes and phonemes, and, 3) the lack of knowledge of spelling rules. in sinhala, non-word spelling mistakes are largely due to the fact that several graphemes correspond to a single phoneme [10]. the most prominent cases are elaborated in the following sections. 1) the pronunciation and orthography of aspirated and unaspirated consonants according to disanayaka [11] the sinhala writing system contains 10 graphemes for representing aspirated consonants (ඛ /k h /, ඝ /g h /, ඡ /tʃ h /, ඣ /dʒ h /, ඨ /ʈ h /,ඪ /ɖ h /, ථ /t /, ධ /d h /, ප /p h /, බ /b h /) and 10 graphemes for representing unaspirated consonants (ක /k/, ග /g/, ච /tʃ/, ජ /dʒ/, ට /ʈ/, ඩ /ɖ/, ත /t /, ද /d /, ඳ /p/, ඵ /b/ ). the aspirated consonants occur in words borrowed from sanskrit or pali languages. however, they are generally not pronounced differently from their unaspirated counterparts [11] [10] [12]. this particular gap between the written language and the spoken language has led to some common spelling errors in sinhala. among the letters representing aspirated consonants, the letters „ඣ‟ /dʒ h /, „ඡ‟ /tʃ h / and „ප‟ /p h / are rarely used, while the rest are frequent. furthermore, it can be seen that these aspirated letters can appear at the beginning, middle or end of a word. hence, it is difficult to establish linguistic rules for the proper usage of unaspirated and aspirated letters in sinhala writing. 2) retroflex and dental letter confusion: the ‘na-nala-la’ dissension the most common spelling errors in sinhala are due to the retroflex and dental letter confusion. in spoken sinhala, several graphemes that represent corresponding retroflex consonants are actually pronounced in an intermediate alveolar-like position. the graphemes „ණ‟ and „ශ‟ represent the retroflex nasal /ɳ/ and the retroflex lateral /ɭ/ respectively. but they are pronounced in the same manner as their respective alveolar counterparts „න‟-/n/ and „ර‟-/l/ [12] [10]. when pronouncing the above consonants, not much attention is paid to the distinction of the place of articulation (i.e. all of them are pronounced as alveolar sounds), but the distinction of retroflex and dental letters (though pronounced as alveolar consonants) is stressed in the writing system. this confusion inevitably leads to spelling errors. in the literature, these errors are commonly known as “nana-la-la” (/na/-/nə/-/la/-/lə/) dissention („න-ණ ර-ශ‟ errors). linguists believe that clear guidelines or a mechanism had been present to describe the correct usage of retroflex-dental letters until the end of 13 th century [13]. however, due to various reasons, these guidelines no longer exist [13]. by analyzing the language, several rules can be defined to minimize the confusion between retroflex and dental letter usage. some rules can be defined by considering the phonological transformation rules applied for words derived from other languages. in addition, some more rules can be derived by analyzing the usage and the context of retroflex and dental letters (i.e. „න, ණ, ර, ශ‟). see sections 1 4 of the appendix a for linguistic rules concerning the use of the above retroflex and dental letters. rules of the former type are extremely complex. a layman lacks the requisite linguistic knowledge to apply such rules to decide whether to use retroflex or the dental letter in spelling a given a word. for example, see the rule described in appendix a 1.2: 1. intervocalic sanskrit and pali ණ /ɳ/ does not get evolved [14]. sanskrit „඾්ණ‟ /ʂɳ/ > pali „ණ්ව‟ /ɳh/ > sinhala ණ /ɳ/ [14] asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 13 december 2010 the international journal on advances in ict for emerging regions 03 example: උ඾්ණ /uʂɳǝ/ > උණ්ව /uɳhǝ/ > උණු /uɳu/, උණ /uɳǝ/ all words used in this example are used in modern sinhala writing, yet, a layman would not normally know which words are borrowed „as is‟ from foreign languages (such as sanskrit or pali) and which words are of the derived kind. therefore, without this knowledge, one might fail to apply the above rule and use the dental letter „න‟ instead of retroflex letter „ණ‟ (or vice versa) in certain words causing spelling errors. 3) the pronunciation and orthography of the retroflex and palatal sibilants in sinhala, the grapheme „඾‟ that represents the retroflex sibilant /ʂ/, is pronounced as the palatal sibilant „ල‟-/ʃ/. as with the case of graphemes for representing aspirated sounds, the above two graphemes were also borrowed from sanskrit. here too, though the distinction of the place of articulation is not prominent in pronunciation, the correct grapheme has to be used in writing. it is possible to define some linguistic rules on the correct use of above graphemes (see sections 5 and 6 of the appendix a). it should be noted, however, that not all cases are covered by such linguistic rules. hence, it can be seen that the confusion between above two graphemes has lead to some spelling errors. b. error detection and correction methodology the algorithm for spelling error detection and correction is described below. it is based on n-gram statistics computed from the ucsc sinhala corpus [23]. the computed word unigram frequencies and the syllable bigram and trigram frequencies are effectively utilized in addressing the prominent “na-na-la-la” dissension as well as other spelling errors described in section iii-a above. fig. 1. core modules and the overall architecture of the spell checker our algorithm is based on the assumption that the majority of users of the language write using correct spellings. in other words, we assume that the frequency of valid words (i.e. words with correct spelling) appearing in the corpus is higher than the frequency of invalid words. the core modules and the overall architecture of the spell checker are illustrated in figure 1. the main algorithm of the sinhala spell checker is given in figure 2. each module has been implemented as a function and the algorithm corresponding to each function is given after the description of each module. processedwordlist=preprocess(inputtext) for each word w in processedwordlist permutationlist=generatepermutations(w) bestsuggestion=selectbestsuggestion (permutationlist, w) if bestsuggestion is not equal to w then substitutionlist[w]=bestsuggestion end if end for outputtext=postprocess(inputtext, substitutionlist) display(outputtext) fig. 2. the main algorithm of the spell checker 1) pre-processing module the input to the system is unicode text. in the preprocessing module, the system first tokenizes the text stream and builds a list containing unique sinhala words found in the text. each word is then compared with an exception word list. if a word is found to be in the exception word list, it will be removed from the unique word list, hence from further processing. the exception word list contains a list of homophones and valid spelling variants. a total of 1188 words identified mainly from literature [15] are included in this exception list. several examples for homophones include {කන /kanǝ/ eat, කණ /kanǝ/ear}, {තන /t anǝ/ breast, තණ /t anǝ/grass}, spelling variants include {උළශර -/ulelǝ/ ceremony, උළරශ /ulelǝ/ ceremony} and {කු඿රතා /kusǝlǝt a:/skills, කුලරතා /kusǝlǝt a:/ skills}. homophone disambiguation requires contextual information as well as advanced natural language processing (nlp) techniques and is beyond the scope of this paper. the algorithm in its current form is only capable of processing isolated words. therefore, homophones and spelling variants are excluded from further processing. each word in the processed unique list is then passed to the permutation generation module. pre-processing algorithm function preprocess takes the input text as a parameter and returns a list of unique sinhala words found in the input text but not in the exception word list. pre-processing module permutation generation module best suggestion selection module post-processing module input text output text processed word list permutations generated for each word selected best suggestions for each word 14 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 preprocess(inputtext) tokenizedwordlist = tokenize(inputtext) for each word w in tokenizedwordlist if w is sinhala and w is not in uniquewordlist and w is not in exceptionwordlist then append w to uniquewordlist end if end for return uniquewordlist fig. 3. pre-processing algorithm 2) permutation generation module as identified in the section iii-a, phonetic similarity of letters can cause spelling mistakes in a word. similar sounding groups of letters found in this exercise include {ක,ඛ}=/k/, {ග,ඝ}=/g/, {ච,ඡ}=/tʃ/, {ජ,ඣ}=/dʒ/, {ට,ඨ}=/ʈ/, {ඩ,ඪ}=/ɖ/, {ත,ථ}=/t /, {ද,ධ}=/d /, {ඳ,ප}=/p/, {ඵ,බ}=/b/, {න,ණ}=/n/, {ර,ශ}=/l/, {඿,ල,඾}=/s/ or /ʃ/ and {ඥ,ඤ}=/ɲ/. the permutation-generation module accepts a sinhala word processed by the pre-processing module and generates permutations by searching the word for above similar sounding letters and substituting them with their corresponding letters that belong to the same group. among the generated words, there can be words with correct spellings, the given (source) word itself and words with incorrect spellings. for example, given the word „සුඳතර‟ /supat alǝ/ popular, the module will generate and return a list containing 24 tokens including සුඳතර, සුඳතශ, සුඳථර, සුඳථශ, සුපතර, සුපතශ, සුපථර, සුපථශ, ශුඳතර, ශුඳතශ, ශුඳථර, ශුඳථශ, ශුපතර, ශුපතශ, ශුපථර, ශුපථශ, ෂුඳතර, ෂුඳතශ, ෂුඳථර, ෂුඳථශ, ෂුපතර, ෂුපතශ, ෂුපථර and ෂුපථශ. permutation generation algorithm the function generatepermutations accepts a sinhala word processed by the preprocess function and returns a list containing all generated permutations. the permutations are generated by searching the word for similar sounding letters and substituting them with corresponding letters that belong to the same group. generatepermutations(w) similarlettergroups={{ක,ඛ},{ග,ඝ},{ච,ඡ}, {ජ,ඣ}, {ට,ඨ}, {ඩ,ඪ}, {ත,ථ}, {ද,ධ}, {ඳ,ප},{ඵ,බ},{න,ණ},{ර,ශ}, {඿,ල,඾},{ඥ,ඤ}} permutationlist=[] for each letter l in w if l found in similarlettergroup g for each similarletter in g w = replace l with similarletter in w if w not in permutationlist append w to permutationlist results = generatepermutations(w) append results to permutationlist end if end for end if return permutationlist fig. 4. permutation generation algorithm 3) best suggestion selection module this is the core module of the algorithm. this module involves the detection and correction of spelling errors. the n-gram statistics computed from the ucsc sinhala corpus is used in this module. a distinct word list along with word frequencies (word unigram frequencies), syllable trigram frequencies and syllable bigram frequencies have been precomplied and stored in a database for fast retrieval and efficient processing (see section v for details of the bigram and trigram counting algorithm). in the first step, word unigram frequencies obtained from the corpus are used to rank the words generated from the permutation generation module and to choose the best suggestion among the generated words. the word unigram frequency corresponding to each generated word is obtained from the database. the word with the highest frequency is chosen as the best suggestion. if none of the generated words are found in the corpus, i.e. the word unigram frequencies returned zero for all the generated words, syllable trigram and bigram frequencies are used to select the best suggestion in the successive steps. if the generated word consists of more than three syllables, it will be divided into overlapping sequences of three syllables. then, for each three syllable sequence, the corresponding pre-computed trigram frequencies are obtained from the database and summed up to get an overall score for the generated word. if the summed up trigram frequencies yield zero for a certain word, the word will be divided into repetitive chunks of two syllables and pre-computed syllable bigram frequencies will be summed up to get an overall score for the word. similarly, if the generated word consists of two syllables, the syllable bigram frequency is used. generated words are sorted according to the overall score obtained. the word with the highest score is chosen as the best suggestion. output of this module is the best suggestion for a given word. the functionality of the above module is explained below using examples. example #1: suggestion of the best word using word unigram frequencies. input word: කුලුන /kulunǝ/ (column) permutationlist = කුලුන, කුලුණ, කුළුන, කුළුණ, ඛුලුන, ඛුලුණ, ඛුළුන, ඛුළුණ step 1: obtaining the corresponding word unigram frequencies from the corpus. කුලුන 2 කුලුණ 1 කුළුන 0 කුළුණ 43 ඛුලුන 0 ඛුලුණ 0 ඛුළුන 0 ඛුළුණ 0 step 2: selecting the best suggestion (word with the highest frequency) best suggestion: කුළුණ example #2: suggestion of the best word using syllable trigram frequencies. input word = පැඛිළශණ඼ා (a word with incorrect spellings meaning falter – the word with the correct spelling is not included in the word unigram list.) asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 15 december 2010 the international journal on advances in ict for emerging regions 03 permutationlist = ඳැකිළරන඼ා, ඳැකිළරණ඼ා, ඳැකිළශන඼ා, ඳැකිළශණ඼ා, ඳැඛිළරන඼ා, ඳැඛිළරණ඼ා, ඳැඛිළශන඼ා, ඳැඛිළශණ඼ා, පැකිළරන඼ා, පැකිළරණ඼ා, පැකිළශන඼ා, පැකිළශණ඼ා, පැඛිළරන඼ා, පැඛිළරණ඼ා, පැඛිළශන඼ා, පැඛිළශණ඼ා overall word score computation method for the word „පැකිළරන඼ා‟ is given in the following: step 1: decomposition of the word into repetitive three syllable sequences. පැකිළරන඼ා = පැකිළර + කිළරන + ළරන඼ා step 2: obtaining the syllable trigram frequencies for above syllable sequences from the database. පැකිළර 0 කිළරන 27 ළරන඼ා 43 step 3: adding above frequencies to obtain the overall score for the word පැකිළරන඼ා = 0+27+43 = 70 similarly, the word „ඳැකිළරන඼ා‟ yields a score of 95 when computed using syllable trigram frequencies. overall scores will be computed for the other generated words in the same manner. words are then sorted according to the computed scores and the word with the highest score is returned as the best word. the best word suggested for the above input word for instance is: ඳැකිළරන඼ා. if summed up syllable trigram frequencies yield zero for all generated words, the word will be passed to the bigram computation component. the best word selection using the bigram computation component operates in a similar manner to the trigram computation method explained above. example #3: suggestion of the best word using syllable bigram frequencies. input word = ඛළදෝපැනිඹා (a word with incorrect spellings meaning firefly – the word with the correct spelling is not included in the word unigram list.) permutation list = කළදෝඳැනිඹා, කළදෝඳැණිඹා, කළදෝපැනිඹා, කළදෝපැණිඹා, කළධෝඳැනිඹා, කළධෝඳැණිඹා, කළධෝපැනිඹා, කළධෝපැණිඹා, ඛළදෝඳැනිඹා, ඛළදෝඳැණිඹා, ඛළදෝපැනිඹා, ඛළදෝපැණිඹා, ඛළධෝඳැනිඹා, ඛළධෝඳැණිඹා, ඛළධෝපැනිඹා, ඛළධෝපැණිඹා overall word score computation method for the word „ඛළදෝපැනිඹා‟ is given in the following: step 1: decomposition of the word into repetitive two syllable sequences. ඛළදෝපැනිඹා = ඛළදෝ + ළදෝපැ + පැනි + නිඹා step 2: obtaining the syllable bigram frequencies for above letter sequences from the database ඛළදෝ 2 ළදෝපැ 0 පැනි 0 නිඹා 2630 step 3: adding above frequencies to obtain the overall score for the word ඛළදෝපැනිඹා = 2+0+0+2630=2632 similarly, the word „කළදෝඳැනිඹා‟ yields the highest score of 2875 when computed using syllable bigram frequencies. hence it is selected as the best suggestion. as the output of this module, a list (substitutionlist) containing the original words and their corresponding best words is returned (e.g. substitutionlist[„කුලුන‟] = „කුළුණ‟, substitutionlist[„පැකිළරන඼ා‟] = „ඳැකිළරන඼ා‟ etc). best suggestion selection algorithm the function selectbestsuggestion accepts a generated permutation list and selects the best suggestion from the list based on word unigram, syllable bigram or syllable trigram frequencies. selectbestsuggestion(permutationlist, originalword) ## uni-gram comparison highestunigramfrequency = 0 bestword = originalword for each word w in permutationlist wordunigramfrequency = getunigramcountfromdb(w) if wordunigramfrequency > highestunigramfrequency then highestunigramfrequency = wordunigramfrequency bestword = w end if end for ## tri-gram comparison if bestword is equal to originalword then highesttrigramscore=0 bestword=originalword for each word w in permutationlist threesyllablechunks=[] if length of w > 3 then threesyllablechunks = decomposewordintotrigrams(w) wordtrigramscore=0 for each threesyllablechunk in threesyllablechunks wordtrigramscore = wordtrigramscore + gettrigramcountfromdb (threesyllablechunk) end for if wordtrigramscore > highesttrigramscore then highesttrigramscore = wordtrigramscore bestword=w end if end if end for end if 16 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 ## bi-gram comparison if bestword = originalword then highestbigramscore = 0 bestword = originalword for each word w in permutationlist twosyllablechunks = [] if length of w > 2 then twosyllablechunks = decomposewordintobigrams(w) wordbigramscore = 0 for each twosyllablechunk in twosyllablechunks wordbigramscore = wordbigramscore + getbigramcountfromdb (twosyllablechunk) end for if wordbigramscore > highestbigramscore then highestbigramscore = wordbigramscore bestword = w end if end if end for end if return bestword fig. 5. best suggestion selection algorithm 4) post processing module in this module, the input text will be scanned from the beginning and the words that are in the substitution list are replaced with the best suggestions. this methodology preserves the original formatting information of the text, including non-sinhala words, numerals, punctuations and space among others. post processing algorithm the function postprocess accepts the original input text and the substitution list as the parameters. it will scan the input text for the words that are in the substitution list and replace them with corresponding best suggestions. then, it will return the output text, which will be rendered by the display function. postprocess(inputtext, substitutionlist) outputtext = inputtext for each word w in outputtext if w is in substitutionlist then replace w in outputtext with substitutionlist[w] end if end for return outputtext fig. 6. post processing algorithm iv. evaluation and results there is neither a standard lexicon for sinhala spell checker evaluation, nor previous work reported for comparing with. therefore, in order to evaluate our system, we used 5505 words obtained from a well known printed dictionary of inherently difficult and commonly misspelled words [15] as the baseline. the first test was straight forward. each entry was passed to the system and the output of the system was compared with the original entry. the second and third tests were much more stringent. the second test involved programmatically altering the original entries of the test data set so that all the dental letters were replaced by their corresponding retroflex letters. furthermore, the unaspirated letters were replaced by their aspirated counterparts. these words were then used as the input to our spell checker, and the output was compared with the original unaltered entries. similarly, in the third test, the aspirated letters in the original entries were altered to the corresponding unaspirated counterparts and dental letters were replaced by the corresponding retroflex letters. these words were then analyzed by our speller. the output was compared with the original unaltered entries. a fourth test was carried out by obtaining 20 randomly chosen blog articles published online [24]. the articles were analyzed for spelling errors by our system. for each of the above tests, the errors detected by our system were manually analyzed by an expert. the analysis revealed that our system has wrongly identified a small number of words as invalid (false negatives). moreover, a few words which were identified as valid by our system were actually invalid (false positives). the test results are summarized in table i. the results show an overall accuracy of over 82% for the proposed algorithm. manual analysis of the words that were wrongly suggested by our system as correct revealed that these words are not found in the corpus. therefore, such words were suggested by the trigram or bigram calculation methods described in section iii-b-3. prominent observations made by further analyzing such words are given below: 1. අනුis a commonly used prefix in sinhala. it has a higher syllable bigram and syllable trigram frequency. therefore it can be erroneously substituted for අණුto suggest a word with incorrect spelling. e.g. අණුජීවිඹා  අනුජීවිඹා 2. the bigram frequency of the letter sequence -යන shows that it is one of the most frequently used phonemic combination. therefore, it can be erroneously substituted for -යණproducing a word with incorrect spelling. e.g. කාර්මීකයණඹ  කාර්මීකයනඹ table i evaluation results of subasa spell checker test no. total number of words # of correct suggestions # of incorrect suggestions accuracy (%) 1 5505 4728 777 85.89 2 5505 4616 889 83.85 3 5505 4588 917 83.34 4 3304 2501 803 75.70 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 17 december 2010 the international journal on advances in ict for emerging regions 03 3. the trigram frequency of the suffix -ර්ලන shows that it is one of the most frequently used phonemic combination in the language. therefore, it can be erroneously substituted for -ර්඾ණ. e.g. ප්රeකර්඾ණඹ  ප්රeකර්඾නඹ 4. similarly, the trigram frequency of the suffix -ං ලඹ is one of the most frequently used phonemic combinations in the language. therefore, it can be erroneously substituted for -ං ඿ඹ. e.g. කඨිනානි඿ ඿ඹ කටිනානි඿ ලඹ to compare the effectiveness of our algorithm against currently available algorithms used by popular applications, we then performed the same tests on microsoft office sinhala spell checker and sinhala ubiquitous spell checker version 3.0.1-beta-1 for openoffice.org. in all of these tests, we used the first correction the particular spell checker suggested if it identified an incorrect word. we made the assumption that these spellcheckers suggested corrections based on priorities and relevance and the first suggestion was the most appropriate correction for the relevant error. table ii summarizes the test results for the microsoft spell checker. it shows an overall accuracy of 50.35%. for naturally occurring text (blog corpus) it reports an accuracy just above 64%, which is not very useful in practice. table 3 summarizes the test results for the sinhala ubiquitous spell checker version 3.0.1-beta-1 for openoffice.org. these results show an average accuracy of 31.41%, a lower value compared to the microsoft office spell checker. however, it shows a better performance of over 73% for the blog text where spelling mistakes are not deliberate. from these tests it is evident that our algorithm performs better in naturally occurring text, and much better in extremely bad cases of misspelling such as those simulated for tests 1, 2 and 3. v. discussion the n-gram statistics used in this spell checker were precompiled and stored in a database. the current database contains 440022 unique words with their frequency of occurrence in the corpus (word unigrams), 166460 distinct three syllable sequences (syllable trigrams) with their frequency of occurrence and 46878 two syllable sequences (syllable bigrams) with their frequency of occurrence. our algorithm, combined with these statistics, is capable of processing virtually any given word. the algorithm used to calculate the syllable trigram frequencies is listed below: for each textfile in the text corpus tokens=tokenize(textfile) for each token in tokens chunuks =dividetokenintothreesyllablechunks() for each threesyllablechunk in chunks if threesyllablechunk is in database then occurrence=occurrence+1 else if insertintodatabase(threesyllablechunk) occurrence=1 end if end for end for end for fig. 7. syllable trigram frequency algorithm the syllable bigrams were calculated in a similar manner. in our algorithm, the complexity and efficiency lie in the permutation generation module. in this paper, we define the term ‘complexity’ as the maximum number of words that can be generated for a given word. using the same distinct word list obtained from the corpus, a few experiments were carried out to find the most complex sinhala word (additional details of these experiments are given in appendix c). the study revealed that a word of local origin, „පු඿්තකාරාධිඳතිතුභන්රා‟ – librarians, can generate up to 3072 permutations. this word can be further inflected as „පු඿්තකාරාධිඳතිතුභන්රාත්‟, increasing the number of generated words up to 6144. moreover, some lengthy borrowed words from pali such as „ඳ චුඳාදාන඿්කන්ධඹන්ළගන්‟ (6144) and „ළන඼඿ඤ්ඤාණා඿ඤ්ඤාඹතනඹාළේද‟ (9213) can generate up to 12288 permutations due to further inflections (e.g. „ඳ චුඳාදාන඿්කන්ධඹන්ළගනුත්‟). however, such words are not used in everyday writing. analysis of words with complexity higher than 6144 revealed that most such words are borrowed words that are no longer used in ordinary sinhala writing. some other words in the test set were found to be erroneous words (e.g. words with unicode conversion errors, non-delimited words etc). though it is safe to declare 6145 as the threshold for the complexity, allowing room for inflections of borrowed words and in order to shield the table ii evaluation results of microsoft office spell checker test no. total number of words # of correct suggestions # of incorrect suggestions accuracy (%) 1 5505 3453 2052 62.72 2 5505 1668 3837 30.29 3 5505 2416 3089 43.88 4 3304 2131 1171 64.49 table iii evaluation results of open office spell checker test no. total number of words # of correct suggestions # of incorrect suggestions accuracy (%) 1 5505 1078 4427 19.58 2 5505 720 4785 13.08 3 5505 1047 4456 19.02 4 3304 2443 861 73.94 18 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 system from intentional attacks, we have set a threshold of 20000 as the maximum complexity that can be handled by the current implementation of our algorithm. the reason for this limitation is to avoid deliberate attempts to break the system by inputting a letter sequence with extremely high complexity. any input that will result in more than 20000 permutations will be left unprocessed. such words will be specially marked as „unchecked’ in the output. all modern computers are capable of handing large amounts of data in a fast and reliable manner due to increased memory capacity and high-speed parallel processing capabilities. therefore, the generation of 20000 permutations can be completed within a negligible amount of time. furthermore, the study revealed that the word length does not significantly affect the complexity of a word (see appendix c for details). the average word length (i.e. the number of unicode code points) of sinhala words was found to be 4. the locally originating maximum length word was found to be „ළජෝෝතිල්ලා඿්ත්රnයින්ළේ‟ – astrologists’ in this study. it is interesting to note that there can be extremely lengthy borrowed words (from pali or sanskrit) such as „ළනත්රnාක්඾්ටරටහාවාධිර ජභර඿ර඿්බඵ්ලරීදළද ර්රතාි‟ though such words are very rare in modern texts. vi. conclusion and future work the implementation of an n-gram based spell checker for sinhala has been discussed in this paper. by substituting phonetically similar characters in a given word, permutations are generated and sent to the best suggestion selection module. the best suggestion selection module uses three techniques for ranking the generated permutations. the three techniques are based on word unigram frequencies, syllable trigram frequencies and syllable bigram frequencies, which are pre-computed from a raw text corpus. empirical evaluation of our algorithm using four different test cases has revealed extremely promising results. a platform independent desktop application with a graphical user interface (gui) and a web-based version of the same have been developed using the python programming language to demonstrate the functionality of the proposed algorithm. the usage of the applications are described in appendix b. the accuracy of corrections suggested by the algorithm can be increased by simply adding non-existing words to the distinct word list and by increasing the unigram frequencies of words with correct spellings. it is expected to incorporate a crowd source based automated mechanism for improving the accuracy of the current spell checker. further enhancements planned include the optimization of the permutation generation module by storing and processing data using a trie [3] data structure. this will help to effectively prune a large number of generated words to only those that appear in the distinct word list. the current algorithm is only capable of addressing substitution errors. the success of the application of the reverse dictionary lookup methodology for other indic languages [1] [4] [5] has motivated us to attempt the same approach for sinhala. this will enable the algorithm to capture other types of spelling errors such as insertion, deletion and transposition [3] [4]. research is underway to investigate the incorporation of the n-gram score computation methodology proposed in [9] for this purpose. the algorithm applied for sinhala, can also be used to construct spell checkers for other languages in which linguistic resources are scarce or non-existent. it is of particular relevance to languages which have rich morphology and thus are difficult to completely enumerate in a lexicon. furthermore, the same algorithm can be utilized for the identification of homographs and common spelling mistakes found in sinhala. to the best of our knowledge this is the first study and evaluation of a sinhala spell checker algorithm. this study has opened up new opportunities for further research and will provide a baseline for comparison and evaluation of sinhala spell checking algorithms in future. appendix a sinhala spelling rules 1. use of retroflex ණ /ɳ/ in sinhala 1. intervocalic sanskrit and pali ණ /ɳ/ does not get evolved [14]. example: ළයෝවණ /ro:hǝɳǝ/ > ළයෝවණ /ro:hǝɳǝ/ > රුහුණු /ruhuɳu/ 2. sanskrit „඾්ණ‟ /ʂɳ/ > pali „ණ්ව‟ /ɳh/ > sinhala ණ /ɳ/ [14] example: උ඾්ණ /uʂɳǝ/ > උණ්ව /uɳhǝ/ > උණු /uɳu/, උණ /uɳǝ/ 3. retroflex ණ /ɳ/ is used in front of a retroflex consonant. retroflex consonants are ට /ʈ/, ඨ /ʈ h /, ඩ /ɖ/, ඪ /ɖ h /. examples: ඝණ්ටාය /g h aɳʈa:rǝ/, කාණ්ඩ /ka:ɳɖǝ/, චණ්ඩාර /ʧaɳɖa:lǝ/ exceptions: i. dental න /n / is used before a retroflex consonant in words borrowed from western languages [16]. examples: කවුන්ටයඹ /kaun ʈǝrǝjǝ/, කැන්ටිභ /kæn ʈimǝ/ ii. dental න /n / is used without a vowel before the letter /ʈ/ in dative case nouns [16]. examples: දරු඼න්ට /d aruvan ʈǝ/, මිනිසුන්ට /min isun ʈǝ/ asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 19 december 2010 the international journal on advances in ict for emerging regions 03 iii. in sinhala, the suffix ට /-ʈǝ/ occurs in infinitives. in that case the /n / in front of it doesn‟t appear as a retroflex /ɳ/. example: රඵන්ට /laban ʈǝ/, කිඹන්ට /kijan ʈǝ/ 4. in certain constructions, germinated ණ /ɳ/ occurs. but those constructions belong to indian languages. examples: උප්ඳර඼ණ්ණ /uppǝlǝvaɳɳǝ/, කණ්ණාඩි /kaɳɳa:ɖi/ 5. retroflex ණ /ɳ/ is used after ය /r/ in nouns and adjectives [15]. examples: ඼ර්ණ /varɳǝ/, ඳහාණත /pariɳǝt ǝ/, තරුණ /t aruɳǝ/ exceptions : i. dental න /n / is used before the letter ය /r/ in nouns. examples: ළත යන් /t oran /, රෑන් /r :n /, ළය න් /ron / [16] ii. dental න /n / is used after the letter ය /r/ in verbs [15]. examples: කයන඼ා /kǝrǝn ǝva/, භයන඼ා /marǝn ǝva/, ළතෝයන඼ා /t o:rǝn ǝva/ iii. in present participles, the suffix න /-n ǝ/ occurs. it is not written as a retroflex ණ /ɳ/ in the vicinity of ය /r/. present participle noun examples: භයන /marǝn ǝ/ භයණ /marǝɳǝ/ iv. dental න /n / is used in the imperative verb suffix නු /-n u/. examples: කයනු /kǝrǝn u/, දයනු / d arǝn u/, වහානු /harin u/ 6. in certain evolved words, ණ /ɳ/ appear in original words. in certain borrowed words ය /r/ is not fully recorded. but half of it is called rakaransa. in the vicinity of rakaransa retroflex ණ /ɳ/ is written. examples: ආභන්ත්රnණ /a:man t rǝɳǝ/, ඳහාත්රnාණ /parit ra:ɳǝ/, ළේtණි /ʃre:ɳi/ 7. retroflex ණ /ɳ/ is used after the retroflex ඾ /ʂ/. examples: ත්඾්ණා /t ruʂɳa:/, ගළේ඾ණ /gave:ʂǝɳǝ/, දක්ෂිණණ /d akʂiɳǝ/ 8. in the honorific suffix ආණ /-a:ɳǝ/ and its variations always occurs ණ /ɳ/ [14]. example: ආණ /-a:ɳǝ/ අණු /-aɳu/ අණි /aɳi/ පිඹාණන් /pija:ɳan / ළතයණු඼න් /t erǝɳuvan / දිඹණි /d ijǝɳi/ 9. retroflex ණ /ɳ/ is used in suffixes that ණ /ɳǝ/ and ණි /-ɳi/ in intransitive past tense verbs. examples: ණ /-ɳǝ/ ණි /-ɳi/ siɳǝ v ʈǝhiɳi 10. retroflex ණ /ɳ/ is used in suffixes that ණු /ɳu/ and ණ /-ɳǝ/ in ancient intransitive verb particles. examples: ණු /-ɳu/ ණ /-ɳǝ/ ඉදුණු /id uɳu/ ඼ැටුණ /v ʈuɳǝ/ 2. use of dental න /n / in sinhala 1. sanskrit ර්ණa /rɳ/ > pali ණ්ණ /ɳɳ/ > sinhala න /n / [14] example: කර්ණa /karɳǝ/ > කණ්ණ /kaɳɳǝ/ > කන් /kan / 2. sanskrit ඍණ්ව /rhɳ/ > pali ණ්ව /ɳh/ > sinhala න /n / [14] example: ග්ව්ණාති /grhɳa:t i/ > ගණ්වාති /gaɳha:t i/ > ගනු /gan u/ 3. sanskrit ඥ /ʤɲ/ > pali ඤ /ɲ/, ඤ්ඤ /ɲɲ/ > sinhala න /n / [14] example: ඥාති /ʤɲa:t i/ > ඤාති /ɲa:t i/ > නෑ /n æ:/ 4. sanskrit නෝ /n j/, ණෝ /ɳj/ > pali ඤ්ඤ /ɲɲ/ > sinhala න /n / [14] example: පුණෝ /puɳjǝ/ > පුඤ්ඤ /puɲɲǝ/ > පින් /pin / 5. sanskrit ල්න /ʃn /, ඾්ණ /ʂɳ/ > pali ඤ්ව /ɲh/ > sinhala න /n / [14] example: ප්රeල්න /praʃn ǝ/ > ඳඤ්ව /paɲhǝ/ > ඳැන /pæn ǝ/ 6. sanskrit ණ /ɳ/ > pali න /n / > sinhala න /n / [14] example: නිර්වාාණ /n irva:ɳǝ/ > නිබ්ඵාන /n ibba:n ǝ/ > නි඼න් /n ivan / 20 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 7. dental න /n / is used without a vowel in front of a dental consonant. dental consonant are ත /t /, ථ /t h /, ද /d /, ධ /d h / [16] examples: ත /t / චින්තන /ʧin t ǝn ǝ/, ලාන්ත /ʃa:n t ǝ/, කාන්තා /ka:n t a:/ ථ/t h / ග්රuන් ථ /gran t h ǝ/, භන්ථ /man t h ǝ/ ද /d / ළක න්ද /kon d ǝ/, භන්දිය /man d ira/, සින්දු /sin d u/ ධ /d h / අන්ධ /an d h ǝ/, ඿ම්ඵන්ධ /samban d h ǝ/, ඿න්ධි /san d h i/ 8. if a sinhala word contains a geminated nasal consonant it should be dental න /n /. examples: ආ඿න්න /a:san n ǝ/, ඼න්නම් /van n am/, ඼න්නි /van n i/ 9. dental න /n / is used without a vowel before ඿ /s/ [16]. examples: ඳන්඿්ල /pan sal/, කාන්සි /ka:n si/, ඼වන්ළ඿t /vahan se:/ 10. the nasal that occurs at inanimate noun roots which ends in ඉ /i/ or උ /u/ is dental න /n /. when singular suffix අ /-ǝ/ occurs at end position of a word the last consonant doubles. examples: ගිනි /gin i/, පිනි /pin i/, ඔටුනු /oʈun u/, දුනු /d un u/ 11. dental න /n / is used after ඿ /s/ or ල /ʃ/[17]. example: ඿ /s/ – උදෑ඿න /ud :sǝn ǝ/, ඼ා඿නා /va:sǝn a:/, ළ඿tනා /se:n a:/ ල /ʃ/ දර්ලන /d arʃǝn ǝ/, ප්රeකාලන /prǝka:ʃǝn ǝ/, ශූනෝ /ʃün jǝ/ 12. dental න /n / without a vowel is used in endings of noun roots. examples: ඳා඼වන් /pa:vahan /, ඼දන් /vad an / 13. dental න /n / is used after ය /r/ in compound nouns. examples: ඼වයනු඿ළයන් /vaharan usaren /, පිහානි඼න් /pirin ivan /, ඵණ්ඩායනාඹක /baɳɖa:rǝn a:jǝkǝ/ 3. use of retroflex ශ /ɭ/ in sinhala 1. sanskrit and pali ට /ʈ/, ඨ /ʈ h /, ඩ /ɖ/, ඪ /ɖ h / > sinhala ශ /ɭ/ (jayathilake, 1937) example: කූඨ /ku:ʈ h ǝ/ > කශ /kaɭǝ/ 2. pali ශ /ɭ/ > sinhala ශ /ɭ/ [17] example: ද්ඪ /d rɖ h ǝ/ > දශ්ව /d aɭhǝ/ > දශ /d aɭǝ/ 3. sanskrit and pali ණ /ɳ/ > sinhala ශ /ɭ/ [17] example: ඼ාණිජෝා /va:ɳiʤʤa:/ > ඼ණිජ්ජා /vaɳiʤʤa:/ > ළ඼ළශ඲ /veɭe n d ǝ/ 4. where both ර /l/ and ශ /ɭ/ obtain as alternatives in pali sinhala generally adopts the latter ශ /ɭ/ [14]. example: දලි්බද /d aliddǝ/, දළි්බද /d aɭid d ǝ/ > දිළිඳු /d iɭi n d u/ 5. retroflex ශ /ɭ/ is used on behalf of ය /r/ in past participles which are composed from verb roots ending in ය /r/. examples: කය /karǝ/ කශ /kaɭǝ/ භය /marǝ/ භශ /maɭǝ/ 6. prefix පිළි /piɭi-/, which is derived from a sanskrit prefix ප්රeති /prət i-/, is used with retroflex ශ /ɭ/. examples: පිළිගන්න඼ා /piɭigan n ǝva/, පිළිතුරු /piɭit uru/, පිළිඵ඲ /piɭiba n d ǝ/ 7. retroflex ශ /ɭ/ is used excessively before the nasalized consonants ඟ / ŋ g/, ඲ / n d /, ම / m b/. examples: ශඟ /ɭa ŋ gǝ/, ශ඲ /ɭa n d ǝ/, ළක ශම /koɭǝ m bǝ/ [15] exceptions: 1. following words use the dental ර /l/ [15]. examples: ළඳ රම /pola m bǝ/, ඿රම /salǝ m bǝ/ 4. use of dental ර /l/ in sinhala 1. sanskrit and pali ර /l/, ්ලර /ll/ > sinhala ර /l/ example: භව්ලරක /mahallǝkǝ/ > භවලු /mahalu/ [17] 2. sanskrit and pali ය /r/, න /n / > ර /l/ [14]. example: කරුණා /karuɳa:/ > කුලුණු /kaluɳu/ ඼න /van ǝ/ > ඼්ල /val/ 3. the halant form ර /l/ occurring at the end position of noun roots is always the dental ර /l/. when such words combine with the vowels, they retain the dental ර /l/. examples: කකුර /kakulǝ/ කකු්ල /kakul/, ගළඩ ර /gaɖolǝ/ ගළඩ ්ල /gaɖol/, කයර /karǝlǝ/ කය්ල /karal/ 4. when doubling the word-end consonant in the inflection of noun roots ending in ඉ /i/ or උ /u/, the dental ර /l/ is retained. examples: ඇඟිලි /æ ŋ gili/ – ඇඟි්ලර /æ ŋ gillǝ/ භවලු /mahalu/ භව්ලරා /mahalla: / asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 21 december 2010 the international journal on advances in ict for emerging regions 03 5. use of retroflex ඾ /ʂ/ in sinhala 1. retroflex ඾ /ʂ/ is used after ක /k/ without a vowel. examples: අක්ෂfය /akʂǝrǝ/, දක්ෂf /d akʂǝ/, ික්ෂුʃ /b h ikʂu/ 2. retroflex ඾ /ʂ/ is used without a vowel before the letters of ට /ʈ/ and ඨ /ʈ h /. examples: අධි඾්ඨාන /ad h iʂʈ h a:n ǝ/, ළජෝt඾්ඨ /ʤje:ʂʈ h ǝ/, ධර්මි඾්ඨ /d h armiʂʈ h ǝ/ දි඾්ටිඹ /d iʂʈijǝ/, දු඾්ට /d uʂʈǝ/, ශි඾්ට /ʃiʂʈǝ/ 6. use of palatal ල /ʃ/ in sinhala 1. palatal ල /ʃ/ is used before dental න /n /. examples: දර්ලනඹ /d arʃǝn ǝjǝ/, ළ්බලනඹ /d e:ʃǝn ǝjǝ/, ප්රeල්න /praʃn ǝ/ appendix b implementation of the algorithm a platform-independent desktop application with a graphical user interface (gui) and a web-based version of the same has been developed using python programming language to demonstrate the functionality of the algorithm proposed in this paper. the desktop version of the implementation (figure b.1) automatically corrects the spellings for the input text. the left pane contains the input text and the right pane contains the corrected text. the web-based version (figure b.2) highlights the incorrect words and provides suggest corrections as a rightclick context menu. the user can replace the text there in. in addition, correct words are flagged differently to provide improved visual feedback to the user. the suggestions provide additional information as to from which n-gram statistic (word unigram, syllable trigram or syllable bigram) the suggestion was made. the web-based version has been further developed to facilitate user submissions of corrections to the system for improving the quality of the spellchecking functionality. this is available at http://www.subasa.net/. example: ඳාඨ඿ාරාචාහානිඹ „ශදරු භයණ අනුඳාතඹ ඉවර ඹෑභ‟ පිළිඵ඲ ළ්බලණඹක් අලුත්ගභ කණි඾්ඨ විදෝාරයඹ ේ඼ණාගායළදී ඳැ඼ැත්ීදඹ. appendix c a few experiments were performed to investigate the relationship between complexity, word length and corpus word frequency. we computed complexity and length of all unique words (440,022) found in the 10 million (10,132,451) word ucsc sinhala corpus. a. complexity vs. length the first experiment investigated the relationship between word length and the complexity. having removed any duplicates for word length and complexity pairs, ln (complexity) vs. word length graph was plotted. (see figure `c.1). this graph shows that there can be words with the same length but different complexity values. the graph was of sawtooth type and when the word length is above 30, complexities for a particular word length decreased; at word length 60, complexity was 32. this showed that there is no proportionality between complexity and word length. b. complexity vs. frequency to observe the relationship between frequency and complexity, ln(frequency) vs. ln(complexity) graph was drawn (see figure c.2), where frequency for a particular complexity meant the summation of all frequencies corresponding to the words having that same complexity. this also was a sawtooth type graph and generally when the complexity increased the frequency decreased. but there was no regular pattern of decrease and we can safely say that there is no strong relationship between the frequency and the complexity. c. frequency vs. length the third graph analysed the relationship of frequency to the word length to get a general idea about the distribution of words in the corpus. this graph plotted ln(frequency) against the word length (see figure c.3). up to word length of 4, the graph shows a drastic increase. at word length 4, the graph reaches the maximum frequency of 1647009. further increase of word length shows a downward trend of frequency. it was also observed that the average sinhala word length is 4. for word lengths beyond 25 it showed a sawtooth type behaviour and as the word-length goes over 43, it showed only a very low frequency most of the times. these extremely lengthy words were found to be erroneous words (i.e. corpus noise: typos, cleaning errors, unicode conversion errors) in the raw corpus. from all the above analyses, we can clearly say there is no relationship between the word length and the complexity. acknowledgment we are immensely grateful to professor tissa jayawardenan for reviewing the linguistic rules. we also thank our colleagues vincent halahakone, namal udalamatta and jeewanthi liyanapathirana who provided insight and expertise that greatly assisted this research. we would also like to thank the localisation research centre, university of limerick, ireland; especially reinhard schäler and karl kelly for their invaluable support. references [1] b. b. chaudhuri, "towards indian language spell-checker design," language engineering conference (lec'02), 2002, p. 139. [2] m. das, s. borgohain, j. gogoi, s. b. nair, "design and implementation of a spell checker for assamese," language engineering conference (lec'02), 2002, p. 156. http://www.subasa.net/ 22 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 [3] k. kukich, techniques for automatically correcting words in text. in acm computing surveys, vol. 24, no. 4, 1992, pp. 377-439. [4] b. b. chaudhuri, "reversed word dictionary and phonetically similar word grouping based spell checker to bangla text," in proceedings of the lesal workshop, mumbai, india, 2001. fig b.1. the desktop application fig b.2. the web-based application asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi 23 december 2010 the international journal on advances in ict for emerging regions 03 [5] t. santhosh, k. g. varghese, r. sulochana, and r. kumar, "malayalam spell checker," in proceedings of the international conference on universal knowledge and language 2002, goa, india, 2002. [6] v. dixit, s. dethe, and r. k. joshi, "design and implementation of a morphology-based spellchecker for marathi, an indian language," archives of control sciences, vol. 15, no. 3, (n.d.), pp. 301-308. [7] t. dhanabalan, r. parthasarathi, and t. v. geetha, "tamil spell checker", in proceedings of 6th tamil internet 2003 conference, chennai, tamilnadu, india, 2003. [8] s. hussain, n. durrani, and s. gul. (2005). survey of language computing in asia, center for research in urdu language processing, national university of computer and emerging sciences. [online]. available: http://www.panl10n.net/english/outputs/survey/sinhala.pdf. [9] f. ahmed, e. w. d. luca, and a. nürnberger, "multispell: an ngram based language-independent spell checker," in poster session of 8th international conference on intelligent text processing and computational linguistics (cicling-2007), mexico city, mexico, 1997. [10] a. wasala, r. weerasinghe, and k. gamage, “„sinhala grapheme-tophoneme conversion and rules for schwa epenthesis,”. in proceedings of the coling/acl main conference poster sessions, 2006, pp. 890-897. [11] j. b. disanayaka, sinhala akshara vicharaya (sinhala graphology), sumitha publishers, 2006, isbn: 955-1146-44-1. [12] w. s. karunatillake, an introduction to spoken sinhala ( 3rd edn.), m. d. gunasena & co. ltd., 217, olcott mawatha, colombo 11, sri lanka, 2004, isbn 95521-0878-0. [13] j. w. gair, and w. s. karunatillake, the sinhala writing system, a guide to transliteration, sinhamedia, p.o. box 1027, trumansburg, ny 14886, 2006. [14] j. lanerolle, the uses of න-/n/,ණ-/ɳ/ and ර-/l/, ශ -/ɭ/ in sinhalese orthography, the times of ceylon company limited, colombo, 1934. [15] s. koparahewa. dictionary of sinhala spelling, s. godage and brothers, colombo 10, sri lanka, 2006, isbn 955-20-8266-8. [16] j. b. disanayaka, the usage of dental and cerebral nasals, sumitha publishers, 2007, isbn: 978-955-1146-66-5. [17] d. b. jayathilake, sinhala shabdakoshaya (sinhala dictionary), prathama bhagaya (vol 1), sri lankan branch of royal asian society, 1937. [18] http://www.mysinhala.com/features.htm, http://www.microimage.com/press/microimagedirectoctober2005.ht m and http://www.scienceland.lk/spell-checker.html [19] http://hunspell.sourceforge.net/ [20] http://wiki.services.openoffice.org/wiki/dictionaries#sinhala_.28sri_ lanka.29 [21] https://addons.mozilla.org/en-us/firefox/addon/13981/ [22] http://www.sasrutha.com/ and http://www.facebook.com/note.php?note_id=176602427414 [23] the ucsc sinhala corpus is a ten million word raw corpus compiled from various sources. http://www.ucsc.cmb.ac.lk/ltrl/?page=panl10n_p1&lang=en&style=d efault#corpus [24] the blog posts were collected from http://blogs.sinhalabloggers.com/ (a popular sinhala unicode blog syndicator) on 1 st of september, 2009. fig c.1. the graph of ln(complexity) vs. word length http://www.panl10n.net/english/outputs/survey/sinhala.pdf http://www.microimage.com/press/microimagedirectoctober2005.htm http://www.microimage.com/press/microimagedirectoctober2005.htm http://www.scienceland.lk/spell-checker.html http://hunspell.sourceforge.net/ http://wiki.services.openoffice.org/wiki/dictionaries#sinhala_.28sri_lanka.29 http://wiki.services.openoffice.org/wiki/dictionaries#sinhala_.28sri_lanka.29 https://addons.mozilla.org/en-us/firefox/addon/13981/ http://www.sasrutha.com/ http://www.facebook.com/note.php?note_id=176602427414 http://www.ucsc.cmb.ac.lk/ltrl/?page=panl10n_p1&lang=en&style=default#corpus http://www.ucsc.cmb.ac.lk/ltrl/?page=panl10n_p1&lang=en&style=default#corpus http://blogs.sinhalabloggers.com/ 24 asanka wasala, ruvan weerasinghe, randil pushpananda, chamila liyanage and eranga jayalatharachchi the international journal on advances in ict for emerging regions 03 december 2010 fig c.2. the graph of ln(frequency) vs. ln(complexity) fig c.3. the graph of ln(frequency) vs. word length  the international journal on advances in ict for emerging regions 2010 03 (02) :34 47  abstract—this paper presents six-degrees-of-freedom ship simulation system which allows simulated ship handling under complicated environmental conditions and threat scenarios. this system simulates real-time six degrees of freedom ship motions (pitch, heave, roll, surge, sway, and yaw) under user interactions, and environmental conditions. the simulation system consists of a ship motion prediction system and a perception enhanced immersive virtual environment with greater ecological validity. this ship motion prediction system uses a few model parameters which can be evaluated by using standard ship maneuvering test or determined easily from databases such as lloyd’s register. this virtual environment supports multiple-display viewing that can greatly enhance user perception. ecological environment for strong sensation of immersion is also developed. the virtual environment facilitates the incorporation of real world ships, geographical sceneries, different environmental conditions and wide range of visibility and illumination effects. this simulation system can be used to demonstrate ship motions, maneuvering tactics and assign focused missions to trainees and evaluate their performance. trainees can use the simulation system to study at their own pace. implementation and operational cost of this ship simulation system is only a fraction of the conventional training involving real ships. i. introduction hip simulations have been used for naval training, ship hull designing, simulating military scenarios and entertainment activities such as computer games [3] [11]. general naval training applications and computer games are real-time applications and therefore these applications respond in real-time for user interactions. most of the hull designing and simulating military scenarios are non realtime applications with greater accuracy. this type of applications requires extensive computing resources because there are massive numerical equations to solve [1]. today, ship simulations have become an essential tool in maritime education [2] [3]. since ships have six degrees of manuscript received march 22, 2010. accepted december 03, 2010. this research was funded by the national e-learning center, sri lanka and supported by department of electrical and electronic sri lanka navy. damitha sandaruwan is with the university of colombo school of computing, 35, reid avenue, colombo, sri lanka. (e-mail: dsr@ucsc.cmb.ac.lk). nihal kodikara and chamath keppitiyagama are also with the university of colombo school of computing,35 ,reid avenue ,colombo 7, sri lanka.. (e-mail: ndk@ucsc.cmb.ac.lk, chamath@ucsc.cmb.ac.lk) rexy rosa is with the department of physics, university of colombo, colombo 7, sri lanka.. (e-mail: rosa@phys.cmb.ac.lk) freedom in ship motion as shown in figure 1 [4], maritime lectures in a conventional classroom can only explain ship’s motions one by one, but those motions occur simultaneously. consequently it is not easy to understand that phenomena and the resultant effects of a six degree force acting upon a vessel. a ship simulation system can be used to demonstrate complex ship motions of this nature. in a conventional maritime education system, it is not possible to create real scenarios such as terror attacks and rapid environmental changes for training purposes. in general, ship simulators are used to develop ship handling skills and strengthen theoretical understanding of ship motions in naval training. these ship simulators should respond in realtime and maintain substantial accuracy levels. when we train naval trainees to a specific mission, it is necessary to simulate real world ships, geographical areas and various environment conditions. ecological validity is a very important factor for any virtual learning and training system [5]. therefore, construction of tiled display system and real ship bridge gives users the real world feeling which enhances the quality of training. in ship simulation systems, learning and teaching scenarios have to be changed frequently because ship maneuvering tactics have to be changed frequently and those tactics depend on various factors such as geographical location, nature of the vessel and other environment conditions whereas, fundamental navigation and maneuvering strategies remain same. there are commercial ship simulation systems with great ecological validity such as transas [6] and oceaniccorp [7]. these systems are extremely expensive, proprietary and a six degrees of freedom ship simulation system for maritime education damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa s fig. 1: six degrees of freedom ship motions [4] damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 35 december 2010 the international journal on advances in ict for emerging regions 03 closed in nature. users can have versatile and realistic functionalities and users can customize these functionalities up to some extent [8]. however, if users try to simulate existing real world scenarios, they will face enormous restrictions because their real-time ship motion algorithms and integration mechanism of 3d objects such as vessels and navigation areas are hidden. in a ship simulation system, real-time ship motion simulation algorithms perform a major task. cpu effectiveness and accuracy of these algorithms are very important because they can affect the overall system in various perspectives. there are publicly available real-time and none real-time ship motion simulation algorithms proposed by various researchers [9] [10] [11]. there is no perception-enhanced-immersive-shipsimulation-system for marine education available under an open source license, although there are ship motion prediction systems, [10] [11], 3d virtual navigation systems based on defferent rendering engines [12] [13] [14], physics engines [15] [16] and 3d models [17] available under open source licenses. however, they are not capable of predicting reliable six degrees of freedom real-time ship motions. in addition, they have a variety of unknown complex parameters in ship motion prediction algorithms. our approach is developed based on a reliable six degrees of freedom real-time ship motion prediction system. in this approach our ship motion prediction algorithms use limited number of parameters which can be extracted from standard maneuvering tests or databases such as lloyd’s register. we have incorporated perception enhanced tiled vision system and constructed immersive environment with greater ecological validity. several virtual learning and teaching scenarios for maritime education are experimented and we plan to make this total solution available under open source license. ii. related work the motion of a floating rigid body in ocean surface is extremely complicated and difficult to predict [18]. there are many real-time and none real-time ship motion prediction systems proposed by various researchers and institutes and there are different approaches to predict ship motions [9] [10] [11]. there are number of physics engines and simulation systems and game developments [15] [16]. aspects of those physics engines are investigated. the accuracy and computational efficiency of the material properties, stacks, links, and collision detection systems are very sound but those are not capable of predicting real-time ship motions with respect to the rudder movement, throttle movement and environment conditions [15]. there are computer games such as ship simulator 2010 [19] and virtual sailor [20]. in these applications, users can select various ships, environment conditions and geographical areas associated with their object library and they can adjust principal parameters and properties of those objects. however, users cannot incorporate existing real ship or cultural objects to virtual environment and there is no capability to enhance the user perception with multiple display panels. some numerical algorithms are based on strip theory [21] [22]. salvesen et al.. have presented a strip theory based method for predict heave, pitch, sway, roll and yaw motion as well as wave-induced vertical and horizontal forces blending moments and moments in regular waves [21]. comparison between computed motions and experimental motions show satisfactory agreement in general. however, this numerical procedure is impractical for real time simulation because these strip theory based ship motion predictions need hours to produce a set of accurate solutions for just one or two motions by using modern computers. journée present quick strip theory calculations for ship motion prediction [22]. this approach describes a strip theory based calculation method which delivers information on ship motions and added resistance within a very short computation time. a comparative validation is done but it is more suited for ship hull design and cannot be used for realtime ship motion simulations. fossen and smogeli presented strip theory formulation for dynamic ship motion prediction [23]. in their approach, a computer effective nonlinear time-domain strip theory formulation for lowspeed applications has been presented. the proposed model can be used to simulate real-time ship motion prediction and it is possible to incorporate the effect of varying sea states. however, the model is suited for low speed maneuvers and model parameters were evaluated by using a proprietary product. some other studies have been carried out for predict ship motion utilizing the kalman filter approach [24]. triantafyllou et al. utilized kalman filter techniques with simplified computational ship models for ship motion prediction. in this study, the equations of motion as derived from hydrodynamics and approximations are used with kalman filter technique. the influence of various parameters are evaluated. however, their method requires specific model parameters, and if the ship parameters are unavailable their method is not applicable. xiufeng et al. developed mathematical models for realtime ship motion simulation [25]. they estimate the total forces acting on the ship first, then based on newton’s law, and then they deduce first order differential equations to model the relation between forces and accelerations. the equations are solved by using runge– kutta method. their applications focus on handling ships in a selected area such as inside or near harbor areas, only the physics models for surge, sway, and yaw are given and they have not discussed the possibility to simulate existing real ships with their simulation system. another work is presented by cieutat et al. [26]and they proposed wave models based on the work of alain fournier and william t reeves [27]. this approach describes a new efficient real-time model of wave propagation and shows its 36 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 integration in a real maritime simulator. they determine heave, pitch, and roll of a ship by using sea surface height under the ship. for estimating pitch and roll, the tangent plane of sea surface is computed first. then the ship is rotated such that its orientation is aligned with the tangent plane. their ship motion prediction algorithms are too simple and ships of different shapes will produce the same behavior if the wave condition is the same. consequently their method is not flexible enough to simulate the behaviors of different ship models. x zhao et al. presented a development of a high performance ship-motion prediction method [28]. simulation results show that this method can predict ship motion with good accuracy. this is suitable for short-term motion prediction but not for longer time real-time ship motion predictions. ching-tang chou, and li-chen fu presented a 6 degreeof-freedom real-time ship motion simulation system [29]. they focused on the construction of the physical dynamics of the ship on virtual reality environment. the real force and torque of ship are modeled according to newton-euler formula based on the calculation of the volume below the ocean surface. they presented a new algorithm which can approach the dynamics of the ship in real-time. they introduced their real-time ship motion prediction algorithms only for deep ocean. they have not represented the excess drag force due to combination of yaw and sway and they did not discuss the possibility of simulating existing real ship with their virtual environment. gatis barauskis and peter friis-hansen presented a 3 degrees of freedom ship dynamic response model [10] which can predict surge, sway and yaw. they focused on cpu time required for ship motion prediction, ability of simulating complex non-linear models and availability of determining ship model parameters for the different type of existing ships. the model consists of a non-linear speed equation; a linear first order steering equation called nomoto model [30] and linear sway equation. the model responds in real-time for rudder and throttle commands. this ship model is able to operate on a limited number of model parameters. apart from the smain data of the ship, which can be determined rather easily from databases such as lloyd’s register [31] ship model parameters can be estimated by using standard maneuvering tests [18]. this model represents the added mass / excess drag force due to combined yaw and sway motion but they have not done validation with respect to real world existing ships. table 1 presents features and remarks of their model. ueng et al. presented efficient computational models for ship motions simulation [11]. these models are used to simulate ship movements in real time. six degrees freedom ship motions (pitch, heave, roll, surge, sway, and yaw) are divided into two categories. the first three movements are induced by sea waves, and the last three are caused by propellers, rudders and other environment disturbances. newtonian dynamics, fluid dynamics and other theories in physics are used to develop algorithms. this method can be used to predict real time ship motions with high cpu effectiveness. the model does not represent the added mass / excess drag force due to combined yaw and sway motion. it is a considerable deviation from the real world scenario. they have not discussed the mechanism to determine unknown model parameters and they have not done validation with respect to real world ship scenarios. table 2 presents features and remarks of their model. there are a number of proposed mathematical ship models such as tristan p´erez and mogens blanke [32], tristan perez and thor i. fossen [33] and anna witkowska et al. [34]. however, possibility of integrating those mathematical ship models with real-time applications is not discussed. if we compare the ecological validity of above discussed ship motion simulation methods, only few approaches go towards development of 3d virtual environment [11] [29]. however, nobody has tried to develop multi display screens, physical construction of ship bridge and create immersive environment with greater ecological validity. they implement their real-time algorithms and interact with the virtual environment by using single display panel. however there are other approaches to create immersive environment with greater ecological validity [35] [36]. the researchers table i features of gatis barauskis’s ship model model features remarks long term real-time prediction yes prediction of surge, sway & heave surge & sway prediction of yaw, pitch & roll only yaw validation of predictions no responsiveness for wave no representation of drag forces due to combined yaw & sway motion yes determination of model parameters and simulation existing real ship yes integration with vr systems not tested integration with immersive environment not tested table ii features of shyh-kuang ueng’s ship model model features remarks long term real-time prediction yes prediction of surge, sway & heave yes prediction of yaw, pitch & roll yes validation of predictions no responsiveness for wave yes representation of drag forces due to combined yaw & sway motion no determination model parameters and simulatation existing real ship no integration with vr systems yes integration with immersive environment not tested damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 37 december 2010 the international journal on advances in ict for emerging regions 03 have not discussed the way of simulating existing geographical areas, real naval vessels and development of real-time ship motion algorithms. alexandre g. ferreira et al. [36] presented multipledisplay visualization system that can greatly enhance user perception specially for maritime virtual reality applications. a common approach to provide multiple synchronized display panels is to use a powerful centralized processing unit to support the rendering process on all displays. these researchers have used a different approach. the proposed distributed architecture that supports a flexible and reliable visualization system which gives users a sensation of immersion with low-end graphics workstations. the proposed system ensures the synchronization of all displayed views. xiuwen liu et al. have proposed a multiprojector tiled display system for marine simulator [35]. their approach is to develop a low-cost multi projector seamless tiled display system commodity hardware and technology for the virtual reality based marine simulators. the main technical problems of the display system are discussed, including geometry distortion adjustment and edge blending. the researchers having thought about ecological validity of the virtual environment thus consequently constructed a real ship bridge and incorporated real ship bridge indicators such as radar and throttle. experimental results in marine simulator show that this framework is very effective. there are commercial six degrees of freedom ship simulation systems with great ecological validity such as transas and oceaniccorp [6] [7]. these commercial ship simulators provide versatile and realistic ship simulation for maritime teaching, learning, assessment and research. they are capable of predicting real-time ship motions with respect to the rudder movement, throttle movement and environment conditions. however, these systems are extremely expensive and proprietary and closed systems. consequently, their ship motion prediction algorithms and assumptions are unknown. these simulation systems enable users to simulate ship models, cultural objects and illumination effects which are associated with their object library and adjust principal parameters and properties of the objects in the object library [8]. however, users cannot incorporate existing real ship or cultural objects to the system. these commercial simulation systems have bundled teaching and learning scenarios such as ―transaspisces2,pisces‖ [37]. users can customize these bundled teaching and learning scenarios although they cannot incorporate totally new teaching and learning materials. after studying all these approaches we found out that the three degrees of real-time ship motion algorithms proposed by gatis barauskis & peter friis-hansen [10] and the six degrees of real-time ship motion algorithms proposed by shyh-kuang ueng et al.[11]. are very effective from different perspectives. a combination of these two approaches can produce more effective and productive realtime six degrees of freedom algorithms. consequently, our approach is to combine the above mentioned two algorithms to develop a more effective and productive six degrees of freedom ship motion algorithms and construct immersive virtual environment based on frameworks proposed by alexandre g. ferreira et al.[36] and xiuwen liu et al. [35]. we tested new six degrees of real-time ship motion algorithms in immersive environment with greater ethological validity and we briefly discuss the michanisum of integrating real world geographical areas and real ships. iii. implementation of the simulation system a. overview of the integrated simulation system the simulation system consists of trainer station, trainee station, navigational information display system, cylindrical tiled display system, computational ship model and a database as illustrated in figure 2. the trainer can select a vessel with predefined physical and mechanical properties and he can select geographical location’s environmental conditions and threat scenarios where the trainee can essentially be trained. navigational aids and other necessary indicators are generated and displayed on several display panels. virtual marine scenes with own ship, moving or fixed targets, cultural objects and environmental effects are projected on to five computer screens as shown in figure 3. this real-time vision system covers more than 270 0 field of view from which the trainee will have the ability to maneuver the ship. fig. 2: integrated simulation system fig. 3: maneuvering a simulated ship 38 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 b. computational ship model 1) ship motions the motion of a floating rigid body can be specified by newton’s laws, fluid dynamics and other basic physics, but the motion of a floating rigid body in the ocean surface is extremely complicated and difficult to predict [18]. all six possible degrees of freedom (6dof) in a motion of a ship can be illustrated as in figure 1. surge, heave, and sway are translational motions. roll, yaw, and pitch are rotational motions [4]. following assumptions were made in order to simplify the mathematical ship model for real-time challenges.  the pitch, heave and roll motions are caused by sea waves. surge, sway, and yaw motions are induced by internal and external forces such as rudders, propellers and environment conditions such as wind and sea currents.  figure 4 and 5 illustrate the initial earth-fixed reference frame with [x y z] t coordinate system and the ship body-fixed reference frame with [xo yo zo] t coordinate system [9]. ship body-fixed axes coincide with the principal axes of inertia. origin of the ship body-fixed frame coincide with the center of gravity of the ship ( rg = [0 0 0 ] t ) [9].  the ship is rigid and impossible to deform. the shape of the ship is identical to a cubical.  the sea waves consist of sinusoidal waves and waves can apply force and moments (torque) on the ship. however, ship cannot influence the wave.  the variation of the sea surface is less than or equal to 8 th level of the beaufort’s sea state scale [18]. the magnitudes of the position, orientation, forces, moments, liner velocities and angular velocities are respectively denoted by [x y z] t , [ψ θ φ] t , [x y z] t , [k m n] t , [u v w] t and [p q r] t as shown by table1. the position-orientation vector in the xy plane is expressed as  = [x y ψ] t and the linear-angular velocity vector is expressed as v = [u v r] t. the rate change of positionorientation vector is expressed as vr )(  [32] where             100 0cossin 0sincos )r(    . (1) 2) wave model we use the multivariable ocean wave model introduced by ching-tang [38] to model the sea surface and determine the height field of the sea surface. in that model, ]φtω)ysinθ[(xcosθsinkah(xyt) iii n 1i iii    (2) above function represents a water surface height on the z axis direction. a is the wave amplitude, k is the wave number where this number is defined as 2π /λ by the wave length λ. ω is the pulsation which is defined as the 2πf by the frequency f. a , k , and f are time (t) dependent variables. θ is the angle between x axis and the direction of the wave. φ is the initial phase which can be selected randomly between 0 2π. 3) surge motion the surge motion of a ship is described by a simplification of the non-linear speed equation as follows [9]. )vrx(muuxtum vruue  (3) in equation (3) m and e t respectively denote the ship’s mass and single driving force (effective propeller thrust) that is transmitted from the ship engine by her propellers. the two terms uux uu and )vrx(m vr  can be regarded as damping forces, which depend on the instantaneous dynamics of the vessel. )vrx(m vr  represents excess drag force due to combined yaw and sway motion uux uu correspond to the quadratic resistance force at the forward fig. 4: reference frames and coordinate systems [9] fig. 5: reference frames & coordinate systems [9] table iii six possible degrees of freedom ship motion [4] degrees of freedom forces moments linear/angular velocities positions euler angles surge x u x sway y v y heave z w z yaw n r ψ pitch m q θ roll k p φ damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 39 december 2010 the international journal on advances in ict for emerging regions 03 speed u. when the ship is traveling a straight course and at a constant speed u, 0)vrx(m vr  and uuxt0 uue  . then, if we know te and u then it is possible to calculate the uu x . the force of the propeller is related to its revolution speed n and the diameter d and the thrust coefficient kt and it is given by te = ktn 2 d 4 [39], xvr = 0.33 m [32]. 4) yaw motion the yaw motion of the ship is described by using the first order nomoto model and the transfer function from rudder angle δ and rate change of turn r is equivalent to the time domain differential equation as follows [9] [30].  krr  (4) k is the steady-state gain of the system and  is the time constant. k and  can be calculated by using maneuver tests [18]. this model behaves more accurately within the linear region of a ship’s steering function up to rudder angles of approximately 35 degrees. [39] [9] 5) sway motion sway motion is considered based on the transfer function model with constant parameters. it is equivalent to the time domain differential equation given below [9] [30]. ) ( rrkvv vv    (5) kv is the steady-state gain in sway and kv is the time constant in sway. the time constant and the steady state constant have been assumed to be lvk 5.0. and v = 0.3 [10] [9]. 6) motions caused by sea waves by solving the equation (1),(3),(4) and (5) we can compute the ship’s position orientation vector in the xyhorizontal plane with respect to time. that means we can calculate x, y, u, v and ψ. then according to archimedes’ principle [40] we assume the translational motion (heave) and rotational motions (pitch, and roll) are generated by the swellness of water under the ship. it can be calculated by using the height variation of the sea surface. we assume that the shape of the ship is cubical as illustrated in figure 6 and the ship body is vertically projected onto the sea surface to get the l×w bounding box. it is divided in to 1m×1m cells for mathematical convenience as illustrated in figure 7. we evaluate the height fields at center points of each 1m×1m cells and we assume the ship is not actually presented when the height field is calculated. we can use ching-tang’s wave model [38] to calculate height fields. we assume that the projected bounding box and its points move with the ship. then, any time we can calculate height fields according to ship’s orientation and the wave propagation. we can obtain forces and moments to generate the heave, pitch and roll motions by calculating the height fields for overall bounding box, calculating difference of height fields between the front and rear halves of the bounding box and calculating difference of height field between the port and starboard halves of the bounding box respectively. 7) heave motion first we calculate the vertical variation (height fields) of each 1m×1m cells. if the sum of height fields is not zero the swell of the sea surface generate the force to raise up or pull down the ship. the sum of height fields is ha and hi,j denote the height field of (i,j) point with respect to vertically projected ship body ( l × w grid).when we know the ship’s position in xy plane (xo,yo) and the heading (euler angle) with respect to positive x-axis (ψ), we can compute the earth fixed coordinate (xhi,j, yhi,j) for each and every ship fixed coordinate hi,j in vertically projected ship body (l× w grid) as illustrated in figure 4. the height field of (i,j) point in the vertically projected ship body is equivalent to the height field of ) j)(i,h y, j)(i,h (x point in earth fixed coordinate system. height fields of earth fixed coordinates can be calculated by using ching-tang’s wave model [38]. i.e. t),y,h(xt)j,h(i, j)(i,hj)(i,h  (6) )( 22 ),(   sinjix jih (7) )( os 22 ),(   cjiy jih (8) i j1 tan   . (9) the sum of height fields in vertically projected ship body (l × w grid)        2 2 2 2 , l l i w w j jia hh . (10) fig. 6: the shape of the ship is identical to a cubical fig. 7: vertically projected ship body (l × w grid) 40 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 the force for heave motion can be calculated by multiplying the sum of the height fields and sea water density (sw). the net force for heave motion (fh) can be calculated by subtracting the resistance force (rh). h k , m, w , w denote the resistance coefficient for heave motion, mass of the ship, heave acceleration and heave velocity [16]. the equations are listed below. 2 wm h k h r  (11) h rwsahh f  (12) m h f w  wtw  (13) 8) pitch motion here, we assume that pitch is determined by the difference of height field between front and rear halves of the ship. hp is the height field deference between the front and rear halves of the ship. i i hh l l i w w j jip        2 2 2 2 , (14) by using ching-tang’s wave model [38], hi,j can be calculated. then the net force for pitch (fp), pitch angular acceleration ( q ) and pitch angular velocity (q) is determined as follows: qpipkpr  (15) prwsphpf  (16) pi pf q  qtq  (17) sw,kp,rp,ip denote the sea water density, resistance coefficient for pitch motion, resistance force against pitch motion and the ship’s moment of inertia along y-axis respectively. for mathematical convenience we have already assumed the shape of the ship is cubical as figure 6 shown. then ) 22 ( 12 1 dlmpi  . (18) 9) roll motion here we assume that roll is determined by the difference of height field between the port side and starboard side halves of the ship. hr is the height field deference between the port side and starboard side halves of the ship. j j hh l l i w w j jir        2 2 2 2 , (19) then as we did in the pitch motion calculations we can calculate net force for roll motion, roll angular acceleration and roll angular velocity. 10) overview of the computational ship model the system consists of two major stages. first stage is to compute ship’s position and orientation in xy plane by using ship’s physical data, user defined dynamic properties (rudder, engine rpm) and resistance forces due to motion. the second stage is to compute heave, pitch and roll motion by using the out puts of the first stage and additionally considering the ocean wave model as illustrated in figure 8. in this system, we use constraints, coefficients and parameters like k,  , kp, kh which can be evaluated with a standard maneuvering test or figure out from a databases such as lloyd’s register [31] . 11) validate surge, sway and yaw motions our ship motion prediction algorithms are more suitable for ocean surface vehicles which approximately box type geometric formations. our surge, sway and yaw real-time ship motion prediction algorithms were validated with respect to benchmark sea trials of the ―esso ossaka tanker‖ [41]. esso osaka is a tanker with approximately box type geometric formation as shown in the figure 9. it is an oil tanker that has been used for extensive maneuvering studies in the open literature [42]. main particulars for the esso osaka are length (between perpendiculars) -325m, beam -53m, draft -21.73m, displacement4 319,400 tonnes. propulsion characteristics are propeller diameter 9.1 m and when the propeller fig. 8: computational ship model fig. 9: esso ossaka tanker [41] damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 41 december 2010 the international journal on advances in ict for emerging regions 03 rotates at 51 rpm the ship travels at 10 knots in calm water. the propeller thrust coefficient kt is 0.004 [43] [44]. figures 10 to 12 present measured turning circle data for the esso osaka with initial velocities of 10 knots. by using the esso osaka’s main particulars, propulsion characteristics and the above real sea trial data set it is possible to determine our model parameters uu x , k and . however, due to unavailability of some resistance and propulsion properties we had to model them using approximations based on data in the open literature [43] [44] after determining the model parameters, we simulated the same turning circle under similar conditions in our simulation system. we used matlab [45] solvers to solve deferential equations in our real-time algorithms and 2.66ghz intel core 2 duo imac to run our computational ship model. comparison of the real esso osaka turning circle and simulated turning circle is shown in the figure 13. comparison of predicted and measured ship motions gives encouraging outcomes. we predicted these ship motions under several assumptions as mentioned in the section ―b”. simulating the real world scenario in a virtual world with similar conditions could lead to practical issues. we cannot create exactly similar real world environment and conditions in the virtual world and the rate change of the mechanical properties such as rudder and throttle are extremely difficult to imitate in a virtual world. we approximated several parameters based on literature. when we consider all these factors there should be deviations in simulated results and real world situations. 12) validate heave, pitch and roll motions the relationship between ship motion and the ocean wave characteristics is commonly measured by using the response amplitude operators (rao) [46] [47]. figure 14 present the typical heave rao for a floating body on ocean fig. 11: yaw rate versus time for esso osaka turn at 10 knots with 35 degree rudder [43] fig. 12: ship speed versus time for esso osaka turn at 10 knots with 35 degree rudder[43] fig. 13: validation with respect to ―esso ossaka‖ sea trials fig. 14: typical heave rao behavior with respect to wave frequency [18] fig. 10: trajectory for esso osaka turn at 10 knots with 35 degree rudder [43] 42 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 wave [18]. figure 15 and figure 16 present the typical pitch and roll rao curves for a containership with respect to wave frequency at different ship speeds [18]. due to unavailability of heave, pitch and roll resistance coefficients for ―esso osaka‖ we had to assume reasonable values for them using approximations. today it is very easy to measure heave, pitch and roll for a real ship in sea way because there are digital devices such as octans surface motion sensor [48]. if we can measure heave, pitch and roll under known condition then we can determine resistance coefficients for heave, pitch and roll motions. we measured raos for ―esso osaka‖ ship model with respect to different domains however there is no unclassified real sea trial rao data set for ―esso osaka‖ to compare with our ship motion predictions. we measured raos for steered esso osaka at 6 knots, encountered wave direction 30, wave amplitude 0.5m and wave length 0.5m. the nature of the rao curves are presented in figure 17,18 and 19. the comparison of typical rao curves of a container ship and obtained rao curves of ―esso ossaka‖ does not give any significant result. however there are some common futures in higher level comparisons. throughout these experiments we assumed ship’s hull formation is a cuboid .however motion of a ship is affected by the geometry of the ship hull [49]. in the ocean wave model we consider only the most responsive wave component consequently it is a deviation from the real scenario and we fig. 16: typical pitch rao behavior with respect to wave frequency [18] fig. 17: observed heave rao behavior with respect to wave fig. 18: observed pitch rao behavior with respect to wave frequency fig. 19: observed roll rao behavior with respect to wave frequency fig. 15: typical roll rao behavior with respect to wave frequency [18] damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 43 december 2010 the international journal on advances in ict for emerging regions 03 approximated several parameters. when we consider all these factors there should be a deviation in simulated results and real world situation. according to experiment results, features of the proposed computational ship model are present in table 4. 13) panoramic vision system panoramic images and videos are regularly used in various virtual reality applications such as autonomous navigation and virtual walkthroughs [35] [36]. this visionsystem is based on the client-server architecture. it supports real-time six degrees of freedom autonomous navigation system with 300 0 field of view. the server computer sends the navigational instructions (latitude, longitude, altitude, roll, pitch, and yaw) to the six the client computers as shown in figure 20. in each client computer, the same virtual environment is loaded and the position and the orientation values received by a parental node. each virtual camera inherits its position and orientation from the parental node while maintaining 60 degrees with respect to adjacent virtual cameras. to create tile 300°field of view (fov) each virtual camera occupies 60 0 angle of view [12]. navigational instructions (latitude, longitude, altitude, roll, pitch, yaw) are sent to virtual cameras from the master computer over the network. we develop a method to synchronize each virtual camera. the sever computer send data set containing latitude, longitude, altitude, roll, pitch, yaw simultaneously to each client computer through the network. each and every client pc should pace the virtual camera according to the given position and orientation and render the scenery. after completing the rendering process each client machine should send a message to sever computer. thereafter the sever computer sends the next navigational instruction data set after acquiring all six messages from the client computers. this synchronization mechanism reduces the frame rate a little, but the above mentioned 2.66ghz intel core 2 duo imacs with nvidia geforce 9400m graphic cards were able to maintain more than 25 frames per second with complex scenery. c. database in this ship simulation system we use our own computational ship models based on our algorithms so that we can incorporate any computational ship model to our database which satisfies our algorithms and constrains. we have developed several six degrees of freedom computational ship models such as benchmark tanker ―esso ossaka‖ and offshore patrol vessel quite identical to ―jayasagara‖ class which has been locally built by the colombo dockyard [50]. we used ogre3d rendering engine which has capability of realistic ocean wave rendering [51]. to create real-time interactive environment [52] and incorporated real world geographical sceneries with cultural objects, moving/ fixed targets, several environmental conditions and wide range of visibility and illumination effects such as daytime, dusk and night were incorporated into our database. we modeled a sri lankan harbor and incorporated into our data base based on the method proposed by philip paar etl [53] . we used british admiralty charts of ―galle harbor and approaches‖ as shown in the figure 21 [54] and google earth [55] images shown in the figure 22 for the 3d modeling. fig. 20: structure of the vision system table iv features of proposed ship model model features remarks long term real-time prediction yes prediction of surge, sway & heave yes prediction of yaw, pitch & roll yes validation of predictions yes responsiveness for wave yes representation of drag forces due to combined yaw & sway motion yes determine model parameters and simulate existing real ship yes integration with vr systems yes integration with immersive environment yes fig. 21: ―galle harbor‖ navigational map 44 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 subsequently the shore line was finalized and sequent digital pictures were taken from sea while keeping the same distance from the shore line. same process was repeated several times to get different image sequences with different distances from the shore line. various moving and fixed targets and cultural objects were observed in the sea around the selected harbor. the relative sizes of the observed object were recorded with respect to a selected earth fixed object and digital pictures were taken from different distances. we used 3d studio max [56] to create mesh models. all naval vessels, moving or fixed targets, and cultural objects, scenes of navigation areas were modeled using 3d studio max. when modeling the navigation areas and the shore line major objects were modeled with polygonal meshes and the other objects were placed by using the billboards. throughout this 3d modeling we used appropriate textures to model more realistic scenery. most of the time we captured the real texture from the digital camera and it was then enhanced by image editing software. several times we used standard materials from the 3d studio max library. we used ogremax scene exporter to export the modeled scenery from 3d studio max to ogre 3d. the ogremax scene exporter [57] is a 3ds max plugin that exports 3ds max scenes to ogremax scene files. finally the virtually created environment is compared with the real environment. the outcome is quite satisfactory as shown in the figure 23 and figure 24. iv. development of immersive environment the immersive environment is composed of 4 standard game pcs – (3 clients and a server) and multi projector seamless tiled display system. it was constructed using three multimedia projectors with 2500 ansi lumens. this large polygonal screen projected realistic visuals with wide field of view and a real scale bridge was constructed and placed as shown in the figure 25. this brings a sense of seriousness and realism to the users perception hence this is strengthening the ecological validity of the environment. fig. 22: top view of the selected harbor fig. 23: real harbor environment fig. 24: virtually created harbor environment fig. 25: bridge and projector arrangement fig. 26: physically built ship bridge damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 45 december 2010 the international journal on advances in ict for emerging regions 03 discarded real equipments were integrated in to the bridge. in this case, real throttle and wheel were interfaced to provide a more realistic virtual environment and a strong sense of immersion as shown in figure 18. the total cost for equipments and construction of ecological environment is less than 2, 000, 000 sri lankan rupees. this perception enhanced virtual environment was exhibited in several national and international level exhibitions as shown in figure 19. large number of people including naval experts, sailors, navy officers, national and international researchers, game lovers and general public used this immersive environment. according to the feedback from naval experts who has more than 15 years maritime experience with sri lankan navy ships, we noted that there is a good agreement for surge, sway and yaw motion simulations with respect to the real scenario. there are some divergences in heave, pitch and roll motion simulations with respect to the rendered ocean wave pattern. however we were able to achieve the desired perception of naval experts by changing several model parameters which are related to heave, pitch and roll motion simulation. v. teaching and learning scenario there are three kinds of learning and training scenarios where the simulator can be used.  in class, teaching of theoretical aspects of navigation and ship motion.  train people for a particular mission and evaluate their performance.  self learning. in marine education, theoretical understanding of ship motions is essential. variety of phenomena occur simultaneously in short time duration, including drift, turning, speed reduction, drift angle, heading. the trainer can use the simulation system and demonstrate those ship motions real-time. trainer can select a ship, geographical scene of the navigation area, cultural objects, moving/ fixed targets, environmental conditions. he can demonstrate ship maneuvering tactics and the use of the navigational display consisting of radar, sonar, 2d map, engine rpm indicator and the rudder angle indicator. the trainer can also demonstrate how to deal with various threat scenarios. then trainer can define various missions with different difficulty levels for trainee to gain and understand ship maneuvering skills. while the trainee is maneuvering the ship the trainer can change difficulty levels and threat scenarios. after several training sessions the trainer can be given specific missions and his performance evaluated.. trainees can use the simulation system and select ship, geographical scenery, define his own difficulty levels and study at their own pace. these learning scenarios enhance the teaching and learning experience. a. sample learning activity a basic learning activity is developed with the help of the navel experts from the sri lanka navy according to their fundamental navigation lessons with real ships. lesson title: basic navigation simulated ship: offshore patrol vessel (ship’s length, beam, draft, mass and maximum effective thrust are respectively 48 m, 8.6 m, 2.2 m, 35.6 x 10 4 kg and 15.0 x 10 4 n), sea state: beaufort sea state 3, present geographical location is given, mission: cast off the ship from the present location and proceed to given location and anchorage within given time frame. trainee has to use the simulation system. geographical map of the navigation area as shown in figure 20 and parallel ruler are provided. several learning activities were carried out with navel experts to experiment our simulation system and further identify learning and teaching requirements. vi. future work we had to consider existing real ship and perform maneuvering tests and measure necessary motions by using electronic devices. subsequently we can evaluate model parameters without approximations. this will lead to a perfect six degrees of freedom validation of our ship motion simulation. we had to further develop the computational fig. 27: virtual environment in an exhibition fig. 28: geographical map of the navigation area 46 damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa the international journal on advances in ict for emerging regions 03 december 2010 ship model for simulate various kind of modern ships with multiple propellers. seamless cylindrical tiled display system should be developed to further enhance the trainee’s perception. image stitching and blending algorithms [58] have to be used to get the desired output which is going to be projected on a semi cylindrical projection screen by using the multimedia projectors. then the trainee can get a more immersive maritime experience. development of a real-time acoustic model for the current simulation system will enhance the ecological validity of the simulated environment. in order to use this simulation system in a wide range of teaching and learning scenarios we had to incorporate more naval vessels, moving or fixed targets, cultural objects , scenes of navigation areas, various environmental effects and conditions to the existing database. we had to conduct experiments with navel experts to further identify the trainee’s and trainer’s requirements. consequently, various training scenarios had to be designed in order to develop a more productive and efficient virtual learning and training environment. development of a learning management system that could monitor trainee, trainer and training activities is essential. it seemed that the accuracy of the roll, pitch and heave motion simulation do not have great impact for several training scenarios such as traffic handling and off show navigations. consequently, we are planning to conduct further research to develop a training oriented computational ship model for educational activities based on human perception and other important factors such as biological, physiological and sociological aspects. vii. conclusion experimental results show that the proposed six degrees of freedom ship motion simulation algorithms have different capabilities with respect to the algorithms proposed by gatis barauskis and peter friis-hansen presented [10] and shyhkuang ueng et l [11]. specially, we tested our ship motion prediction algorithms in immersive environments and validated the motion predictions with respect to real world scenarios. surge, sway and yaw motion predictions show some satisfactory results however roll, pitch and heave motion predictions do not give any satisfactory results. however, there are some encouraging observations in higher level explorations. the performed learning and training scenarios indicate that current simulation system can be used to select training scenarios such as navigation training and sea traffic handling. experience gained through this kind of ship simulation system enables students to understand the dynamic mechanism behind ship motion. it is possible to create virtual scenarios such as motion of enemy ships and rapid environmental changes for training purposes which are not possible in conventional ship maneuvering training done in real sea environment with real ships. naval trainers can use this system for in-class-teaching of selected theoretical aspects of navigation and ship motion. in addition, trainers can train trainees for particular missions and evaluate their performances. trainee can use the system and define various tactical scenarios within different difficulty levels so that they can learn at their own pace. implementation cost of current simulation system is less than 2000usd and running cost of this ship simulation system is minimal. however, operating cost of tanker ship per day is more than 2000usd [59]. consequently, with respect to conventional training involving real ships, this ship simulation system reduces the total cost of training and increases the quality of training. experimental results show that the proposed framework is very effective for a virtual maritime learning and training purposes and it is scalable and comparable to industry standard full scale simulators with perception enhanced cylindrical vision system. we plan to make this ship simulation system available under an open source license so that anyone can customize or further develop it according to their requirements. acknowledgment our thanks go to department of electrical and electronics, sri lanka navy and national e-learning center, sri lanka for their collaborative support. references [1] biran, a b., ship hydrostatics and stability. oxford ox2 8dp : butterworth-heinemann, 2003. isbn 0 7506 4988 7. [2] ergun demirel, romesh mehta., "developing an effective maritime education and training systemtudev." accra-ghana : s.n., 2009. imla conference. [3] woolley, mark., "time for the navy to get into the game!" proceedings magazine, u.s naval institiute. 2009, vols. 135/4/1,274, april. [4] society of naval architects and marine engineers ., nomenclature for treating the motion of a submerged body through a fluid. new york : the society of naval architects and marine engineers, 1950. [5] albert a. rizzo, todd bowerly, j. galen buckwalter, dean klimchuk, roman mitura,thomas d. parsons., "a virtual reality scenario for all seasons the virtual classroom." cme 3. january, 2006. [6] transas marine limited., transas simulation products. [online] transas marine international. [cited: 06 06, 2010.] http://www.transas.com/products/simulators/. [7] oceanic consulting corporation., centre for marine simulation. [online] http://www.oceaniccorp.com/facilitydetails.asp?id=7. [8] transas marine international., transas database editor model wizard. transas. [online] [cited: may 06, 2010.] http://www.transas.com/products/simulators/sim_products/navigation al/components/tools/. [9] fossen, thor i., guidance and control of ocean vehicles. england : jhon wiley, 1996. isbn: 0471941131. [10] gatis barauskis, peter friis-hansen., "fast-time ship simulator." 2007. safety at sea 2007. [11] shyh-kuang ueng, david lin, chieh-hong liu., "a ship motion simulation system." s.l. : springer-verlang, 2008. virtual reality (2008). pp. 65–76. [12] junker, gregory., pro ogre 3d programming. s.l. : apress, 2006. isbn-13: 978-1590597101. http://www.transas.com/products/simulators/ http://www.oceaniccorp.com/facilitydetails.asp?id=7 http://www.transas.com/products/simulators/sim_products/navigational/components/tools/ http://www.transas.com/products/simulators/sim_products/navigational/components/tools/ damitha sandaruwan, nihal kodikara, chamath keppitiyagama and rexy rosa 47 december 2010 the international journal on advances in ict for emerging regions 03 [13] irrlicht realtime 3d engine. [online] [cited: may 06, 2010.] http://irrlicht.sourceforge.net/index.html. [14] gaming and simulation engine. [online] [cited: may 06, 2010.] http://www.delta3d.org/. [15] adrian boeing, thomas bräunl., "evaluation of real-time physics simulation systems." perth, western australia : acm press, 2007. graphite 2007. pp. 281-288. isbn: 9781595939128. [16] physx., physxinfo.com. popular physics engines comparison physx, havok and ode. [online] [cited: may 06, 2010.] http://physxinfo.com/articles/?page_id=154. [17] free downloads 3d models. [online] [cited: may 06, 2010.] http://www.3dmodelfree.com/. [18] pinkster, j.m.j. journée and jakob., introduction in ship hydromechanics. s.l. : delft university of technology, 2002. [19] ship simulator extremes. [online] [cited: may 06, 2010.] http://www.shipsim.com/. [20] virtual sailor. [online] [cited: may 06, 2010.] http://www.hangsim.com/vs/. [21] nils salvesen, e. o. tuck and o. faltinsen., ship motion and sea loads. new york : society of naval architects and marine engineers, 1970. [22] journée, j.m.j., "quick strip theory calculations in ship design." newcastle upon tyne : s.n., 1992. practical design of ships and mobile structures. vol. 1. [23] thor i. fossen, øyvind n. smogeli., "nonlinear timedomain strip theory formulation for low-speed manoeuvring and station-keeping." modelling, identification and control mic. 2004, vol. 25. [24] michael s triantafyllou, marc bodson, michael athans., "real time estimation of ship motions using kalman filtering techniques." ieee journal of oceanic engineering, january 1983, vols. oe-8, pp. 9-20. [25] zhang xiufeng, jin yicheng, yin yong, li zhihua., "ship simulation using virtual reality technique." singapore : acm, 2004. 2004 acm siggraph international conference on virtual reality continuum and its applications in industry. pp. 282 285 . isbn:1-58113-884-9. [26] j. m. cieutat, j. c. gonzato, p. guitton., "a new efficient wave model for maritime training simulator." budmerice, slovakia : ieee computer society washington, dc, usa, 2001. 17th spring conference on computer graphics. p. 202 . isbn 0-7695-1215-1. [27] alain fournier, william t. reeves., "a simple model of ocean waves." acm siggraph computer graphics, s.l. : acm new york, ny, usa, 1986, issue 4, vol. 20, pp. 75 84 . issn:00978930. [28] x zhao, r xu, c kwan., "ship-motion prediction: algorithms and simulation results." 2004. icassp '04ieee international conference on acoustics, speech, and signal processing. isbn: 07803-8484-9. [29] ching-tang chou, li-chen fu., "vr-based motion simulator for dynamics ship on six degree-of-freedom platform." roma, italy : s.n., 2007. icra international conference. sbn: 1-4244-0601-3. [30] chen, ching-yaw tzeng and ju-fen., "fundamental properties of linear ship steering dynamic models." journal of marine science and technology, 1997, vol. 7. [31] lloyd's register., lloyd's register. [online] [cited: may 06, 2010.] http://www.lr.org. [32] tristan perez, mogens blanke., mathematical ship modeling for control applications. 2002. [33] tristan perez, thor i. fossen., "kinematic models for seakeeping and manoeuvring of marine vessels." modeling, identification and control, 2007, vol. 28, pp. 1-12. issn 0332–7353. [34] anna witkowska, mirosław tomera, roman ´smierzchalski., "a backstepping approach to ship course control." nternational journal of applied mathematics and computer science, 2007, vol. 17, pp. 73-85. [35] xiuwen liu, cui xie, yicheng jin, yong yin., "construct low-cost multi-projector tiled display system for marine simulator." hangzhou, china : s.n., 2006. 16th international conference on artificial reality and telexistence (icat'06). pp. 688-693. isbn: 07695-2754-x. [36] alexandre g. ferreira, renato cerqueira, waldemar celes filho, marcelo gattass., "multiple display viewing architecture for virtual environments over heterogeneous networks." s.l. : ieee computer society washington, dc, usa, 1999. xii brazilian symposium on computer graphics and image processing. pp. 83 92 . isbn:0-76950481-7. [37] transas marine limited., "potential incident simulation, control and evaluation system." transas. [online] [cited: may 06, 2010.] http://www.transas.com/products/simulators/sim_products/pisces/. [38] fu, ching-tang chou and li-chen., "ships on real-time rendering dynamic ocean applied in 6-dof platform motion simulator." taichun, taiwan : s.n., 2007. cacs international conference. [39] k j rawson, e c tupper., basic ship theory. s.l. : butterworthheinemann. vol. 1. isbn 0750653965. [40] y nakayama, r f boucher., introduction to fluid mechanics. s.l. : butterworth-heinemann, 1998. isbn 0340676493. [41] auke visser's international esso tankers site., esso osaka (19731985). [online] [cited: may 06, 2010.] http://www.aukevisser.nl/inter/id427.htm. [42] the specialist committee on esso osaka., "final report and recommendations to the 23rd ittc." 2002. 23rd international towing tank conference. [43] mc taggart, kevin., improved maneuvering forces and autopilot modelling for the shipmo3d ship motion library. s.l. : defence r&d canada – atlantic, 2008. drdc atlantic tm 2008-162. [44] mctaggart, kevin., simulation of hydrodynamic forces and motions for a freely maneuvering ship in a seaway. s.l. : defence r&d canada – atlantic, 2005. drdc atlantic tm 2005-071. [45] the mathworks, inc., the mathworks. [online] [cited: may 06, 2010.] http://www.mathworks.com/. [46] tristan p´erez, mogens blanke., simulation of ship motion in seaway. technical report ee02037. [47] lewis, edward v., principles of naval architecture. s.l. : society of naval architects and marine engineers , 1989. isbn-0939773-00-7. [48] ixsea ltd., octans surface gyrocompass & motion sensor . [online] [cited: may 06, 2010.] http://www.ixsea.com/en/products/1/octans.html. [49] william b. lamport, m josefsson., "the next generation of round fit-for-purpose hull form fpsos offers advantages over traditional ship-shaped hull forms." new orleans, louisiana usa : asme, 2008. deepgulf conference. [50] jayasagara class patrol craft. [online] [cited: may 06, 2010.] http://en.wikipedia.org/wiki/jayasagara_class_patrol_craft. [51] object-oriented graphics rendering engine. [online] [cited: may 06, 2010.] http://www.ogre3d.org/. [52] philip greenwood, jesse sago, sam richmond, vivi chau., "using game engine technology to create real-time interactive environments to assist in planning and visual assessment for infrastructure." cairns, australia : s.n., 2009. 18th world imacs / modsim congress. [53] philip paar, katy appleton , malte clasen , maria gensel , simon jude , andrew lovett., "interactive visual simulation of coastal landscape change." 2008. digital earth summit on geoinformatics. [54] british admiralty charts. [online] [cited: may 06, 2010.] http://www.mdnautical.com/britishadmiraltycharts.htm. [55] satelight image source . [online] [cited: may 06, 2010.] http://earth.google.com/. [56] autodesk., the autodesk 3d studio max documentations. [online] [cited: may 06, 2010.] http://usa.autodesk.com/. [57] the ogremax documentations. [online] [cited: may 06, 2010.] http://www.ogremax.com/. [58] v rankov, r j locke, r j edens, b vojnovic., "an algorithm for image stitching and blending." proc spie (2005). 2005, vol. 5701, pp. 190-199. [59] aldo a. mclean, william e. biles., "a simulation appraoch to the evaluation of operational costs and performance in liner shipping operations." s.l. : winter simulation conference , 2008. winter simulation conference. isbn:978-1-4244-2708-6 . http://irrlicht.sourceforge.net/index.html http://www.delta3d.org/ http://physxinfo.com/articles/?page_id=154 http://www.3dmodelfree.com/ http://www.shipsim.com/ http://www.hangsim.com/vs/ http://www.lr.org/ http://www.transas.com/products/simulators/sim_products/pisces/ http://www.aukevisser.nl/inter/id427.htm http://www.mathworks.com/ http://www.ixsea.com/en/products/1/octans.html http://en.wikipedia.org/wiki/jayasagara_class_patrol_craft http://www.ogre3d.org/ http://www.mdnautical.com/britishadmiraltycharts.htm http://earth.google.com/ http://usa.autodesk.com/ http://www.ogremax.com/  international journal on advances in ict for emerging regions 2011 04 (01) : 4 14  abstract — this paper investigates the factors influencing the adoption of mobile phone technology among farmers in bangladesh. electronic services are one important measure for rural development and mobile phones is the dominating cellular technology; hence understanding the adoption of this technology is important. the paper uses interpretive philosophy investigating adoption factors by means of survey data, participant observation and related studies on rural bangladesh and technology acceptance. based on a number of acceptance models from the literature, a conceptual rural technology acceptance model (rutam) was developed to analyze the arguments pertinent to a rural developing country context. the most salient modification, compared to earlier models, is that social influence plays a bigger role than technology at early stages of adoption. ‘tech-service promotion’ and ‘tech-service attributes’ are introduced as external factors which affect the behavioral intentions of an individual by means of perceived usefulness (pu) and perceived ease of use (peu). index terms — adoption, bangladesh, farmers, ict4d, rutam, tam i. introduction ith its more than 160 million people, bangladesh ranks as the eighth most populous country in the world [1]. out of 29 million households, 89% are situated in rural areas and 52% (15 million) account for agricultural farm households [2]. according to the world bank [3], “poverty in bangladesh is primarily a ‟rural phenomenon‟, with 53 percent of its rural population classified as poor, comprising about 85 percent of the country‟s poor”. the rate of adult literacy at national level is 49%, while it is 46 % in rural areas. as surveyed by the bbs-unesco [4], around 26% of the poorest and 34% of the poor people in the rural areas have formal literacy. manuscript received july 7, 2010. recommended by prof. maria lee on january 19, 2011. sirajul m. islam is with the swedish business school at the örebro university, fakultetsgatan 1, örebro 70182, sweden. (e-mail: sirajul.islam@oru.se). åke grönlund is with the swedish business school at the örebro university, fakultetsgatan 1, örebro 70182, sweden. (e-mail: ake.gronlund@oru.se). although bangladesh ranks one hundred thirty-eighth out of 154 countries in the ict development index [5] in 2007, the penetration rate of mobile phones is remarkable compared to other ict tools (e.g. pc, internet etc.). recent investigations show that around 45% of the total population – one out of (less than) three people, or at least one per family on average – has a mobile phone [6]. as the growth of the bangladeshi economy depends on rural development, much attention needs to be paid to the agricultural sector and the farmers who are the main, yet one of the weakest actors in the economy [7], [8]. in the perspective of this paper, timely adoption and appropriate use of easily and widely available mobile phone technology in agricultural operations is one opportunity that may help in realizing the „digital opportunities‟, enhancing rural productivity and hence contribute to reducing urban-rural inequalities. although the agricultural trade and farmers have become an important target for mobile phone services, studies of technology adoption and the diffusion process in such contexts are currently scarce [9]. kwon and chidambaram [10] find that much of the variance in the studies of mobile technology use remains unexplained, and addressing this gap should be an important direction for future research. kim and garrison [11] suggest that the researchers should add more constructs to the existing acceptance models related to mobile technology as this kind of technology is constantly evolving and new factors are emerging every time. against this backdrop this paper aims primarily to investigate the factors influencing the adoption of mobile phone technology among the farmers in bangladesh. the underlying purpose of doing so is to allow a better understanding of how to provide useful services to the farmers‟ community. to do so, we focus on the following issues:  to explore earlier theories and models on technology adoption and diffusion;  to develop a conceptual research model based on the literature studies;  to investigate empirically the factors relevant to rural bangladesh; and  to detail the model based on the empirical findings and hence suggest the factors with associated variables especially relevant to rural people in developing regions. factors influencing the adoption of mobile phones among the farmers in bangladesh: theories and practices sirajul m. islam and åke grönlund w sirajul m. islam and åke grönlund 5 june 2011 international journal on advances in ict for emerging regions 04 the findings from this study contribute to a larger picture of the technology adoption process in rural settings, specifically in bangladesh. it can then potentially be transferable to other developing countries, particularly those who share a similar socio-economic and technological background. ii. literature review technology adoption is the decision of a group or individual to make use of an innovation. beal and bohlen [12] state that people accept new ideas through a series of complex mental processes in which adoption is the final action. rogers [13] shows technology diffusion in a global perspective to match a classical normal distribution curve which can be explained by the demographic and psychographic characteristics of the adopters. fig. 1. technology acceptance model (tam) [14] the technology acceptance model (tam; by davis in 1989) as shown in figure 1, initially developed for new enduser of information systems for organizations, is one of the most influential models in the study of technology use [15]. tam explains the factors influencing the behavior of an individual regarding accepting and using new technology. perceived usefulness (pu) is the key determinant of acceptance, meaning the user‟s „subjective probability that using a specific application system will increase his or her job performance within an organizational context‟ [14]. perceived ease of use (peu), is „the degree to which the user expects the target system to be free of effort‟ [14]. together, pu and peu determine the attitude (a) of a person towards using the system. finally with the influence of pu and attitude, behavioural intention (bi) influences the actual use of the system. however despite its robustness across populations, settings and technologies, davis [16] later identifies the following key limitations of tam.  static, cross-sectional, snapshot-oriented (individual level of analysis and limited span across causal chain).  emphasis on controlled, conscious processing (exclusion of automatic processing and overlooking multitasking).  limited account of social processes (knowledge collaboration and collective processes). malhotra and galletta [17] argue that tam is incomplete as it does not account for social influence in the adoption of new information systems, and therefore suggest to consider the effect of social influence on the commitment of the is user. furthermore, mathieson et al. [18] remark that tam has limitations in assuming that usage is voluntary and free of barriers that would prevent individuals from using an is. inclusion of social influence was indeed the motivation for tam-2, proposed by venkatesh and davis [19]. tam-2 provides a detailed account of the key forces of the underlying judgments of perceived usefulness, “explaining up to 60% of the variance in this important driver of usage intentions”. the unified theory of acceptance and use of technology (utaut) by venkatesh et al. [20] is a further development which combines some major theories (e.g. tam, theory of planned behavior, and innovation diffusion theory) from the is literature. the model has three direct determinants of intention to use (performance expectancy, effort expectancy, and social influence) and two direct determinants of use behavior (intention and facilitating conditions). there the intention and facilitating conditions are mediated by experience, voluntariness, gender, and age. venkatesh et al. [20] suggest that „given that utaut explains as much as 70 percent of the variance in intention, it is possible that we may be approaching the practical limits of our ability to explain individual acceptance and usage decisions in organizations‟. a. rutama conceptual research model based on the review of a number of theories pertinent to technology acceptance in general and mobile technology in particular, we have developed a conceptual research model for the research objectives as stated. this conceptual model (figure 2), which can be known as the rutam – rural technology acceptance model, incorporates most of the major and commonly used factors in a summarized fashion. a simplified version of rutam is also presented in figure 3. so far, the rural area technology acceptance and diffusion of innovation model (rutadim) proposed by lu and swatman [21] is the only model focusing on rural context, but it is devised in a developed country context. rutadim was developed specifically to investigate acceptance of mobile technology and the likely diffusion of a project called mobicert in rural areas of south australia. although rutadim is not tested beyond the context in which it was developed, we have considered the two new external variables it proposes: „rural connectivity‟ and „access and response time‟ under the „facilitating conditions‟ in rutam. external variables (ev) perceived usefulness (pu) perceived ease of use (peu) attitudes towards use (a) actual system use behavioral intention (bi) 6 sirajul m. islam and åke grönlund international journal on advances in ict for emerging regions 04 june 2011 fig. 2. the rural technology acceptance model (rutam) fig. 3. simplified version of rutam it can be noticed that rutam is strongly influenced by the original tam. the prevailing models express different views of the relations among the factors we adopted. following the most recent models, we adopted a “social” approach in rutam, assuming that social influence is more important than technology itself. this is contrary to the original tam but consistent with most of later models. following that assumption, we tentatively distinguished between external and individual factors influencing pu and peu. next we briefly describe the conceptual model rutam and the factors to be analyzed under each of the proposed constructs based on the literature pertinent to the use of new technology (e.g. mobile phones) and systems. 1) facilitating conditions venkatesh et al. [20] define facilitating conditions (fc) as „the degree to which an individual believes that an organizational and technical infrastructure exist to support the use of a system‟. seneler et al. [22] describe fc as the support given to the users while interacting with the technologies, like learning the technology from a friend. jain and hundal [23] argue that the choice of service provider is affected by the facilitating factors such as network coverage, service quality, easy availability of subscriptions and bill payment centers. the following list of variables is commonly found relevant to the mobile phone technology which can broadly be categorized as the „facilitating conditions‟:  rural connectivity & access time [21]  technological infrastructure [24], [25]  quality and availability of support services [23], [26]  market structure and mechanism [27], [28]  tax policy and distribution channel [25]  modes of payments [24], [25], [29] 2) tech-service attributes tech-service attributes (ta) refer to the properties or characteristics of a certain technology, system, or service that distinguish it from other technologies, systems or services. adesina and baidu-forson [9] find that farmers' perceptions of technology characteristics significantly affect their adoption decisions. dishaw and strong [30] also argue that users‟ perceptions about ease of use and usefulness are likely to be developed from rational assessments of the characteristics of the technology and the tasks for which it could be used. therefore, the variables related to this (ta) category are:  service characteristics [31]  cost of handsets and services [21], [26], [32]  technology characteristics interface, network capabilities [33]  interface characteristics [23], [33]  brand reputation, flexible technology (e.g. cdma or gsm) [23], [34], [35] 3) tech-service promotion while awareness is the individual‟s extent of alertness and ability to draw inference in a certain time and space towards an object or situation, influence is the process of creating this awareness. kalish [36], characterizes awareness as one of the steps towards adoption and subsequently defines it as “the stage of being informed about the product search attributes” (p. 1569). kotler and armstrong [37] place awareness as the prerequisite of knowledge, liking, preference, conviction and purchase. doss [38] finds that lack of awareness is one of the main reasons for farmers not adopting the new technology. cook [39] therefore suggests that suppliers must promote their initiatives in order to create awareness among the users. 4) social influence according to the theory of reasoned action [40], [41], behavioral intention of a person is influenced by subjective norms which in turn are influenced by the significance of referents‟ perception (or normative beliefs) and motivation to comply with those referents. stiff and mongeau [42] find that the influence of social norms on individuals‟ behavioral facilitating conditions tech-service attributes tech-service promotion individual characteristics demographic social influence e x te rn a l f a c to rs in d iv id u a l f a c to rs perceived usefulness (pu) perceived ease of use (peu) behavioral intention use external factors perceived usefulness (pu) perceived ease of use (peu) use behavioral intention (bi) individual factors sirajul m. islam and åke grönlund 7 june 2011 international journal on advances in ict for emerging regions 04 intentions in some cases is stronger than the influence of attitudes. sometimes, perception of societal norms may prevent a person‟s behaviour in accordance to his/her personal attitudes. in a rural context, jain and hundal [23] find that “the rural people [of india, authors‟ remark] had been found more influenced by the neighbors‟ usage […….] and media has been regarded as the negligible impact on the choice of buying a mobile phone”. in addition to neighbors, there are some other sources of influence also evident in the literature, such as relatives, friends, and seniors or influential persons in the community [26], [31], [43]. 5) demographic factors there is a good number of studies describing the importance of the demographic context in use and adoption of new technology. according to those studies, variables that are important in this category are:  age [10], [23], [33]  gender [44]  culture and ethnicity [33], [45], [46], [47]  income and household [24], [48], [49]  occupation [10], [48]  education [24], [49] age is one of the most discussed demographic factors in the technology adoption literature. however, mallenius [50] suggests that the “keyword should not be age, but rather, functional capacity” which addresses the capacity to use mobile devices and services. richardson et al. [51] find, in a study on grameen telecom's village phone programme in bangladesh, that “higher expenditures for better service are more likely to come from younger phone users aged 20 to 30, an age group that would more likely be receptive to a wider range of phone services, including card phones” . similarly, the jain and hundal [23] study among the rural people of india reveals that the majority of the users (62 %) of mobile phones are within the age group of 20 to 40. considering the impact of culture on human behavior, phillips et al. [52] argue that cultural affinity has a positive influence on technology adoption through perceived ease of technology use and therefore it is highly correlated with demand for products and services. on the other hand, biljon and renaud [47] find that “mobile phone uses have a unique set of cultural dimensions …… that do not necessarily correspond to the culture that exist in human-human relations”. dimaggio and cohen [49] explain the positive correlation between the level of income and timing of adoption of new technology. he finds that availability of a technology infrastructure shapes inequality by place of location (urban verses rural) that makes income more important. similarly, kalba [24] argues that adoption of certain technology attributes or alternatives (e.g. fixed vs mobile connection and postpaid vs pre-paid services) depends on the level of household income over time. furthermore, the rate of income depends on the type of occupation [10], [48] and therefore it is an important factor for the urgency and relevance of adopting a technology at a given time and within a specific cultural framework. education and income are closely related [53]; the more educated a person is, the greater is the likelihood of a high income. also, more educated people are better able to learn and use new technology [49] and hence they are more likely to be innovative. with respect to farmers, fuglie and kascak [54] find that diffusion of new technology among this community is relatively slow due to their low education level. yet, the jain and hundal [23] study on rural india exhibits that a majority of the mobile adopters have education level „below metric [10 th class]‟. 6) individual characteristics sultan and chan [55] argue that individual characteristics are more significant than technology properties in the technology adoption process in general. on the other hand, wei and zhang [56] find that in the rural context psychological factors in adopting a new technology and mobile phones in particular are less significant than behavioral factors. such a phenomenon in a rural setting is probably the social influence on the adoption process which is stronger than individual characteristics [31]. gatignon and robertson [57] suggest that information processing capability is a factor that separates the adopters from the non-adopters. this capability is framed by the individual‟s extent of observability or awareness [48] [58], innovativeness [32], [59] and past adoption or usages experience [34]. compatibility or apprehensiveness [32] [48], which is also an important determining factor, depends not only on a person‟s age, education and income but also on the relevance of the new technology with the task or job in a given time and place. 7) perceived usefulness (pu) and perceived ease of use (peu) perceived usefulness (pu) and perceived ease of use (peu) are the most cited factors that influence the attitude and behavioral intentions of a person [14]. these two factors are also most significant in mobile service usages [31]. with regard to the usage of cellular phones, kwon and chidambaram [10] find that peu has significant effects on users' extrinsic and intrinsic motivations, while apprehensiveness has a negative effect on intrinsic motivations. gefen and straub [15] argue that the importance of peu is related to the nature of the task an individual is facing. peu, therefore, directly affects the adoption of a device, such as a mobile phone, only when the person‟s primary task is to be done via such device. it is therefore also suggested that peu is affected by actual use of the phone (i.e. after adoption), though the effect diminishes with the frequency of usages over time [60]. the following is a list of common factors related to pu as cited by many studies on mobile technologies:  perceived usefulness [26], [60]  job relevance [11], [21], [61] http://www.telecommons.com/villagephone/graphs/users-ageandwilltopaymore.gif 8 sirajul m. islam and åke grönlund international journal on advances in ict for emerging regions 04 june 2011  mobility [31], [43]  new possibilities [62]  enjoyment [31], [62]  convenient/time saver [63]  productivity (save money and make more money) [63]  indispensible for business [63] 8) behavioral intention and use attitude, as a significant factor in the process of adoption, is found in the original studies of tra [40] and tam [14], but has been excluded from many other studies, even the later versions of tam. where social norms and perceived usefulness are strong, a person‟s innate unfavorable attitude may disappear and behavioral intentions will become more consistent with the social trends in a certain time and context [31], [42]. as subsequent action for adoption is concerned, sarker and wells [33] find continuity of use over time and resource commitment as the two outcomes, while some other studies describe these two as „actual use‟ [26], [60], [61]. iii. methods this is an interpretive case study [64] aiming to investigate the factors influencing adoption of mobile phone technology among the farmers in bangladesh for a broader purpose of offering a better understanding of how to provide useful information services to the rural communities in the developing regions as part of the process of overall rural development. this approach is particularly relevant to this study as it is “aimed at producing an understanding of the context of the information system [i.e. using and adopting mobile phones, authors‟ remark], and the process whereby the information system influences and is influenced by the context” [64]. furthermore, the inductive thinking process in interpretive research provides a hypothesis with a goal „not only to conclude a study but to develop ideas for further study‟ [65]. this paper investigates the situation by means of a mix of qualitative and quantitative data [65] where the researchers are, as required, directly involved in the process of collecting and analyzing the data. in this case one of the researchers was a „passionate participant‟ [66] being a resident closely acquainted with the farmers in bangladesh, while the other one was a “distant observer”. this “participant observation” approach with a “sense of detachment” [67], helped to achieve comprehensive insights about the social settings of the farmers in bangladesh. a. data collection secondary data was collected by means of a literature search and by analyzing the contexts and existing theories as advised by walsham [64]. in this case, both academic and general search engines were used. a “snowball” approach [67] for locating relevant papers was also employed by checking the lists of references of the relevant papers found. for the empirical part of the study, we conducted several surveys over the period between november 2006 and june 2009 in rural bangladesh, primarily to understand the agricultural market information systems and the use of mobile phones by the rural inhabitants, particularly the farmers. table i summarizes the surveys. the first survey [8] was aimed at an overall understanding about farmers and agricultural marketing channels of bangladesh and at evaluating the effectiveness of government-initiated webbased agricultural market information systems (www.dam.gov.bd). the other two were done immediately before and after the test run of a mobile phone based agriculture market information system (amis) designed for the farmers in a number of remote villages in a northern district of bangladesh the questionnaires used had sections on demographics, personal situation, farming situation including methods and produce, information and market needs and habits, and views and preferences regarding media and communication technology. there were structured as well as open-ended questions. to allow comparison, several questions were identical across the surveys. as many farmers are illiterate, the questionnaires were completed by the interviewers. in addition to the questionnaires, data was collected by the first author by means of observations, interviews and conversations with the farmers, and by video and face to face discussions with the relevant actors in natural and formal settings. table i research surveys in bangladesh (november 2006 – june 2009) period sample size (n) respondents methods research focus nov.2006 – feb.2007 1050 (350 from each category) farmers, wholesalers and retailers from 13 (out of 64) districts of bangladesh semi-structured interviews for the supply side and structured questionnaires for the demand stakeholders. government sponsored web-base market information systems and marketing channel nov.2007 – feb.2008 420 farmers from 50 villages of 13 (out of 64) districts of bangladesh structured and open-ended questions. demographics, personal and farming situation including methods and produce, information and market, media and communication technology dec.2008 – jan.2009 210 farmers (who had mobile phones) within the geographical zone of the pilot project on mobile phone based amis structured and open-ended questions all components of the research model sirajul m. islam and åke grönlund 9 june 2011 international journal on advances in ict for emerging regions 04 b. data analysis primary data were categorized according to the research objectives in general and the insights derived from the literature review. the categorized data were used to examine the existing concepts by a simple frequency analysis (i.e. percentage) and to establish some arguments based on the discussions, open-ended questions and observations. the comprehensive literature review and series of data collection efforts until the point of theoretical saturation in our study suffice the iterative and comparative characteristics of the qualitative research [68], [69]. orlikowski [69] states that the resultant framework from the process of theoretical saturation would be empirically valid and should “generalize the patterns across the sites”. the literature review process followed the guidelines of webster and watson [70] and oates [67] which are designed to lead to the proposal of a conceptual model that synthesizes and extends existing research. concerning the technology adoption models, starting from rogers‟ innovation diffusion theory (idt) [13] and davis‟ technology acceptance model (tam) [11], data from the collected papers were compiled in accordance with commonly cited factors of adoption and most prominent variables, such as perceived ease of use (peu) and perceived usefulness (pu). sorted data was categorized in some “concept matrixes” [70], and subsequently explained by including concepts and aspects from relevant articles and surveys. finally, the findings were summarized in a conceptual graphical model, called rutam, where each of associated variables is explained and rationalized by the theoretical and practical groundings. iv. bangladesh case study: results and discussion in the following, we present and discuss our empirical findings, observations, and related studies on the farmers in bangladesh using the factors as discussed in our conceptual model. a. facilitating conditions and technology-service attributes according to bangladesh telecommunication regulatory commission (btrc) [6], mobile phone density in bangladesh is now around 45%, to be compared with a mere 1% in 2003. this rapid growth started in 1997 with the abolishing of the monopoly enjoyed by a company, citycell, which uses cdma technology. under this monopoly, the cost of a mobile subscription was more than usd 1500 and network coverage was limited to only three metropolitan areas – dhaka, chittagong and rajshahi. grameenphone (gp, “the village phone”) came into the market with their gsm technology right after the abolishment of the monopoly and within six years of operation it became the first operator in bangladesh to reach one million subscribers. the fast expanding network, cheap fees and subscriptions, flexible technology and payment options (e.g. gsm vs cdma, prepaid vs postpaid service) and offering many value added services (vas) have rapidly made gp a market leader. subsequently, the heavy competitive pressure among the six market players has led to reduced subscription cost to only around usd 25 a reduction of more than 98 % since 1997. table ii shows technology attributes of the mobile phones used by the farmers. it indicates that rural people primarily prefer the network operators and handset providers respectively who have better networks and affordable prices with positive reputations. the nokia 1000 series dominated the market. in fact, the ultrabasic model nokia 1100 is one of the most popular cell phones in the world as well [71]. our survey (table ii) also found that most of the phones were owned during the period between 2006 and 2007. nationwide statistics also shows that the mobile penetration rate soared by around 250% from 6% in 2005 to 15% in 2006 [6]. there could be three possible reasons behind such growth; (a) government‟s reduction of import tax on mobile phone handsets from tk. 1,500 (us$25) to tk. 300 (us$5) in mid 2005, (b) reduction of tax on sim/rim from tk. 1,300 (us$20) to tk. 800 (us$12) in 2006, and (c) initiation of competitive airtime tariffs with flexible payment options (e.g. prepaid service). table ii technology attributes of the mobile phones used category features responses, n= 210, year 2009 handset brand nokia 80% (of which 95% is ultrabasic 1000 series.) reasons for buying a particular brand affordable price 40 % user friendly interface 30 % long-lasting battery capacity 12 % reputation 8 % easy availability 8 % operators subscribed grameenphone (gp) 95% (out of a choice of six operators). subscription period 2006 and 2007 60% subscription type pre-paid 100% main reasons of subscribing gp better network (stability) 40 % affordable price 35 % reputation, vas 25 % service facilities received from the operators flexible payment options. 80% customer services on demand 80% ample availability of retailers for refilling talk-time and the ability to sign up for a new subscription locally 75% 10 sirajul m. islam and åke grönlund international journal on advances in ict for emerging regions 04 june 2011 b. tech-service promotion and social influence the presence of adequate awareness is one critical success factor for acceptance. for example, we found in our 2007 survey [8] that less than 1 % of the farmers had heard about the existence of the government sponsored web-based agriculture market information service (www.dam.gov.bd). whether or not the service was useful to farmers or not, clearly government had failed to promote its existence among the farmers. table iii exhibits the sources of awareness prior to subscribing to mobile services, which clearly points to a strong social influence. although the choices of handsets and operators are distinct to each other, they are typically marketed together and often purchased at the same time. the above findings suggest that for both handsets and operators, social influence is more important than the media influence. we also found that 32 % of the farmers (table vi) in our sample have an education level above class ten, which matches the percentage of farmers being influenced by media while subscribing and buying the phones. in fact, it has been observed that media influence is strong among the early adopters who are apparently more of risk takers. in reality in rural bangladesh, this group also acts as referents to others having less or no education. as discussed in the literature review, „individual factors‟ is an important influencing part of mobile phone adoption in a rural context. in fact, how the external factors are impacting upon a person depends on the individual factors which are comprised of individual and demographic characteristics, and extent and type of social influences. c. demographic factors 1) age table iv shows that people between 19 and 30 are the most prone to mobile phone usage in 2009, but the age group 3150 is not far behind and in fact they were the most frequent phone owners in 2008. the age group 19-30 is generally seen as the most important target group. 2) gender according to the un-fao [72], there are around 49% females and 52% males in bangladesh. women have a nearly 50% lower adult literacy rate than men and constitute around 46% of the farming population. the female share of non-agricultural wage employment in 2002 was 25 % [73]. there is a lack of literature describing to what extent women are mobile phone subscribers and users. the authors of this paper did not survey the gender ratio of mobile phone use in rural villages, but a general observation is that it is mainly dominated by men. in fact, we did not find any female farmers having mobile phones in our sampling. there is so far an only one study hultberg [44] on bangladesh that shows that men make 70% of the mobile calls, which indicate a distinct gender difference. 3) culture almost 90% of bangladesh‟s population shares the same language, culture and religion. therefore, we did not find any noticeable characteristics or activities that can distinguish the adopters from the non-adopters in terms of their prevailing human culture. an interpretation of this observation could be the cultural homogeneity at the national level and socio-economic homogeneity at the rural level. according to khan et al. [74], “except for the [small] tribal areas in the chittagong hill tracts, bangladesh is a homogeneous country in which all rural areas are generally similar”. 4) income and households table v indicates that, not surprisingly, the income level in our sample was very low and most people are poor according to the national statistics [2]. according to this table, 72% have a monthly household income of less than usd 84. this suggests that mobile technology in rural bangladesh has been significantly spread even in the lower income groups. it further suggests that except as concerns table iii sources of awareness prior to subscribing to mobile services category influences responses, n= 210, year 2009 social norms community use 65% reasons of choosing the brand of handsets (not mutually exclusive) human influence (friends, relatives, neighbors, and other early adopters). 75 % media influence (radio, tv and newspapers). 32% reasons of choosing the operators (not mutually exclusive) human influence (friends, relatives, neighbors, and other early adopters). 58% media influence (radio, tv and newspapers). 53% table iv survey results – age age level (those have mobile phones) 2009 n=210 2008 n=155 18 th and below 7% 0% 19 th –30 th 47% 25% 31 st – 50 th 40% 47% 50 th above 6% 28% table v survey result – farmers‟ monthly income monthly income level (household) (those who have mobile phones) 2009 n=210 2008 n=135 us$ 84 and below 72 % 62 % us$ 85 – 145 22 % 27 % us$ 146 – 215 6 % 2 % us$ 215 above 0 % 9 % (1us$ = bgd taka 70 approximately) sirajul m. islam and åke grönlund 11 june 2011 international journal on advances in ict for emerging regions 04 the choices of certain technology attributes (fixed vs mobile, post vs pre-paid, basic vs. advanced features etc.) and timing of adoption, income is not an important factor for the adoption and use of mobile phones in rural bangladesh. 5) education according to the survey (table vi), a majority of the farmers (68%) have an education level below the metric (class 10) or secondary schooling taught in native language (bangla) and 25% have no formal education. it has been observed that those who do not have any formal literacy are able to access the mobile phone interface by memorizing the signs or symbols instead of the letters. this at least suffices to be able to perform basic operation of the phones. 6) individual characteristics a person who has high self-efficacy achieves compatibility towards adopting a new technology over time by exerting the required efforts. however in general, such self-efficacy among the farmers in bangladesh is apparently low and therefore the influences of social norms are relatively higher. according to our 2009 survey, around 85% of the farmers opined that use of mobile phone had become a part of their daily lives. this opinion, and our overall observation, suggests that a particular buying behavior (i.e. extravagance) exists which is contradictory to the traditional pattern of (positive) correlation between income and consumption. people even manage to buy a mobile phone by taking loans from others or by saving money at the cost of sacrificing consumption of other essential goods. examining farmers‟ intentions prior to deciding whether or not to buy a mobile phone, our 2009 survey (n=210) shows that farmers exhibited moderate innovativeness and a low level of individual uniqueness. they also had a need for cognition, and dependence on visualization rather than verbalization prior to using and adopting anything. a study by islam et al. [7] also found that “education, family size, farm size, annual income, farming and living expenditure, innovativeness, communication exposure, organizational participation and, aspiration were positively correlated with their use of information system”. 7) perceived usefulness and perceived ease of use although the numbers presented in tables vii and viii are on post-adoption data, they provide some indications for pu and peu of the users. according to table vii, almost all farmers opined that use of mobile phones makes their daily lives easier mainly due to its mobility characteristics. it helps overcoming barriers of time and location and improves productivity. on the other hand, table viii indicates that the english language is a major problem in accessing the interface and contents of the services. however, when they receive sms, they use to seek help from their family members, neighbors or friends to interpret it. this indicates that due to strong social influence and perceived usefulness, linguistic and other operational barriers do not discourage people from using a service they deem to be useful. in fact, over time, users would be able to overcome the basic barriers that they initially thought of and faced in. 8) behavioral intention and use table ix shows that the affective attitudes [75] in the form of trialability [13] or experiments [33], [60] apparently have little impact on the behavioral intentions of the farmers. most of the users do not intend to change either operator or handset. however, some of the subscribers have intentions to change their handset probably either for enjoying better features as they already have achieved some operational skills or to buy brand new sets since many of them initially bought second hand sets from early adopters. regarding the calling habits, a majority of the respondents make at least five calls a day, which can be considered as „frequent users‟ given the low economic status of the group. around 85% of these calls take place among relatives and friends, while the rest are mainly used for business purposes. d. key findings table vi survey result – farmers‟ education level education level (those have mobile phones) 2009, n=210 no formal education 25% 1 – 10th class 43% secondary school 15% higher secondary 14% graduate 3% table vii farmers‟ perceived usefulness (pu) farmers’ perceived usefulness (pu) for mobile phone usages agreed n=210, yr = 2009 makes the life easier 97% useful in daily lives 95% mobility (overcoming time and location barriers) 83% productivity (by saving money and making more money) 80% table viii farmers‟ perceived ease of use (peu) statements responses it is so interesting to access the sms and internet features disagreed= 57 % (n=210, year = 2009) features of mobile phone seem uncomfortable to use english & sms = all making call : none (year = 2009) do you access value added services? no = 75% (n=155, year = 2008) table ix farmers‟ mobile phones usages behavior statements responses i will switch from my present phone operator soon (n= 210, year 2009) disagree = 80 % i will change my existing mobile handset soon disagree = 72 % on average, how many calls do you make in a day? 1 to 5 calls = 47 % 6 & above = 53 % on average, how many calls do you receive in a day? 1 to 5 calls = 27 % 6 & above = 73 % 12 sirajul m. islam and åke grönlund international journal on advances in ict for emerging regions 04 june 2011 the following section details the key findings by populating the conceptual model (figure 4) with the specific variables that our surveys, in combination with the arguments of related studies have found. as for external factors, market structure, infrastructure and tax policy of the government are the three major facilitating conditions. our evidence shows that mobile phone penetration rate increased dramatically during the time when the government abolished the monopoly in the telecommunication sector and reduced taxes on subscriptions and handsets. the network infrastructure (e.g. available base stations) is also a matter as it is directly linked to the „network effect (or externality)‟. quick and wide expansion of the gp network created not only a positive effect on the customer base, but also created strategic advantages over other players in the market. tech-service promotion is an external factor influencing the process of individual awareness. this in fact is the promotion of products and services on part of the contents, products and service providers. although, media influence among the farmers is not so great due to the low availability of media (tv, radio and newspapers) and low literacy level, it affects indirectly through referents as part of social influence process. in fact, social influence turned out more powerful than promotional factors in our study. farmers want to share the experience and rely mainly on educated family members, friends and neighbors who either are early adopters or have knowledge about the products and services. the tech-service attributes include network stability, cost of subscription for services, bill payment options, user friendliness of the handsets and brand reputations. this factor influences directly both on the pu and peu of an individual. while facilitating conditions and tech-service promotion are more of indirect factors, tech-service attributes affect directly the decision making process of an individual. tech-service promotions and tech-service attributes are two new additions to those found in the literature review of technology acceptance models. the level of compatibility with the product and services, the extent of awareness in regard to the surrounding environment (e.g. what is going on around them, who is using what and why etc.), need for visualization rather than relying on words of mouth, and a tendency towards extravagant buying are the individual level characteristics of the farmers in bangladesh. however, more or less, these characteristics are shaped and affected by a person‟s age, gender, occupation and education. we find that variances in age, gender and education are matters in regard to the nature and the extent of use and acceptance. in particular, we observed that the age group 19 30 is the most frequent users. the gender bias is strong; users are predominantly male. the survey also reveals that education has effect only on the choices of services, not on the decision of buying a phone. fig. 4. factors with associated variables in rutam influencing the adoption of mobile phones among the farmers in bangladesh mobility, better means of connectivity with others, productivity in terms of saving money and increasing profits, and a sense of enjoyment are the major considerations that influence the perceived usefulness. although lack of literacy seems to be a barrier to access to the interface, it does not seem to deter an individual from using the services as and when required. we also observed that even users who do not have electricity used the mobile phones. they used to charge their handsets at neighbors‟ houses or a nearby rural market. this suggests a strong social influence and high perceived usefulness. it further indicates that while peu is indeed a factor, pu is more important (as shown by dotted in figure 4) finally we find that despite the presence of socio-economic constraints, farmers are frequent users and have a low tendency to switch from one product or service to another. facilitating conditions -market structure -infrastructure -tax policy tech-service promotion service operators devices individual characteristics awareness compatibility need for visualization extravagance demographic age gender education social influence family friends neighbors early adopters e x te rn a l f a c to rs in d iv id u a l f a c to rs tech-service attributes network stability cost of subscription & payment options cost of devices interface usability perceived usefulness (pu) mobility connectivity productivity enjoyment perceived ease of use (peu) behavioral intention use high frequency long-tern use sirajul m. islam and åke grönlund 13 june 2011 international journal on advances in ict for emerging regions 04 v. conclusion this paper has explored earlier theories and models on technology adoption and diffusion and summarized them into a conceptual research model, which has not been done before so comprehensively. we have detailed and rationalized the factors by means of empirical data and studies related to rural bangladesh. the conceptual model populated with some factors as presented here can be useful for policy makers, service and technology designers and marketers, and researchers having particular interest to serve rural communities in developing regions. the inclusions of two new external factors – „tech-service promotion‟ and „tech-service attributes‟ – may be of special interest for the researchers devoted to technology acceptance and diffusion models. one limitation is our study is of course that we have not provided any formal testing of the rutam. we have provided empirical findings to validate the contents and the logic of it based on relatively a small sample size. in fact, the present version of rutam is a hypothesis which can be considered as the first step of extending the prevailing tam, specially fitted for rural people in poor countries. in this case, a formal testing in a larger sample would be one direction of future research. strictly speaking we are not generalizing the results, but we still believe that the findings can be used, with caution, in other countries having similar socio-economic and technological contexts. references [1] united nations department of economic and social affairs (undesa), “world population prospects: selected demographic indicators: population – 2010”. population division, un, new york, 2009. http://esa.un.org/unpd/wpp2008/tab-sorting_population.html. [2] bangladesh bureau of statistics. statistical yearbook. bangladesh bureau of statistics (bbs), government of bangladesh, dhaka, 2009 [3] bangladesh: priorities for agriculture and rural development. world bank, 2010. http://go.worldbank.org/770vr4diu0. [4] bangladesh: literacy assessment survey-2008. bangladesh bureau of statistics & unesco, 2008. http://www.unescodhaka.org/downloaddocument/literacy-assessment-survey-2008. [5] measuring the information society: ict development index 2009. ituinternational telecommunication union , geneva: itu, 2009 [6] mobile phone subscribers in bangladesh. government of bangladesh. bangladesh telecommunication regulatory commission (btrc). dhaka, 2010. http://btrc.gov.bd/newsandevents/mobile_phone_subscribers.php. [7] m.s. islam, m. e. uddin, and m.u. rashid, “use of knowledge system in the rural community in improving livelihood status of the farmers under rdrs”. journal of agriculture & rural development, vol. 5, no. 1&2, pp. 167-172, 2007. [8] m. s. islam and å. grönlund, “agriculture market information eservice in bangladesh: a stakeholder-oriented case analysis”. lncs series, 4656/2007:167-178, germany: springer, 2007. [9] a.a. adesina and j. baidu-forson, “farmers perceptions and adoption of new agricultural technology: evidence from analysis in burkina faso and guinea”. west africa agricultural economics, vol. 13, pp. 1-9, 1995. [10] h.s. kwon and l. a. chidambaram, “test of the technology acceptance model, the case of cellular telephone adoption”. proceedings of the 33rd hawaii international conference on system sciences, usa, 2000. [11] s. kim and g. garrison, “investigating mobile wireless technology adoption: an extension of the technology acceptance model”. information systems frontiers, vol. 11, no. 3, pp. 323-333, 2008. [12] g.m. beal and j. m., bohlen, "the diffusion process". special report, agriculture extension service, iowa state college, no. 18, pp. 56–77, 1956. [13] e. rogers, the diffusion of innovations. first & fourth editions, new york: free press, 1960 & 1995. [14] f. d. davis, “perceived usefulness, perceived ease of use, and user acceptance of information technology”. mis quarterly, vol. 13, no. 3, pp. 319-340, 1989. [15] d. gefen and d. straub, “the relative importance of perceived ease of use in is adoption: a study of e-commerce adoption”. journal of the association of information, vol. 1, no. 8, 2000. [16] f. d. davis, “acceptance of information technology: research progress, current controversies, and emerging paradigms”. workshop on hci research in mis, walton college of business, university of arkansas, december 8, 2007. [17] y. malhotra and d. f. galletta, ”extending the technology acceptance model to account for social influence: theoretical bases and empirical validation”. proceedings of 32 nd hicss, pp.6-19, 1999. [18] k. mathieson, e. peacock, and w.w. chin, “extending the technology acceptance model: the influence of perceived user resources”. acm sigmis database archive, vol. 32 , no. 3, special issue on adoption, diffusion, and infusion of it , pp. 86 – 112, 2001. [19] v. venkatesh and f.d. davis, “a theoretical extension of the technology acceptance model: four longitudinal field studies”. management science, vol. 46, no. 2, pp. 186-204, 2000. [20] n. venkatesh, m. g. morris, g. b. davis, and f.d. davis, “user acceptance of information technology: toward a unified view”. mis quarterly, vol. 27, no. 3, pp. 425-478, 2003. [21] n. lu and p.m.c. swatman, “the mobicert project: integrating australian organic primary producers into the grocery supply chain”. journal of manufacturing technology management, vol. 20, no. 6, pp. 887-905, 2009. [22] c.o. seneler, n. basoglu, and t.u. daim, “a taxonomy for technology adoption: a human computer interaction perspective”. picmet 2008 proceedings, south africa, 2008. [23] a. jain and b. s. hundal, “factors influencing mobile services adoption in rural india”. asia pacific journal of rural development, vol. 17, no. 1, pp. 17-28, 2007. [24] k. kalba, “the adoption of mobile phones in emerging markets: global diffusion and the rural challenge”. international journal of communication, vol. 2, pp. 631-661, 2008. [25] s. sangwan and l. f. pau, “diffusion of mobile phones in china”. erim report series, ers-2005-056-lis, 2005. [26] j. v. biljon and p. kotzé,”modeling the factors that influence mobile phone adoption”. proceedings of the 2007 annual research conference of the south african icsit on it research in developing countries, south africa, pp. 152 – 161, 2007. [27] p. a. geroski, “models of technology diffusion”. research policy, no. 29, pp. 603-626, 2000. [28] b. hobijn and d. comin, “cross-country technology adoption: making the theories face the facts”. frb ny staff report, no. 169, june, 2003. [29] c. sinha, “effect of mobile telephony on empowering rural communities in developing countries”. irfd conference on digital divide, tunisia, 2005. [30] m. t. dishaw and d. m. strong, “extending the technology acceptance model with task-technology fit constructs”. information & management, vol. 36, no. 1, pp. 9-21, 1999. [31] b. kargin and n. basoglu, “factors affecting the adoption of mobile services”. picmet proceedings, portland, usa, 5-9 august, 2007. [32] y. li, z., t. fu , and h. li, “evaluating factors affecting the adoption of mobile commerce in agriculture : an empirical study”. new zealand journal of agricultural research, vol. 50, no. 5, pp. 12131218, 2007. [33] s. sarker and j. d. wells, “understanding mobile handheld device use and adoption”. communications of the acm, vol. 46, no. 12, pp. 35–40, 2003. http://esa.un.org/unpd/wpp2008/tab-sorting_population.html http://go.worldbank.org/770vr4diu0 http://www.unescodhaka.org/download-document/literacy-assessment-survey-2008 http://www.unescodhaka.org/download-document/literacy-assessment-survey-2008 http://btrc.gov.bd/newsandevents/mobile_phone_subscribers.php http://www.springerlink.com.db.ub.oru.se/content/103798/?p=625c5763952549ebb990c995fcc07240&pi=0 14 sirajul m. islam and åke grönlund international journal on advances in ict for emerging regions 04 june 2011 [34] v. venkatesh, “determinants of perceived ease of use: integrating control, intrinsic motivation, and emotion into the technology acceptance model”. information systems research, vol. 11, no. 4, pp. 342-365, 2000. [35] h. karjaluoto, j. karvonen, m. kesti, t. koivumäki, m. manninen, j. pakola, a. ristola, and j. salo, “factors affecting consumer choice of mobile phones: two studies from finland”. journal of euromarketing, vol. 14, no. 3, 2005. [36] s. kalish, “a new product adoption model with price, advertising, and uncertainty”. management science, vol. 31, no. 12, pp. 15691585, 1985. [37] p. kotler and g. armstrong, principles of marketing. englewood cliffs, nj: prentice hall, 1994. [38] c. r. doss, “understanding farm level technology adoption: lessons learned from cimmyt‟s micro surveys in eastern africa”. cimmyt economics working paper, 03-07. mexico, d.f.: cimmyt, 2003. [39] r. cook, awareness and influence in health and social care: how you can really make a difference. radcliffe publishing ltd. pp. 256, 2006. [40] m. fishbein, and i. ajzen, belief, attitude, intention and behavior: an introduction to theory and research. ma : addison-wesley, 1975. [41] m. qingfei, j. shaobo, and q. gang, “mobile commerce user acceptance study in china: a revised utaut model”. tsinghua science and technology, vol. 13, no. 3, 2008. [42] j. b. stiff and p. a. mongeau, persuasive communication (2nd ed.). usa: guilford press, 2003. [43] c. c. wong and p. l. hiew, “diffusion of mobile entertainment in malaysia: drivers and barriers”. proceedings of world academy of science, vol. 5, 2005. [44] l. hultberg, “women empowerment in bangladesh: a study of the village pay phone program”. thesis: media and communication studies, school of education and communication (hlk) jönköping university, spring term , 2008. [45] j. isham, “the effect of social capital on technology adoption: evidence from rural tanzania”. iris center working paper, no. 235, 2000. [46] g. hans, “towards a sociological theory of the mobile phone”. in: sociology in switzerland: sociology of the mobile phone. online publications, zuerich, march 2004 (release 3.0). http://socio.ch/mobile/t_geser1.html. [47] j. v. biljon and k. renaud, “a qualitative study of the applicability of technology, acceptance models to senior mobile phone users, advances in conceptual modeling – challenges and opportunities”. lecture notes in computer science, germany: springer, no. 5232, pp. 228-237, 2008. [48] a. vishwanath and g. m. goldhaber, “an examination of the factors contributing to adoption decisions among late-diffused technology products”. new media & society, vol. 5, no. 4, pp. 547-572, 2003. [49] p. dimaggio and j. cohen, “information inequality and network externalities: a comparative study of the diffusion of television and the internet”. the economic sociology of capitalism, working paper, no. 31, 2003. [50] s. mallenius, m. rossi, and v. k. tuunainen, “factors affecting the adoption and use of mobile devices and services by elderly peopleresults from a pilot study”. proceeding of 6th annual global mobility roundtable, usa, 2007. [51] d. richardson, r. ramirez, and m. haq, “grameen telecom's village phone programme: a multi-media case study”. cida, canada, 2000. [52] lisa a. phillips, r. calantone, ming-tung lee, "international technology adoption: behavior structure, demand certainty and culture". journal of business & industrial marketing, vol. 9, no: 2, pp.16 – 28, 1994. [53] l. g. schiffman and l.l. kanuk, consumer behavior (9th ed). usa: prentice-hall, 2004. [54] k. o. fuglie and c. a. kascak, “adoption and diffusion of naturalresource-conserving agricultural technology”. review of agricultural economics, vol. 23, no. 2, pp. 386-403, 2001. [55] f. sultan and l. chan, “object-oriented computing in software companies”. ieee transaction on engineering management, vol. 47, no. 1, 2000. [56] l. wei and m. zhang, “the adoption and use of mobile phone in rural china: a case study of hubei, china”, telematics and informatics archive, vol. 25, no. 3, pp. 169-186, 2008. [57] h. gatignon and t.s. robertson, “technology diffusion: an empirical test of competitive effects”, the journal of marketing, vol. 53, pp. 35-49, 1989. [58] s. l. huff and m. c. munro, “managing micro proliferation”. journal of information systems management, vol. 6, no. 4, pp. 72-75, 1989. [59] j. lu, j.e. yao, and c. s. yu, “personal innovativeness, social influences and adoption of wireless internet services via mobile technology”. journal of strategic information systems, vol. 14, no. 3, 2005. [60] k. renaud and j. v. biljon, “a qualitative study of the applicability of technology, acceptance models to senior mobile phone users, advances in conceptual modeling – challenges and opportunities”. lecture notes in computer science, berlin / heidelberg : springer, vol. 5232. saicsit, 2008. [61] y. lu, z. deng, and b. wang, “an empirical study on chinese enterprises” adoption of mobile services, 1-4244-1312-5/07/, ieee, 2007. [62] c. carlsson, k. hyvönen, p. repo, and p. walden, “adoption of mobile services across different technologies”. 18th bled econference, slovenia, june 6-8, 2005. [63] j. donner, “the social and economic implications of mobile telephony in rwanda: an ownership/access typology”. in p. glotz, s. bertschi & c. locke (eds.), thumb culture: the meaning of mobile phones for society, pp. 37-52, 2005. [64] g. walsham, “interpretive case studies in is research: nature and method”. european journal of information systems, vol. 4, pp. 74-81, 1995. [65] r. k. yin, case study research: design and methods, revised ed., thousand oaks, usa: sage, 2003. [66] a. d. andrade, “interpretive research aiming at theory building: adopting and adapting the case study design”. the qualitative report, vol. 14, no. 1, pp. 42-60, 2009 [67] b. j. oates, researching information systems and computing, london: sage publications, 2006. [68] b. g. glaser and a. l. strauss, the discovery of grounded theory, new york, aldine, 1967. [69] w. j. orlikowski, “case tools as organizational change: investigating incremental and radical changes in systems development”. mis quarterly, vol. 17, no. 3, 1993. [70] j. webster and r. watson, “analyzing the past to prepare for the future”. mis quarterly, vol. 26, no. 2, 2002. [71] k. banks, “mobile phones and the digital divide”. pc world, idg news service, july 29, 2008. available: http://www.pcworld.com/article/149075/mobile_phones_and_the_dig ital_divide.html. [72] asia's women in agriculture, environment and rural production: bangladesh. sustainable development department, food and agriculture organization of the united nations, 1997. [73] bangladesh: female share of non-agricultural wage employment. the united nations university (unu)-globalis, the global virtual university, 2003 available : http://globalis.gvu.unu.edu/indicator_detail.cfm?indicatorid=62&co untry=bd [74] a. r. khan, r. w. rochat, f. a. jahan, and s. f. begum, “induced abortion in a rural area of bangladesh”. studies in family planning, vol. 17, no. 2, pp. 95-99, 1986. [75] e. r. hilgard, “the trilogy of mind: cognition, affection, and conation”. journal of the history of behavioral sciences, vol. 16, pp. 107-117, 1980. http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/stiff.htm&dir=com/gencom&cart_id= http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/stiff.htm&dir=com/gencom&cart_id= http://socio.ch/mobile/t_geser1.html http://www.thumbculture.com/ http://www.pcworld.com/article/149075/mobile_phones_and_the_digital_divide.html http://www.pcworld.com/article/149075/mobile_phones_and_the_digital_divide.html http://globalis.gvu.unu.edu/indicator_detail.cfm?indicatorid=62&country=bd http://globalis.gvu.unu.edu/indicator_detail.cfm?indicatorid=62&country=bd international journal on advances in ict for emerging regions 2022 15 (3): december 2022 international journal on advances in ict for emerging regions a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana love ekenberg, tobias fasth, nadejda komendantova, vasilis koulolias, aron larsson, adriana mihai, mats danielson abstract — the purpose of this research was to develop a methodological framework that could be applied for policy formation in situations having a high level of uncertainty and heterogeneity of existing opinions among involved stakeholders about risk mitigation and management such as covid-19 pandemic risk. in this paper, we present such a framework and its application for policy decision-making in botswana for mitigating the covid-19 pandemic. the purpose of the proposed model is twofold: firstly, to supply decision-makers with reliable and usable epidemiologic modelling since measures to contain the spread of the covid-19 virus were initially to a large extent based on various epidemiologic risk assessments. secondly, given that some sets of measures adopted in other parts of the world were progressively imposing high or even very high social and economic costs on the countries which adopted these measures, we provided a multi-criteria decision support model which could be used in order to weigh different policy approaches to combat the virus spread taking into consideration local impact assessments across a variety of societal areas. we describe how the formulation of a national covid-19 strategy and policy in botswana in 2020 was aided by using ict decision support models as a vital information source. then we present the virus spread simulation model and its results which are connected to a multi-criteria decision support model. finally, we discuss how the framework can be further developed for the needs of botswana to optimise hazard management options in the case of handling covid-19 and other pandemic scenarios. the significant research contribution is on advancing the research frontier regarding a methodology of including the heterogeneity of views and identification of compromise solutions in policy-relevant discourses under a high degree of uncertainty. keywords — covid-19; intervention modelling; simulation; multiple criteria decision analysis i. introduction ven though the first covid-19 vaccines have been approved for emergency use since early 2021 in the developed world, and were approved for regular use in late 2021, there are still no clear predictions of how long ……………….. correspondence: mats danielson (e-mail: mats.danielson@su.se) received: 20-12-2021 revised:02-09-2022 accepted: 22-09-2022 love ekenberg, tobias fasth, vasilis koulolias, aron larsson, mats danielson are from stockholm university, sweden and nadejda komendantova, adriana mihai are from the international institute of applied systems analysis, laxenburg, austria. (lovek@dsv.su.se, fasth@dsv.su.se, vasilisk@dsv.su.se, aron.larsson@miun.se, mats.danielson@su.se, komendan@iiasa.ac.at, mihai@iiasa.ac.at ). doi: http://doi.org/10.4038/icter.v15i3.7238 © 2022 international journal on advances in ict for emerging regions. the pandemic will last (with new mutations, such as the current, as of july 2022, dominating omicron ba.4 and ba.5, emerging continuously) and when updated vaccines will be approved and available on markets globally. the short-, medium-, and long-term costs associated with extreme mitigation measures have become a matter of debate and discussion. nowadays, such discussions constantly put pressure on countries and governments to relax their measures, regardless of the number of new covid-19 cases. these costs are enormous and associated with unemployment, low productivity in affected industries, limited trade and mobility, rising inequality, and increased risk of poverty as well as threats to food security and risk of hunger in a number of developing countries. for example, the closing of schools alone had adverse consequences including interrupted learning, gaps in childcare, high economic costs, and the risk of increasing dropout rates, among others. for supporting the national policy formation in botswana, a two-stage ict model was employed. the first stage consisted of a virus spread model based on the available evidence on covid-19 epidemiological factors, as well as some mitigation measures’ impacts. the response measures to the pandemic had to be analysed at specific local levels, to be seen in relation to the demographic, social, and economic conditions and practices, healthcare systems capacity, and stakeholders ’needs. there are various possibilities to combine measures into strategies to see their different effects in reducing the rate of virus transmissibility which is discussed in section 4. the second stage was a multi-criteria decision analysis model applied to the scenarios and mitigations generated during the first stage. these two stages combined made up the decision support model for policy formation. the entire covid-19 pandemic situation shows that mankind has been largely unprepared for it [1]. quite obviously, there was no vaccine readily available at the onset, nor was there any real preparedness in terms of research as is done regularly for the seasonal flu [2]. also, we did not have reliable information about critical measures to protect people from the virus and society from its spread or at least to reduce its exposure and vulnerability. decision-makers had to operate under conditions of severe uncertainty about the case fatality rate, the spreading of the virus, the timing of infectiousness, and the number of asymptomatic cases ‒ just to mention a few uncertainties [3]. a critical problem in assessing the risk is that the evidence about the case fatality rate is still contradictory because we do not know precisely the number of people who are infected, which is the denominator [4]. as a result of this and many other “known and unknown” factors in the covid-19 outbreak [5], public e this is an open-access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. mailto:lovek@dsv.su.se mailto:fasth@dsv.su.se mailto:vasilisk@dsv.su.se mailto:aron.larsson@miun.se mats.danielson@su.se mailto:komendan@iiasa.ac.at mailto:mihai@iiasa.ac.at http://doi.org/10.4038/icter.v15i3.7238 http://doi.org/10.4038/icter.v15i3.7238 https://creativecommons.org/licenses/by/4.0/legalcode https://creativecommons.org/licenses/by/4.0/legalcode 47 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 authorities had to make decisions based on “quantitative evidence” and expert scientific advice. these include advice on possible future scenarios, on assessment of the sanitary system carrying capacity (especially of intensive care units), on expected public adoption of more or less restrictive measures, and on the evolution of national public debates about the issue [6]. under conditions of large uncertainty connected with the handling of the covid-19 pandemic, several cognitive and behavioural biases might have played a role in the decisionmaking processes. these biases are connected with risk perceptions under conditions of ambiguity [7]. behavioural economics count more than 180 various biases. we assume that the following biases might have been relevant regarding the perception of the pandemic emergency situations such as the availability cascade [8], i.e., individuals adopt a new insight since other people have adopted it, and then there is also the availability heuristic, the mixture of frequency and the ease with which examples come to mind [9]. there are bandwagon effects and information cascades, where the individual adoption is strongly correlated to the proportion of people who have already adopted an idea, combined with an enormous amount of available information [10], base rate fallacy [11], probability neglects [12], exaggerated expectations, framing [13], group thinking in general [14], and many others. an obvious component is also the problem with bounded rationality in which individuals are restricted regarding their willingness to collect information and are unable to identify a perceived optimal solution. as a response, they make decisions only after they have significantly simplified the decision space, and must therefore be content with a certain (again perceived) acceptable level of performance. they search in this sense for a satisfactory solution, but they focus only on a limited set of options from available alternatives [15‒16]. in case of this bias, there is still a perceived rational claim that the benefits of time-saving will overshadow the costs of any potential reduction in the quality of the decision. while comparing actions of disaster risk reduction, there is also the issue of representativeness heuristics, when representativeness is defined as the degree to which an event is similar in essential characteristics to its parent population. and reflects the salient features of the process by which it is generated [17]. an example of how biases influence the decision-making processes could be the difference in perceptions of the asian and western disaster risk reduction authorities. previous experiences, such as the sars endemic or seasonal flu, influenced perceptions of covid-19 as being lethal. many asians perceived the covid-19 risk as being deadlier because of the sars epidemic, which the region experienced recently. at the same time, the eu disaster risk reduction authorities first perceived covid-19 as less deadly because of the frequent experiences with the seasonal flu. this shows the influence of representativeness and availability heuristics, as well as of anchoring bias, which is known mainly in relation to negotiation processes. then there is an unavoidable component of dread risk [18] (compare, e.g., with hazardous technologies) connected with the judgments of people about unknown risks and their “perceived lack of control, dread, catastrophic potential, fatal consequences, and the inequitable distribution of risks and benefits” [19]. risk perceptions, influenced by biases, affect decisions regarding risk mitigation and management, including precautionary measures. the types of precautionary measures that can be enacted by countries at different points in time, depending on the severity of a situation, include: advice to adopt individual hygienic/precautionary measures, limiting large events/mass gatherings, limiting medium events, closing/reducing the opening time of economic activities (restaurants, bars), closing schools and/or universities, adopting border restrictions, adopting travel restrictions, domestic lockdowns, and compulsory quarantine controlled by the military. these measures are progressively limiting more and more of the individual freedoms and have progressively higher economic and societal costs, undertaken with the aim of preserving citizen health. in their decisions on which measures to enact, many countries acted in an apparently uncoordinated manner, at least at the beginning of the pandemic spread. even if it is clear that covid-19 does not respect national borders, the measures undertaken by bordering countries have been partially inconsistent. for example, the decision to close or not to close children’s primary care facilities (e.g. kindergarten) has been justified with completely different logic and rationalities. as of march 13, 2020, switzerland activated a “state of necessity” and italy was already in a “state of emergency”. this means that italy was at a higher level on the risk evaluation ladder. notably, the two countries share a border and the region lombardy (the most affected area in italy) is bordering with the swiss canton ticino. on march 13, 2020, switzerland [20] decided not to close kindergartens to protect the most vulnerable groups of the population (namely the elderly/grandparents), who would otherwise be in charge of taking care of the children. a few kilometres away in northern italy, the kindergartens were closed, for the exact same reason: to protect the most vulnerable [21]. austria and switzerland also share borders and have a similar population size (8.8 million in austria and 8.5 million in switzerland). on sunday, march 15, 2020, austria activated a state of emergency and decided on domestic lockdown, with 800 confirmed covid-19 cases. on the same day, switzerland had 2,300 confirmed cases but much lower levels of restrictions [22]. in (non-neighbouring) albania, a lockdown was decided on the same day, with 40 confirmed cases and one death in tirana. decisions on whether or not to impose lockdowns were not taken only based on the number of confirmed cases, and the effects of these inconsistencies in decision-making are to a large extent unforeseeable. the analysis of public responses is a difficult topic for multiple reasons including constant changes in national policy measures over time and a lack of clarity about the drivers of those changes. in other words, rationalities and justifications have been clearly changing, moving, e.g., from “no panic” to “balancing health and economic aspects" to “health first” depending on the number of cases and several other factors (most of them unknown). when do countries decide to enact the aforementioned progressive measures? how do they take into consideration the economic side-effects of mortality, for instance [23], given that the pandemic economic situation is comparable to or even worse than the 2008 financial crisis [24]? for how long can countries sustain skyrocketing levels of unemployment? there are large sacrifices in direct and indirect capital and gdp per life saved (and especially lifea framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 48 december 2022 international journal on advances in ict for emerging regions years saved) which could lead to unbearable financial consequences for a long time to come. instead of making decisions in a state of panic, a rational analysis should be employed which takes into consideration the actual risks compared with each other. if the risks are indeed high and underestimated by some public authorities, then the typical first priority considered when deciding is moral justice. the obligation, in this case, would be to protect the elderly, the most vulnerable, and the only way to do that is to limit other people’s exposure as much as possible. however, without a clear estimation of the risk associated with the sars-cov-2 outbreak, there is no indication that this would be the best measure to be adopted by public authorities in their respective countries, not the least because the consequences are unknown and public reactions to unanticipated costs can appear. measures need to adequately estimate how much it would cost to reduce the risk and to what extent they can reduce it. measures need to take into consideration the individual perception and behaviour in the wake of risk, the factors which can influence the said perception including media reports and framing, as well as the emotions stirred by representations and by the level of uncertainty. ii. background to botswana’s response to the covid-19 pandemic there are various population and health systems issues that are specific to sub-saharan africa concerning covid-19. first, the demographic structure is different from the rest of the world. the median age of the population is 19.7 years compared to 38.4 in china and 43.1 in europe. early experiences from asia and europe showed that people over 60 years of age and those with significant health problems were the most vulnerable to covid-19. although africa’s relative youth may have been considered a protective factor, the precise trajectory of how the epidemic would evolve was unclear at the time of modelling. the second factor that had to be taken into consideration was the high prevalence of hiv, tb, malnutrition, and anaemia. there were some indications that last year’s peak of malaria and covid-19 could have coincided. hiv, tb, malaria, malnutrition, and anaemia are likely to increase the severity when contracting covid-19. thirdly, the measures of social distancing may not be easy to impose in africa. weekly attendance of religious services among adults is over 80% in some african countries. for example, senegal had protests when visits to mosques were banned while conversely, tanzania came under scrutiny when it was announced that places of worship would not be closed. another important fact that reflects the response to covid-19 is that in addition to the burden of infectious diseases like hiv, tb, and malaria, africa is facing the burden of non-communicable diseases such as diabetes, hypertension, cardiovascular and renal diseases as well as cancer and severe accidents. as a result, already stretched health systems did not have the possibility to handle the burden of covid-19 as well. the capacity of both medical personnel and material is the lowest in the world. the ratio of medical doctors and nurses per 10,000 inhabitants is less than five in the majority of african countries, which is far below the ratio in developed countries. the capacity to treat critically ill patients with multi-organ failures has posed a challenge to many developed countries. the number of icu beds and ventilators, as well as the possibilities for renal replacement therapy, is among the major challenges for most of the countries affected by the sars-cov-2 virus. ongoing issues with insufficient numbers of health workers who in the midst of the fight with covid-19 were exposed to the infection and eventually became victims of the virus, together with a lack of adequate ppe, have made headlines all over the world. a. covid-19 impact on sub-saharan africa – with a focus on botswana there is a need to reorganise and prioritise health systems in order to increase the critical care capacity focusing on good triage and keeping scarce icu beds only for critical cases. most sub-saharan countries are having the same challenges when it comes to health systems. botswana, despite being among the fastest-growing african economies, is sharing the same problems as other countries in the region. ever since it gained independence in 1966, botswana has had exponential growth mainly based on revenue from the mining industry but also from tourism and meat production. in the past few years, it became clear that there is a great need to diversify the economy and stimulate local production in order to create more jobs and reduce dependence on its neighbours, mainly south africa. another challenge botswana has is a scarce population of just over 2 million and a huge territory the size of france or texas. some parts of botswana that belong to the kgalagadi desert have a very low population density, less than 10 per km2. since the early nineties, botswana has been fighting the hiv/aids epidemic. in 2000, it has been officially declared that botswana is among the countries that have the highest prevalence of hiv in the world. life expectancy has dropped from 57 to 36 years. this has forced the government to virtually declare war on the epidemic and start offering arv treatment to all citizens and non-citizens in need. ever since the government got involved in arv treatment, it became a success story for the whole continent. the burden of the hiv epidemic has mobilised a lot of human and material resources. the early onset of the covid-19 epidemic did not bypass botswana entirely. the first three cases were detected at the end of march 2020 and consisted of citizens who had travelled to uk and thailand. immediately after the detection of the first cases, vigorous contact tracing started which resulted in further detection of new cases with local transmission. one month later, on april 27, 2020, there were 22 cases with one case of death of an elderly lady with other comorbidities. closely following the situation in neighbouring countries like south africa, where at the beginning of march 2020 there were several hundreds of patients including local transmissions, the botswana government decided to take similar radical steps and locked down the country on april 2, 2020 (using constitutional emergency powers). the borders were closed and people coming from outside were quarantined. strict measures of social distancing, restriction of movements, hand hygiene, and sanitising were introduced. only essential services were functioning. the situation in the late spring of 2020 was as follows: facing a prolonged period of isolation, the inability to travel and to live a social life for people who have many ties with families in rural areas, would have been taking its toll and causing serious psycho-social problems. the economy would have been deeply affected with most of the businesses closed 49 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 down and many workers being retrenched. the government introduced measures of social support and food baskets for families in need. banks had given breaks for loan repayments and owners eased on lease agreements. despite all these measures, the economy had been affected and there was rising concern about what would happen in the long run. in this situation, a decision was made to try to model the virus spread and societal effects from various mitigation strategies. in essence, botswana, like other sub-saharan countries, has had its own fair share of pre-existing socio-economic and health system issues that have been straining an already stretched health system. however, keeping in mind a lack of icu beds, skilled personnel, and resources, there was and still is a fear that things can get out of hand. the inability to treat very sick patients with multi-organ failures remains a large problem that is common even in developed countries. like in other developing countries, the consensus is that the focus should be on prevention through social distancing, hand hygiene, and restriction of movement. initially, it seemed that the measures of lockdown were working. however, the government, through the taskforce team formed by the president, immediately started working on public health measures that were meant to control and curb the epidemic. up until june 24, 2020, there were fewer than 100 confirmed cases and only one confirmed death. the first pandemic wave was essentially bypassed by botswana. during the autumn of 2020, the numbers started to rise with 14,025 confirmed cases at the end of the year and 40 reported deaths. fear was expressed that the epidemic would go out of hand, especially looking at the experience of other countries. other diseases like hiv, tb, and malaria have been flaring especially in the north-east part of the country and have been straining the health system. a lack of icu beds, ventilators, and machines for renal replacement has been worrisome for all sectors of society. in 2021, especially the third pandemic wave has taken its toll on botswana. the sharpest rise in reported cases was between mid-july and mid-august with over 50,000 new cases reported during that 30-day period and with a sharp rise in reported virus-related deaths following the period. in november 2021, there was a plateau after the third wave with a total of around 190,000 reported cases and 2,400 deaths. this is in relation to a total population of 2.35 million in the country. b. assessing covid-19 impact in botswana a proportionate response to the risk situation needs to take into consideration the social and economic impacts as well, which may be unprecedented, as estimated by the southern african development community (sadc), due to financial and healthcare system limitations. multiple factors and stakeholders need to be included in opting for a set of measures. the africa center for strategic studies had estimated for botswana a risk factor of 18 for the spread of covid-19, which was among the lowest on the continent (the factor ranged from 37 for south sudan down to 13‒16 for some island nations) [25]. this risk factor resulted from mapping the relative levels of vulnerability considering a country’s international exposure, its public health system, the density and total population of urban areas, the population age, the level of government transparency, press freedom, conflict magnitude, and forced displacement, concluding that “with early identification and isolation of cases, [it] may be better able to minimise the worst effects of this pandemic”. elsewhere [26], it was estimated that botswana was among the countries with a non-negligible risk, exposed exclusively to the potential risk from airports in the fujian province; since then, however, in april 2020, the first community transmitted cases were registered. furthermore, the world health organization estimated the country’s readiness status in february 2020 as being “adequate” [27] and in a situation update for the who african region, it was recommended for countries with under 100 confirmed infections that “measures to contain or at least delay the spread of the outbreak need to be intensified; including active case finding, testing and isolation of cases, contact tracing, physical distancing and promotion of good personal hygiene practices” [28]. for other african countries, other analyses have suggested more severe measures, such as for instance the london school of hygiene & tropical medicine (lshtm) recommending for nigeria a strategy that would combine the above who measures with “lockdowns of two months’ duration, where socio-economically feasible” [29] to delay the epidemic and gain time for planning and resource mobilisation. the effects of a lockdown compared to other mitigation measures were unclear; in south africa, for instance, it was noticeable that its epidemic trajectory had started to flatten before the lockdown effects came into place, but it was to be determined whether the slowed rate of infections was due to lower testing, missing cases in poorer communities or travel and public gatherings restrictions put in place before the lockdown. it has been argued [30] that given the lack of certainty about the effectiveness of a lockdown at specific local levels, direct involvement of the local communities in african countries in the decision over measures would be desirable, as they can provide the needed contextual knowledge and specific issues which must be included so as to preserve basic livelihoods through the chosen local strategy. the measures initially taken in botswana were implementing a suppression strategy, in a fashion similar to other countries. on april 2, 2020, the president declared a state of emergency and a national lockdown. this was described as a “mass quarantine strategy for suppressing or mitigating” the epidemic, with extreme social distancing imposed, including closing borders, closing schools and universities, suspension of public gatherings of more than 10 people, suspension of public transport services including long-distance buses and trains, and restricted movement. as complementary measures, the government also announced plans for both enhancing community testing and introducing an electronic permit application for contact tracing. there were a few immediately visible effects of the lockdown, aside from the epidemiological ones. on the downside, industry sectors were affected to an extent that was difficult to estimate, for instance, triggering a covid-19 pandemic relief fund with a capitalisation of two billion pula from the government [31]. this was to be distributed according to four strategic objectives: for wage subsidies for business sectors with a few exceptions for the industries which continue their activity, for stabilising businesses by offering, for instance, government loan guarantees and making tax concessions, then for ensuring strategic reserves and for promoting opportunities for the sectors which can upscale their local production. the educational system was also highly affected, as the ministry of basic education signalled. it was estimated a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 50 december 2022 international journal on advances in ict for emerging regions that learners might find themselves in the situation of repeating their classes in 2021‒2022. the closure of schools is making the duration of the current school year insufficient to meet the minimum requirements for the number of school days in a school year. botswana could have had possible advantages in the local demographics and population distribution, as mentioned above. the population of the country is relatively young. as evidence from other countries shows, the most severe covid-19 cases are among the elderly groups of the population. due to the demographic situation in botswana, the total mortality rate in this country could be lower than in china or western europe. the population density in botswana is also lower than in europe or china. but the healthcare system was insufficiently prepared to provide the necessary equipment and care. it was estimated that botswana has approximately 100 icu fully equipped beds and 2,000 overall available hospital beds. there are also associated comorbidities such as malaria, hiv/aids, and tuberculosis which can influence the number of potentially severe cases. each alternative of a set of covid-19 risk mitigation measures has implications for socioeconomic development in the country. therefore, the needs of various social groups and stakeholders should be considered while drafting policy measures and action plans for future pandemic risk mitigation and management. as the evidence on current pandemic risk management shows, there has been an astonishing lack of coordinated actions. also, no vision was developed for handling a similar or even a more serious event in the future. there are no entirely value-neutral policy plans. opting for the most popular vision and choosing a seemingly reasonable path ultimately requires tackling medical and financial considerations, as well as differing societal preferences together, rather than as separate issues. understanding of preferences from various stakeholder groups such as policymakers, industry, young community, civil society, and academia, contributes significantly to social acceptance of risk mitigation measures. guided by the hypothesis that contributions to such a development and preferences amongst societal stakeholders are just as important as medical or regulatory issues, a complete decision support model should address benefits and costs, perceptions and preferences, potentially arising conflicts between stakeholder groups and political requirements of different mitigation pathways. iii. available measures the measures to contain the spread of the covid-19 virus have been largely based on various epidemiologic risk assessments, which were made primarily by centres of disease control and prevention in europe and the us, and by the world health organization. these assessments established scenarios starting from the number of confirmed infections in a country, with every scenario having a series of recommendations on containment measures to use in order to limit the spread of the virus. aside from the increased healthcare and treatment efficiency efforts, these nonpharmaceutical interventions are layered progressively, starting from more low-cost measures to isolating individuals confirmed positive with the virus to, eventually, more invasive and costly social distancing measures. countries have taken different approaches as to which set of measures to introduce and when. some countries, such as japan, did initially mainly focus on contact tracing and testing, recommending people restrict their travels and teach and work from home. sweden chose to cancel public events and restrict public transport but did not close primary schools or workplaces while recommending people to keep a social distance. south korea had a similar approach, but with a more intensive contact tracing using digital systems. interestingly, taiwan, in spite of its proximity to china, had one of the lowest stringency levels [32] since they did not close down schools, workplaces, or public transport, and did mostly focus on tracing and isolating measures. taiwan’s experience with the 2003 sars epidemic could account for a series of quick decisions involving traveller screening, wide distribution of masks, hand sanitizers, and thermometers [33], as well as investing approx. usd 6.8 million into the manufacturing sector to create 60 new mask production lines. however, there was a dominant approach that seems to have been preferred by several countries including romania, austria, denmark, norway, germany, italy, and many others. this approach adopted extreme social distancing measures going from case quarantine and public gathering bans to partial lockdowns. furthermore, the approach included closing schools, public transport, and many workplaces, only allowing people to leave their homes for specific purposes, with a tighter curfew imposed on the elderly. these measures have been defended for their short-term capacity to reduce the rate of transmissibility and to flatten the epidemic curve as much as possible in order to primarily keep the hospital systems from getting overburdened. there are several challenges with modelling the effects of risk mitigation measures. one challenge is connected with epidemiologic models which do not take into consideration demographics, distribution of population, age groups, and their interaction patterns, such as the classic seir model. furthermore, there is limited evidence included in currently used models [6] on how each measure reduces the rate of transmissibility. it has already been argued that “the incremental effect of adding another restrictive measure is only minimal and must be contrasted with the unintended negative effects that accompany it” [34]. we begin to know more about some measures’ effectiveness. for instance, combining case quarantine with other public health measures is shown to be more effective than only relying on case quarantine. there is also some evidence that wearing masks [35] reduces transmissibility and is most effective when compliance is high, at the same time substantially reducing both the death toll and the economic impact. wearing them at a rate of 96% could alone flatten an epidemic growing at a rate of 0.3/day by bringing down the r factor (virus reproduction) from an original value of 3.68 to 1.00 or less. when combined with contact tracing, the two effects multiply positively [36]. but what about other measures? how effective is it to close schools or borders, or to restrict certain workplace activities? how much can a country build up its healthcare system during the restriction period? there is no meaning in restricting businesses if corresponding measures are not taken. in this case, the costs of lockdowns could be much higher than the costs of taking less extreme suppression measures. since there are no clear predictions of how long the pandemic will last and when vaccines will be approved and available on markets globally. the societal costs associated with these extreme measures have become a matter of debate and discussion. these costs are enormous, associated with 51 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 unemployment, low productivity in affected industries, limited trade and mobility, rising inequality, and increased risk of poverty as well as threats to food security and risk of hunger in a number of developing countries. however, few countries base their decisions to adopt a set of measures on adequate economic simulations, and the macro-level projections made by the imf [37] and oecd [38] estimate that the lockdowns will affect one-third of the developed countries’ gdps. this makes economic mitigation approximate at best and vulnerable to the many unknown side-effects which might not be possible to model at a country level. the response measures to the pandemic have to be analysed at specific local levels, to be seen in relation to the demographic, social, and economic conditions and practices, healthcare systems capacity, and stakeholder needs. botswana’s early suppression was an advantage from this point of view, as its low numbers of infections and deaths make way for a variety of pandemic hazard scenarios that can be considered for the future. there are various possibilities to combine measures so as to see their different effects in reducing the rate of transmissibility, while also looking at their different consequences under other criteria, including indirect deaths in different groups, inhibited work capacity in the short and long term, social costs, fear, democracy, and human rights aspects. iv. framework description initially, the prognoses used estimation-prediction methods such as spread models. then time calibration was done using the observed number of case fatalities and estimates of the time between infection to death and infection fatality risk. the assumptions which serve as a basis for predictions are that there is no change in behaviour and that preventive measures were put in place at one specific point in time. it is also assumed that the overall effect of preventive measures is known. the effects are estimated from the observed increased doubling time after preventive measures are put in place. the predictions are highly sensitive to the doubling times without and with preventive measures, sensitive to the basic reproduction number r0 but less sensitive to the estimates used for time-calibration: observed number of case fatalities, the typical time between infection and death, and the infection fatality risk [39]. a more complete framework could include more phases than the scenario generation using strategies that will constitute the basis for an initial analysis, which was the focus of the work discussed in this paper, where we applied varieties of the seir model for modelling the effects of various risk mitigation measures. other activities could consider other criteria and a more complete multi-criteria decision analysis (mcda) methodology to identify the preferences of various stakeholder groups. the stakeholders’ preferences should then be collected with the help of focus group discussions and decision-making experiments. another important issue is to validate the results and develop policy recommendations through various methods of participatory governance. this could include key informant interviews (face-to-face or telephone), in collaboration with local actors, to collect a set of narratives from relevant stakeholders (developers, public, and public authorities at a local scale, local experts, ngos, as well as enterprises). using qualitative data analysis software, such as nvivo and atlas.ti, we could analyse the narratives to identify dominant and oppressed discourses in both sectors. applying the lens of the theory of plural rationalities, we could identify differences in view about the covid-19 issues in botswana, as well as areas of conflict. the interviews and a literature review help to adjust the weights and the valuations and to have a more thorough discussion regarding the preferences and input data in the model. a. a seir model for botswana in epidemiology, so-called seir (or sir) models are very commonly used to represent the spread of disease in a population. based on our review of comparative studies of various simulation models [63], and our conclusions of the available measures within the botswana context described in section iii, we opted to use the seir model. the seir model provided the necessary flexibility to assess spread including the selection bias in testing, which can contribute towards a better accuracy on unreported cases that in turn established the necessary projections for measures on the untested infectious part of the population. furthermore, the seir model with its visualisation and modelling extensions has been used in several countries and states that have similar socio-economic environments and health care systems to botswana. the population is divided into three (or four) compartments; susceptible (s), exposed (e), infected (i), and recovered (r), and in some models also dead (d). in these models, a system of coupled differential equations governs the flows between the different compartments over time, people becoming infected move from s to i and people who recover (or die) move from i to r. system dynamics is a natural choice for implementing models simulating transmission processes, since the methodology presupposes a holistic approach and focuses on how the parts in the system affect each other with reinforcing or balancing feedback loops [40‒41]). a common seir model operates on the following parameters: individual mortality; disease spread rate; recovery rate and the mean infection time, rate of movement from the exposed class to the infectious class and the mean latency period, and the basic reproduction r0 [42]. to model an age-specific spread of covid-19 in the population, the four compartments (s, e, i, and r) are divided into three age groups 0‒14 year old, 15‒64 year old, and 65+ year old. these age groups are then further divided into a nonrisk group and a risk group. the transmission rate of the virus is governed by a timedependent infectiousness (seasonality), age-group specific infectivity, a severity-specific reduction (undetected, mild, severe, and critical) of infectivity, an age-group specific 3x3 contact matrix, and two physical distancing measures: quarantine and contact reduction. the seasonality of the virus reflects that the virus is more infectious during a specified period. the seasonality is based on three parameters: the period, peak day (the day of the year with the highest infectiousness), and the amplitude of the seasonality. see fig. 1 for an overview of the model and its parameters. during planning for intervention measures against outbreaks of pandemics, various computer-based support tools are commonly used. for instance, in sweden, the national board of health and welfare has supported research and development of a decision support tool to complement the individual-based, total population model microsim [43]. the a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 52 december 2022 international journal on advances in ict for emerging regions primary requirements for tools of this kind were that they should support scenario analysis, i.e. to run “what-if” experiments and that the tools should be implemented quickly and be easy to adapt. during the latter decade, various simulation environments have emerged, such as anylogic, enabling swift usage of generic seir modelling which has been employed in some recent studies, including studies of the corona sars-cov-2, mers, and the zika virus [44‒45]. there are several studies investigating specific performance aspects of interventions against pandemics but they are most often limited to a single scenario, as well as seldom being designed to explicitly acknowledge the inherent uncertainties in both simulation results and scenario likelihoods. we have previously applied a dynamic multicriteria decision analysis approach to synthesise outcome predictions from multiple models and explicitly elicit and imbed stakeholder preferences into decision recommendations [46]. utilising such an approach, dynamic, comprehensive, and transparent decision-making is supported. for the epidemiologic model adjusted to botswana’s case, the input data requires the following country-specific information: 1. population size in country/region/city divided into age groups. 2. households, age distribution, health sector capacity. 3. morbidity in the population per age group (for instance presence of risk factors in each age group). 4. number of confirmed cases per day, divided per age group and case severity (like intensive care and hospitalisation). 5. number of tested people, number of positive cases, and deaths from covid-19. 6. for increased granularity, number of average contacts (with other people) per day for each age group and with other age groups (for instance, school kids mostly interact with other school kids). 7. current medical system capacity (no. of icu beds, ventilators, medication, testing capacity) and estimated ability to increase it (how much and in what timeframe). 8. population access to personal hygiene, including water, soap, disinfectants, and face masks. we could, e.g., consider one of three alternative sets of measures to contain the spread of the sars-cov-2 virus, as they had at the time of this modelling been (1) already implemented by countries, as described above in the available measures section and monitored by the oxford covid-19 government response tracker (oxcgrt) [47] (2) modelled by the imperial college london team of ferguson et al. [6] and (3) modelled in other african countries. after the data collection phase, it should be established which set of measures could be applied, considering the available level of specificity for each set, as well as botswana’s characteristics and capacity. for instance, a measure aiming to reduce the elders’ social contact might not be effective in protecting the vulnerable categories in botswana, as the population is generally younger than in the uk or italy. a measure relying on the extensive use of face masks again depends on the availability of such on the market and on the country’s capacity to invest in their rapid production, as well as on their affordability once they are on the market. a realistic set of measures should of course be chosen for an integrated model. therefore, the alternative sets of measures considered are the following: measure set (1): level 1: only pharmaceutical measures and case isolation level 2: measures from level 1 and personal protective measures (stay home when sick, wash hands, observe prudent respiratory etiquette, clean frequently touched surfaces daily, use face masks), mild social distancing measures (large public gatherings banned, work from home where possible, social distancing recommended, possible social network-based distancing strategies [48]) level 3: measures from level 2, but with social distancing imposed, including a partial lockdown – schools, universities, restaurants and large shopping centres are closed. people can still go out for their basic necessities, work, use public transport ‒ partial lockdown (based on austrian and romanian models) [49] level 4: full lockdown, when everything is closed and people are not allowed to go out or are not allowed to go out after a certain time of the day, such as having a curfew after 6 pm (based on models from some cities in romania and russia as well as in jordan) measure set (2) [50]: level 1: an unmitigated epidemic – a scenario in which no action is taken. level 2: mitigation including population-level social distancing –aiming at a uniform reduction in the rate at which individuals contact one another, short of complete suppression. level 3: mitigation including enhanced social distancing of the elderly – as level 2 but with individuals aged 70 years old and above reducing their social contact rates by 60%. level 4: suppression – exploring different epidemiological triggers (deaths per 100,000 inhabitants) for the implementation of wide-scale intensive social distancing (modelled as a 75% reduction in interpersonal contact rates) with the aim to rapidly suppress transmission and minimise near-term cases and deaths. measure set (3): level 1: sectors permitted: all sectors open. transport restrictions: all modes of transport allowed, with stringent hygiene conditions in place. movement restrictions: interprovincial movement allowed, with restrictions on international travel. level 2: sectors permitted: construction, all other retail, all other manufacturing, mining, all other government services, installation, repairs and maintenance, domestic work and cleaning services, and informal waste-pickers. transport restrictions: domestic air travel restored, car rental services restored. movement restrictions: movement between provinces at level 1 restrictions. 53 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 level 3: sectors permitted: licensing and permitting services, deeds offices and other government services designated by the minister of public service and administration, take-away restaurants and online food delivery, retail within restricted hours, clothing retail, hardware stores, stationery, personal electronics, and office equipment production and retail, books, and educational products, e-commerce and delivery services, clothing and textiles manufacturing (at 50% capacity), automotive manufacturing, chemicals, bottling, cement and steel, machinery and equipment, global business services, construction, and maintenance. transport restrictions: bus services, taxi services, e-hailing, and private motor vehicles may operate at all times of the day, with limitations on vehicle capacity and stringent hygiene requirements, limited passenger rail restored, with stringent hygiene conditions in place, limited domestic air travel, with a restriction on the number of flights per day and authorisation based on the reason for travel. movement restrictions: no inter-provincial movement of people, except for transportation of goods under exceptional circumstances (e.g. funerals). level 4: sectors permitted: all essential services, plus food retail stores already permitted to be open permitted may sell a full line of products within the existing stock, all agriculture (horticulture, export agriculture including wool and wine, floriculture and horticulture, and related processing), forestry, pulp and paper, mining (open cast mines at 100% capacity, all other mines at 50%), all financial and professional services, global business services for export markets, postal and telecommunications services, fibre optic and it services, formal waste recycling (glass, plastic, paper and metal). transport restrictions: bus services, taxi services, e-hailing and private motor vehicles may operate at all times of the day, with limitations on vehicle capacity and stringent hygiene requirements. movement restrictions: no inter-provincial movement of people, except for transportation of goods under exceptional circumstances (e.g. funerals). level 5: sectors permitted: only essential services. transport restrictions: bus services, taxi services, e-hailing and private motor vehicles may operate at restricted times, with limitations on vehicle capacity and stringent hygiene requirements. movement restrictions: no interprovincial movement of people, except for transportation of goods under exceptional circumstances (such as funerals). b. results from the seir model it should be emphasised that the model we have used in this framework is quite simple despite there being a large number of models around. there are nevertheless strong reasons to keep as much as possible as simple as possible. the more input parameters we have, the more diffuse everything becomes if we cannot make them local due to the already enormous state space. the big challenge here is rather to get the input data realistic since there are still many critical uncertainties with covid-19 and models with higher complexity than the training and validation data should be used very sparingly as decision bases. in the example simulation (in anylogic 8) below, the input parameters were the following (see fig. 1): infected (days): number of days an individual is infected and infectious. exposed (days): number of days between an individual gets infected and becomes infectious. infectivity 0‒14: a parameter used to calibrate the risk of people in age group 0‒14 getting infected. infectivity 15‒64: a parameter used to calibrate the risk of people in age group 15‒64 getting infected. infectivity 65+: a parameter used to calibrate the risk of people in age group 65+ getting infected. amplitude: the amplitude of the seasonality. peak day: the day with the highest infectiousness during the year (in days from january 1) [51]. infectivity (% of infectiousness) the reduction in % of infectiousness for undetected, mild, severe, and critical cases. population: the total population. % of the total population 0‒14: age group 0‒14’s share of the total population. 15‒64: age group 15‒64’s share of the total population. 65+: age group 65+’s share of the total population. 0‒14 rg: the share of people in the age group 0‒14 who belong to a risk group. 15‒64 rg: the share of people in the age group 15‒ 64 who belong to a risk group. 65+ rg: the share of people in the age group 65+ who belong to a risk group. quarantine (% of days infected) the % of the infected period for undetected, mild, severe, or a critical case in quarantine. severity profile the share of each age group who are undetected, mild, severe, or critically infected. period (1 or 2): checkbox used to enable the policy. year (2020 or 2021): the year the policy should be enabled. start day: the start day of the policy (day of the year). end day: the end day of the policy (day of the year). a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 54 december 2022 international journal on advances in ict for emerging regions fig. 1 input values to the seir simulation model: general parameters, infectivity, demographics and risk groups, social distancing, quarantine days and severity profiles 55 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 the results from the basic assumptions are provided in fig. 2 below. this is, however, based on an incomplete data set that must be adjusted and adapted to different regions, in particular since sars-cov-2 does not seem to behave like, e.g., seasonal influenza, but is acting more local in comparison. the particular conditions in botswana cannot really be compared in a simple way, and the micro and meso perspectives must play an important role. fig. 2 output from the seir simulation model for botswana: covid-19 undetected and detected incidences c. further socioeconomic modelling aspects for the socioeconomic analysis information about households, age distribution, health sector capacity and other input factors are needed: 1. a complete economic input-output table (should be in a format similar to tables from eurostat) for the economy of botswana. the sector classification can be different. 2. national accounting data (including sector accounts with non-financial balance sheets and government statistics). 3. some data that can be used as a proxy for the sectorial demand shock due to covid-19 (the number of unemployed due to the lockdown as a proxy). 4. population access to the internet, divided into age groups and occupation if possible (to see where and if remote work can be used). 5. educational system data, including lost school time, test score outcomes, how many are affected, what kinds of long-term effects and what the mitigation plans are and known effects. 6. population at risk of poverty and informal economy size. 7. baseline criminality rates (thefts and domestic violence in particular). 8. mitigation measures that have been in effect and others that are being considered. 9. communication strategy for covid-19 information. additionally, business demographics data would be useful but is not absolutely essential. an initial rough evaluation of the number of fatalities, costs and effects of 3‒4 categories of mitigation scenarios could be a starting point. this can be an initial step to produce an estimate of how many lives in botswana can be saved and what will be the direct shortand long-term costs of risk mitigation measures. a multi-criteria decision analysis should include collected data following a criteria setup that is subject to refinement when gathering more available evidence: 10. epidemiological and healthcare systems: direct fatalities, indirect fatalities; 11. economic aspects: short-term costs, unemployment, taxes, specific industries affected, growing industries; 12. social and behavioural aspects: criminality rates, domestic violence, mental health, education and training, social division, trust in government; 13. environmental: climate change, pollution; 14. long-term resilience: remote work and education, improving prevention and hazard response, social inclusion and coping with loneliness; 15. political: risk of shortand long-term abuses, citizen dissatisfaction. a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 56 december 2022 international journal on advances in ict for emerging regions d. multi-criteria decision modelling and analysis a multi-criteria decision analysis (mcda) framework should be supported by elaborated decision analytical tools and processes: a framework for elicitation of stakeholder preferences, a decision engine for strategy evaluation, a set of processes for negotiation, a set of decision rule mechanisms, processes for combining these items and various types of implementations of the above. these components apply to decision components, such as: agenda settings and overall processes, stakeholders, goals, strategies/policies/sub-strategies/part-policies, etc., consequences/effects, qualifications and sometimes quantifications of the components, negotiation protocols and decision rules and processes. a multitude of methods for analysing and solving decision problems with multiple criteria and stakeholders have been suggested during the last decades. a common approach is to make preference assessments by specifying a set of attributes that represents the relevant aspects of the possible outcomes of a decision. value functions are then defined over the alternatives for each attribute and a weight function is defined over the attribute set. one option is to simply define a weight function by fixed numbers on a normalised scale and then define value functions over the alternatives, where these are mapped onto fixed values as well, after which these values are aggregated and the overall score of each alternative is calculated. one of the problems with the additive model as well as other standard multiple criteria models is that numerically precise information is seldom available, and most decision-makers experience difficulties in entering realistic/real-life information when analysing decision problems, as they are faced with the elicitation of exact weights that demand an unreasonable exactness which does not exist. the common lack of reasonably complete information increases this problem significantly. several attempts have been made to resolve this issue. methods allowing for less demanding ways of ordering the criteria, such as rank orderings or interval approaches for determining criteria weights and values of alternatives, have been suggested, but the evaluation of these models is sometimes quite complicated and difficult for decision-makers to understand and accept. some main categories of approaches to remedy the precision problem are based on capacities, sets of probability measures, upper and lower probabilities, interval probabilities (and sometimes utilities), evidence and possibility theories, as well as fuzzy measures. the latter category seems to be used only to a limited extent in real-life decision analyses since it usually requires a significant mathematical background on the part of the decision-maker. another reason is that computational complexity can be problematic if the fuzzy aggregation mechanisms are not significantly simplified. for the evaluations in the decision support model, a method and software for integrated multi-attribute evaluation under risk, subject to incomplete or imperfect information should be used. the software used for our purposes originates from earlier work on evaluating decision situations using imprecise utilities, probabilities, and weights, as well as qualitative estimates between these components derived from convex sets of weight, utility and probability measures. to avoid some aggregation problems when handling set membership functions and similar, we introduced higher-order distributions for better discrimination between the possible outcomes [52]. for the decision structure, we use a common decision tree formalism but refrain from using precise numbers. to alleviate the problem of overlapping results, we suggest a new evaluation method based on the resulting belief mass over the output intervals, but without trying to introduce further complicating aspects into the decision situation. during the process, we consider the entire range of values as the alternatives presented across all criteria as well as how plausible it is that an alternative outranked the remaining ones, and thus provided a robustness measure. because of the complexity of these calculations, we use the state-of-the-art multi-criteria software tool decideit 3.0 for the analysis, which allows for imprecision of the kinds that exist in this case. decideit is based on patented algorithms [53] and several versions have been successfully used in a variety of decision situations, such as large-scale energy planning [54], allocation planning [55], demining [56], financial risks [57], gold mining [58] and many others [59]. as mentioned above, a problem with most models for criteria rank ordering is that numerically precise information is seldom available. we have solved this in part by introducing surrogate weights [60]. this, however, is only a part of the solution since the elicitation can still be uncertain and the surrogate weights might not be a fully adequate representation of the preferences involved, which of course, is a risk with all kinds of aggregations. to allow for analyses of how robust the problem is to changes of the input data, we also introduced intervals around the surrogate weights as well as around the values of the options. thus, in this elicitation problem, the possibly incomplete information was handled by allowing the use of intervals [61], where ranges of possible values are represented by intervals in combination with a surrogate. using the weighted aggregation principle, we combined the multiple criteria and stakeholder preferences with the valuation of the different options under the criteria surrogate weights. the results of the process were (i) a detailed analysis of each option’s performance compared with the others, and (ii) a sensitivity analysis to assess the robustness of the result. during the process, we considered the entire range of values as the alternatives presented across all criteria as well as how plausible it was that an alternative would outrank the remaining ones, and this provided a robustness measure. e. multi-criteria decision modelling and analysis with a co-creation process, we mean an adaptive and inclusive approach to participatory governance, based on the engagement and involvement of various stakeholder groups. it recognises human factors such as individual patterns of decision-making processes as well as cognitive and behavioural biases, institutional structures, perceptions of risks, benefits, and costs of various policy interventions as well as a need for compromise-oriented solutions to bring heterogeneity of views and a variety of voices. the involvement of stakeholders in decision-making processes and model development is essential for conforming to stakeholder requirements. for this, a number of techniques may be employed providing a value-oriented prioritisation to meet the demands and the environment of the stakeholders better than other techniques. we could then employ a preference-based approach, relying on techniques and models from the decision-analytic field aimed to elicit users’ values through 57 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 studying their preferences and gathering preferential data from several stakeholders or prospective users in order to reach a selection of features providing maximum value while within the resources available [62]. v. conclusions our approach is situated within the wider field of the social shaping of technology, a basic premise being that the transformation of technologies and technical systems is not determined by any scientific, technological or economic rationality. rather there is a wide range of social, political and institutional factors that interact in a systemic fashion to influence their development changing them into socially transformed information systems that assist us in making precise decisions. compared to other research of the pre-covid era our research advances the methodology of risk governance in conditions of severe uncertainty by including a multidisciplinary aspect and developing compromiseoriented solutions which applied inputs from various sciences. the previous pre-covid studies were based either on epidemiological background or on social or policy studies. our methodological framework allows for unifying both of these areas. it also addresses the long-term long-lasting risks on which scientifically available evidence is seldom as the majority of disaster risk reduction works focus on risks with immediate impacts. the result of an extended governmental project would define a blueprint on how decisions are made, implemented and scaled up and how data become information and their underlying technologies lead to novel delivery of the right decisions for the public. this approach is by default respecting the culture and new knowledge on how hybrid decision-related services can be introduced to a wider privatepublic partnership ecosystem. the social impact of a more innovative project could provide a more harmonised, mutually efficient interaction between administration and the greater public via the proposed technical means while addressing: real needs from real users addressed a better fit between problem and solution larger support for the proposed measure and more sustainable adoption new ideas and opportunities spotted, debated, and created such a project would garner insight into how to optimise hazard management options in relation to covid-19-induced hazards. stakeholders would become more aware of the availability of different management options regarding each of the pertinent hazards to their communities, as well as the impact of their preferences on risk management and on the broader society. this would probably facilitate improvements in the resilience also regarding future extreme hazard events, particularly in a multi-hazard context deliver effective solutions for a multi-stakeholder planning approach and strengthen policy coherence by identifying management options, thereby contributing to a more resilient region. the management options can be communicated with stakeholders that could also be used to gather feedback about how they recognise these options and determine the possible opportunities and constraints from their viewpoint. the 1 https://www.preference.nu/helision/ participatory approach of engaging different stakeholders would help to ensure the buy-in of stakeholders and encourage them to take on board the final results. societal actors at all levels can acquire rich and deep insights into how their actions and the actions of others contribute to the escalation or mitigation of extreme hazards. a common understanding of future challenges should be shared among different stakeholders. recommendations on how to develop optimal hazard management can help shed light on similar challenges faced now and in the future. a limitation of our model is that it does not support adequate calculations of trade-offs between different criteria. transparency furthermore must be considered when handling critical situations. should there be other types of mitigation measures and even social constructs so that underprivileged groups could be better protected? furthermore, when imposing hard mitigations, countries suffer from the socioeconomic effects of pandemics, increasing poverty and inequality. this must be discussed in advance among broader populations. we have already started to develop such tradeoff support features if which some have been implemented in the tool helision 1 , providing graphical support for such analyses.i further research includes developing automatized and interactive questionnaires so that respondents more directly will be able to see the results of their answers so they can be refined in real-time. another line of research will be to further develop interactive support for users to state preference structures in a way that is even better aligned to their “real” preferences, something that they might not even be aware of in advance. this might be done by sequences of questions for internal consistency checks. references 1. world health organization, “virtual press conference on covid-19 11 march 2020, url: https://www.who.int/docs/defaultsource/coronaviruse/transcripts/who-audio-emergencies-coronaviruspress-conference-full-and-final-11mar2020.pdf?sfvrsn=cb432bb3_2 2. f. amanant and f. krammer, “sars-cov-2 vaccines: status report”, immunity, volume 52, issue 4, pp. 583-589, 2020 3. k. g. andersen, a. rambaut, w. i. lipkin, e. c. holmes and r. f. garry, “the proximal origin of sars-cov-2” nature medicine, 25, pp. 450-452, 2020. 4. the research blog of iiasa nexus, “explaining the covid-19 outbreak and mitigation measures” mar 10 2020, url: https://blog.iiasa.ac.at/2020/03/10/explaining-the-covid-19-outbreakand-mitigation-measures/ 5. s. roberts, “embracing the uncertainties”, the new york times, april 7 2020, url: https://www.nytimes.com/2020/04/07/science/ coronavirus-uncertainty-scientific-trust.html 6. n. m. ferguson, d. laydon, g. nedjati-gilani et al., “report 9: impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand“, imperial college london (16-03-2020), url: https://www.imperial.ac.uk/media/imperialcollege/medicine/sph/ide/gida-fellowships/imperial-collegecovid19-npi-modelling-16-03-2020.pdf 7. d. ellsberg, “risk, ambiguity, and the savage axioms”, quarterly journal of economics. 75 (4): 643–669, 1961 8. k. timur, and c. sunstein, “availability cascades and risk regulation”, stanford law review, vol. 51, no. 4 (1999). 9. a. tversky, and d. kahneman, “availability: a heuristic for judging frequency and probability”, cognitive psychology, volume 5, issue 2, pages 207-232, 1973 10. r. b. morton, d. mueller, l. page, b. torgler (2015). "exit polls, turnout, and bandwagon voting: evidence from a natural experiment". european economic review. 77: 65–81. doi:10.1016/j.euroecorev.2015.03.012. https://www.preference.nu/helision/ https://www.who.int/docs/default-source/coronaviruse/transcripts/who-audio-emergencies-coronavirus-press-conference-full-and-final-11mar2020.pdf?sfvrsn=cb432bb3_2 https://www.who.int/docs/default-source/coronaviruse/transcripts/who-audio-emergencies-coronavirus-press-conference-full-and-final-11mar2020.pdf?sfvrsn=cb432bb3_2 https://www.who.int/docs/default-source/coronaviruse/transcripts/who-audio-emergencies-coronavirus-press-conference-full-and-final-11mar2020.pdf?sfvrsn=cb432bb3_2 https://blog.iiasa.ac.at/2020/03/10/explaining-the-covid-19-outbreak-and-mitigation-measures/ https://blog.iiasa.ac.at/2020/03/10/explaining-the-covid-19-outbreak-and-mitigation-measures/ https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-npi-modelling-16-03-2020.pdf https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-npi-modelling-16-03-2020.pdf https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-npi-modelling-16-03-2020.pdf a framework for covid-19 pandemic intervention modelling and analysis for policy formation support in botswana 58 december 2022 international journal on advances in ict for emerging regions 11. m. bar-hillel, “the base-rate fallacy in probability judgments”, acta psychologica. 44 (3): 211–233, 1980, doi:10.1016/00016918(80)90046-3. 12. c. sunstein, “probability neglect: emotions, worst cases, and law”, chicago law & economics, olin working paper no. 138, 2001 13. a. tversky, d. kahneman, “the framing of decisions and the psychology of choice”, science, 211 (4481): 453–58, 1981 14. m. e. turner, a. r. pratkanis, “twenty-five years of groupthink theory and research: lessons from the evaluation of a theory”, organizational behavior and human decision processes, 73 (2–3): 105–115, 1998 15. l. e. lindblom, “the handling of norms in policy analysis”, in abramovitz, moses; et al. (eds.), the allocation of economic resources: essays in honor of bernard francis haley, stanford, california: stanford university press, 1959. 16. h. simon, “models of bounded rationality behavioral economics and business organization”. vol. 2, mit press, cambridge, ma, 1982 17. d. kahneman and a. tversky, “prospect theory: an analysis of decision under risk”, econometrica, 47, 263‒291, 1979 18. b. chu, “what is 'dread risk' – and will it be a legacy of coronavirus?” independent, june 16 2020, url: https://www.independent.co.uk/news/business/comment/dread-riskcoronavirus-legacy-psychology-probability-economy-a9568696.html 19. p. slovic, “perception of risk”, science, 236(4799), 280–285, 1987 20. état de vaud, “hotline et informations sur le coronavirus”, url: www.vd.ch/coronavirus 21. gazzetta ufficiale della repubblica italiana, anno 161, numero 62, 9 marzo 2020, url: https://www.gazzettaufficiale.it/eli/gu/2020/ 03/09/62/sg/pdf 22. john hopkins unversity of medicine, coronavirus resource center, url: https://coronavirus.jhu.edu/map.html 23. a. france-presse, “financial crisis caused 500,000 extra cancer deaths, according to lancet study” the telegraph, 26 may 2016, url: https://www.telegraph.co.uk/news/2016/05/25/financial-crisiscaused-500000-extra-cancer-death-according-to-l/ 24. oecd, “the territorial impact of covid-19: managing the crisis and recovery across levels of government”, 10 may 2021, url: https://www.oecd.org/coronavirus/policy-responses/the-territorialimpact-of-covid-19-managing-the-crisis-and-recovery-across-levelsof-government-a2c6abaf/ 25. africa center for strategic studies, “mapping risk factors for the spread of covid-19 in africa”, april 3, 2020 (updated may 13, 2020) url: https://africacenter.org/spotlight/mapping-risk-factorsspread-covid-19-africa/ 26. m. gilbert, g. pullano, f. pinotti, e. valdano, c. poletto, p-y. boëlle, e. d'ortenzio, y. yazdanpanah, s. p. eholie, m. altmann, b. gutierrez, m. u. g. kraemer, v. colizza, “preparedness and vulnerability of african countries against importations of covid19: a modelling study”, the lancet, volume 395, issue 10227, march 14, 2020. url: https://www.thelancet.com/journals/lancet/article/ piis0140-6736(20)30411-6/fulltext 27. world health organization, “covid-19 in africa: from readiness to response” url: http://whotogo-whoafroccmaster.newsweaver.com/ journalenglishnewsletter/g65c7ca8gui 28. world health organization, “covid-19 situation update for the who african region: external situation report 8, 22 april 2020, url: https://apps.who.int/iris/bitstream/handle/10665/331840/sitrep_c ovid-19_whoafro_20200422-eng.pdf 29. london school of hygiene & tropical medicine, “strategies combining self-isolation, moderate physical distancing and shielding likely most effective covid-19 response for african countries”, 21 april 2020, url: https://www.lshtm.ac.uk/newsevents/news/2020/strategiescombining-self-isolation-moderate-physical-distancing-and-shielding 30. bbc, “coronavirus: why lockdowns may not be the answer in africa”, 15 april 2020, url: https://www.bbc.com/news/worldafrica-52268320 31. republic of botswana, “covid-19 relief fund”, url: https://www.gov.bw/covid-19-relief-fund 32. university of oxford, “oxford covid-19 government response tracker”, url: https://covidtracker.bsg.ox.ac.uk/stringency-scatter 33. i. scher, “taiwan has only 77 coronavirus cases. its response to the crisis shows that swift action and widespread healthcare can prevent an outbreak”, insider, url: https://www.businessinsider.com/coronavirus-taiwan-case-studyrapid-response-containment-2020-3 34. b. nussbaumer-streit, v. mayr, a. iulia dobrescu, a. chapman, e. persad. i. klerings, g. wagner, u. siebert, c. christof, c. zachariah and g. gartlehner, quarantine alone or in combination with other public health measures to control covid‐19: a rapid review, cochrane database of systematic reviews, 2020 url: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.cd013 574/full 35. j. howard, a. huang, z. li, z. tufekci, v. zdimal, h-m. van der westhuizen, a. von delft, a. price, l. fridman, l-h. tang, v. tang, g. l. watson, c. e. bax, r. shaikh, f. questier, d. hernandez, l. f. chu, c. m. ramirez and a. w. rimoin, “face masks against covid-19: an evidence review”, preprints, 10 april 2020, url: https://www.preprints.org/manuscript/202004.0203/v1?fbclid=iwar 3sy7tfiqwdj87y2w_are9lxvh6qyxan9i05za9bwrkku7arj1eeadmqw 36. l. tian, x. li, f. qi, q-y. tang, v. tang, j. liu, z. li, x. cheng, x. li, y. shi, h. liu and l-h. tang, “calibrated intervention and containment of the covid-19 pandemic”, arxiv, url: https://arxiv.org/abs/2003.07353 37. international monetary fund, “world economic outlook, april 2020: the great lockdown”, url: https://www.imf.org/en/publications/ weo/issues/2020/04/14/weoapril-2020 38. oecd, “oecd updates g20 summit on outlook for global economy”, 27 marsh 2020 (updated 15 april 2020), url: http://www.oecd.org/ newsroom/oecd-updates-g20-summit-onoutlook-for-global-economy.htm 39. t. britton, “basic estimation-prediction techniques for covid-19, and a prediction for stockholm” medrxiv, 15 april 2020, url: https://www.medrxiv.org/content/10.1101/2020.04.15.20066050v1.f ull.pdf 40. j. w. forrester, “principles of systems”. cambridge ma: productivity press, 1968 41. j.d. sterman, “system dynamics modeling: tools for learning in a complex world”. california management review. 2001;43(4), 2001 42. m.y. li and j.s. muldowney, “global stability for the seir model in epidemiology”, mathematical biosciences.vol.125, pp.155‒164, 1995 43. l. brouwers, m. camitz, b. cakici, k. mäkilä, p. saretok, “microsim: modeling the swedish population”, arxiv:0902.0901. url: http://arxiv.org/abs/0902.0901, 2009 44. p. shi, y. dong, h. yan, c. zhao, x. li, w. liu, m. he, s. tang, and s. xi. “impact of temperature on the dynamics of the covid-19 outbreak in china”, science of the total environment 728, 2020 45. j. jang, and i. ahn, “simulation of infectious disease spreading based on agent based model in south korea”, advanced science and technology letters vol.128, pp.53‒58, 2016. 46. t. fasth, a. talantsev, l. brouwers, a. larsson, “a dynamic decision analysis process for evaluating pandemic influenza intervention strategies”, ispor value in health, vol. 20(9), 2017 47. university of oxford, covid-19 government response tracker, url: https://www.bsg.ox.ac.uk/research/researchprojects/coronavirus-government-response-tracker 48. p. block, m. hoffman, i. j. raabe, j. beam dowd, c. rahal, r. kashyap, m. c. mills, “social network-based distancing strategies to flatten the covid 19 curve in a post-lockdown world”, arxiv:2004:07052, 2020, url: https://arxiv.org/ftp/arxiv/papers/ 2004/2 004.07052.pdf 49. radio românia internaţional, “romania enters total lockdown”, 25 march 2020, url: https://www.rri.ro/en_gb/romania_ enters_total_lockdown-2614312 50. p. gt. walker, c. whittaker, o. watson et al., “the global impact of covid-19 and strategies for mitigation and suppression”, imperial college london, 2020, url: https://www.imperial.ac.uk/media/ imperialcollege/medicine/sph/ide/gida-fellowships/imperial-collegecovid19-global-impact-26-03-2020v2.pdf 51. r. a. neher, r. dyrdak, v. druelle, e. b. hodcroft, and j. albert, “impact of seasonal forcing on a potential sars-cov-2 pandemic”, medrxiv, 13 feb. 2020, url: https://www.medrxiv.org/ content/10.1101/2020.02.13.20022806v1.full.pdf 52. m. danielson, and l. ekenberg, “an improvement to swing techniques for elicitation in mcdm methods”, knowledge-based systems, 2019. 53. m. danielson and l. ekenberg, “a support system for decision analysis”, google patents, wo2005117531a2, url: https://patents.google.com/patent/wo2005117531a2/en 54. n. komendantova, l. ekenberg, l. marashdeh, a. al-salaymeh, m. danielson, and j, linnerooth-bayer, “are energy security concerns https://www.independent.co.uk/news/business/comment/dread-risk-coronavirus-legacy-psychology-probability-economy-a9568696.html https://www.independent.co.uk/news/business/comment/dread-risk-coronavirus-legacy-psychology-probability-economy-a9568696.html https://www.gazzettaufficiale.it/eli/gu/2020/%252003/09/62/sg/pdf https://www.gazzettaufficiale.it/eli/gu/2020/%252003/09/62/sg/pdf https://www.telegraph.co.uk/news/2016/05/25/financial-crisis-caused-500000-extra-cancer-death-according-to-l/%2520 https://www.telegraph.co.uk/news/2016/05/25/financial-crisis-caused-500000-extra-cancer-death-according-to-l/%2520 https://www.oecd.org/coronavirus/policy-responses/the-territorial-impact-of-covid-19-managing-the-crisis-and-recovery-across-levels-of-government-a2c6abaf/ https://www.oecd.org/coronavirus/policy-responses/the-territorial-impact-of-covid-19-managing-the-crisis-and-recovery-across-levels-of-government-a2c6abaf/ https://www.oecd.org/coronavirus/policy-responses/the-territorial-impact-of-covid-19-managing-the-crisis-and-recovery-across-levels-of-government-a2c6abaf/ https://africacenter.org/spotlight/mapping-risk-factors-spread-covid-19-africa/%2520 https://africacenter.org/spotlight/mapping-risk-factors-spread-covid-19-africa/%2520 https://www.thelancet.com/journals/lancet/article/%2520piis0140-6736(20)30411-6/fulltext https://www.thelancet.com/journals/lancet/article/%2520piis0140-6736(20)30411-6/fulltext http://whotogo-whoafroccmaster.newsweaver.com/%2520journalenglishnewsletter/g65c7ca8gui http://whotogo-whoafroccmaster.newsweaver.com/%2520journalenglishnewsletter/g65c7ca8gui https://apps.who.int/iris/bitstream/handle/10665/331840/sitrep_covid-19_whoafro_20200422-eng.pdf https://apps.who.int/iris/bitstream/handle/10665/331840/sitrep_covid-19_whoafro_20200422-eng.pdf https://www.lshtm.ac.uk/newsevents/news/2020/strategies-combining-self-isolation-moderate-physical-distancing-and-shielding https://www.lshtm.ac.uk/newsevents/news/2020/strategies-combining-self-isolation-moderate-physical-distancing-and-shielding https://www.bbc.com/news/world-africa-52268320 https://www.bbc.com/news/world-africa-52268320 https://www.gov.bw/covid-19-relief-fund https://covidtracker.bsg.ox.ac.uk/stringency-scatter https://www.businessinsider.com/coronavirus-taiwan-case-study-rapid-response-containment-2020-3 https://www.businessinsider.com/coronavirus-taiwan-case-study-rapid-response-containment-2020-3 https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.cd013574/full https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.cd013574/full https://www.preprints.org/manuscript/202004.0203/v1?fbclid=iwar3sy7tfiqwdj87y2w_are-9lxvh6qyxan9i05za9bwrkku7arj1eeadmqw https://www.preprints.org/manuscript/202004.0203/v1?fbclid=iwar3sy7tfiqwdj87y2w_are-9lxvh6qyxan9i05za9bwrkku7arj1eeadmqw https://www.preprints.org/manuscript/202004.0203/v1?fbclid=iwar3sy7tfiqwdj87y2w_are-9lxvh6qyxan9i05za9bwrkku7arj1eeadmqw https://arxiv.org/abs/2003.07353 https://www.imf.org/en/publications/%2520weo/issues/2020/04/14/weo-april-2020 https://www.imf.org/en/publications/%2520weo/issues/2020/04/14/weo-april-2020 http://www.oecd.org/%2520newsroom/oecd-updates-g20-summit-on-outlook-for-global-economy.htm http://www.oecd.org/%2520newsroom/oecd-updates-g20-summit-on-outlook-for-global-economy.htm https://www.medrxiv.org/content/10.1101/2020.04.15.20066050v1.full.pdf https://www.medrxiv.org/content/10.1101/2020.04.15.20066050v1.full.pdf http://arxiv.org/abs/0902.0901,%25202009 https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker%2520 https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker%2520 https://arxiv.org/ftp/arxiv/papers/%25202004/2%2520004.07052.pdf%2520 https://arxiv.org/ftp/arxiv/papers/%25202004/2%2520004.07052.pdf%2520 https://www.rri.ro/en_gb/romania_%2520enters_total_lockdown-2614312%2520%2520 https://www.rri.ro/en_gb/romania_%2520enters_total_lockdown-2614312%2520%2520 https://www.imperial.ac.uk/media/%2520imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-global-impact-26-03-2020v2.pdf%2520 https://www.imperial.ac.uk/media/%2520imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-global-impact-26-03-2020v2.pdf%2520 https://www.imperial.ac.uk/media/%2520imperial-college/medicine/sph/ide/gida-fellowships/imperial-college-covid19-global-impact-26-03-2020v2.pdf%2520 https://www.medrxiv.org/%2520content/10.1101/2020.02.13.20022806v1.full.pdf https://www.medrxiv.org/%2520content/10.1101/2020.02.13.20022806v1.full.pdf https://patents.google.com/patent/wo2005117531a2/en 59 ekenberg, l. et al. international journal on advances in ict for emerging regions december 2022 dominating environmental concerns? evidence from stakeholder participation processes on energy transition in jordan”, climate, 2018. 55. a. larsson, t. fasth, m. wärnhjelm, l. ekenberg, and m. danielson, “policy analysis on the fly with an online multi-criteria cardinal ranking tool”, journal of multi-criteria decision analysis, 2018:1– 12, 2018, https://doi.org/10.1002/mcda.1634. 56. l. ekenberg, t. fasth, and a. larsson, “hazards and quality control in humanitarian demining”, international journal of quality & reliability management 35(4), pp. 897–913. 2018, doi:10.1108/ijqrm-01-2016-0012. 57. m. danielson, m., and l. ekenberg, “efficient and sustainable risk management in large project portfolios”, proceedings of bir 2018 (17th international conference on perspectives in business informatics research), springer, 2018. 58. a. mihai, a. marincea, and l. ekenberg, l., “a mcdm analysis of the roşia montană gold mining project”, sustainability vol. 2015(7), pp. 7261–7288, doi:10.3390/su7067261, 2015. 59. l. ekenberg, k. hansson, m. danielson, g. cars, “deliberation, representation, and equity: research approaches, tools, and i https://www.preference.nu/helision/ algorithms for participatory processes”, 384p, isbn 978-1-78374304-9, open book publishers, 2017. 60. m. danielson, and l. ekenberg, “an improvement to swing techniques for elicitation in mcdm methods”, knowledge-based systems, 2019, https://doi.org/10.1016/j.knosys.2019.01.001. 61. m. danielson, l. ekenberg, and a. larsson, “evaluating multicriteria decisions under strong uncertainty”, to appear in a. de almeida, l. ekenberg, p. scarf, e. zio, m.j. zuo, multicriteria decision models and optimization for risk, reliability, and maintenance decision analysis recent advances, springer, 2022. 62. m. danielson, l. ekenberg, n. komendantova, a. al-salaymeh and l. marashdeh (2022) a participatory mcda approach to energy transition policy formation, to appear in a. de almeida, l. ekenberg, p. scarf, e. zio, m.j. zuo, multicriteria decision models and optimization for risk, reliability, and maintenance decision analysis recent advances, springer, 2022. 63. soumik purkayastha, rupam bhattacharyya, ritwik bhaduri, ritoban kundu, xuelin gu, maxwell salvatore, debashree ray, swapnil mishra & bhramar mukherjee a comparison of five epidemiological models for transmission of sars-cov-2 in india, bmc – part of springer, 2021 https://doi.org/10.1002/mcda.1634. https://doi.org/10.1016/j.knosys.2019.01.001