Microsoft Word - brAIN 3-4 Gabriela.doc 47 Browsing Semantic Data in Slovakia Ján Mojžiš Institute of Informatics, SAS, Bratislava, Slovakia upsyjamo@savba.sk Michal Laclavík Magnetic Media Online, New York, USA laclavik@magnetic.com Abstract Semantic data browsing is important task for open and governmental data in behalf of public control. There are many projects and solutions regarding semantic data browsing and navigation, but despite the fact, in Slovakia, the availability of such data is poor. It is a shame, because projects like National Action Plan of Open Government and the site data.gov.sk are already operating for several years. In this work we would like to point out key aspects of semantic data and detail the Slovak market of semantic data. We design and propose our solution of semantic data browsing, evaluate the implementation in our AGECRT NET tool. Keywords: semantic data, open data, navigation, visualization, RDF 1. Introduction Data on the Web are published in various formats. Either for common users, who tends to use popular web browsers in order to browse online data. In such case, browsers use HTTP protocol to browse HTML syntax formatted data. Other kind of data is semantic data. Here we can include semantic web formats like RDF. Actually, it is only a standard and the concrete implementation is then RDF/XML format. As for HTML data, also for RDF data, there are browsers. They are different browsers, but in general, serve the same goal of browsing. Nowadays, there is an initiative which press on publishers (the ones who put the data online) and encourage them to format their data in one of RDF formats. The benefits are apparent; machine readable format, clear syntax, suitable for batch and bulk machine processing. Here, the data is separated from the content. For comparison, the data in HTML format have the data mixed with the content. RDF format contains no content, but the data alone. We have found that most simple RDF format is RDF/N-Triple. The USA governmental project data.gov hosts datasets of different departments and states of the USA. However, while the USA governmental project Data.gov hosts a decent volume of 189,998 datasets, its Slovak counterpart data.gov.sk appears just as a poor cousin, offering “only” 624 datasets. Beside of that, the potential for Slovakia is promising, yet it seems to have completely missed the chance. Like U.S. governmental data.gov website service, which was launched in late May 2009, Slovak counterpart, data.gov.sk, is controlled by the Open Government Partnership Action Plan of the Slovak Republic (OGSK)1. The intention is common for both projects, to publish governmental open data. Yet data.gov.sk is missing several vital governmental datasets. In comparison to data.gov.sk, which grew from 47 datasets2 to more than 188,000 in just 6 years3, its Slovak counterpart, data.gov.sk, available from 2012, contains only 624 datasets. Because data.gov.sk was proposed as part of OGSK1, all governmental institutions (including cities) should participate with publishing. To 1 Open Government Initiative Action Plan of the Slovak Republic (2012). Retrieved from http://www.otvorenavlada.gov.sk/data/files/1853_ogp-action-plan-slovakia.pdf 2 Data.gov Tunns Six! (2015). Retrieved from https://gsablogs.gsa.gov/gsablog/2015/05/26/data-gov-turns-six/ 3 Data.gov datasets. (2015). Retrieved from http://catalog.data.gov/dataset BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 48 illustrate the clumsiness of official institutions with data.gov.sk, there is a city of Prešov, a bright example of initiative for open data, has its datasets published with data.gov.sk. Here we can find street names dataset, school-like institutions or statistics about citizen counts. That would be great, but Prešov is the only city, actually publishing with data.gov.sk, to the day of writing this work. In Slovakia, there are 8 countries, each with 1 regional capital (including a city of Prešov), so 7 regional capital cities are still absent in data.gov.sk. Regarding OGSK, the measure of its fulfillment can be verified using either report by independent research institution (Kurian, 2013) or that by the government itself4, although reports seems a bit outdated (last reports are dated to 2013, based on year period, 2012/2013, completely missing 2013/2014 and recent). But based on (Kurian, 2013), prior to the publication, only 8 out o 22 commitments was completed. To compare with Czech Republic, a close neighbor to Slovakia, Czech government supports SPARQL endpoints for structured data querying through their governmental portal portal.gov.cz5 as well as they support bulk download. Slovak Insurance Agency (SIA) is publishing its debtors lists composed of natural persons in CSV format6, which is a kind of structured, machine-readable format. Although SIA is governmental, its dataset is not included in data.gov.sk. Interestingly, another governmental institute, the Všeobecna Zdravotná Posťovňa (VsZP), governmental health institution, also publishes its debtors online, but in comparison to SIA, this dataset, actually, is listed under data.gov.sk7. What is important to say, that there are no legal obstacles in laws in Slovakia, the ones, which would prevent institutions from publishing with data.gov.sk. Processing of personal data in Slovakia is regulated by an Act No. 122/2013 Coll. on Protection of Personal Data and on Changing and Amending of other acts, resulting from amendments and additions executed by the Act. No. 84/2014 Coll. Here, Section 10, Paragraph 3, Point e clearly states, that (personal) information made public by controller may be processed legally and that information is, for example, also SIA’s debtors list. Therefore, it should have been included in data.gov.sk. Problems and delay, regarding the process of data publication, is often caused by departments alone. From OGSK we can see that there is an action plan of data publishing, which involves various departments and kinds of data8. A worth of notice, however, is that there are no debtors datasets listed neither for VsZP nor SIA. But from this report, it is clear, that the Slovak Business Register (SBR), which lists all legal and natural persons, who actually do (legal) business in Slovakia, its dataset publication is only partial and, it is still under the subject of negotiation. Moreover, the only format that user is able to get out of SBR is HTML, which complicates the steps of further machine processing, for instance, person titles are cast into the name field together with surname and the address can be noted in many different ways (with postal no., without it, city first, or street first, street numbering variability, abbreviations and etc.). Even that the data gets published and bulk download is present, the data itself is in inappropriate format. For example Financial department of Slovakia also publish debtors 4 Report on Action Plan fulfillment for years 2012 and 2013. (2013). Retrieved from http://www.otvorenavlada.gov.sk/hodnotenie-plnenia-uloh-z-akcneho-planu-ogp-pre-roky-2012-a-2013/ 5 Dataset of Czech Administration of Social Insurance. (2014). Retrieved from https://portal.gov.cz/app/RejData/rec.jsp?id=1646070&id_rej=97898&y=2015&m=11&doctype=idx 6 Debtors list of Social Insurance Agency in Slovakia, in CSV format. (2015). Retrieved from http://www.socpoist.sk/index/open_file.php?file=dlznici/2015-11-06_SP_dlznici_CSV.zip 7 Health Insurance Agency Všeobecká zdravotná poisťovňa, listed under data.gov.sk. (2015). Retrieved from http://data.gov.sk/dataset?_organization_limit=0&organization=vseobecna-zdravotna-poistovna 8 Dataset listing of Slovak government. (2012). Retrieved from www.otvorenavlada.gov.sk/data/files/2651_statna-sprava-datasety.xls J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 49 lists9, but offer only PDF formatted files on its website page. So do commercial health insurance agency UNION10, which also has its debtors, but again, only PDF is available. We wonder such wastefulness, because an effort was clearly present; export to PDF. But how much more work it would take to provide CSV for instance? For the field of statistics information publishing, there is Statistical Office of the Slovak Republic (SLOVSTAT)11. It collects (local) macro-economical, social and other statistics, very similar to Eurostat12. But in comparison to Eurostat, it does not offer the bulk downloading function, so whole datasets can only be browsed manually via their on-line web interface. Eurostat datasets, however, could easily be downloaded and then analyzed and processed. The commercial, non-profit or government-independent organizations or projects in Slovakia, are partially serve as complements to, somewhat paralyzed, governmental institutions, which rather do their bureaucracy, strictly following various acts and are backside facing towards citizens. We can name foaf.sk, a project intended to online SBR data browsing and the collection of personal and business data out of public available datasets (like SIA and VsZP). Other projects include vorsr.sk, Fair-Play Alliance13, Transparency International Slovakia14 and, perhaps, Slovak Open Data Initiative15. We shall discuss them further. In this paper we would like to point out the situation with semantic data availability in Slovakia and suggest improvements. As it seems, that official governmental institutions tends rather to not publish their data openly, user needs to navigate through inappropriate HTML formatted pages and other projects emerges lame, we propose our solution to semantic data browsing. But the lack of vital SPARQL endpoints and bulk downloads forcing us into, normally avoidable, workarounds. Here inaccuracy may occur during information extraction, while parsing HTML via scripts or regular expressions. In any case, we are able to browse the data of SBR, refine original structure of data as much as possible and create graphs. We are able to provide graph visualizations in advance. SBR dataset is the first we have picked, because the lacking of sufficient solutions is noticeable. As we have already stated, SBR contain all legal and natural persons, who do business in Slovakia and in comparison to the Trade Register of the Slovak Republic, it contains connections essential for social network creation. 2. Related Work In this section we discuss projects, solutions and services which aim to semantic data browsing improvement, either supplying relationship visualization, data structure refining or information extraction, kind of added value. Thus, we exclude data.gov.sk, SBR, VsZP, SIA and other governmental institutions. Dokulil and Katreniaková suggest visualize for navigation in RDF data based on user’s mental map. They are able to reduce graph by removing nodes or a whole subtree. Animation is available during restructuring operation (Dokulil & Katreniaková, 2008). 9 Debtors list of Finance Department of Slovak Republic in PDF format. (2015). Retrieved from https://www.financnasprava.sk/sk/elektronicke-sluzby/verejne-sluzby/zoznamy/zoznam-danovych-dlznikov 10 Debtors list of Health insurance agency UNION in PDF format. (2015). Retrieved from http://www.union.sk/documents/51150/Zoznam_dlznikov_FO 11 http://slovak.statistics.sk/ 12 Bulk download of data published by Eurostat. (2015). Retrieved from http://ec.europa.eu/eurostat/data/bulkdownload 13 Fair-Play Alliance Slovakia. (2015). Retrieved from http://www.fair-play.sk/ 14 Transparency International Slovakia. (2015). Retrieved from http://www.transparency.sk/en 15 Open Data Initiative. (2015). Retrieved from http://opendata.sk/liferay/o-nas BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 50 Slovak project foaf.sk is intended to SBR data browsing primary. With the help of regular expressions and with the public-available data of SBR, authors extracted information and created graph. As we have mentored above, SBR does not publish its data in any structured format, nor does it offer bulk download and the data is missing on data.gov.sk also. Authors thus need to parse the HTML, filter-out the content and refine the data on their own. They have handled the duplicate entries and missing person’s unique identifiers like personal number (regarding personal privacy see aforementioned 122/2013), so they had used as much as name and address. In their task of relationship discovery, they use spread activation and offer full-text search with Sphinx tool (Suchal & Vojtek, 2009). Graph visualization is powered by Adobe Flash. Foaf.sk, originally founded as a non-profit initiative of individuals, proud of graph visualization, later sold-off, we have found, that now (Nov/5/2015), it is missing its original visualization, stagnating and, perhaps, slowly started to decline. Only a brief report about reconstruction and plans for visualization is provided, but 5 months already, there is a no change. M. Laclavík proposes a set of tools for information extraction (Laclavík et al., 2012), annotation and graph creation (Laclavík et al., 2010).. Ontea is a text annotation tool with the use of ontology and regular expressions. It can work in a cluster with MapReduce, extracts information from text, creating objects and, finally, semantic trees. The result is a graph, which can be searched with their next tool gSemSearch (former Email Social Network Search Prototype). To perform a relationship discovery task, gSemSearch is using spread activation (similar to foaf.sk). Spread activation is computed upon their database store SGDB, where they also perform graph traversal. In his former works (Laclavík at el., 2014), (Laclavík et al., 2011), (Laclavík et al., 2012), (Laclavík et al., 2011), he further use this concept and he performs distributed parallel computing with MapReduce only in initial stage, where he extracts and annotate entities needed for graph creation. This project, however, seems dead, as the graph is not visualized and to this day (Nov/5/2015) the official gSemSearch tool website is unavailable16. Another SBR browsing project and service is vorsr.sk, a virtualization of SBR with a great potential. Now, only a shell, an encapsulation or SBR, there is only Google search box and just a few direct links to SBR official website. In Czech Republic, for comparison, there is one visualization website, to serve as a kind of visual SBR browser, obchodni- rejstrik.podnikani.cz, with which, users can, still, see visualizations and find is appealing and functioning. Fair-Play Alliance (FPA) and Transparency International Slovakia (TIS) are non- governmental organizations, aimed mainly to the fight against corruption. They publish various reports and offer data browsing. Although FPA provides data browsing through their Datanest service17, search is supported and the menu is rich, again, no bulk download is possible and so neither is querying with SPARQL. The results of TIS are important and their purpose is reasonable. Manly they publish particular reports about corruption, statistics and audits. They serve as public control, but they do not focus on structured data availability, like SPARQL or structured data bulk download. An initiative for Open Data is a project, “a group of people, which wants to carry through a plan for open and modern open government.” On their site18, however, it is 16 GSemSearch tool, search online for relations in Enron Email corpus. (2013). Retrieved from http://try.ui.sav.sk:7070/enron/gSemSearch.html. Address now unavailable (2015) 17 Lawyer employee register by Fair-Play Initiative. (2015). Retrieved from http://datanest.fair- play.sk/datasets/56#/data 18 Open Data Initiative. (2015). Retrieved from http://opendata.sk/liferay/o-nas J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 51 markedly outdated with last news dated from 2012. Again, in comparison to opendata.cz in Czech, Slovak counterpart is rather dull with several links even dead. And this initiative shows a bit of fragmentation, being distributed, beside its official website opendata.sk, across group Facebook19 and another website Utópia20, also. On one of their branching sites, Utópia, we find more recent updates, but the effect of the initiative is rather unfilled. Also, the capital city of Bratislava supports various projects among which, a project about dataset publishing in standardized structured format was pending. But, unfortunately, this project was not supported21. Despite that, the datasets are now available22, but again, neither structured data nor bulk downloading is provided. Similar to data.gov.sk is Czech counterpart opendata.cz, which is but an independent initiative for open data publishing, with but a few datasets. Abroad, we can find RelFinder. It is, perhaps, one of the most promising and known relationship discovery tools. The main difference from foaf.sk is its support of virtually any SPARQL endpoint (Linked Data publisher). RelFinder is capable of visualization as well. It is designed and created by P. Heim as a search and graph visualization tool. The solution is implemented as a web service and powered by Adobe Flex. The service allows user to specify SPARQL endpoint. RelFinder allows user to search for 2 vertexes for which the relationships are about to be discovered. To simplify search process, an auto-complete functionality is available, providing user with the list of matched entries. On the official website of the RelFinder23, there is an implementation and user can test the service and search for relationships. Among the visualization advantage, the data is fetched and presented online, no data is stored therefore the data is never outdated. Also the ability to support virtually any SPARQL endpoint makes RelFinder a universal tool. For instance, when searched for ”Bill Gates” and ”Virginia”, the main relationship is returned as ”United States”. RelFinder offers filtering as well (for example based on connectivity). A filtering is available; it hides vertexes which are simply too far above threshold. On the graph visualization, edges and vertexes are colored (based on filters selected in tabs). Layout is maintained with force-directed algorithm. Its code is published under GNU and hosted by Google Code24. Similarly to the web browsers, Semantic Web browsers (intended to browse Linked Data), browse the semantic data online. Semantic Web browser can connect various endpoints, does support SPARQL query language and combines various visualization techniques (facets, graph visualizations). Among many browsers, those well-known are DBpedia Mobile (Becker & Bizer, 2008), Exhibit (Huynh, Karger, & Miller, 2007), MSpace (Smith et al., 2005) or Tabulator (Berners-Lee et al., 2006) and RelFinder. They help with displaying often offering various visualization techniques like lenses (Furnas, 1986). But because in Slovakia, the situation with semantic data is poor, the use of linked data browsers in Slovak environment is discouraging. So, to conclude this section, many projects are launched and were executed already, but results are very poor, unification, which should involve data.gov.sk, is nowhere, rather fragmentation and distribution of the data is present, some websites are offline, others were never launched. Often a project, which was successfully launched actually, is just a one-shot 19 Facebook group of Open Data Initiative Slovakia. (2015). Retrieved from http://datanest.fair- play.sk/datasets/56#/data 20 Utópia, a branch of Open Data Initiative Slovakia. (2015). Retrieved from https://utopia.sk/liferay/home 21 City of Bratislava: Project about selected datasets publication; unsupported. (2014). Retrieved form http://pr.banm.sk/liferay/datasetybanm 22 Datasets of city of Bratislava; only in PDF format. (2015). Retrieved from http://zverejnovanie.banm.sk/ 23 RelFinder demo application. (2015). Retrieved from http://www.visualdataweb.org/relfinder/relfinder.php 24 RelFinder source code hosted on Google Code. (2015). Retrieved from http://code.google.com/p/relfinder/ BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 52 item, dissolving and eventually found dead. Together with a fact, that many governmental institutions still provide non-machine readable, structured data, the report on OGSK is a disappointment and Semantic Web standards like Notation 3 25 or RDF/XML Syntax Specification26 were published at least 10 years ago, we see the current status of semantic data publishing as a shame. And the potential there is, we are in the European Union, ready for Eurofunds collecting, we have action plans (OGSK), commissions and initiatives (Open Government Partnership OGP1), there are academic and research institutions like Slovak Academy of Sciences27 and Slovak University of Technology28 and more, yet results are just partial and effects are surely delayed. In order to evolve, we should always compare to such successful projects like data.gov, commercial-like obchodni-rejstrik.podnikani.cz or governmental portal of Czech Republic portal.gov.cz. 3. Our solution With the aim primary on unstructured information extraction and refining, relationship discovery and visualization, we propose our solution for SBR in the first place. The reason for this is, primary, that HTML formatted results of SBR are very jerky and uncertainty regarding the structure of information is very high. Readers can also be pointed by J. Suchal and P. Vojtek (2009), that care should be taken towards type errors. We discuss that later. In this work, we try to fill-up the gap of visualization and, somehow limited data access offered by SBR, adapting to the problems disclaimed above. We suggest a new client- side paradigm, which does not depend on a particular website like foaf.sk. Figure 1 describes the schema briefly and the key elements are parsers with other tools on the top and structured formats, for datastore, on the bottom. Figure 1. Schema of client-side application for semantic browsing. 25 Notation 3, W3C. (2011). Retrieved from http://www.w3.org/TeamSubmission/n3/ 26 RDF/XML Syntax specification. (2004). Retrieved from http://www.w3.org/TR/REC-rdf-syntax/ 27 Slovak Academy of Sciences. (2015). Retrieved from http://www.sav.sk/?lang=en 28 Slovak University of Technology in Bratislava. (2015). Retrieved from http://www.stuba.sk/english.html?page_id=132 J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 53 We do not consider incremental machine querying as the best choice, because it both burdens the server for the duration of the process and it creates copies of records, which should then be updated periodically and it also induce possible errors. However, as Suchal & Vojtek (2009), we see no alternation to regular expression usage, because HTML syntax is unavoidable part of the solution, when refining data of SBR. As our solution calculates with modules (wrapped as tools in Figure 1), we design solution modularly with plug-in possibility of other modules. Other modules may process different data sources from different hosts, but they should return results in common format. Each module should implement its own protocol, because of variability and particular purpose to serve. Each module should function as thread-safe implementation, using main thread idle time in order to keep user interaction seamless. Structured format may be any standard structured format like RDF/XML or N-Triple, or particular modules can use their own format, in which case a description of such format must have been given clearly. Client interacts with the application through interface, using functions of modules, loading or saving the data and performing tasks like visualization. A choice to store the data was made to ease of later usage and enable data sharing. Because the data may still be unavailable in machine-readable form (like SBR) and there was an effort to gather it (parsing, refining), we find this option useful. Although we are aware that the data may became outdated after a while. 4. Implementation In order to implement our solution we choose Java programming language as platform independent developing backend. The language and especially developing environment (Eclipse, Netbeans) are also platform independent and at minimum, Linux is supported, where .Net is only partially available. We design our GUI with the Swing library and for visualization, we use Jung graph library29, which is very flexible and offers rich customizability. Regular expressions are already built inside Java, so the only problem is actual writing of expressions. For the purpose of quick testing, we have developed a testing webpage, where we can evaluate regular expressions prior their implementation in the module. Because we focus on SBR and HTML output, we have found, that best suitable regular expression can be derived from “[^<>]”. Hence the negation tells the engine, that neither < nor > could be encountered during string evaluation. In SBR, there are common symbols like dash, parenthesis, numbers as well as diacritical marks like carons or acutes. 5. Data To evaluate our solution, implemented in AGECRT NET tool, we used SBR browsing module of AGECRT NET and have connected to SBR dataset. Although we do not have direct numbers, SBR contains around 86.500 firm records. SBR website does not offer statistics regarding their records count, but on its main page30, only update timestamps are displayed. Indirectly, however, it is possible to create an imagination of the size based on firm record identifiers. As J. Suchal states (Suchal, & Vojtek, 2009), they have harvested SBR dataset using continuous incrementing of (aforementioned) identifiers. We do not know if Suchal was aware of that, but identifiers (and thus firm records) are distributed across, at least, 8 different datasets. Parameters of GET URI include SID, which varies from 1 up to 7, so each firm ID must be queried with each different SID in order to ensure, that all firm records are searched. 29 Java Universal Network/Graph Framework, JUNG. (2010). Retrieved from http://jung.sourceforge.net/ 30 Slovak Business Register. (2015). Retrieved from http://www.orsr.sk/Default.asp?lan=en BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 54 SBR contains firm records which are composed of firm detailing items. Those items include Partners, Management body, Stockholders or Supervisory board. Additionally, there are items on Branch of the enterprise or Restructuring trustees. Items can be either for natural persons or legal persons and other firms. With the use of such items from firm details, a social network (of firms and persons) may be created. However, unlike Facebook, there are no direct person-to-person edges, rather person-firm and firm-firm type edges. There are 2 possible query types, we use, in general. Search for a firm. Here a firm name and address may be typed or its identification number supplied. Or search for a person, only name and surname are allowed. No address information is supported. Firm records are divided into 2 sets. One is actual, containing information actual to a generated timestamp. Another is full, marking historical information and changes also. 6. Results We have searched for firm “Váhostav”, which is a rather big firm in Slovakia, with many press articles published about31. On Figure 2 there is a browsing window, for SBR data results, displaying tabular structure, which was refined from SBR dataset by continuous querying. Resulting graph is rather complex. There are 175 vertices and 201 edges, which were created directly, containing 158 persons and 17 companies. As we see on Figure 3, some filtering methods are required in order to create suitable overview of relations. Figure 4 thus shows the visualization of the same graph, but with tens of vertices merged. Now it contains 22 persons and 17 companies. A merging was performed for clarification and is built inside the visualization module. A simple condition says that a merging is performed if persons are unique, thus if a person is connected only to 1 firm. More formally, person vertices are merged, if their vertex degree equals 1 (each). Figure 2. Results for firm name “Váhostav” are in table. Each row defines a firm with its name, identification number and address. Then a connection is specified (whether it be a person or another firm). 31 Press publications about firm Váhostav (in Slovak). (2015). Retrieved from https://www.google.sk/search?q=v%C3%A1hostav&oq=v%C3%A1hostav#q=site:sme.sk+v%C3%A1hostav J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 55 Figure 3. Graph created directly from the results. The usefulness of such visualization has its key points regarding connections. Thanks to SBR browsing module, we were able to get 22 firm records for “Váhostav” query. Between any 2 companies, connections may be (and often are) not bidirectional, so, in order to navigate through connections, we have refined all 22 records. Although, even being filtered, graph is still complex. And it is possible to further navigate and search for outgoing connections, for example firm “MERLIN TRADE, a.s.” on Fig.4 contains item on “Ján Kato”, which is already included in our graph and connected to “VÁHOSTAV-SK-DEVELOPEMENT” on bottom left side and “VÁHOSTAV-SK, a.s.” in the center. Edge coloring and drawing is helpful with overlapped edges. For methods of visualization, including coloring, we refer to studies of H. Omote and K. Sugiyama (2006), and I. Herman, G. Melanon, and M. S. Marshall (2000) or our study on graph clutter filtering and connectivity distance (Mojzis & Laclavik, 2014). We also use edge coloring and drawing methods to ease edge following and to help with overlapping. Person “Ján Kato” is connected with 2 different colors. It is thus possible to follow edge from its start to end. Or “Oľga Kmecíková”, who is connected with 4 edges, each BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 56 one colored with different color and pattern. Edge coloring and patterning is helpful as we have already proposed in our previous work (Mojzis & Laclavik, 2014). Figure 4. After merging, overview is more transparent. Tens of persons were merged together into clusters in order to clarify the visualization. Firms and persons are recognized based on their icons. Comparing to Suchal (Suchal & Vojtek, 2009), we use city name as additional identifier even during visualization. We do not share his suggestion “If two persons with the same name, but different address occur inside one firm, consider that as one person”. There is a record of “Váhostav-SK, a.s.” with 2 persons, namely both with the same name “Juraj Široký” but with different addresses. Although the street is the same, numbering is different (8137/137 vs. 137). This is rather a complex problem and, as it, is very suitable for the field of machine-learning. Despite Suchal, we actually recognize persons based on their addresses and also (academic) titles. It is uncommon, that 2 different persons, named equally, live in the same street and city, if it is father and son. Yet they may have 2 different titles. Regarding different addresses, we encourage to use firm’s actual record as a complement, in order to repair typed errors and address changes, because changes and error fixes are performed by the competent staff of SBR. We do not dare to mark 2 person items as one and the same person. It is up to SBR, that they keep their person items updated every time a person gets re-housed. They however, use personal numbers and ID card numbers and other data, not available to J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 57 public, which they may rely on. Because that, we have to tolerate that inaccuracies in personal items are unavoidable. However, we prefer rather to not recognize a relationship, as if the relation should be falsely marked, due to 2 to 1 person conversion. Although we use clustering with person merging, this operation is completely reversible and does not affect connections at all. Personal information is keep intact. 7. Conclusion An offer of semantic data market is poor and non-governmental or commerce sector does not improve it either. It is an unfortunate situation, because the potential is big and the hole is noticeable. Whether we look in Czech Republic, we find governmental SPARQL endpoints and datasets32 or non-governmental project for business register browsing with obchodni-rejstrik.podnikani.cz. And availability of open data is vital on behalf of public control. Data.gov is only partially implemented and, as the report says, only 8 out of 22 submitted tasks were finished. Government-independent initiatives and projects are, also, present in Slovakia, but their effect is marginal and they are often one-time events. Other projects we find slowly declining (including foaf.sk) and unavailability of the data is still very intensive. It was a great step forward to publish debtors, thanks to SIA and VsZP. Even in the case the data is provided online, like Financial department of Slovakia or commercial health insurance agency UNION do, only PDF formatted file with tables is available. Basically, we divide open data publishing in Slovakia into 2 categories. First is category of clear and open data, machine readable. This category is very important and the data highly usable. But it is the smallest out of 2 groups. Second, although being published, the data itself is in inappropriate format (like PDF). Yet so small market of open data in Slovakia is divided and fragmented once more. As if it was not enough already, that data.gov.sk does not cover all of the few available machine readable sources. By proposal of this paper, we try to fill up the gap of semantic tools availability, currently present in Slovakia. Because data sources are not always machine readable, we advance further the concept similar to Suchal and Vojtek (2009) and use regular expressions. But instead of online based service, we propose client based solution. The advantages of availability are preserved intact, but no additional costs, with funding online service, are present. Additionally, when application is about to be updated, it is up on a user, whether he wishes to do so. The modular concept is counting with the possibility of new modules creation and inclusion. Proved already functioning and useable, we would like to recommend its usage. Currently, there are negotiations in process, regarding offline dataset of SBR. We consult with support member of SBR and SOAP33 query is under construction. Should we have succeeded, we publish and detail about how to obtain offline dataset of SBR. Unfortunately, prior to publishing of this work, we do not have any more information. We are almost sure, that it is possible to successfully connect to their SOAP server in order to get dataset. Acknowledgement This work is partially supported by following grants: CLAN (APVV-0809-11) and VEGA (2/0154/16). 32 Dataset of Czech Administration of Social Insurance. (2014). Retrieved from https://portal.gov.cz/app/RejData/rec.jsp?id=1646070&id_rej=97898&y=2015&m=11&doctype=idx 33 XML Soap. (2015). Retrieved from http://www.w3schools.com/xml/xml_soap.asp BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 6, Issues 3-4, December 2015, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 58 References Becker, Ch. & Bizer, Ch. (2008). DBpedia Mobile: A Location-Enabled Linked Data Browser. LDOW, 369, 2008 Berners-Lee, T., Chen, Y., Chilton, L., Connolly, D., Dhanaraj, R., Hollenbach et al. (2006). Tabulator: Exploring and analyzing linked data on the semantic web. In Proceedings of the 3rd International Semantic Web User Interaction Workshop, volume 2006. Athens, Georgia, 2006. Dokulil, J. & Katreniaková, J. (2008). Navigation in RDF data. In Information Visualization, 2008. IV'08. 12th International Conference (pp. 26-31). Institute of Electrical and Electronics Engineers. Furnas, G. W. (1986). Generalized fisheye views (Vol. 17, No. 4, pp. 16-23). ACM. Herman, I., Melanon, G., & Marshall, M. S. (2000). Graph visualization and navigation in information visualization: A survey. Visualization and Computer Graphics, Institute of Electrical and Electronics Engineers Transactions on, 6(1), 24-43. Huynh, D., Karger, D., & Miller, R. (2007). Exhibit: lightweight structured data publishing. In Proceedings of the 16th international conference on World Wide Web, pages 737– 746. ACM, 2007. Kurian M. Independent Reporting Mechanism SLOVAKIA: Progress Report 2012-13. (2013). Retrieved from http://www.opengovpartnership.org/files/slovakia-ogp-irm- public-comment-engpdf-0/download Laclavík, M., Dlugolinský, Š., & Ciglan, M. (2014). Discovering relations by entity search in lightweight semantic text graphs. Computing and Informatics, 33:877–906, 2014. Laclavík, M., Dlugolinský, Š., Kvassay, M., & Hluchý, (2011). L. Email social network extraction and search. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Volume 03, pages 373–376. Institute of Electrical and Electronics Engineers Computer Society, 2011. Laclavík, M., Dlugolinský, Š., Kvassay, M., & Hluchý, L. (2010). Use of E-mail Social Networks for Enterprise Benefit. In Web Intelligence/IAT Workshops, pages 67–70. Citeseer, 2010. Laclavík, M., Dlugolinský, Š., Šeleng, M., Ciglan, M., & Hluchý, L. (2012). Emails as graph: relation discovery in email archive. In Proceedings of the 21st international conference companion on World Wide Web, pages 841–846. ACM, 2012. Laclavík, M., Šeleng, M., Ciglan, M., & Hluchý, L. (2012). Ontea: Platform for pattern based automated semantic annotation. Computing and Informatics, 28(4):555–579, 2012. Laclavík, M., Šeleng, M., Ciglan, M., Dlugolinský, Š., & Hluchý, L. (2011). gSemSearch: Objavovanie relácií v kolekciách textových a grafových dát. In 6th Workshop on Intelligent and Knowledge Oriented Technologies: WIKT, pages 1–5, 2011. Mojzis, J. & Laclavik, M. (2014). Graph clutter filtering based on connectivity distance and visibility. In Science and Information Conference (SAI), 2014 (pp. 153-158). Institute of Electrical and Electronics Engineers. Omote, H. & Sugiyama, K. (2006). Method for drawing intersecting clustered graphs and its application to web ontology language. In Proceedings of the 2006 Asia-Pacific Symposium on Information Visualisation-Volume 60 (pp. 89-92). Australian Computer Society, Inc. J. Mojžiš, M. Laclavík - Browsing Semantic Data in Slovakia 59 Smith, D., Owens, A., Russell, A., Harris, C., Wilson, M. et al. (2005). The evolving mSpace platform: leveraging the Semantic Web on the Trail of the Memex. In Proceedings of the sixteenth ACM conference on Hypertext and hypermedia, pages 174–183. ACM, 2005. Suchal, J. & Vojtek, P. (2009). Navigácia v sociálnej sieti obchodného registra SR. DATAKON, Srní, Czech Republic.