Mathematical Problems of Computer Science 41, 114---121, 2014.

114

An Automated System for Correction of Rules of
Transformation from Universal Networking

Language into Natural Language1

Aram A. Avetisyan and Levon R. Hakobyan

Institute for Informatics and Automation Problems of NAS RA
e-mail: a.avetisyan@undlfoundation.org, levon.r.hakobyan@gmail.com

Abstract

An approach to automated correction of UNL grammar rules is implemented. The
development process of the rule sets could be made in a more effective way if some tools
are used which can find the rules responsible for possible errors and make suggestions
about its correction. Several definitions for the formalization of the transformation
process and the algorithm for such kind of functionality are created. The implementation
of the algorithm in the UNDL Platform and some examples are described.

Keywords: UNL, Machine translations, Transformation rules, Grammar.

1. Introduction

UNL (Universal Networking Language) is a meta-language representing semantic information.
Its main purpose is to store “the meaning” of natural language texts in a language-independent
format. Each sentence in UNL is a oriented linked graph [1,2,3].

Any sentence in UNL is a semantic network believed to be logically precise, humanly readable
and computationally tractable. In the UNL approach, information conveyed by natural language
is represented sentence by sentence, as a graph composed of a set of oriented binary labeled links
(referred to as “relations”) between nodes (the “Universal Words”, or simply “UW”), which
stand for concepts. UWs can also be annotated with “attributes" representing modality [1,2].

Today one of the main systems for the UNL processing is UNDL Platform. UNDL Platform is
a web-based application which was developed by UNDL Foundation [3].

Transformation process from natural language to UNL is called analysis and from UNL to
natural language - generation. In UNDL Platform the analyzer (i.e. algorithm which implements
the analysis) was developed under the IAN project, and the generator (i.e. algorithm which
implements the generation) - under the EUGENE project [3].

1 This work was supported by State Committee of Science, MES RA, in frame of  the research project SCS 13-B321


A. Avetisyan and L. Hakobyan 115

Transformation applications use dictionaries and grammar rule sets for the corresponding
natural language.

Therefore, one of the main goals of the UNL project is the development of grammar rule sets.
But the process of creating and editing such rules can be a big challenge for users.

In this article we introduce an approach to the automation of the rules correction process using
real examples.

We introduce some new definitions, describe the corresponding algorithm for responsible rules
searching and give basic information about its implementation in the UNDL Platform system.

2.  The Problem Setting

Transformations in the UNDL Platform system are implemented by consequently applying
transformation rules on the UNL sentence. In case of analyzer, the result of this process is
transformation from the linear structure to the semantic graph. In case of generator, it is
transformation from the semantic graph to the linear structure.

Transformation from UNL to natural language has the following input parameters: dictionary,
UNL sentence and grammar rule set. There are two types of grammar rules: transformation rules
(t-rules) and disambiguation rules (d-rules) [4].

Transformation rule is a pair (α, β), where the left side α is a condition statement, and the right
side β is an action to be performed over α.

Disambiguation rule is a pair (α, P), where the left side α is a statement and the right side P is
an integer from 0 to 255 that indicates the priority coefficient which is taken into account for the
implementation of α.

Without loss of generality, let us assume that there are no d-rules, and attach priorities to the
corresponding t-rules.

The transformation process consists of several iterations (steps). On each step one t-rule is
applied. Rule applying process continues until there are no rules which can be applied to the
sentence.

On each step, transformation algorithm loops through all t-rules. If a t-rule with a satisfying
condition is found, that rule will be applied to the current sentence, i.e. an action of current t-rule
will be applied to the current sentence.

When the algorithm finishes, the user gets a transformation result and trace-data. Trace-data
contains information about all steps of the transformation process applied to the sentence. In fact,
this is the description of the transformation process. But because of a huge amount of
information, this description is not easy for the users to deal with.

If the user finds mistakes in the result of transformation, he has to look for a t-rule that caused
the current mistake to appear in the result. This process is not really effective for the user. This
approach can become even more ineffective if the user also wants to find dependencies between
rules and to make a decision on how applying some rules affect on choosing of the rules on the
next steps.

Therefore, one of the main goals of the UNL project is creating additional tools for users to
help them for improving the grammar rule sets.

As we mentioned above, the result of the transformation from UNL to the natural language is a
linear structure with nodes. Therefore, possible mistakes are contained in one or more nodes.
And the user can mark the corresponding nodes whose transformation should be corrected.

Proceeding from the above, we can define our goal as the development of the corresponding
functionality, to help us find the rules that caused the specified node’s state.

To find an appropriate solution we should, at first, introduce some definitions about the rule’s
responsibility for the corresponding structure of the transformation result.


An Automated System for Correction of Rules of Transformation116

In this paper, we consider the method for a development of the corresponding functionality.
We also introduce an algorithm for responsible rules searching and give basic information about
its implementation in the UNDL Platform system.

Our approach is developed for transformations from UNL to natural language. But with some
modifications it could be used also for transformations from natural language to UNL.

3.  One Possible Solution of the Problem

As we mentioned above, during the transformation process from UNL to natural language, a
transformation from the semantic graph to a linear graph takes place. As a result the user gets a
linear graph which contains one node for each word in the sentence in the natural language. So
every mistake is contained in nodes from the UNL sentence obtained during the transformation.
This makes possible to mark nodes which contain mistakes.

We assume that the source UNL sentence and dictionaries are correct. Considering this we
claim that the correctness of the transformation result totally depends on transformation rules.

3.1. Basic Definitions

At first, we should give some basic definitions of the roles of rules in the transformation process.
As we have mentioned, the UNL sentence is a graph, so sub-graphs may be considered.

Rule is called affected on current node or relation if the current node or relation has been
created or modified by applying the rule, or this rule’s application has triggered the creation or
modification of the mentioned node or relation in future steps.

Rule is called responsible for some sub-graph if its application has resulted from the
corresponding sub-graph appearance in the transformation process.

Rule is called indirectly responsible for some sub-graph if its application has triggered a rule
responsible for this sub-graph in future steps.

Of course, several rules can be responsible for a single sub-graph.
Taking these definitions into consideration, we can rephrase our goal as: functionality for

searching all rules responsible for current nodes.

3.2. Approach

Our approach consists of two parts: basic part and modules. Basic part is an algorithm that
reviews the steps of the transformation process and marks affected steps. The modules part
consists of semi-independent logical modules. In the current implementation, the module is a
function. Each of these modules should be used for a specific type of errors. For example, there
is a module for syntactic errors.

Basic algorithm reviews all transformation steps, marks some of them which have affected on
current nodes, decides which module should be called and prepares data to be passed to that
module depending on user input.

On the second stage the module starts processing. This module analyzes all rules marked as
affected on current node by basic algorithm and marks the rules responsible for current nodes.


A. Avetisyan and L. Hakobyan 117

3.3. Implementation

In our implementation, during the transformation process the transformation function already
saves some critical information about transformation steps. This feature lets us start analyzing
steps immediately after the transformation process.

In UNDL Platform, the steps of the implemented transformation process are as follows:
Function match is used to check, if the sentence satisfies the rule condition. It returns the object
of the MatchSet class. This object is empty if there is no sub-graph which satisfies the current
condition. If match function finds a sub-graph which satisfies the condition showing that this rule
is applicable to the current sentence, it returns the object of the MatchSet class which contains a
copy of the current sub-graph.

If there is a corresponding sub-graph, the rule applying takes place. Rule applying is
implemented by the execute function. This function returns the object of the OutputSet class,
which contains a copy of the sub-graph, obtained by rule applying.

In fact, the object of the MatchSet class contains data about nodes and relations of the sub-
graph which satisfies the condition. And the object of the OutputSet class contains data about
nodes and relations, which were obtained by rule applying.

We have developed the Tracer class, which stores all critical information for searching
responsible rules.

On every step our algorithm saves MatchSet and OutputSet objects, the applied rule, and the
current sentence obtained by rule applying, in the object of the TraceNode class.

The Tracer class contains a Path object which is implemented by the hash-table. In this hash-
table the key is a number of applied rule, and its corresponding value is an object of the
TraceNode. Step number and TraceNode object created on that step are added to the Path object
on every step of the transformation process.

So, after transformation the Tracer object stores all information about transformation steps we
need.

In order to find the affected rules, unique identifiers of the corresponding nodes are allocated.
The AnalyzePath function is called with these identifiers as parameters.

The AnalyzePath function looks through all TraceNode objects that are stored in Path hash-
table from the last to the first. OutputSet and MatchSet of the corresponding TraceNode are
analyzed on every step.

In fact, a rule can be claimed as affected on node, if the identifier of that node is contained in
OutputSet or MatchSet.

For every step with an affected rule, an object of MarkedItem class is being created. The
MarkedItem stores the link to the current TraceNode and some additional information. The
MarkedItems set is then passed to the corresponding module.

So, the basic algorithm finds all rules affected on current node. On the second stage, a module
for deep error processing starts.

This module loops through all the steps marked by the basic algorithm and analyzes them. It
also analyses OutputSet and MatchSet, but the processing logic depends on the error type.

In rules sets usually rules, which are used for some kind of sentence post processing, occur.
They remove unused brackets and add whitespaces to sentences.

These rules applying usually does not have a semantic meaning and these rules applying does
not have a significant impact on the NL sentence obtained by the transformation process. So, in
our algorithm we have the ability to ignore such kind of post processing rules.


An Automated System for Correction of Rules of Transformation118

3.4. Computational Complexity

Computational complexity is very important for us, because UNDL Platform is a web-
application and its computational time is bounded with the server response time. So we should
show that the upper bound for computational time of our algorithm is appropriate for us.

If we take maximal count of all nodes and relations for all steps as M, and count of steps in
transformation process as N, we claim that the estimated algorithm time is О(2(N * 2M)).

Indeed, the maximal count of steps analyzed on the current stage, for the first and second
stages, is equal to N. And the maximal possible count of reviewed nodes and relations (taking
into consideration that both OutputSet and MatchSet are reviewed) is equal to 2M. From here we
get an upper bound for the current stage of the algorithm which equals to N * 2M. Therefore, the
whole algorithm upper bound is equal to 2(N * 2M).

4.  Examples

4.1 The Basic Algorithm

During our development and testing process we used the Test Drive corpora developed by
UNDL Foundation [5]. We used the following sentence with the “PRE#1” identifier:
[S: PRE#1]

{org}
the book on the table

{/org}
{unl}

plc(book:01.@def, table:02.@def.@on)
{/unl}

[/S]
After transformation we get the “the book on the table” result (it is equal to the original). And

it has the following linear representation:
<SHEAD>

#L("the":06, " ":09)
#L(" ":09, "book":01)
#L("book":01, " ":0A)
#L(" ":0A, "on":04)
#L("on":04, " ":0B)
#L(" ":0B, "the":08)
#L("the":08, " ":0C)
#L(" ":0C, "table":02.@on)

<STAIL>
We mark the “:08” node to get all rules responsible for that node, and run our basic algorithm.

As a result, we get this set of rules which are marked as responsible for the “:08” node.

When we get this information, the corresponding module can be called to deeply analyze the
marked rules.

During this process two rules were ignored as post processing rules. They are shown in Table
2.


A. Avetisyan and L. Hakobyan 119

Table 1

Rul
e Id

Rule Depth MatchSet OutputSet

781 NP(%x;%y):=(%y,+>BLK)(
%x,+>BLK);

29 NP:03(table:02.@
on, the:08)

#L:03(the:08,
table:02.@on)

711 NS(%x;%y):=NP(%x;%y); 25 NS:03(table:02.@
on, the:08)

NP:03(table:02.@on,
the:08)

685 XP(%x,N;%y):=NS(%x;%y); 22 XP:03(table:02.@
on, the:08)

NS:03(table:02.@on,
the:08)

650 /[ACDIJNPV]S/(%x;%y,^spe
c):=XP(%x;%y,+spec);

16 NS:03(table:02.@
on, the:08)

XP:03(table:02.@on,
the:08)

625 (%x,N,@def):=(NS(%x,-
@def;%y,[the],+LEX=D,+PO
S=ART),+LEX=N);
book.@def > NS(book,the)

8 [table:02.@def.@
on]

[sc:03(NS:03(table:02.@
on, the:08))]

Table 2

Rule Id Rule
816 ((%x)(%y)):=(%x)(%y);
817 (%x,>BLK)(%y,^BLK,^PUT,^STAIL):=(%x,->BLK)(" ",+BLK)(%y);

4.2. An Error Specific Module

Here we consider an example of one simple module which could be a prototype for more
complex modules creation.

We used the following sentence with the “VER#1” identifier from Test Drive corpora: “He
arrived”.

Let’s assume we get the following transformation “He arrives”. Here is an error in the last
letter. User shows that error and inputs the right letter “d”.

So the basic algorithm returns the following subset of steps to be analyzed by the module:

Table 3

Rule Id Rule
419 (@past,^att=@past):=(-@past,+att=@past,+WHEN);
45 (%x,V,@past):=(%x,-@past,-VBL,+ATE=PAS);
166 (%x,^inflected,FLX):=(%x,!FLX,+inflected);

On the last step analyzing (the algorithm goes from the last to the first), the algorithm assumes
that “s” was added on that step by comparing the MatchSet and the OutputSet. After that, the
algorithm analyzes the rule and assumes that on the last step the “s” letter was added by applying
“FLX” of the current node because of the “ATE=PAS” attribute. Therefore, this is the reason of
current error. So this is the responsible rule.


An Automated System for Correction of Rules of Transformation120

After that, the algorithm starts to analyze the other rules from the subset in order to find
indirectly responsible rules. Now the algorithm looks for the steps which are responsible for the
“ATE=PAS” attribute of the current node and/or the MatchSet of the last overviewed step.

The algorithm assumes that the second step is indirectly responsible, because the “ATE=PAS”
attribute was added on that step and the MatchSet of the last overviewed step is equal to the
OuputSet of the current step.

The last step is analyzed by the same way. On the previous overviewed step the “ATE=PAS”
attribute was added because of the “att=@past” attribute. So, on this step the algorithm looks for
the reason of the “@past” attribute and the corresponding equality between the OutputSet of the
current overviewed step and the MatchSet of the previous overviewed step.

After that, because there are no other steps, the algorithm assumes that there are two indirectly
and one directly responsible rule as it is shown on Table 4.

Table 4

Rule Id Rule Type of Responsibility
419 (@past,^att=@past):=(-

@past,+att=@past,+WHEN);
Indirectly

45 (%x,V,@past):=(%x,-@past,-VBL,+ATE=PAS); Indirectly
166 (%x,^inflected,FLX):=(%x,!FLX,+inflected); Directly

And finally we get recommendation to replace “PAS:=0>"s";” part of the corresponding
dictionary entry attribute by “PAS:=0>"d";” string.

As we see, the modules for error processing entirely depend on the type of the error. In general,
on every step a module should analyze the MatchSet and the OutputSet of the current step, the
MatchSet of the previous step, the condition and the action of the rule and the attributes of the
corresponding nodes.

The modules should be intelligent enough to be able to algorithmically analyze the structures
and the goals of the rules.

5. Conclusion

Today, a development of the methods for generating of transformation rules is one of the
important parts of the UNL project. The main goal of such methods is a creation of tools, which
would be able to increase efficiency of the rules development process.

Several definitions for the formalization of some aspects of the transformation process have
been introduced above. The algorithm for responsible rules searching and the structures, used in
the current implementation under the UNDL Platform project, are described. Also the examples
of an error processing module and the basic algorithm were given.

We propose here the first implementation of the mentioned method. It should be improved
during testing and the analysis of new cases of errors processing.

Researches will be done in order to create new algorithms and improve the existing ones for
more precise automation of the error analyzing process.

References

[1] H. Uchida, M. Zhu, and T. Della Senta, A Gift for a Millennium, IAS/UNU, 1999.
[2] R. Martins, UNL Wordnet, Mackenzie University, 2008.


A. Avetisyan and L. Hakobyan 121

[3] UNDL Foundation web site, [Online]. Available: http://www.undlfoundation.org/
[4] DeConverter Specification Version 2.6, UNL Center /UNDL Foundation 2002.
UNL Center, Enconverter Specifications, Version 3.3 Tokyo, 2001.

[5] Test Drive Corpus, UNDL Foundation, [Online]. Available:
http://www.unlweb.net/wiki/EUGENE#Test_drive

Submitted  14.12.2013, accepted 22. 02. 2014.

UNL լեզվից դեպի բնական լեզու տրանսֆորմացման
կանոնների ուղղման ավտոմատացված համակարգ

Ա. Ավետիսյան և Լ. Հակոբյան

Ամփոփում

Հոդվածում նկարագրված է UNL լեզվի քերականության կանոնների  ուղղման
ավտոմատացված մեթոդը: Կանոնների մշակման գործընթացը կարող է ավելի
արդյունավետ լինել, եթե օգտագործվեն համապատասխան գործիքներ, որոնք թույլ
կտան գտնել կանոններ՝ պատասխանատու հնարավոր սխալների համար և ստանալ
հանձնարարական դրանց շտկման հնարավոր մեթոդի վերաբերյալ: Հոդվածում
բերված են որոշ սահմանումներ տրանսֆորմացման ընթացքը ֆորմալիզացնելու
համար և նկարագրված է դրան համապատասխան ալգորիթմը: Հոդվածում
նկարագրված է նաև համապատասխան ալգորիթմի իրականցումը UNDL Platform
համակարգում և բերված են դրա կիրառման առանձին օրինակներ:

Автоматизированная система для корректирования правил
трансформации с языка UNL на естественный язык

А. Аветисян и Л. Акопян

Аннотация

В статье описан метод автоматизированного корректирования правил UNL
грамматики. Процесс разработки правил может быть более эффективным с
использованием инструментов позволяющих найти правила, ответственных за возможные
ошибки, и получить рекомендации о методе их исправления. В статье даны некоторые
определения с целью формализации процесса трансформации и приведен
соответствующий алгоритм. В статье также описаны реализация алгоритма в системе
UNDL Platform и некоторые примеры его использования.