Microsoft Word - 4-2269_s1


Engineering, Technology & Applied Science Research Vol. 8, No. 6, 2018, 3512-3514 3512  
  

www.etasr.com Chopra et al.: Improving Translation Quality By Using Ensemble Approach 

 
Improving Translation Quality By Using Ensemble 

Approach 
 

Deepti Chopra 

Department of Computer Science 

Banasthali University 

Newai, India 

deeptichopra11@yahoo.co.in 

Nisheeth Joshi  

Department of Computer Science 

Banasthali University 

Newai, India 

jnisheeth@banasthali.in 

Iti Mathur 

Department of Computer Science 

Banasthali University 

Newai, India 

mathur.iti@rediffmail.com 
 

Abstract—Machine translation (MT) has been a topic of great 

research during the last sixty years, but, improving its quality is 

still considered an open problem. In the current paper, we will 

discuss improvements in MT quality by the use of the ensemble 

approach. We performed MT from English to Hindi using 6 MT 

different engines described in this paper. We found that the 

quality of MT is improved by using a combination of various 

approaches as compared to the simple baseline approach for 

performing MT from source to target text. 

Keywords-machine translation; named entity translation; 

natural language processing; source text rewriting 

I. INTRODUCTION  

Machine translation is a natural language processing (NLP) 
application. It is defined as the process of conversion of text 
from one language to another and is still considered an open 
problem. Tasks related to MT began soon after the World War 
II, when translation was performed with the help of electronic 
bilingual dictionaries and manually designed lexical rules [1]. 
To make advancements in the field of MT, US Government 
established a committee called Automatic Language Processing 
Advisory Committee (ALPAC). ALPAC members concluded 
that MT was not very accurate, and was more expensive than 
human translation. So, they suggested investing in basic 
research in NLP. Recently, more powerful computers were 
developed that could handle the huge amount of the MT related 
data. Today, when we look back into past, we may realize that 
the ALPAC report led to progress in the field of NLP in the 
long term. Several NLP based resources have been developed 
and helped developers to solve MT based problems. Today, a 
large number of companies and institutions have been 
motivated by the profitability of MT as a business and this has 
led them to invest in MT based projects. 

II. CLASSIFICATION OF MT SYSTEMS 

On the basis of degree of human interaction, MT systems 
can be classified into three types [2]: (a) machine-aided human 
translation (MAHT), (b) human-aided machine translation 
(HAMT) and (c) fully automatic machine translation (FAMT). 
MAHT is implemented by many commercial systems. FAMT 
systems are mostly free and can be found on the internet. 
According to the levels of linguistic analysis, MT may be 

classified into three types: (a) direct, (b) transfer and (c) 
interlingua. The levels of linguistic analysis can be seen in [3]. 

In direct approach, phrase by phrase or word by word 
translation takes place without undergoing any other additional 
representation [4]. The advantage of direct MT approach is that 
translation can be understood with little effort. Its 
disadvantages are that it can be built only for specific language 
pairs and it is expensive to build in case of multilingual 
scenarios. Also, some of the meanings of the source text might 
get lost in translation when using it. Example based MT 
(EBMT) systems and statistical MT (SMT) systems are based 
on direct approach. In EBMT systems, bilingual corpus or 
parallel texts are used. They are implemented using case based 
reasoning methodology of machine learning. Statistical 
machine translation (SMT) systems make use of Bayes 
decision rule and statistical decision theory in order to reduce 
the number of decision errors. Rule based MT (RBMT) 
systems are based on transfer based approach. RBMT systems 
can either be syntactic transfer based MT systems or semantic 
transfer based MT systems, in which the source text is firstly 
converted to source abstract representation which is then 
converted to target abstract representation using linguistic 
rules. The target abstract representation is then finally 
transformed into target text. In syntactic transfer based systems, 
source parse tree is constructed from the source text which is 
then transformed into target parse tree. The target parse tree is 
then converted into target sentence. In semantic transfer based 
MT systems, the source text is converted into source semantic 
abstract representation which is then converted into target 
semantic abstract representation. Target semantic abstract 
representation is then transformed to syntactic structure which 
is finally converted to target text. An advantage of transfer 
based approach is that it can handle ambiguities that are 
transferred from one language to another. One disadvantage of 
the transfer based approach is that the original meaning of the 
source text may get lost during translation. In interlingua based 
MT systems, a language independent based abstract 
representation is constructed from the source sentence which is 
then converted to target sentence. Source sentences in different 
languages having the same meaning have the same abstract 
representation in interlingua based MT systems. This 
minimizes transfer generation burden. 


Engineering, Technology & Applied Science Research Vol. 8, No. 6, 2018, 3512-3514 3513  
  

www.etasr.com Chopra et al.: Improving Translation Quality By Using Ensemble Approach 

 
III. PROBLEMS IN MT REGARDING INDIAN LANGUAGES 

Indian languages are free word order and morphologically 
rich languages. Some of the problems faced in MT in Indian 
languages are presented below. 

A. Complex Sentences Are Not Translated Correctly 

Source sentences that are complex are usually translated 
incorrectly. For example, consider this source text: “The Taj 
Mahal is one of the wonders of the world located on the south 
bank of the Yamuna river in the Indian city of Agra”. The 
above mentioned complex source text can be simplified or 
rewritten to obtain the following simplified source text: “The 
Taj Mahal is one of the wonders of the world. It is located on 
the south bank of the Yamuna river in the Indian city of Agra”. 

B. Named Entities Are Not Identified Correctly.  

Named Entity recognition (NER) should be performed prior 
to MT. So, that named entities are correctly identified and 
spelled (translated or transliterated) correctly.  

IV. METHODOLOGY 

MT Systems are constructed using different combinations 
of ensemble techniques that include classifier based approach, 
source text rewriting and named entity translation. MT systems 
that we have designed are summarized in Table I. 

TABLE I.  MT SYSTEMS FOR ENGLISH TO HINDI TRANSLATION 

Engine No. Scheme 

M1 English-Hindi baseline system 

M2 Classifier based approach incorporated in English-Hindi SMT 

M3 
Source text rewriting approach incorporated in English-Hindi 

SMT 

M4 
English name entity translation system incorporated in English-

Hindi SMT 

M5 
Classifier based approach coupled with English name entity 

translation system and incorporated in English-Hindi SMT 

M6 
Source text rewriting approach coupled with English name entity 
translation system and incorporated in English-Hindi SMT 

 
We have used a testing file consisting of 1100 sentences in 

English. In M4, named entity recognition using Stanford NER 
is used to detect named entities from the English text. These 
named entities are translated into Hindi and sentences 
comprising of named entities in Hindi and non-named entities 
in English are produced. These English-Hindi mixed sentences 
are sent to statistical MT for complete translation into Hindi. In 
M6, at first, sentence reordering is performed using classifier 
based approach. These reordered sentences are sent to Stanford 
NER for named entity recognition. These named entities are 
translated into Hindi and then complete translation into Hindi is 
performed using statistical MT. For human evaluation, we used 
HEval evaluation metric [5]. The language linguistic features 
that have been included in human evaluation metrics are: 

• Translation of gender and number of nouns. 

• Translation of tense in the sentence.  

• Translation of voice in the sentence. 

• Identification of proper noun(s). 

• Use of adjectives and adverbs corresponding to nouns and 
verbs. 

• Selection of proper words/synonyms (lexical choice).  

• Sequence of phrases and clauses in the translation.  

• Use of punctuation marks in the translation.  

• Fluency of translated text and translator’s proficiency.  

• Maintaining the semantics of the source sentence in the 
translation.  

• Evaluating the translation of source sentence (with respect 
to syntax and intended meaning). 

In order to assess the quality of translation, a five point 
scale is employed as shown in Table II. 

TABLE II.  DESCRIPTION OF 5 POINT SCALE IN HUMAN EVALUATION 

Score Meaning 

4 Ideal 

3 Perfect 

2 Acceptable 

1 Partially Acceptable 

0 Not Acceptable 

 
The overall score is computed for all the linguistic features 
using (1): 

Overall	Score =
∑����	��	���	�������

�∗(�����	��.��	������� ��	�������!)
 (1) 

This score is also compared with adequacy and fluency 
score. Adequacy and fluency are represented in Tables III and 
IV respectively. 

TABLE III.  DESCRIPTION OF ADEQUACY ON 5 POINT SCALE 

Score Meaning 

5 Complete Information 

4 Most Information  

3 Much Information 

2 Little Information 

1 None 

TABLE IV.  DESCRIPTION OF FLUENCY ON 5 POINT SCALE 

Score Meaning 

5 Ideal 

4 Good 

3 Non Native  

2 Disfluent 

1 Incomprehensible 

 
V. RESULTS 

We have used 1100 sentences for the mentioned 6 MT 
engines and these sentences were distributed among 10 
documents having 110 sentences each. The combined 
document total score for all 6 MT Engines is shown in Table V. 
The value in bold represents the highest overall score attained 
by the MT Engine. Out of 10 documents, M6 has attained the 
highest overall score in 8 documents. The overall accuracy of 
MT systems is shown in Figure 1. M6 has attained the highest 
overall accuracy of 0.913. 


Engineering, Technology & Applied Science Research Vol. 8, No. 6, 2018, 3512-3514 3514  
  

www.etasr.com Chopra et al.: Improving Translation Quality By Using Ensemble Approach 

 
TABLE V.  DOCUMENT WISE OVERALL SCORE OF MT ENGINES 

 M1 M2 M3 M4 M5 M6 

DOC 1 0.42456 0.53451 0.79728 0.67322 0.8027 0.91302 

DOC 2 0.47223 0.64324 0.8685 0.6827 0.79626 0.87742 

DOC 3 0.45623 0.57187 0.8732 0.64068 0.7837 0.89334 

DOC 4 0.54163 0.65432 0.88698 0.7396 0.80064 0.8899 

DOC 5 0.43274 0.57435 0.89732 0.73058 0.8045 0.91114 

DOC 6 0.56847 0.62341 0.96182 0.74444 0.84218 0.96932 

DOC 7 0.57324 0.62156 0.95778 0.71096 0.8334 0.91216 

DOC 8 0.54628 0.68942 0.9392 0.73568 0.84058 0.9183 

DOC 9 0.53404 0.75842 0.92698 0.7437 0.84706 0.9294 

DOC 10 0.54231 0.74351 0.93215 0.72451 0.82568 0.91245 

 
Fig. 1.  Overall MT systems accuracy. 

VI. CONCLUSION 

In this research paper, we showed that using ensemble 
techniques, the quality of English to Hindi MT improves. We 
have designed 6 MT systems and performed our experiment on 
1100 English sentences. The MT engine designed using source 
text rewriting approach coupled with English name entity 
translation system and incorporated in English-Hindi SMT has 
shown the highest overall accuracy of 0.913. 

REFERENCES 

[1] R. Srivastava, R. A. Bhat, “Transliteration systems across indian 
languages using parallel corpora”, 27th Pacific Asia Conference on 
Language, Information, and Computation, pp. 390-398, Taiwan, 
November 21-24, 2013 

[2] V. H. Yngve, “The machine and the man”, Mechanical Translation, Vol. 
1, No. 2, pp. 20-22, 1954 

[3] B. Vauquois, “A survey of formal grammars and algorithms for 
recognition and transformation in machine translation”, IFIP Congress 
(2), Vol. 68, pp. 1114-1122, UK, August 5-10, 1968 

[4] W. Weaver, “Translation”, in: Machine Translation of Languages, pp. 
15-23, 1955 

[5] N. Joshi, I. Mathur, H. Darbari, A. Kumar, “HEval: Yet another human 
evaluation metric”, International Journal on Natural Language 
Computing, Vol. 2, No. 5, pp. 21-36, 2013 

 
0,509173

0,641461

0,904121

0,712607
0,81767

0,912645

0

0,2

0,4

0,6

0,8

1

1 2 3 4 5 6

O
v

e
ra

ll
 A

c
c
u

ra
c
y

MT Engines