Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844
Vol. V (2010), No. 3, pp. 301-313

SRoL - Web-based Resources for Languages and Language Technology
e-Learning

S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

Silvia Monica Feraru
Institute for Computer Science of the Romanian Academy, Iaşi, Romania
E-mail: mferaru@etti.tuiasi.ro

Horia-Nicolai Teodorescu
Institute for Computer Science of the Romanian Academy, Iaşi, and
Technical University "Gheorghe Asachi" of Iaşi, Romania
E-mail: hteodor@etti.tuiasi.ro

Marius Dan Zbancioc
Institute for Computer Science of the Romanian Academy, Iaşi, and
Technical University "Gheorghe Asachi" of Iaşi, Romania
E-mail: zmarius@etti.tuiasi.ro

Abstract: The SRoL Web-based spoken language repository and tool collection in-
cludes thousands of voice recordings grouped on sections like "Basic sounds of the
Romanian language", "Emotional voices", "Specific language processes", "Patho-
logical voices", "Comparison of natural and synthetic speech", "Gnathophonics and
gnathosonics". The recordings are annotated and documented according to propri-
etary methodology and protocols. Moreover, we included on the site extended docu-
mentation on the Romanian language, on speech technology, and on tools, produced
by the SRoL team, for voice analysis. The resources are a part of the CLARIN Euro-
pean Network for Language Resources. The resources and tools are useful in virtual
learning for phonetics of the Romanian language, speech technology, and medical
subjects related to voice. We report on several applications in language learning and
voice technology classes. Here, we emphasize the utilization of the SRoL resources
in education for medicine and speech rehabilitation.
Keywords: spoken language resources, voice education, gnathosony, gnathophony,
education, speech rehabilitation.

1 Introduction

In a world where the Web and Internet communication is pervasive, the computer is more than a
study topic for everyone, it is a ubiquitous tool. Computers serve for more than doing computations,
they are now one of the most used means of communication and interaction - the very basis of any
educational system. As a consequence, computer-based education is an obvious choice whenever a
distance separates the learner and the learning person. In a general sense, computer-based education and
virtual education based on Internet is today an undeniable fact of life in every academic campus [28],
[29]. While computers and the network are the means, the spoken language represents the prevalent
support of communication in the teaching-learning process. Hence, the natural need to address e-learning
and virtual learning of languages, phonetics, voice pathology, and other aspects related to voice and
spoken language.

In view of the above, we built during a timeframe of about five years a web site that offers the
possibility of teaching and learning various aspects on the Romanian language, based on an anno-
tated corpus freely accessible on the Internet. The corpus is complemented with in-depth phonetic
and linguistic analyses, moreover with specific tools accessible by users from everywhere through the

Copyright c© 2006-2010 by CCC Publications


302 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

web [16], [17], [18], [19], [25]. This instrument has a high level of dimensionality and aims to cover
numerous aspects of the language that are not typical features in language corpora. This makes this
"corpus-tool" an unique instrument of its kind existing today in the domain [22].

During the recent years, we developed an emotional speech database which can help in education
and re-education of speech, in diagnosis and treatment, and in learning a language aided by computer;
examples of related published results are [5], [16], [22].

Voice and language e-education is a topic addressed by many research and educational groups.
Solomon [13] studied the possibilities and issues of learning with and about computers in schools or
in other learning environments. The Eric Education Resources Page shows the importance of computer
assisted education of speech and voice [24]. On the other side, web-based educational resources and
training have received attention during the last decade. Ake Olofsson [10] offers a simple method of
compensation for word decoding problems, by using a computer which pronounces the words which
can not be read. Olofsson developed a program for the IBM-PC/AT and a Scandinavian multilingual
text-to-speech unit that children can use to read a textfile on the monitor and request using a mouse the
pronunciation of any word from that text [10].

The computer-assisted learning language software helps the interaction between student and com-
puter by speech, by sound effects, by animation, and by video. On the other hand, the interaction is
restricted typically to the mouse and keyboard. An active interaction, through spoken language enhances
the educational computer-based tools [1]. In computer-assisted language learning, speech recognition of-
fers the possibilities to have an active participation by oral reading and conversation. The CALL system
reported in [1] includes recordings spelled by native speakers. The user has the possibility to compare
the quality of her pronunciation with model recordings.

In another direction of research, Warschaue [23] observes the uses of online communications for
language teaching. He determined that the interest in this domain grows day by day. He proposed a
conceptual framework for understanding the role of the interaction assisted by computer [23]. Lundberg
considers the computer a tool of remediation in the education of students with reading disabilities as
dyslexic students which can benefit by computer training in correct reading and spelling the words [9].

A speech database is a collection of files with sounds, structured according to its own purpose. The
SRoL resource (corpus) is located at the address (www.etc.tuiasi.ro/sibm/romanian_spoken_language/
index.htm). The initiator conceived SRoL as an Internet-based "dictionary of sounds and words" for
the Romanian language supplemented with specific manifestations of voice (including pathologies) and
various tools. The SRoL database includes files with vowels, consonants, diphthongs, sentences with
emotional states, linguistic particularities for the Romanian language, dialectal voices, and gnathosonic
and gnathophonic sounds. It is the first Internet based annotated database of emotional speech for the
Romanian language and contains more than 1500 recordings in different coding formats (.wav, .ogg, .txt,
22 kHz sampling rate, 24 bit or 16 bit precision). The phonetic recordings in SRoL, which refer to an
annotated emotional speech corpus (database), are registered to ORDA.

2 The SRoL resources and the SRoL web site

The SRoL corpus evolved from a small research and educational speech database around 1995 (see
Annex 1). It currently includes several sections, all freely available on the web. The main sections are:

i) Standard pronunciation of vowels, diphthongs, words and short sentences in Romanian; the record-
ings in this section are appropriate for learning correct pronunciation in Romanian, moreover for statis-
tical research on the Romanian phonetics;

ii) Special syntactic constructs (linguistic peculiarities), like double subject and apposition; this sec-
tion is research-oriented;

iii) Emotional voices;


SRoL - Web-based Resources for Languages and Language Technology e-Learning 303

iv) Analytic comparison between the synthetic and natural speech [27];

v) Dialectal utterances;

vi) A small archive of gnathosonic/gnathophonic sounds (included in the general "Archive of Sounds").

Beyond the main sections, the SRoL site includes an introductory section on the phonetics of the
Romanian language, descriptions of the recording protocols and descriptions of the methodology, anal-
ysis tools (free software), extended research documentation, a video application, references, and a list
of potentially useful links. The SRoL team developed instruments for signal processing regarding the
extraction of patterns from voice signals, and the computing of the fundamental frequency (pitch) traces,
respectively the traces of formants F, F, F. The site offers, beside executables programs, descrip-
tions for each of these tools. Those descriptions are intended for a "general use", offering elementary
explanations and relevant references for a better understanding [4], [22].

In this paper, we provide details about applications of the SRoL corpus, available to the address
http://www.etc.tuiasi.ro/sibm/romanian_spoken_language/index.htm.

3 SRoL as support for learning the Romanian language

One of the goals of the SRoL web site is to provide a free Romanian database for students and re-
searchers, for linguists, for teachers, in view of teaching, learning and analysis the Romanian language
sounds. The database includes the pronunciation corpus and related documentation. The database con-
tains among others, sections with:

- recordings of syllables and words pronounced in various contexts, like accentuated word, interrog-
ative sentences, exclamations, various emotions conveyed by the speaker, etc. This part of the database
is aimed as a source for concatenative synthesizers and as benchmark for the voice recognition systems
(isolated words), based on statistical models of language and speech, as [26];

- files of sounds, syllables and words pronounced by persons with various pathologies; this section
may be useful in medical and phonological researches;

- files with professional voices ("perfect" pronunciations), as well as non-professional voices, the
"voices of the people in the street". For the moment, we concentrate on voices from the Iaşi region (East
Romania) and middle area of Moldova.

Learning and teaching languages require well documented audio-visual tools that exemplify and
fully explain spelling for a large variety of voices and contextual and emotional states. While former
methods, like tape recordings and audio disks have been helpful, the multimedia Internet-based tools
offer tremendously increased capabilities. SRoL represents such a tool for the Romanian language. Not
only it is the first for the Romanian language, but its multidimensionality makes it somewhat unique and
novel in concept for language learning and teaching in general.

As an example of use, consider the case of a foreign student who wants to improve her Romanian
pronunciation by comparing the prosody of her voice with the prosody of native speakers. The student
utters a sentence (from those included in the site), then opens WASPT M or another similar tool and
displays the energy and fundamental frequency in her voice. She then compares these prosodic features
to the ones of native speakers and tries to improve her prosody until she produces correct prosodic
patterns. Also, the student can compare formant values and try improving the formants of the vowels she
pronounces.

This instrument is useful for learning to improve speech communication, moreover for human-
computer speech interaction, for security, for medical applications, for video-games and interactive TV,
for teachers, in the study of the Romanian language, etc.


304 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

4 Applications in medical education and re-education of speech

Application fields like language learning, professional voice education, and voice rehabilitation and
re-education for medical conditions have different requirements, moreover are based on different meth-
ods. On the other side, education in medicine (ORL, phoniatrics, dentistry) and in logopedy are other
fields of potential applications of speech resources. Further, voice analysis for diagnosis is a domain that
has seen significant progresses in recent decades. Voice education is needed whenever a voice pathology
including some neurologic and psychiatric disorders, or pathology of the vocal tract occurs. Several
groups have addressed the voice re-education topic [9], [10].

4.1 SRoL resources for minor voice pathology

Till now, we included in SRoL words pronounced by persons with minor pathologies, as trembling
voice. We have demonstrated in our research that splitting the signal in frequency bands that correspond
to the peaks of the F-F formants and respectively to the peaks of F-F formants helps improving
the discrimination process in a significant way. The use of fractal dimensions in assessing the jitter or
shimmer in voice produce mixed results [21]. Adding other fractal dimension, the rate of recognition of
the tremor segments in voice improves, but it still low [21]. The voice pathology section of the database
is useful in medical and phonological researches. Also for medical education use, the site comprises a
gnathosonic and gnathophonic corpus.

4.2 SRoL resources for gnathosony

The gnathosonic analysis refers to the analysis of sounds produced during occlusion, due to the
closing of the mandible over the maxillary at some stage in masticatory-like movements. Watt (cited
in [7], [8], [11], [12]) has initiated the analysis of these sounds with application to diagnosis of the state
of the stomato-gnathic apparatus during the 1960s and 1970s. The method has seen some interest, but it
is not yet a current method in clinical practice.

The shape of the envelope of an occlusal sound is determined by the number of occlusal contacts
and by the dynamics of the terminal part of the occlusion, namely by the dynamics of the sliding of
the teeth, from the first contact until the equilibrium position in occlusion. A characterization of the
waveform should take into account the need to correlate the sound with the medically relevant processes
of contact and sliding. A limit in the occlusal sound analysis has been the complexity and the variability
of shapes of the sound wave. The envelope of a single contact sound is characterized by the rise and
fall times, value of the maximum, duration of the maximum, and total duration. The rise and fall curves
follow exponentially laws, whose constants are of interest in the classification of the occlusal dynamics.
For gnathosonic purposes, the sound signal s(t) generated by occlusion (teeth impact when closing the
mouth like for mastication) and discretized as s[n] is first filtered by an elementary high pass, differential
filter, s[n + ] ← s[n + ] − s[n]. Then, the signal is filtered with a nonlinear filter introduced in [14].
The filter first extracts the rough envelopes, averages them, applies to them median filters, sums the two
resulting envelopes, and then apply to the sum an averaging filter [14]:

uin f [n] = mink=−,..,s[n + k], usup[n] = maxk=−,..,s[n + k]

vin f [n] =



·

∑

k=−

uin f [n + k], vsup[n] =



·

∑

k=−

usup[n + k].

The next stage in the filtering is constituted by the median filtering on a moving window, as

zin f = mediank=−,.., {vin f [n + k]} , zsup = mediank=−,.., {vsup[n + k]} .


SRoL - Web-based Resources for Languages and Language Technology e-Learning 305

and the two envelopes are summed – actually, summed in the sense

y[n] = [|zin f | + zsup] /;

e[n] =


p + 
·

∑

k=−p

y[n + k].

We used a window of width 6 (p = ) for the last averaging. The widths of the windows in the above
operations depend on the signal sampling frequency used in the recording process. The envelope of the
signal is determined by taking the maximal respectively minimal value in a moving window, according
to a procedure similar to the one explained for the filtering process. The envelope, e(t), is itself low-pass
filtered and then used for determining the occlusal sound parameters. The heuristic procedure applied
to determine the duration of the occlusal sounds by forming "binary" impulses during the valid occlusal
sound is:

if (e[n] > c) and B(e[n − ], ..., e[n + ]) > c then h[n] = .,
else h[n] = ,

where B is a binary function (taking only 0 and 1 values) defined by

B = [max (e[n − ], ..., e[n − ]) > c] & [max (e[n + ], ..., e[n + ]) > c] .

The constants were chosen semi-empirically, as a function of the amplitude of the signal, c.. ∼ As where
As is the average amplitude of the signal after filtering (actually, we used the average amplitude of the
sum of the envelopes), and the window width, 14, is determined by tests. We used the values c = .,
c = ., c = ., which correspond to the average signal A = ., determined as explained. For
a normalized amplitude A, A = , the constants are about c = ., c = ., c = .. The detection
procedure can be further improved by reducing the false positives by imposing that the skewness of the
impulse is larger than +.; typical values for the skewness are larger than ., showing that the rise of
the impulse is significantly faster than the decreasing part.

5 Research support in gnathophonics and gnathosonics

In previous researches, we identified several ways the pathology of the stomato-gnathic system in-
fluences the speech:

i) The lack of the frontal dentition, namely of the upper teeth, may dramatically change the spectrum
of the fricative consonants.

ii) The lack of the upper teeth may significantly modify the spectrum of the dento-alveolar sounds
t, d, n, and l. (Notice that these sounds are rather alveolar in English, while in some other languages,
like Spanish and Romanian, they may be dental. Therefore, the influence of the dentition on phonation
is language-dependent.)

iii) The limited mobility and the pain in the temporo-mandibular joint (TMJ) impedes the production
of fast transient vowels, especially in the diphthongs where the second vowel is pronounced with a largely
opened mouth, like oa, ea, ua.

iv) The uncertainty in uttering due to a forcing in the TMJ, or to a poor neuro-muscular control
may produce a tremor of the voice (fast amplitude changes, errors in the attacks, i.e. error in transitory
regimes etc.).

v) The neurological pathology of the buccal cavity may impede on the accuracy of the pronunciation,
including deficient starting of the words.

vi) Defective mobile prostheses may produce extra sounds, especially when the mouth is fast opened
for pronunciation, moreover, it may produce clicks before the utterances.


306 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

vii) Prostheses of the upper teeth that do not provide for a physiological "V " shaped space between
the teeth impede on the pronunciation of the fricatives, for example f.

viii) Especially the fricative consonants and the labial vowels are affected by the state of the dental
furniture.

The s consonant uniformly occupies a large spectrum for a healthy dental apparatus, while it has a
multi-band spectrum when the upper front teeth are missing or have deficiencies. The pronunciation of
s and v may become close to that of f. For subjects with mobile prostheses, we noticed an uncertainty in
the starting of the uttering.

The difference ratio in amplitude spectra is a parameter defined as:

∆ S =
∑

k

|S( fk) − S( fk)|
S( fk) + S( fk)

where f [k] is the k-th frequency in the FFT (Fast Fourier Transform) power spectrum of the two sounds
and S, are the average power spectra of the two sounds. For two similar sounds uttered by the same
speaker, a difference larger than 50% means that the sounds are clearly distinguishable, while a difference
smaller than 10% means that the sounds are indistinguishable. For example, if the average spectra for
two sustained utterances of f and v have a ∆ S index of 40%, they will be distinguished by a listener,
while if ∆ S = %, they will be confused. We proposed the sustained consonant differential analysis
as a method to further assess the impairment of speech production due to dentition. For this test, two
similarly produced sounds are generated in a sustained mode and their spectra contrasted. For example,
the sounds f and v are both at least partly fricative (v can be a semi-vowel, only partly fricative) that
may be poorly produced due to imperfect dentition or neurological control. We conclude this section by
stressing that gnathophonic testing should become a standard test for the dentist in the near future. The
knowledge in the field is only emerging today, and fully developed, commercial tools are yet lacking, but
the importance of the domain can not be refuted [7], [8], [11], [12]. The proposed tests are non-invasive,
objective, and purely instrumental, hence their importance in the evaluation of the health state of the
buccal system. These methods can easily be extended to remote, web-based diagnosis.

In figure 1, we exemplify a gnathophonic (a) and gnathosonic (b) recording sounds (for the speaker
19743m). In figure 1(a), we exemplified recordings of the Romanian words "vata", "fata", "var", in-
tended to obviate similarities and differences in the pronunciations (Fourier spectra) of the consonants f
and v, in the same context (beginning of the word, same _CV C structure, with the same vowels and con-
sonants, and _ denotes the beginning of the word). This is one of the specific choices of words proposed
by the second author to determine when dentition defects produce confusion in the f − v uttered sounds.
By analyzing such recordings available at SRoL, students can learn how to differentiate the normal and
pathological states.

Figure 1: Gnathophonic (a) and gnathosonic (b) recording with details, tool GoldWaveT M


SRoL - Web-based Resources for Languages and Language Technology e-Learning 307

6 Applications in teaching the voice signal technology classes

Signal technology classes are taught around the world, especially for the master degrees in computer
science and electrical engineering, moreover in some departments of linguistics and in a few medical
centers. Some universities and education institutions developed their own databases and tools for spech
processing. For examples, the Center for Spoken Language Understanding (CSLU) offers available lan-
guage database from speech area and hearing science. These resources are important for analyzing the
speech, for diagnosing and treating speech and language problems, for training students and so on. The
tools and the corpora are distributed to over 2000 sites in 65 countries [2]. In education these tools help
students learn about speech, learn a new language, learn through interactive media systems, or to become
accustomed to hearing the normal and abnormal voice signal.

The second author currently uses the SRoL corpus in teaching and laboratory activities in the class
"Speech Technology" given for the master degree in "Computational Linguistics" at the Faculty of Com-
puter Science, "Al.I. Cuza" University of Iaşi. Details on the use in Voice Technology classes of some
topics from SRoL are described in [4]. At the international EUROLAN 2007 summer school, the second
author used the SRoL site to present "Traces of emotion, intentions and meaning in spoken Roma-
nian" (http://eurolan.info.uaic.ro/html/profs/HNTeodorescu.html). The second author taught the specific
methodology aspects, results obtained on the characterization of emotions in speech, possibilities of
recognition of emotions and intentions in speech, and the relationship between specific meanings and
the prosody in specific constructions in the Romanian language. The lesson exemplified applications
of analysis of the speech emotional prosody to social, psycho-social, educational, and psycho-medical
topics.

7 Software tools: pitch (F) extractor

The extraction of the fundamental frequency F values combines four different methods: i) auto-
correlation method (analysis in time domain) ii) the Average Magnitude Difference Function method,
AMDF (analysis in time domain) iii) the Harmonic Product Spectrum method, HPS (based on spectral
analysis) iv) the cepstral method (an analysis in que-frequency domain) - also applied for the higher
formants searching.

The autocorrelation method is a classical method for pitch detection in the time domain. The method
is based on the quasi-periodicity property of the voice signal and generates a local maximum that cor-
responds to the signal period. In the case of AMDF method, the local minimal values are detected and
these values provide the necessary information to compute the fundamental period T,

Ck =


N
·

N∑
n=

xn ·xn+k, k = ,W

Dk =


N
·

N∑
n=

(xn − xn+k), k ∈ ,W

Here, Ck is the self-correlation, Dk is the difference function coefficient for a delay k, xn is the n-th
sample of the signal, N is the number of correlation coeficients, W is the width of the analysis window.

The HPS method (Harmonic Product Spectrum) is based on the propriety that the spectrum of a peri-
odic signal with fundamental frequency F has maximal spectral values at the multiples of this frequency
2F, 3F, 4F, ... (the harmonics of fundamental). When the signals are rescaled with the factors 1/2,
1/3, 1/4,... after the decimation operation, by the multiplication of the resulted signals (which all have
a spectral maximum in fundamental frequency F), the other maximal value from spectrum are strongly
attenuated.


308 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

H kn = H

k·n (decimation) or H

k
n =



k

k−∑

i=

Hk·n+i

The cepstral method relies on the separation of the spectrum of the sound generator, Hg (which pro-
vide the information regarding the fundamental frequency), from the spectrum of the vocal signal filter,
H f (which describe the resonating cavities model). In the cepstral formula, the multiplication operation
between the excitatory signal and the transfer function spectrums is transformed using logarithms into
an addition operation:

H(ω) = F F T (s) = Hg(ω)·H f (ω)
cepstrum = IF F T (log|F F T (s)|) = IF F T (log|Hg(ω) ·H f (ω)|)

cepstrum = IF F T (log|Hg(ω)|) + IF F T (log|H f (ω)|)

where FFT is the Fast Fourier Transform, and IFFT is the inverse FFT.
The results of the F extraction methods are compared in a decisional block, and a selection algorithm

is used if there are significant differences. Another algorithm compares a current value with a number of
neighboring values in order to select the nearest one, moreover compares the current values with mean
values of F.

The error correction of the F extractors is performed through three methods:
- comparing the "neighbors": use the results provided by the same F extractor and if a difference

between two consecutive values greater than a specified threshold value (usually 10-20%) is detected,
the corresponding samples are considered errors;

- if the difference in absolute value between the current value of F and the average of fundamental
frequency is greater than twice the standard deviation, then we consider those values as erroneous;

- if the current value of F is below 60% or over 150% of the average values of F, then we consider
that the corresponding value is incorrect.

The threshold values were empirically determined and the final correction is accomplished by apply-
ing all the three correction methods described. The decision block receives the F values provided by
the detection methods (AMDF differences method, autocorrelation method, HPS method, and cepstral
method). To achieve the best possible pitch detection, the output values are weighted according on the
performance of each F extractor. We assign smaller weights to the methods with a higher probability of
providing incorrect outputs. The false detections of the fundamental frequency often consist in selecting
the first subarmonic, or the first harmonic of F. When these "false" detection are not repaired by the
correction module, we have two options:

- comparing the outputs of different F detection methods for the same window of analysis;
- comparing the outputs with a number of previous final results provided by the decision block.

8 Discussion

Our team has a long standing experience with using novel technologies in teaching, lasting for three
decades [3], [7], [15], [20]. We applied that experience to the SRoL e-teaching and e-learning resource.

The SRoL resource is a vast annotated corpus of speech files complemented by tutorials, papers
and additional files, moreover with tools for speech processing. If used by an experimented student
or teacher, it may become a powerful tool for instruction and learning the Romanian language pronun-
ciation, speech technology, and voice pathology and re-education. The SRoL sound voice resource is
useful in many domains, including phonology, applied computer science, and medicine. Students and
researchers may use this freely accessible site for learning the pronunciation of Romanian language, for


SRoL - Web-based Resources for Languages and Language Technology e-Learning 309

making comparative study between Romanian and other languages, for development of synthetic voice
systems, for other linguistic, phonetic, socio-linguistic or medical applications.

This database is structured corresponding to precise criteria, documented and annotated according
to a well defined methodology. The site has more then 1500 recordings of syllable, word, and sentence
with various tonalities and pronounced with various emotional states. The database contains recordings
of professional and normal voices, from the North-East region of Romania, without dialectal accent.

The SRoL resources have been recognized by several bodies, beyond the scientific publications that
included our papers on SRoL. CLARIN European Network of Language Resources accepted SRoL as
a member; ORDA (the Romanian Office for Authorship Rights) registered the original recordings, and
the SRoL received a gold medal and media attention at the INVENTICA 2009 fair for inventions and
creativity. Also, the website of Embassy of France in Romania briefly described in its Bulletin the
SRoL site and its use in education (http://www.bulletins-electroniques.com/actualites/58811.htm). The
Technical University "Gheorghe Asachi" of Iaşi intends to use SRoL in helping foreign students enrolled
at this university to learn the correct Romanian pronunciation.

We hope the SRoL resources will be used in all the universities in Romania by foreign students
who learn the Romanian language, moreover in other academic media and as an online tool by foreign
students and teachers. We welcome any request for help and educational advice from all those who wish
to use SRoL and the language-related web resources in virtual e-teaching and for e-learning.

9 Conclusions and future work

The SRoL speech annotated corpus constitutes the first extensive educational and research web
speech corpus for the Romanian language. We believe it also constitutes a speech repository unique
in many respects, including the first international language and sound resources for gnathophony and
gnathosony, the first resources for comparative study of appositions and double subject constructions,
moreover specific features as the rigorous methodology of documenting the records we used.

The objectives for the next two years are to increase the speech data base by about 1000 annotated
recordings and to significantly extend the medical-oriented section of the resources. Also, we intend to
add more tools for speech processing, including statistical tools on the GRID.

Acknowledgements

The authors have been partly supported by the Romanian Academy, moreover the second author has
been partly supported by a grant of the Ministry of Education and Science of Romania, during 2005-
2006.

NOTICES
1. A partial version [6] of this paper was presented in the ICVL 2009 conference and received the

INTEL Special Award for Education (2009).
2. The authors contributions: the gnathophonic and gnathosonic research was been performed by

the second author who also wrote the corresponding section of the paper (Sections 2, 4, 5, 6, and 8, and
contributed to writing the other sections); the first author helped with further recordings and with their
inclusion on the web page.


310 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

Bibliography

[1] K. Cameron, Computer Assisted Language Learning (CALL) Media, Design, and Applications,
Taylor & Francis, ISBN: 902651543X, http://www.google.com/books?id=dO_ sNQlWhrsC &
printsec=frontcover & dq=related, ISBN0940753030 & hl=ro & source=gbs_ similarbooks_s &
cad=1.

[2] R.A. Cole, Tools for Research and Education in Speech Science, Proc. Int. Conf. for Physics Stu-
dents, 1999, www.cslu.ogi.edu/toolkit/pubs/pdf/cole_ICPS_99.pdf.

[3] F. De Coulon, E. Forte, D. Mlynek, H.N. Teodorescu, St. Suceveanu, Subject State Analysis by
Computer in CAE, Proc. Int. Conf. on Intelligent Technologies in Human-Related Sciences, Leon,
Spain. Vol .2, pp. 243-250, 1996.

[4] D. Cristea, H.N. Teodorescu, D.I. Tufis, Student Projects in Language and Speech Processing, 4th
Conf. on Language Resources and Evaluation, Lisbon, PortugalWorkshop on Language Resources:
Integration and Development in E-learning and in Teaching Computational Linguistics, pp. 17-22,
2004, http://nats-www.informatik.uni-hamburg.de/view/Main/AcceptedPapers.

[5] M. Feraru, H.N. Teodorescu, The Emotional Speech Section of the Romanian Spoken Language
Archive, Conf. on Intelligent Systems and Technologies, Proc. 5th European, Iaşi, Romania, ISBN
978973730497, 2008.

[6] M.S. Feraru, H.N. Teodorescu, SRoL - Web-based Resources and Tools used for Language and Lan-
guage Technology e-Learning, Virtual Learning - Virtual Reality, Proc. 4th International Conference
on Virtual Learning, ICVL 2009, Bucharest University Press, ISSN: 1844-8933, Section Models &
Methodologies, pp. 119-127, 2009.

[7] W. Hedzelek, T. Hornowski, Gnathosonic Study of Occlusion in Patients Wearing Complete Den-
tures, Eur J Prosthodont Restor Dent., Vol. 5, No. 3, pp. 119-23, 1997.

[8] W. Hedzelek, T. Hornowski, The Analysis of Frequency of Occlusal Sounds in Patients with Peri-
odontal Diseases and Gnathic Dysfunction, J Oral Rehabil., Vol. 25, No. 2, pp. 139-45, 1998.

[9] I. Lundberg, The Computer as a Tool of Remediation in the Education of Students with Reading
Disabilities: A Theory-Based Approach, Learning Disability Quarterly, Technology for Persons with
Learning Disabilities, Vol. 18, No. 2, pp. 89-99, 1995 http://www.jstor.org/pss/1511197.

[10] A. Olofsson, Synthetic Speech and Computer Aided Reading for Reading Disabled Chil-
dren, Journal: Reading and Writing, Vol. 4, No. 2, pp. 165-178, ISSN: 09224777, 1992
(http://www.springerlink.com/content/j521536n135x2864/).

[11] J.F. Prinz, Computer Aided Gnathosonic Analysis: Distinguishing Between Single and Multiple
Tooth Impact Sounds, J Oral Rehabil., Vol. 27, No. 8, pp. 682-689, 2000.

[12] J.F.Prinz, K.W. Ng, Characterization of Sounds Emanating from the Human Temporomandibular
Joints, Arch Oral Biol. Vol. 41, No. 7, pp. 631-639, 1996.

[13] C. Solomon, Computer Environments for Children - A Reflection of Theories of Learning and
Education, 1988 www.google.com/books?id=EonPZ9A81kkC&printsec= frontcover & hl=ro &
source=gbs_v2_summary_r& cad=0.

[14] H.N. Teodorescu, Occlusal Sound Analysis Revisted, Proc. 3rd Int. Conf. MEDSIP 2006, Advances
in Medical, Signal and Information Processing, ISBN: 0863416586, Glasgow, UK, 17-19 July 2006.


SRoL - Web-based Resources for Languages and Language Technology e-Learning 311

[15] H.N. Teodorescu, Computer Semiotics: Understanding Meanings and Parallel Languages (Refer-
eed invited paper) T. Yamakawa, G. Matsumoto (Eds.), Proc. Int. Conf. IIZUKA’98, World Scientific
Publ., pp. 279-283, 1998.

[16] H.N. Teodorescu, M. Feraru, Classification in Gnathophonics - Preliminary Results, The Second
Symposium on Electrical and Electronics Engineering, Galati University Press, pp. 525-530, ISBN
1842-8046, 2008.

[17] H.N. Teodorescu, M. Feraru, Micro-corpus de Sunete Gnatosonice si Gnatofonice, Pistol, Cristea,
Tufis (Eds.) Resurse lingvistice si instrumente pentru prelucrarea limbii romane, Ed. Universitatii
"Al.I. Cuza" Iaşi, ISBN 978-973-703-297-3, pp. 21-30, 2007.

[18] H.N. Teodorescu, M. Feraru, D. Trandabat, Studies on the Prosody of the Romanian Language: The
Emotional Prosody and the Prosody of Double-Subject Sentences, C. Burileanu, H-N. Teodorescu,
(Eds.) Advances in Spoken Language Technology, The Publishing House of the Romanian Academy,
Bucharest, Romania, ISBN 978-973-27-1516-1, pp. 171-182, 2007b.

[19] H.N. Teodorescu, M. Zbancioc, E. Mihailescu, Speech Technology and Bio-Medical Engineering
Teaching Based on the Web-A new Tool and Case Study, Int. Conf. on Interactive Computed Aided
Learning, Villach, Austria, 2006.

[20] H.N. Teodorescu, A. Kandel, B. Paschall, Teaching Modern Chapters in Automata Theory and For-
mal Languages, (abstract in booklet of the Symposium.) Symp. 21 Century Teaching Technologies,
Univ. South Florida, Tampa, USA 2000.

[21] H.N. Teodorescu, R. Ganea, M. Feraru, A. Burlui, Assement of Voice Quality Based on Nonlin-
ear Dynamic Analysis, Proc. of The 15th Int. Conf. on Control Syst. & Computer Sci., Bucharest,
Romania, pp. 536-542, ISBN 9738449898, 2005.

[22] H.N. Teodorescu, D. Tandabat, M. Feraru, M. Zbancioc, R. Luca, A corpus of the Sounds in the
Romanian Spoken Language for Language-Related Education In: C.P. Pascual (Ed.), Revisiting Lan-
guage Learning Resources, Cambridge Scholars Pub. (CSP),UK, Ch. 6, ISBN 1847181562, pp. 73-
89, 2007.

[23] M. Warschaue, Computer-Mediated Collaborative Learning: Theory and Practice, The Mod-
ern Language Journal, Vol. 81, No. 4, Special Issue - Interaction, Collaboration, and Coop-
eration - Learning Languages and Preparing Language Teachers (Winter, 1997), pp. 470-481,
http://www.jstor.org/pss/328890.

[24] B.W. Wise, R.K. Olson, Computer Speech and the Remediation of Reading and Spelling Problems,
J. Special Education Technology, Vol. 12, No. 3, pp. 207-220, 1994.

[25] M. Zbancioc, Tools for the Archive of the Romanian Language Sounds Project, 4th European Conf.
on Intelligent Systems and Technologies, Iaşi, Romania, ISBN 973-730-265-6, 2006.

[26] Kenko Ota, Emmanuel Dulfos, Philippe Vanheeghe, Masuzo Yanagida, Bayesian Inference for
Speech Density Estimation by the Dirichlet Process Mixture, , Studies in Informatics and Control
Journal, Bucharest, Romania, ISSN 1220-1776, Vol. 16, No. 3, 2007.

[27] Florin Grigoras, Horia-Nicolai Teodorescu, Vasile Apopei, Nonlinear Analysis and Synthesis Of
Speech, Studies in Informatics and Control Journal, Bucharest, Romania, ISSN 1220-1776, Vol. 7,
No. 1, 1998.


312 S.M. Feraru, H.N. Teodorescu, M.D. Zbancioc

[28] Tom Page, Gisli Thorsteinsson, Andrei Niculescu, Management of Knowledge in a Problem Based
Learning Environment, Studies in Informatics and Control Journal - With Emphasis on Useful Ap-
plications of Advanced Technology, Bucharest, Romania, Vol. 18, No. 1, 2009.

[29] Antonios Andreatos, International Journal of Computers,Virtual Communities and their Importance
for Informal Learning Communications and Control, International Journal of Computers, Commu-
nications and Control - IJCCC, Romania, ISSN 1841-9836, Vol. II, No. 1,pp.39-47, 2007.

Annex 1. Development stages of SRoL
The presently named SRoL corpus started around 1995 as a small, research and educational database

including examples of recordings with vowels and a few typical words in Romanian, moreover a few
recordings of pathological voices. It was correlated to the class of Image and Speech Processing given
by the second author in the "Gheorghe Asachi" Technical University of Iaşi, Romania. Former students
(who are now professors in several Romanian universities) contributed to that incipient voice database
(credit for recordings and other help for that database deserve the now professors Radu Ciorap and Irinel
Pletea, among others). The database was further developed for educational purposes in relation to the
the class of Speech Technology given by the second author in the Faculty of Computer Science of "Al.
I. Cuza" University in Iaşi.

The third stage of development started in 2004, when the second author decided to significantly en-
large and move the speech database on the web, partly with the help of two grants that helped forming
a team in the Institute for Computer Science of the Romanian Academy and in the "Gheorghe Asachi"
Technical University of Iaşi. The first author joined the team, at that time as a fresh Ph.D. Student. Since
the second author initiated five years ago the Project "The Sounds of Romanian Language" (SRoL), the
team increased to 8 researches. The SRoL Web-based spoken language repository and tool collection
as it is today was developed during several years by the collaboration of groups from the Institute for
Computer Science of the Romanian Academy, CERFS Excellence Center in "Gheorghe Asachi" Techni-
cal University of Iaşi and by staff of the discipline of Language Technology, Computer Science Faculty,
"Al.I. Cuza" University.

Annex 2. Typical shapes of gnathosonic signals
The sketches below stand for the envelopes of typical gnathosonic signals, corresponding to normal,

merged double contact, and isolated double contact signals. The sound is easily categorized by automatic
means.

Figure 2: Typical envelopes of occlusal sounds (from [14])


SRoL - Web-based Resources for Languages and Language Technology e-Learning 313

Silvia Monica Feraru (November 21, 1977) received a MSc. degree in BioMedical Engineering
(2004) and PhD in Electronics (2009) from "Gheorghe Asachi" Technical University of Iaşi. Now
she is research assistant at the Institute for Computer Science of the Romanian Academy, Iaşi
branch. She received the Special Awards Intel Education 2009 at The International Conference
on Virtual Learning, ICVL 2009. Her current research interests include vocal signal processing,
cognitive processes, and various aspects of artificial intelligence. She has (co-)authored more than
21 conference, journal or bookchapter papers.

Horia-Nicolai Teodorescu (November 14, 1951). MS in Electronics, "POLITEHNICA" Univer-
sity, Bucharest, 1975, Ph.D. in Applied Physics - Electronics, under the supervision of the late
Prof. Emil Luca, at the Technical University of Iaşi, 1981. Currently, he is a professor at the
"Gheorghe Asachi" Technical University of Iaşi and the director of the Institute for Computer
Science of the Romanian Academy, Iaşi. He is a correspondent member of the Romanian
Academy. Has authored or co-authored about 300 journal and conference papers, holds 24
national and international patents and has received numerous national and international awards
and prizes. He is a Senior Member, IEEE.

Marius-Dan Zbancioc (August 15, 1975) teaching assistant at the "Gheorghe Asachi" Technical
University of Iaşi and researcher at the Institute of Computer Science of the Romanian Academy,
Iaşi branch. His current research interests include signal processing, expert systems, fuzzy
systems and several aspects of artificial intelligence. He has (co-)authored 3 books and 39 papers.