ESTONIAN ACADEMY
PUBLISHERS
eesti teaduste
akadeemia kirjastus
PUBLISHED
SINCE 1965
 
Linguistica Uralica cover
Linguistica Uralica
ISSN 1736-7506 (Electronic)
ISSN 0868-4731 (Print)
Analysis and Modelling of Temporal Characteristics of Speech for Estonian Text-to-Speech Synthesis; pp. 91–97
PDF | 10.3176/lu.2005.2.02

Authors
Meelis Mihkla, Jüri Kuusik
Abstract

A text-to-speech system must be capable of generating sounds and pauses with such durations that do not noticeably differ from natural speech. Currently, the prosodic modelling of Estonian text-to-speech synthesis is largely based on generalized measurements of speech units in isolated words and sentences, and as a result the synthesized speech is often monotonous and has poor fluency. In this work the first attempts are made to improve the naturalness of the output speech of the speech synthesiser with the help of statistical duration models of fluent speech. The source material consisted of (a) prose read out by a professional actor, and (b) news broadcasts read by announcers. On the basis of this material variability of the duration of pauses and boundary lengthenings was investigated. It turns out that in the case of a read text at normal speech rate the classification of speech pauses is perfectly possible and can be applied in speech synthesis. An attempt was also made to establish whether and to what extent the syntactic parsing of a text is related to the prosodic parsing of speech. A generalized regression analysis revealed what features are essential in predicting sound durations in speech and a statistically optimal model was developed. Curiously the quantity degree of a foot, despite being the cornerstone of Estonian word prosody, was not a significant feature for prediciting the duration of a sound on the basis of this material. The results of the modelling were then compared with the expert opinions of some Estonian phoneticians.

References

Campell,  N.  2000,  Timing in Speech. A Multilevel Process. - Prosody. ­Theory and Experiment, Dordrecht-Boston-London, 281-334.
https://doi.org/10.1007/978-94-015-9413-4_11

Eek,  A.,  Meister,  E.  2003,  Foneetilisi katseid ja arutlusi kvantiteedi alalt (I). Häälikukestusi muutvad kontekstid ja välde. - KK, 815-837.

Krull,  D.  1997,  Prepausal Lengthening in Estonian: Evidence from Conversational Speech. - Estonian Prosody: Papers from a Symposium. Proceedings of the International Symposium on Estonian Prosody, Tallinn, ­Estonia, October 29-30, 1996, Tallinn, 136-148.

Lehiste,  I.  1981,  Sentence and Paragraph Boundaries in Estonian. - CIFU V, Pars VI, 164-169.

Lehiste,  I.,  Fox,  R.  1993,  Influence of Duration and Amplitude on the Percep­tion of Prominence by Swedish Listeners. - Speech Communication 13, 149-154.
https://doi.org/10.1016/0167-6393(93)90066-T

Mihkla,  M.,  Meister,  E., Eek  A.  2000,  Eesti keele tekst-kõne süntees: grafeem-foneem teisendus ja prosoodia modelleerimine. - Arvutus­ling­vistikalt inimesele, Tartu (Tartu Ülikooli üldkeeleteaduse õppetooli toimetised 1), 309-320.

Stout,  R.  2003,  Deemoni surm. CD-versioon. Loeb Andres Ots, Tallinn.

Back to Issue