Journal of Computer Sciences and Applications
ISSN (Print): 2328-7268 ISSN (Online): 2328-725X Website: Editor-in-chief: Minhua Ma, Patricia Goncalves
Open Access
Journal Browser
Journal of Computer Sciences and Applications. 2013, 1(4), 61-74
DOI: 10.12691/jcsa-1-4-2
Open AccessArticle

Prosodic Boundary Prediction for Greek Speech Synthesis

Panagiotis Zervas1,

1Department of Music Technology & Acoustics, Technological Educational Institute of Crete, Rethymnon Branch, Greece

Pub. Date: May 19, 2013

Cite this paper:
Panagiotis Zervas. Prosodic Boundary Prediction for Greek Speech Synthesis. Journal of Computer Sciences and Applications. 2013; 1(4):61-74. doi: 10.12691/jcsa-1-4-2


In this article, we evaluate features and algorithms for the task of prosodic boundary prediction for Greek. For this purpose a prosodic corpus composed of generic domain text was constructed. Feature contribution was evaluated and ranked with the application of information gain ranking and correlation-based feature selection filtering methods. Resulted datasets were applied to C4.5 decision tree, one-neighbour instance based learner and Bayesian learning methods. Models performance exploitation led as to the construction of a practically optimal feature set whose prediction effectiveness was evaluated with two prosodic databases. In terms of total accuracy and F-measure, evaluation results established the decision tree effectiveness in learning rules for prosodic boundary prediction.

prosody phrase breaks ToBI C4.5 IB1 bayesian learning

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit


Figure of 9


[1]  Dutoit, T., An Introduction to Text-To-Speech Synthesis, Dordrecht, Kluwer Academic Publishers, 1997.
[2]  Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirsberg, J., “ToBI: A standard for labelling Eng¬lish prosody”, Proceedings of the International Conference on Spoken Language Processing, Alberta, October 13-16, 1992, vol. 2, p. 867-870.
[3]  Arvaniti, A., and Baltazani, M., “Greek ToBI: A System For The Annotation Of Greek Speech Corpora”, Proceedings of Second International Conference on Language Resources and Evaluation, Athens, 31 May-2 June, vol. 2, 2002, p. 555-562.
[4]  Bolinger, D., Intonation and its Uses: Melody in Grammar and Discourse, Stanford, Stanford University Press, 1989.
[5]  Taylor, P., Black, A. W., “Assigning Phrase Breaks from Part-of-Speech Sequences”, Journal of Computer Speech and Language, vol. 12, 1998, p. 99-117.
[6]  Anderson, Stephen R. (1995). ``Rules and Constraints in Describing the Morphology of Phrases.'' Proceedings of the Chicago Linguistic Society, vol. 31 (Parasession volume on Clitics), pp. 15-31.
[7]  Prieto, P., Hirschberg, J., “Training Intonational Phrasing Rules Automati¬cally for English and Spanish text-to-speech”, Journal of Speech Communication, vol. 18, issue 3, 1996, p. 281-290.
[8]  Bachenko, J., and Fitzpatrick, E., “A computational grammar of discourse-neutral prosodic phrasing in English”, Journal of Computational Linguistics, vol. 16, Issue 3, 1990, p. 155-170.
[9]  Ostendorf, M., Veilleux, N., M., “A hierarchical stochastic model for automatic prediction of prosodic boundary location”, Journal of Computational Linguistics, vol. 20, issue 1, 1989, p. 26-53.
[10]  Riley, M., “Tree-based modelling of segmental duration”, in Talking Machines: Theories, Models, and Designs, G. Bailly and C. Benoit, Eds. Elsevier Science Publishers, 1992, pp. 265-273.
[11]  Muller, A., F., Zimmermann, H., G., and Neuneier, R.,: 1996, “Robust Generation of Symbolic Prosody by a Neural Classifier Based on Autoassociators”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, May 7-10, 1996, p. 1285-1288.
[12]  Fordyce, C., S., Osterdorf, M., “Prosody Prediction for Speech Synthe¬sis Using Transformational Rule-Based Learning”, Proceedings of International Conference on Spoken Language Processing, Sydney, 30 November-4 December 1998, p.682-685.
[13]  Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T., “Hidden Markov models based on multi-space probability distribution for pitch pattern modelling”. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Pheonix, March 15-19, 1999, p. 229-232.
[14]  Busser, Bertjan, Daelemans, Walter, Bosch, Antal van den (2001): "Predicting phrase breaks with memory-based learning", In SSW4-2001, paper 125.
[15]  Zervas, P., Maragoudakis, M., Fakotakis, N., and Kokkinakis, G., “Bayesian Induction of intonational phrase breaks”, Proceedings of Eurospeech, Geneva, September 1-4, 2003, p. 113-116.
[16]  Quinlan, J., R., C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Francisco, 1993.
[17]  Aha, D., Kibler, D., Albert M., “Instance-based learning algorithms”, Machine Learning, vol. 6, 1991, p. 37-66.
[18]  Domingos, P., Pazzani, M., “Beyond independence: Conditions for the optimality of the simple bayesian classifier”, Proceedings of the Thirteenth International Conference on Machine Learning, Bari, July 3-6, 1996, p. 105-112.
[19]  Cowell, R., Dawid, A., P., Lauritzen, S., L., Spiegelhalter, Probabilistic networks and expert systems, Springer, 1999.
[20]  Cormen, T., Leiserson, C., and Rivest, R., :1990, Introduction to Algorithms, MIT Press, Chap. 16, Greedy Algorithms
[21]  Price P. J., Ostendorf M., Shattuck, Hufnagel S., Fong., The use of prosody in syntactic disambiguation, J. Acoust. Soc. Am. Volume 90, Issue 6, pp. 2956-2970, 1991.
[22]  Hirschberg, J., (1993) “Pitch accent in context: predicting intonational prominence from text”, Artificial Intelligence 63, pp. 429-432.
[23]  Prevost, S., (1995) “A semantics of contrast and information structure for specifying intonation in spoken language generation”, PH.D. Thesis, University of Pennsylvania, 1995.
[24]  Sgarbas, K., Fakotakis, N., Kokkinakis, G., “A morphological description of MG using the two-level model”, Proceedings of the 19th Annual Workshop, Division of Linguistics, Thesaloniki, April 23-25, 1999, p.419-433.
[25]  Stamatatos, E., Fakotakis, N., Kokkinakis, G., “A Practical Chunker for Unrestricted Text”, Proceedings of the Second International Conference on Natural Language Processing, Patras, June 2-4, 2000, p. 139-150.
[26]  Fujisaki, H., Nagashima, S., “A model for the synthesis of pitch con¬tours of connected speech” Annual Report of the Engineering Research Institute, University of Tokyo, 1969, pp. 53-60,
[27]  Taylor, P., “The rise/fall/connection model of intonation”, Journal of Speech Communication vol. 15, 1995, pp. 169-186,
[28]  Veronis J., Di Cristo Ph., Courtois F., Chaumette C., “A stochastic model of intonation for text-to-speech synthesis”, Journal of Speech Communication vol. 26, 1998, pp. 233-244,
[29]  Taylor, P., “Analysis and synthesis of intonation using the Tilt model”, Journal of the Acoustical Society of America , vol. 107, issue 3, 2000, pp. 1697-1714,
[30]  Hart, J.,’t, Collier, R., “Integrating different levels of intonation analysis”, Journal of Phonetics, vol. 3, 1975, pp. 235-255.
[31]  Alessandro, C., d’, Mertens, P., “Automatic pitch contour stylization using a model of tonal perception”, Computer Speech and Language, vol. 9, issue 3, 1995, pp. 257-288.
[32]  Xydas G., Spiliotopoulos D., Kouroupetroglou G.: “Modelling Improved Prosody Generation from High-Level Linguistically Annotated Corpora”, IEICE Transactions of Information and Systems, 2005, p. 510-518.
[33]  Zervas, P., Xydas, G., Fakotakis, N., Kokkinakis, G., Kouroupetroglou, G., “Evaluation of Corpus Based Tone Prediction in Mismatched Environments”, Proceedings of 8th International Conference on Spoken Language Processing, Jeju, October 4-8, 2004, p. 761-764.
[34]  Witten, I., H., Frank, E., Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005
[35]  Breiman, L., Friedman, J., H., Olshen, R., A., Stone C., J., Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[36]  Quinlan, J., R., “Induction of decision trees”, Journal of Machine Learning, vol. 1, 1986, p. 81-106.
[37]  Palmer, D., Hearst, M., “Adaptive Multilingual Sentence Boundary Disambiguation”, Journal of Computational Linguistics, vol. 23, issue 2, 1997, p. 241-267.
[38]  Brill, E., “Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging”, Journal of Computational Linguistics, vol. 21, issue 4, 1995, p. 543-565.
[39]  Magerman, D., “Statistical Decision-Tree Models for Parsing”, Proceedings of Meeting of the Association for Computational Linguistics, MIT, Cambridge, Massachusetts, 26-30 June, USA, p. 276-283.
[40]  Black, A., Taylor, P., “Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input”, Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, 18-22 September, 1994, vol. 2, p. 715-718.
[41]  Lee, S., Oh, Y., “Tree-based modeling of prosodic phrasing and seg¬mental duration for Korean TTS systems”, Journal of Speech Communication, vol. 28, issue 4, 1999, p. 283-300.
[42]  Mitchell T., Machine Learning, Mc Graw-Hill, 1997.
[43]  Kohavi, R., John, G., H., “The Wrapper Approach”, in Feature Selection for Knowledge Discovery and Data Mining, H. Liu & H. Motoda (eds.), Kluwer Academic Publishers, 1998, p. 33-50.
[44]  Blum, A., Langley, P., “Selection of relevant features and examples in machine learning”, Artificial Intelligence, vol. 97, no. 1-2, 1997, p. 245-271.
[45]  Hall, M., Smith, L., A., “Practical feature subset selection for Machine Learning”, Proceedings of the Australian Computer Science Conference, February, 1996.
[46]  Weiss, S., M., Kulikowski, C., A., Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning and Expert Systems, Morgan Kaufmann, San Mateo, 1991.