Résumé |
This thesis addresses the issue of modelling speech prosody for speech synthesis and presents MeLos: a complete system for the analysis and modelling of speech prosody, “the music of speech”. The objective of this thesis is to model the strategy, alternatives, and speaking style of a speaker for natural, expressive, and varied speech synthesis. The present study presents original contributions with special attention paid to the combination of theoretical linguistic and statistical modelling to provide a complete speech prosody system. A unified discrete/continuous context-dependent HMM is presented to model the symbolic and the acoustic characteristics of speech prosody: 1) A rich description of the text characteristics based on a linguistic processing chain that includes surface and deep syntactic parsing is proposed to refine the modelling of the speech prosody in context. 2) Segmental HMMs and Dempster-Shafer fusion are used to balance linguistic and metric constrains in the production of a pause. 3) A trajectory model is proposed based on the stylization and the simultaneous modelling of short and long-term F0 variations over various temporal domains. The proposed system is used to model the strategies, alternatives and speaking style of a speaker, and is extended to model the speaking style of any arbitrary number of speakers using shared-context-dependent modelling and speaker normalization techniques. |