Recherche
Recherche simple
Recherche avancée
Panier électronique
Votre panier ne contient aucune notice
Connexion à la base
Identification
(Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)
Entrepôt OAI-PMH
Soumettre une requête
| Consulter la notice détaillée |
| Version complète en ligne |
| Version complète en ligne accessible uniquement depuis l'Ircam |
| Ajouter la notice au panier |
| Retirer la notice du panier |
English version
(full translation not yet available)
Liste complète des articles
|
Consultation des notices
%0 Journal Article
%A Obin, Nicolas
%A Lanchantin, Pierre
%T Symbolic Modeling of Prosody: From Linguistics to Statistics
%D 2015
%B IEEE/ACM Transactions on Audio, Speech and Language Processing
%V 3
%N 23
%P 588-599
%F Obin15a
%K text-to-speech synthesis
%K speech prosody
%K speaking style
%K prosodic events
%K surface/deep syntactic parsing
%K hierarchical HMMs
%K segmental HMMs
%K Dempster-Shafer fusion
%X The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in textto- speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a text. Then, a novel statistical representation based on a hierarchical HMM (HHMM) is used to model the prosodic structure of a text: the root layer represents the text, each intermediate layer a sequence of intermediate phrases, the pre-terminal layer the sequence of accents, and the terminal layer the sequence of linguistic contexts. For each intermediate layer, a segmental HMM and information fusion are used to fuse the linguistic and metric constraints for the segmentation of a text into phrases. A set of experiments conducted on multi-speaker databases with various speaking styles reports that: the rich linguistic representation improves drastically the assignment of prosodic events, and the fusion of linguistic and metric constraints significantly improves over standard methods for the segmentation of a text into phrases. These constitute substantial advances that can be further used to model the speech prosody of a speaker, a speaking style, and emotions for text-to-speech synthesis.
%1 1
%2 1
%U http://architexte.ircam.fr/textes/Obin15a/
|
|