Recherche

Recherche simple

Recherche avancée

Panier électronique

Votre panier ne contient aucune notice

Connexion à la base

Identification

(Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

Entrepôt OAI-PMH

Soumettre une requête

	Consulter la notice détaillée
	Version complète en ligne
	Version complète en ligne accessible uniquement depuis l'Ircam
	Ajouter la notice au panier
	Retirer la notice du panier

English version

(full translation not yet available)

Liste complète des articles

Consultation des notices

Vue détaillée

Vue Refer

Vue Labintel

Vue BibTeX

%0 Journal Article
%A Obin, Nicolas
%A Lanchantin, Pierre
%T Symbolic Modeling of Prosody: From Linguistics to Statistics
%D 2015
%B IEEE/ACM Transactions on Audio, Speech and Language Processing
%V 3
%N 23
%P 588-599
%F Obin15a
%K text-to-speech synthesis
%K speech prosody
%K speaking style
%K prosodic events
%K surface/deep syntactic parsing
%K hierarchical HMMs
%K segmental HMMs
%K Dempster-Shafer fusion
%X The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in textto- speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a text. Then, a novel statistical representation based on a hierarchical HMM (HHMM) is used to model the prosodic structure of a text: the root layer represents the text, each intermediate layer a sequence of intermediate phrases, the pre-terminal layer the sequence of accents, and the terminal layer the sequence of linguistic contexts. For each intermediate layer, a segmental HMM and information fusion are used to fuse the linguistic and metric constraints for the segmentation of a text into phrases. A set of experiments conducted on multi-speaker databases with various speaking styles reports that: the rich linguistic representation improves drastically the assignment of prosodic events, and the fusion of linguistic and metric constraints significantly improves over standard methods for the segmentation of a text into phrases. These constitute substantial advances that can be further used to model the speech prosody of a speaker, a speaking style, and emotions for text-to-speech synthesis.
%1 1
%2 1
%U http://architexte.ircam.fr/textes/Obin15a/