In human heads there is a strong structural linkage between the vocal tract and facial behavior during speech. For a robotic talking head to have human-like behavior, this linkage should be emulated. One way to do that is to estimate the articulatory features from a given utterance and to use them to control a talking head. In this paper, we describe an algorithm to estimate the articulatory features from a spoken sentence using a novel computational model of human vocalization. Our model uses a set of fuzzy rules and genetic optimization. That is, the places of articulation are considered as fuzzy sets whose degrees of membership are the values of the articulatory features. The fuzzy rules represent the relationships between places of articulation and speech acoustic parameters, and the genetic algorithm estimates the degrees of membership of the places of articulation according to an optimization criteria and it performs imitation learning. We verify our model by performing audio-visual subjective tests of animated talking heads showing that the algorithm is able to produce correct results. In particular, subjective listening tests of artificially generated sentences from the articulatory description resulted in an average phonetic accuracy slightly under 80%. Through the analysis of large amounts of natural speech, the algorithm can be used to learn the places of articulation of all phonemes of a given speaker. The estimated places of articulation are then used to control talking heads in humanoid robotics.

TOWARDS ARTICULATORY CONTROL OF TALKING HEADS IN HUMANOID ROBOTICS USING A GENETIC-FUZZY IMITATION LEARNING ALGORITHM

MUMOLO, ENZO;Nolich M.
2007-01-01

Abstract

In human heads there is a strong structural linkage between the vocal tract and facial behavior during speech. For a robotic talking head to have human-like behavior, this linkage should be emulated. One way to do that is to estimate the articulatory features from a given utterance and to use them to control a talking head. In this paper, we describe an algorithm to estimate the articulatory features from a spoken sentence using a novel computational model of human vocalization. Our model uses a set of fuzzy rules and genetic optimization. That is, the places of articulation are considered as fuzzy sets whose degrees of membership are the values of the articulatory features. The fuzzy rules represent the relationships between places of articulation and speech acoustic parameters, and the genetic algorithm estimates the degrees of membership of the places of articulation according to an optimization criteria and it performs imitation learning. We verify our model by performing audio-visual subjective tests of animated talking heads showing that the algorithm is able to produce correct results. In particular, subjective listening tests of artificially generated sentences from the articulatory description resulted in an average phonetic accuracy slightly under 80%. Through the analysis of large amounts of natural speech, the algorithm can be used to learn the places of articulation of all phonemes of a given speaker. The estimated places of articulation are then used to control talking heads in humanoid robotics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2500959
 Avviso

Registrazione in corso di verifica.
La registrazione di questo prodotto non è ancora stata validata in ArTS.

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 2
social impact