Experimental analysis of fixed multipulse excitation patterns in PC synthesis

Mumolo, Enzo; Pierucci, P.; Prina Ricotti, L.; Ragazzini, S.

The text-to-speech systems based on the concatenation of acoustic units codified with the linear prediction method, require that the sysìnthesis filter is excited by an artificial source signal. In the case of voiced speech sounds, the periodic source is represented by pulses that are spaced by the required period. The signal reproduced using this source is characterized by a fuzzy and tense quality. A very effective method used to reproduce with a high quality the speech signal using linear prediction is the multi-pulse method. It has been experimentally shown that the use of fixed multipulse patterns chosen from the analysis of the voice of a speaker determines a relevant improvement of the synthetic signal, also if these patterns are used to reproduce other realizations of the speaker, or even of other speakers. This result is not easily explainable, because the data obtained in the perception tests performed on signal synthetized with sources of particular spectral characteristics of amplitude and phase show that the amplitude of the harmonics of the sources is critical parameter for the correct reproduction. It is moreover important that these components reproduce a natural distribution of the phase that is variable in function of frequency and time. The use of fixed multi-pulse sequences does not however satisfy these conditions. In this paper an hypothesis about the reasons that determine the improvement obtained using multipulse patterns is verified and the characteristics of the implementation of a real-time text-to-speech system using a multipulse source are briefly considered.

Experimental analysis of fixed multipulse excitation patterns in PC synthesis / Mumolo, E., P., P., L., P.R., S., R.. - STAMPA. - (1988), pp. 585-592. (7th FASE Symposim, Speech 88 Edinburgh 22-26 August 198of 8).