期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:7
出版社:S.S. Mishra
摘要:Voice conversion involves transformation of speaker characteristics in a speech uttered by a speaker called source speaker to generate a speech having voice characteristics of a desired speaker called the target speaker. Voice conversion is used in many applications namely dubbing, to enhance the quality of the speech, text-to-speech synthesizers, online games, multimedia, music, cross-language speaker conversion, restoration of old audio tapes, cellular applications, low bit-rate speech coding, etc. There are various models used for voice conversion such as Hidden Markov Model (HMM), Artificial Neural Network (ANN), Dynamic Time Warping (DTW), and Vector Quantization (VQ). The quality and the identity conveyed by the transformed speech depend upon the accuracy of the transformation function derived from the given training data. The estimation of the transformation function requires properly alignedpassages spoken by source and target speakers. Exact alignmentof the corresponding speech units in the source and target passages is mandatory for the accurate estimation of the transformation function as the durations of speech units (i.e. phonemes or sub-phonemes) mayhave quite different distributions among speakers.Generally, DTW and VQ are used for this purpose. The objective of this paper is to compare the effectiveness of DTW and VQ based estimation of the transformation function. The analysis of the results shows that DTW provides about five percent more reduction in the transformed target distances of the speech. It means, DTW based technique is relatively better for the estimation of the transformation