Descripción: Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante

Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante

The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX li...

Descripción completa

Autor Principal:	Alarcón Pedroza, Lebis Armando
Otros Autores:	Gutiérrez Erazo, José Luis
Formato:	info:eu-repo/semantics/bachelorThesis
Idioma:	spa
Publicado:	Universidad de San Buenaventura - Cali 2016
Materias:	Señales digitales Procesamiento del habla Pattern recognition Redes neurales (Computadores) Aprendizaje automático (Inteligencia artíficial) Habla Audio digital
Acceso en línea:	http://hdl.handle.net/10819/3106
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Sumario:	The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX library, and the implementation of classifiers and interface was performed using MATLAB. Five classifiers were compared: Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Logistic Regression, Artificial neural networks (ANNs) and support vector machines (SVMs), where the best accuracy results were obtained with ANNs: 79.19% and SVMs: 81.21%. Tests were performed to measure the reduction percentage with three new audio. These tests showed an average reduction of 27.34% using ANNs and 24.50% using SVMs. In addition comprehension tests were performed using a reduced audio created by the tool. A 16.67% of information loss was found. It was concluded that the prosodic and spectral parameters provide sufficient data for a classification of relative importance. It was also found that mixing the prosodic and spectral parameters in the same data set provides better accuracy.

Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante

Ejemplares Similares