Herramienta para reducir automáticamente la duración de un discurso en ingles adaptada a las características de voz de un hablante
The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX li...
Autor Principal: | Alarcón Pedroza, Lebis Armando |
---|---|
Otros Autores: | Gutiérrez Erazo, José Luis |
Formato: | info:eu-repo/semantics/bachelorThesis |
Idioma: | spa |
Publicado: |
Universidad de San Buenaventura - Cali
2016
|
Materias: | |
Acceso en línea: |
http://hdl.handle.net/10819/3106 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Sumario: |
The development of the tool was divided into three phases: manual labeling of important audio segments, extraction of audio parameters and system training. In the phase of labeling a web application was implemented in order to speed up the process. Feature extraction was performed with MIRTOOLBOX library, and the implementation of classifiers and interface was performed using MATLAB. Five classifiers were compared: Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Logistic Regression, Artificial neural networks (ANNs) and support vector machines (SVMs), where the best accuracy results were obtained with ANNs: 79.19% and SVMs: 81.21%. Tests were performed to measure the reduction percentage with three new audio. These tests showed an average reduction of 27.34% using ANNs and 24.50% using SVMs. In addition comprehension tests were performed using a reduced audio created by the tool. A 16.67% of information loss was found. It was concluded that the prosodic and spectral parameters provide sufficient data for a classification of relative importance. It was also found that mixing the prosodic and spectral parameters in the same data set provides better accuracy. |
---|