11
XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL Colima, Colima, México. 7 al 9 de noviembre del 2012

XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

Embed Size (px)

Citation preview

Page 1: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia,

Electrónica y Computación, ROPEC 2012,

INTERNACIONAL

Colima, Colima, México. 7 al 9 de noviembre del 2012

Page 2: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL

iii

Comité Organizador

Coordinador General

M.I. Isidro Ignacio Lazaro Castillo – Universidad Michoacana de San Nicolás de Hidalgo

Presidente (Chair)

Dr. Juan Anzurez Marín – Universidad Michoacana de San Nicolás de Hidalgo

Vicepresidente (Co-Chair)

M.C. José Luis Álvarez Flores – Universidad de Colima Campus Coquimatlán

Dr. Jaime Cerda Jacobo – Universidad Michoacana de San Nicolás de Hidalgo

Ing. Elías Valencia Valencia – Universidad de Colima Campus Coquimatlán

Tesorero

Dr. Jaime Cerda Jacobo – Universidad Michoacana de San Nicolás de Hidalgo

C.P. Héctor Francisco Cárdenas Castañeda – Universidad de Colima Campus Coquimatlán

Coordinadores de Cursos

M.C. Tiberio Venegas Trujillo – Universidad de Colima Campus Coquimatlán

Dra. Elisa Espinosa Juárez – Universidad Michoacana de San Nicolás de Hidalgo

M.I. Isidro Ignacio Lazaro Castillo – Universidad Michoacana de San Nicolás de Hidalgo

Difusión

M.C. Ricardo Fuentes Covarrubias – Universidad de Colima Campus Coquimatlán

Dr. Norberto García Barriga – Universidad Michoacana de San Nicolás de Hidalgo

Dr. Mario Graff Guerrero – Universidad Michoacana de San Nicolás de Hidalgo

Página WEB

M.C. José Alberto Álvarez Martín – SAC de la Sección Centro Occidente

Coordinadores de la Exposición Industrial

M.I. Isidro Ignacio Lazaro Castillo – Universidad Michoacana de San Nicolás de Hidalgo

M.I. José Rodolfo Madrigal Sánchez – Universidad de Colima Campus Coquimatlán

M.C. Genaro Chacón Hernández – Equipo Industrial y Servicios SA

Page 3: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL

iv

Inscripciones y Registro

M.C. José Alberto Álvarez Martín – SAC de la Sección Centro Occidente

Lic. Norma Guadalupe Cano Hernández – Universidad de Colima Campus Coquimatlán

Uriel Sandoval – Presidente de la Rama Estudiantil de la UMSNH

Cesar L. Melchor Hernández – Representante de Ramas Estudiantiles

Javier Hernández Santillán – Presidente del Capítulo de Comunicaciones de la Rama Estudiantil del Instituto Tecnológico de Morelia.

Coordinadores del Comité Revisor

Dr. Jaime Cerda Jacobo – Universidad Michoacana de San Nicolás de Hidalgo

M.I. Isidro Ignacio Lazaro Castillo – Universidad Michoacana de San Nicolás de Hidalgo

Memorias del Evento

Dr. Antonio Ramos Paz – Universidad Michoacana de San Nicolás de Hidalgo

Reconocimientos

M.I. Salvador Ramírez Zavala – Universidad Michoacana de San Nicolás de Hidalgo

Coordinador de Moderadores

Dr. Víctor Hugo Castillo Topete – Universidad de Colima Campus Coquimatlán

Dr. Mario Graff Guerrero – Universidad Michoacana de San Nicolás de Hidalgo

Coordinador de Eventos Sociales

Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán

M.C. José Alberto Álvarez Martín – SAC de la Sección Centro Occidente

Uriel Sandoval – Presidente de la rama estudiantil UMSNH

Comité de Apoyo

Capítulos en la Sección Centro Occidente de las siguientes sociedades: o IEEE Engineering in Medicine & Biology Society (EMBS) o IEEE Power Electronics Society (PELS) o IEEE Power & Energy Society (PES) o IEEE Control Systems (CS) IEEE Industry Applications Society (IA) o IEEE Communications Society (COM) o IEEE Computer Science (C) o IEEE Women in Engineering Society (WIE) o Joint Chapter Computational Intelligent Society (CIS) & Roboticsand Automation

(RA)

Page 4: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL

v

Capítulos Estudiantiles en la Sección Centro Occidente de las siguientes sociedades:

o IEEE Power & Energy Society (PES) – Rama Estudiantil UMSNH o IEEE Industry Applications Society (IA) – Rama Estudiantil UMSNH o Grupo de Afinidad Mujeres en la Ingeniería (WIE) Rama estudiantil UMSNH

Ramas Estudiantiles en la Sección Centro Occidente: o Rama Estudiantil de la Universidad de Colima, Campus Coquimatlán o Rama Estudiantil de la Universidad Michoacana de San Nicolás de Hidalgo o Rama Estudiantil del Instituto Tecnológico de Morelia o Rama Estudiantil de la Universidad de Colima, Campus Manzanillo o Rama Estudiantil del Instituto Tecnológico de la Piedad o Rama Estudiantil del Instituto Tecnológico de EstudiosSuperiores de Zamora

Page 5: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL

vi

Comité Revisor

Presidente (Chair)

Dr. Juan Anzurez Marín - Universidad Michoacana de San Nicolás de Hidalgo

Vicepresidente (Co-Chair)

M.C. José Luis Álvarez Florez - Universidad de Colima Campus Coquimatlán

Dr. Jaime Cerda Jacobo – Universidad Michoacana de San Nicolás de Hidalgo

Ing. Elías Valencia Valencia – Universidad de Colima Campus Coquimatlán

Miembros del Comité revisor Internacional

Revisor Institución País

Dr. Angel Gutiérrez Montclaire State University Estados Unidos

Dr. Fredy H. Martínez S. Universidad Distrital Colombia

Dr. Gilberto Reynoso Meza Universidad Politécnica de

Valencia España

Dr. Gilberto Zamora VisionQuest Biomedical Estados Unidos

Dr. José Alberto Benítez Gómez

IEEE Sección Paraguay Paraguay

Dr. José Luis Pitarch Pérez Universidad Politécnica de

Valencia España

Dr. Norberto Lerendegui Instituto Tecnológico de

Buenos Aires Argentina

Dr. Sergio Bravo Solorio University of Warwick Reino Unido

Page 6: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

XIV Reunión de Otoño de Potencia, Electrónica y Computación, ROPEC 2012, INTERNACIONAL

xii

Ponencias Orales

Título Identificador Autores Página

Modelado de VSI con modulación senoidal modificada para sistema de aire

acondicionado 2

Jose Lopez-Villalobos, Roxana Garcia and Jose Valderrama-Chairez

1

Realización práctica de los Turbo códigos de alto desempeño con un TMS320C6416.

4 Iván Fabián-Luna, Gerardo Abel Laguna-Sánchez and

Ricardo Marcelín-Jiménez 7

Diseño de Experimentos en el Estudio de Pérdidas en Tanques de Transformadores

de Distribución 6

Jose Perez, Juan Olivares, Rafael Escarela, Salvador Magdaleno, Victor Jimenez and Eduardo Campero

13

Planta Robusta de Levitación Neumática para Investigación y Formación en Control y

Visión Artificial 10 Fredy Martinez, Diego Bello, Leidy Garcia and Diego Acero 19

Análisis de Vulnerabilidad de Redes Eléctricas Mediante Medidas de Centralidad

11 Francisco Gutierrez and Emilio Barocio 25

Inverse Optimal Trajectory Tracking for Discrete-Time Nonlinear Systems: Application to the Boost Converter

12 Fernando Ornelas, J. Jesus Rico-Melgoza, Guillermo C.

Zuñiga and Gabriel Casarrubias 31

Irradiación solar en superficies inclinadas para el dimensionamiento de un sistema

fotovoltaico 17

Orlando Alfredo Orta Salomon, Everardo Efrén Granda Gutiérrez, Juan Carlos Díaz Guillén, Sandra Ivonne Perez

Aguilar and Rodrigo Cuevas Tenango 37

Circuito Neuromorfico WTA: Fundamentos y Consideraciones de Diseño en Silicio

18 Cesar Rodolfo Acosta Méndez, Federico Sandoval Ibarra and

Juan Luis Del Valle Padilla 44

Análisis de Ruido impulsivo en sistemas de comunicaciones por líneas de energía

eléctrica generado por cargas inductivas. 20 Monica Avalos, Roberto Linares and Hector Caltenco 50

Identificación de Parámetros en Sistemas Lineales Invariantes en el Tiempo Usando

Polinómios de Chebyshev 21

Garibaldi Pineda García, Isidro I. Lázaro Castillo and J. Alberto Álvarez Martín

56

Evaluación de medida de costo combinada, AD + census, con técnicas de agregación adaptable, para el cálculo de mapas de

disparidad en imágenes estereoscópicas

26 Salvador Ibarra Delgado, Manuel Hernandez Calviño, Jose

Ignacio Benavides Benitez and Jorge Flores Troncoso 62

Cálculo de Pérdidas y Distribución de Campo en el Núcleo de un Reactor de

Potencia 27

Salvador Magdaleno, Enrique Melgoza, Juan Olivares, Rodrigo Ocon and Eduardo Campero

68

Máquina Recicladora de Pilas 31 Noé Bonilla Lopez, Oscar Arturo Sánchez Guerrero and

Rogelio Manuel Higuera González 73

Arquitectura abierta del protocolo de comunicación SPI en un modulo VLSI

32 Reyna Itzel García López, Ramón Chávez Bracamontes, Manuel Bandala Sánchez and Marco Antonio Gurrola

Navarro 81

Comparación de un GA con distintos operadores genéticos y un GA celular para

la síntesis de seguidores de voltaje 37

Mónica Macías, Esteban Tlelo, Miguel Aurelio Duarte and Georgina Flores

87

Contrasting LPCs and Cepstrum Coefficients in Speaker Authentication Tasks

40 Salomé Pérez and Vinicio Carrera 93

ALGORITMOS PSO y PESO APLICADOS AL PROBLEMA DE INESTABILIDAD DINÁMICA EN SISTEMAS MULTIAGENTES NÓMADAS

41 Alejandro Sosa, Victor Zamudio, Rosario Baltazar, Carlos

Lino, Miguel Angel Casillas and Marco Sotelo 98

Control Robusto para Seguimiento de Trayectorias de Posición para Motores de

CD 43

Francisco Beltrán Carbajal, Eusebio Guzmán Serrano and Antonio Valderrábano González

104

Page 7: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

1

Contrasting LPCs and Cepstrum Coefficients inSpeaker Authentication Tasks

Marıa Salome Perez, Student Member, IEEE, and Enrique V. Carrera, Member, IEEE

Abstract—Voice authentication tasks can be improved by usingbetter speech parameterization techniques. It has been proventhat the short-time analysis of speech signals provides uniquefeature vectors by extracting important voice parameters. Amongother feature vectors, the family of Linear Predictive Coefficients(LPCs) and Cepstrum Coefficients (CCs) are the most used fordescribing speech signals because of their good performance andsimplicity. Thus, this paper compares the effectiveness of LPCsagainst CCs when applied to voice authentication tasks. Besidescontrasting Mel-frequency and LPC-based CCs in three differentauthentication scenarios, an important contribution of this workis quantifying the effect of additional time-domain features inthe vectors previously mentioned.

Index Terms—Voice authentication, linear predictive coeffi-cients, cepstrum coefficients.

I. INTRODUCTION

Nowadays, technology is been used for restricting access toresources through user authentication. Although, there are sev-eral authentication techniques, biometric-based authenticationis a promising alternative [1]. Biometric-based authenticationdetermines unique physical, behavioral or adhered humancharacteristics (e.g., fingerprint verification, retinal scans, fa-cial analysis, voice recognition) [2]. Hence, biometric-basedauthentication has some key advantages over other authenti-cation methods, since biometric characteristics are not easilyforgotten, like a password, or lost like a key.

The majority of biometric authentication techniques requiresophisticated equipment and the physical presence of theperson being authenticated. However, a person’s voiceprint canbe as unique as any other biometric characteristic. Thus, takingadvantage of appropriate processing techniques, voice authen-tication could be simple (i.e., no extra hardware or softwareis required), less personally intrusive, and the authenticationitself can be done even remotely [3].

Developments in the voice-authentication field have gener-ated diverse processing techniques to support proper speakerrecognition. A convenient and well understood processingtechnique is to extract important information from the speechsignal by computing a vector of feature coefficients whichis then classified. There are many effective feature extractionalgorithms available in the literature [2]. For instance, theshort-term spectrum of the speech signal is the most famousalternative for representing certain voice features. However,

M. S. Perez studies Electronics and Telecommunications Engineering atthe Ecuadorian Armed Forces University, P. O. Box 17-15-231B, Sangolqui,Ecuador [email protected]

E. V. Carrera is with the Department of Electrical Engineering, Ecuado-rian Armed Forces University, P. O. Box 17-15-231B, Sangolqui, [email protected]

there are other popular approximations to the short-termspectrum, such as linear prediction coding, and Mel-frequencycepstrum coefficients [4].

Based on that, this paper focuses on evaluating the us-age of Linear Predictive Coefficients (LPCs) and CepstrumCoefficients (CCs) in voice authentication tasks. In the caseof CCs, two variations of them are analyzed: Mel-frequencyCepstrum Coefficients (mel-CCs) and LPC-based CepstralCoefficients (lpc-CCs). We are interested in analyzing thesefeature vectors due to their relatively good performance andtheir not-so-complicated structures when compared to othersimilar alternatives. In addition, note that LPCs model thevocal tract of the speaker while CCs simulate the human earstructure [5].

Along our evaluation, LPC and CC feature vectors are alsocombined with other time-domain measurements (i.e., energy,zero crossing rate, kurtosis) in order to present an extensiveparameter space study. Moreover, since the resulting featurevectors are used to authenticate people by means of a classifierbased on artificial neural networks (ANNs) [6], the adaptationof these vectors as valid ANN inputs is evaluated through twonormalization techniques: min-max and z-score [7].

Finally, in order to apprehend the main potentials andlimitations of speaker authentication tasks under differentscenarios, results for three common authentication activitiesare analyzed: (i) the correct recognition of groups of people(e.g., gender identification), (ii) the recognition of one personamong a group of strangers, and (iii) individual authenticationof every person in a group.

II. BACKGROUNDA good parametric representation of the speech signal can

be obtained in a sufficiently short interval, since the voicecan be considered as a wide-sense stationary process arounda period of 30 ms [8]. This short-term representation is oneof the most used techniques to characterize voice informationthrough feature vectors.

A. Linear PredictionLinear prediction coding is normally used due to its pow-

erful speech generation model, which is quite suitable forvoiced sounds and still acceptable for unvoiced ones [9]. Insummary, linear prediction provides an spectral description ofshort signal segments considering the speech as the response ofan all-pole filter. This all-pole filter attempts to model the vocaltract [10]. Filter coefficients characterize the parameters of thisauto-regressive model. In addition, these filter coefficients canbe computed very efficiently and form the LPC feature vector.

Reunión de Otoño de Potencia, Electrónica y Computación

ROPEC' 2012 INTERNACIONAL

Artículo aceptado para ser presentado como ponencial oral 93 ISBN: 978-607-95476-6-0

Page 8: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

2

Fig. 1. Block diagram for our voice authentication system.

B. Short-Time Cepstrum

Short-time spectrum analysis techniques also make use of aderivation known as cepstrum. The cepstrum is not more thatthe inverse Fourier transform of the log magnitude of a signalspectrum [9]. This definition was motivated by the fact thatthe logarithm of the Fourier spectrum contains an echo and anadditive periodical component that could be useful for pitchdetection [10]. Depending on the spectrum representation,many different types of CCs could be obtained [9].

1) LPC-based Cepstral Coefficients: LPC-based CepstralCoefficients model a time evolving signal as an ordered setof coefficients representing the signal spectral envelope. Thischaracterization is obtained by processing LPCs through theinverse Fourier transform [9].

2) Mel-frequency Cepstrum Coefficients: In order to imitatethe human ear structure based on a frequency analysis per-formed in the inner ear, a new type of cepstrum representationhas come to be widely used. This representation is known asthe Mel-frequency Cepstrum Coefficients [10]. The basic ideabehind mel-CCs is to obtain a filter bank with logarithmicband spacing. In most implementations, the short-time Fourieranalysis is done first, and then the DFT values are groupedtogether in selected bands weighted by a triangular function.

C. Time-Domain Features

The LPCs and the CCs are usually referred to as staticfeatures, since they only contain information from a givenframe [6]. In order to enhance this representation, it is usualto introduce dynamic features such as the energy containedin the frame [9]. Another possible factor for enhanced speechcharacterization is kurtosis, which represents the speech signalwaveform by means of a numerical value, measuring therelative concentration (flatness or peakedness) of a real-valuedrandom variable when related to the normal distribution [11].

D. Artificial Neural Networks

ANNs are sets of relatively simple non-linear adaptiveprocessing elements, arranged in a structure that resemblancesthe processing of biological neurons. Several parallel process-ing layers are interconnected, and their connection weightsadjusted to perform specific functions such as classification orprediction.

The most common algorithm used for training ANNs isknown as back-propagation [12]. This algorithm looks for alocal minima in the error function while adjusts the connectionweights of the network. Since back-propagation is a supervisedlearning method, it requires the desired output of each input.In this way, the difference between the desired output and thecurrent output of the network is the error function to minimize.

III. IMPLEMENTATIONThe general scheme proposed for evaluating LPCs against

CCs in voice authentication tasks is shown in Fig. 1. Here,solid lines represent the normal testing process, while thedashed lines correspond to the initial off-line training process.

Current implementation was done using the Matlab R©

R2012a platform1. In particular, data acquisition, signal pro-cessing, DSP object modeling and neural network Matlab’stoolboxes are being used. Details about each block follow inthe next paragraphs.

• Signal acquisition. Speech signals are acquired throughan external omnidirectional microphone in a WAV audioformat using a sample frequency of 8 kHz and 16 bitsper sample.

• Pre-processing. The whole acquired input is segmented inframes of 240 samples (i.e., 30 ms). Consecutive framesare overlapped by a 15-ms time-shift factor. Labels foreach frame are set according to identification parametersestablished in the system interface (e.g., user ID, gender).

• Feature extraction. Each voice frame is used to computevectors of LPCs, lpc-CCs and mel-CCs, besides othertime-domain features such as energy and kurtosis. Lpc-CCs are obtained using the Matlab function LPCToCep-stral. On the other hand, mel-CCs are computed takinginto account the algorithm detailed in the backgroundsection, where 20 is the number of filter bank channelsused to provide spectral representation to each signalframe [10].

• Normalization. Since the back-propagation algorithmused as classifier requires input values in the interval[−1,+1] in order to maximize its performance (becauseof its sigmoid transfer function) [12], normalization isparticularly useful to scale feature vectors within thementioned range. Min-max and z-score normalizationtechniques are analyzed [7].

1http://www.mathworks.com/

Reunión de Otoño de Potencia, Electrónica y Computación

ROPEC' 2012 INTERNACIONAL

Artículo aceptado para ser presentado como ponencial oral 94 ISBN: 978-607-95476-6-0

Page 9: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

3

Fig. 2. Gender identification considering LPCs. Fig. 3. Gender identification considering LPCs and strangers.

Fig. 4. One person recognition considering LPCs.

• Data storage. In order to create training datasets for theclassifier, MAT data files are used to store normalizedfeature vectors and their labels (i.e., the targets of theclassifier). In these files, each row corresponds to adifferent value calculated at the feature extraction stage,and each column resembles a different 240-sample voiceframe.

• Classifier. An ANN-based classifier executing the back-propagation algorithm is used as the main decisionstructure. The implemented ANN has 3 layers, wherethe inner layer contains 10 processing elements and thesize of the input/output layers depends on each specificevaluation. The final decision is chosen according to asimple criterion of maximum output value, since eachnetwork output corresponds to a unique label [10]. Inthis way, recognition success and mismatch are easilyexplored through a confusion matrix.

IV. RESULTS

With the purpose of evaluating recognition accuracy in voiceauthentication tasks, utterance from 16 people (8 women and8 men) were collected. A codeword compound of 4 Spanish

female names, with an average of six characters per word andwithout phonetic separation among them were introduced asinputs to the pre-processing stage. In addition, ANN trainingwas performed with 70% of those samples and the remaining30% was used for testing purposes. Since ANNs have non-deterministic behavior, presented results correspond to anaverage of a hundred trails.

Thus, the following sections discuss recognition accuracyreached by LPCs and CCs with the min-max and z-scorenormalization techniques. In order to apprehend the mainlimitations and potentials of voice authentication under dif-ferent scenarios, three different recognition experiments areevaluated.

A. Gender Identification

Our first experiment uses two labels for training the ANNoutputs: male and female. The input feature coefficients cor-respond to groups of 2, 4, 6, 8 and 10 people with thesame number of men and women. Fig. 2 shows recognitionaccuracy reached with different number of LPCs and theircombination with some time-domain factors. Note that thefigure only shows results for 2, 6 and 10 people. In similar way,Fig. 3 shows the recognition accuracy for different parametercombinations but including strangers. Strangers are samplesfrom people that are not part of the training process, but thosesamples were introduced during the ANN testing process. Theamount of strangers was set to half the trained population foreach authentication experiment.

The number of coefficients was varied from one to ten,although Fig. 2 and 3 only show accuracy for 3, 6 and 10 coef-ficients. In addition, recognition accuracies for normalizationmethods z-score (“zS”) and min-max (“MM”) are included.

According to these results, we can see that the maximumrecognition accuracy for gender identification is obtainedusing 10th-order LPCs combined with the value of energy(“E”). The recognition accuracy reaches 90% for basic genderidentification and up to 80% for the same task but includingstrangers. Moreover, recognition levels of up to 80% couldalso be achieved using LPCs combined with kurtosis (“K”).

Reunión de Otoño de Potencia, Electrónica y Computación

ROPEC' 2012 INTERNACIONAL

Artículo aceptado para ser presentado como ponencial oral 95 ISBN: 978-607-95476-6-0

Page 10: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

4

Fig. 5. People authentication considering LPCs. Fig. 6. People authentication considering LPCs and strangers.

Fig. 7. Gender identification using z-score. Fig. 8. Gender identification with strangers using z-score.

Fig. 9. One person recognition using z-score.

Mentioned accuracy rates are obtained using the z-scorenormalization technique. Results for min-max normalizationare not as good as the z-score normalization, but they alsoshow high recognition accuracy by considering variations as‘LPC+E’ and ‘LPC+E+K’.

B. One Person Recognition

This experiment is oriented to distinguish a selected male-person from the rest of the group. In this case, the ANN hasonly two possible outputs: recognize or reject the person. Fig.4 summarizes results for different combinations of LPCs. Asin previous experiment, results for 3, 6 and 10 coefficients,

and 2, 6, and 10 people are included.Results of recognition accuracy show that 10th-order LPCs

combined with the value of energy are enough to distinguishone person from the rest of the group. In this experiment,‘LPC+E’ produces accuracy rates above 90%. Furthermore,high recognition levels (up to 90%) can be obtained usingLPCs combined with kurtosis and energy. The z-score nor-malization also shows a better behavior that the min-maxtechnique.

Furthermore, ROC Curves for the present experiment weregenerated by the target and non-target samples used to repre-sent the classifier’s true and false positives relation. From theanalysis we conclude that the classifier’s behaviour is specifiedby a high sensitivity against a poor specificity.

C. People Authentication

Our last experiment tries to identify every single personwithin a group of people by assigning a separate output toeach person being identified by the ANN. Similar to previousexperiments, the size of the group is varied from 2 to 10 peopleshowing results for 3, 6 and 10 LPCs.

Fig. 5 evidences that the best recognition accuracy isreached by combining LPCs with kurtosis and energy. Similarbehavior is observed when strangers are included during thetesting process (Fig. 6). Recognition accuracy reaches 100%in the case of authentication without strangers and values up

Reunión de Otoño de Potencia, Electrónica y Computación

ROPEC' 2012 INTERNACIONAL

Artículo aceptado para ser presentado como ponencial oral 96 ISBN: 978-607-95476-6-0

Page 11: XIV Reunión de Otoño de Potencia, Electrónica y ...sapyc.espe.edu.ec/evcarrera/papers/ropec12.pdf · Dr. Sergio Llamas Zamorano – Universidad de Colima Campus Coquimatlán M.C

5

Fig. 10. People authentication using z-score. Fig. 11. People authentication with strangers using z-score.

to 90% when strangers are introduced. Again, all these resultsare obtained using the z-score normalization.

D. LPCs vs. CCs

In order to compare CCs against LPCs in speaker authenti-cation tasks, lpc-CCs and mel-CCs are tested as input featuresin the three previous experiments. The number of coefficientswas varied from 2 to 10 and those values were combined withthe same time-domain factors already discussed.

Fig. 7, 8, 9, 10 and 11 show the recognition accuracy forLPCs, lpc-CCs and mel-CCs considering a group of 10 peopleand z-score normalization. The best results for 3, 6 and 10coefficients are shown in these figures.

All figures show clearly that LPCs are the most appropriatefeatures for authentication tasks. On the other hand, mel-CCs characterization is the least suitable option for genderidentification and people authentication. Although lpc-CCs canbe also considered for authentication tasks, LPCs plus energyas the feature vector and the z-score normalization are the bestapproach for all the proposed authentication tasks.

V. CONCLUSIONS

From the previous analysis, we conclude that linear predic-tion coefficients are quite effective for speaker authenticationtasks. In particular, 10 LPCs plus time-domain energy is thecombination that reaches the highest recognition accuracy.

Furthermore, LPCs present better accuracy than cepstrumcoefficients. Although lpc-CC could also be considered forauthentication tasks, especially when the goal is identifying aperson from a group of people, computing simplicity providesa reason for considering LPC characterization as the bestalternative for voice authentication.

Finally, the z-score normalization seems to be the mostadequate method for processing voice feature vectors as inputsto ANN classifiers.

REFERENCES

[1] M. Wagner, Advanced Topics in Biometrics. University of Canberra,Australia, August 2010, ch. 17, p. 16.

[2] W. Thanhikam, Y. Satirasombat, and C. Charoenlarpnopparut, “Voiceauthentication system: Lpc and mel-ceptrum bases with vector quan-tization.” Sirindhorn International Institute of Technology. ThammasatUniversity, Thailand, Tech. Rep., 2011.

[3] S. Bengio and J. Mariethoz, “A statistical significance test for personauthentication.” IDIAP, Switzerland, Tech. Rep., 2010.

[4] G. Antonoil, V. F. Rollo, and G. Venturi, “Linear predictive coding andcepstrum coefficients for mining time variant information from softwarerepositories.” International Workshop on Mining Software Repositories,Missouri, USA, 2005.

[5] K. S. R. Murty and B. Yegnanarayana, “Combining evidence fromresidual phase and mfcc features for speaker recognition.” IEEE SignalProcessing Letters, vol. 13, no. 1, 2006.

[6] T. Ganchev, N. Fakotakis, and G. Kokkinakis, “Comparative evaluationof various mfcc implementations on the speaker verification task.” WireCommunications Laboratory. University of Patras, Greece, Tech. Rep.,2005.

[7] J. Han, M. Kamber, and J. Per, Data Mining - Concepts and Techniques.,3rd ed. Morgan Kafmann Publishers is an imprint of Elsevier, 2012.

[8] Z. Fang, Z. Guoliang, and S. Zhanjiang, “Comparison of different imple-mentations of mfcc.” J. Computation Science & Technology. TsinghuaUniversity, China, vol. 16, no. 6, November 2001.

[9] A. M. Peinado and J. C. Segura, Speech Recognition over digitalchannels- Robustness and Standards. John Wiley & Sons, Ltd, 2006.

[10] L. R. Rabiner and R. W. Schafer, “Introduction to digital speechprocessing.” Foundations and Trends in Signal Processing, vol. 1, pp. 1–194, 2007, Rugters University. University of California. Hewlett-PackardLaboratories, USA.

[11] S. Perez and E. V. Carrera, “Simple speech recognition using kurtosis.”Science and Technology Magazine, vol. 7, pp. 62–69, 2012.

[12] I. Mcloughlin, Applied Speech and Audio Processing. CambridgeUniversity Press, 2009.

Marıa Salome Perez was born in Quito, Ecuadorin 1990. She received her high-school diploma fromFederico Engels School, Ecuador, in 2008. Shealso received the Cisco-CCNA v4.0 Certificationin 2011. Currently, she is studying Electronics andTelecommunications Engineering at the EcuadorianArmed Forces University, Ecuador. Her researchinterests include artificial intelligence and biometricrecognition.

Enrique V. Carrera (M’99) received his BE de-gree in electronics engineering from the EcuadorianArmed Forces University, Ecuador, in 1992. Healso received in 1996 his MS degree in electricalengineering from the Pontifical Catholic Universityof Rio de Janeiro, Brazil. In 1999, he received hisDSc degree in systems engineering from the FederalUniversity of Rio de Janeiro, Brazil. He joined theDepartment of Computer Science at Rutgers Uni-versity, USA, as post-doctoral associate from 2000to 2004. Currently, he is Associate Professor at the

Ecuadorian Armed Forces University, Ecuador. His research interests includedistributed systems, ubiquitous computing, and computational intelligence.

Reunión de Otoño de Potencia, Electrónica y Computación

ROPEC' 2012 INTERNACIONAL

Artículo aceptado para ser presentado como ponencial oral 97 ISBN: 978-607-95476-6-0