View
222
Download
0
Category
Preview:
Citation preview
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 1/28
Easy Does It:Robust Spectro-Temporal Many-
Stream ASR without Fine Tuning
Streams
Ravuri, Morgan, UC Berkeley
Presented by JJ
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 2/28
Motivation
• Physiological experiments indifferent mammal species : alarge percentage of neurons inthe primary auditory cortex (A1)respond differently to upward-versus downward-moving ripplesin the spectrogram of the input(Depireux et al., 2001).
• Spectro-temporal receptivefields (STRFs) : individual neurons
are sensitive to specific spectro-temporal modulation frequenciesin the incoming sound signal
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 3/28
Introduction
• Cortically-inspired TF features, which capture
spectral and temporal modulations speech
recognition and discrimination.
• Basically, spectro-temporal features are
derived from filtering spectrograms with
particular filters.
• In this case, the GABOR filter is applied to the
auditory spectrogram.
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 4/28
Example
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 5/28
Example Gabor Filters
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 6/28
Example Gabor Filters
Gaussian envelope
complex sinusoid s(n, k)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 7/28
1D Gabor
Gaussian envelope complex sinusoid s(n, k)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 8/28
2D GaborGaussian envelope complex sinusoid s(n, k)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 9/28
Example Gabor Filters
Gaussian envelope
complex sinusoid s(n, k)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 10/28
Their Gabor Filters
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 11/28
Their Gabor Filters
parametersDummy
indices
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 12/28
Tons of Combinations!
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 13/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 14/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 15/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• MLP (Multilayer Perceptron)
• The structure of the MLP
depends on the type of feature
and corpus.
Number of Spectral Cepstral
input units 567 351
frames of context 9 9
hidden units 160 for Aurora2
500 for Number95
160 for Aurora2
500 for Number95
output units 56 56
56D
32D
56D
45D
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 16/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• The outputs of the MLP stream
provide an estimate of the
posterior probability distribution
for phones.
• Then, combine each of these
phone probability estimates
across streams by inverse
entropy.56D
32D
56D
71D
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 17/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• then apply the KL
Transform to the log
probabilities of the
merged MLPs
Principal Components Analysis
56D
32D
56D
71D
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 18/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• then apply the KLTransform to the logprobabilities of themerged MLPs
• reduced to 32D
• orthogonalized
• the features are meanand variance normalized
by utterance• finally appended to the
MFCC feature
56D
32D
56D
71D
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 19/28
System
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• Features HMM
56D
32D
56D
71D39D 32D
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 20/28
Experiments
Database
• Aurora 2 (0 – 20 dB)
• Numbers95
• consists of various numeric portionsextracted from telephone dialogues .
• vocabulary size of 32 words
• training set contains 3590 utterancesof clean data, totaling roughly 3 hrs
• 2 test sets contains 1227 utterances.
• The first contains only clean data
• The second contains the sameutterances with noise added at five
SNR (20dB, 15dB, 10dB, 5dB, and0dB).
• Additive noise
Baseline
• 39 MFCC
• 4-stream system
• 28-stream system
Uni-modulation system
• 150 stream
• spectral only and spectral/cepstral
Metric: Word Error Rate (WER)
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 21/28
ResultsAurora 2
Numbers 95
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 22/28
ResultsAurora 2
Numbers 95
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 23/28
ResultsAurora 2
Numbers 95
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 24/28
ResultsAurora 2
Numbers 95
Discussion 1
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 25/28
ResultsAurora 2
Numbers 95
Discussion 2
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 26/28
ResultsAurora 2
Numbers 95
Discussion 3
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 27/28
ResultsAurora 2
Numbers 95
8/3/2019 Gabor presentation
http://slidepdf.com/reader/full/gabor-presentation 28/28
Stream
…….
…….
…….
Stream
Merge MLP outputs
PCA
MFCC Output
• Not just additive noise
• Another TF feature
might not work
• Log-mel filterbank? Orpower like PNCC?
• How to combine MLP?
Inverse Entropy?
56D
32D
56D
71D39D 32D
Future Work
Recommended