Presentation1 Pca

8/13/2019 Presentation1 Pca

1/48

Principal Component

Analysis

Courtesy :University of Louisville, CVIP Lab


2/48

PCA

PCA is

A backbone of modern data analysis.

Ablack boxthat is widely usedbut poorly understood.PCA

OK lets dispel the magic behind

this black box


3/48

PCA - Overview

It is a mathematical tool from applied linear

algebra.

It is a simple, non-parametric method ofextractingrelevant information from confusingdata sets.

It provides a roadmap for how toreduceacomplex data set to a lower dimension.


4/48

What do we need under our BELT?!!!

Basics of statistical measures, e.g. variance andcovariance.

Basics of linear algebra:

-Matrices

-Vector space

-Basis

-Eigen vectors and eigen values


5/48

Variance

A measure of the spread of the data in a data

set with mean X

Variance is claimed to be the original statistical

measure of spread of data.

5


6/48

Covariance Variance - measure of the deviation from the mean for points in one

dimension, e.g., heights

Covariance - a measure of how mucheach of the dimensionsvariesfrom the meanwith respect to each other.

Covariance is measured between 2 dimensions to see if there is a

relationship between the 2 dimensions, e.g., number of hours studiedand grade obtained.

The covariance between one dimension and itself is the variance

n

( X

X )(X

X )var(X ) =

cov(X ,Y )

i =1

n

i =1=

i i

( n 1 )

( X X )( Y Y )i i

( n 1)


7/48

Covariance

What is the interpretation of covariance calculations?

Say you have a 2-dimensional data set

-X: number of hours studied for a subject

-Y: marks obtained in that subject

And assume the covariance value (between X and Y) is:104.53

What does this value mean?

7


8/48

CovarianceExact value is not as important as its sign.

A positive value of covariance indicates that bothdimensions increase or decrease together, e.g., as thenumber of hours studied increases, the grades in that

subject also increase.

A negative value indicateswhile one increases the otherdecreases, or vice-versa, e.g., active social life vs.performance in ECE Dept.

If covariance is zero: the two dimensions areindependentof each other, e.g., heights of students vs. grades obtainedin a subject.

8


9/48

Covariance

Why bother with calculating (expensive)

covariance when we could just plot the 2 valuesto see their relationship?

Covariance calculations are used to find

relationships between dimensions in highdimensional data sets (usually greater than 3)

where visualization is difficult.

9


10/48

Covariance Matrix

Representing covariance among dimensions as amatrix, e.g., for 3 dimensions:

Properties:

-Diagonal:variancesof the variables

-cov(X,Y)=cov(Y,X), hence matrix issymmetricalaboutthe diagonal (upper triangular)

-m-dimensional data will result inmxmcovariancematrix10


11/48


12/48


13/48


14/48


15/48


16/48


17/48

Transformation Matrices

Scale vector (3,2) by a value 2 to get (6,4)Multiply by the square transformation matrix

And we see that the result is still scaled by 4.

WHY?

A vector consists of both length and direction. Scaling avector only changes its length and not its direction. This isan important observation in the transformation of matricesleading to formation ofeigenvectors and eigenvalues.

Irrespective of how much we scale (3,2) by, the solution(under the given transformation matrix) is always a multiple

of 4. 22


18/48

Eigenvalue Problem

The eigenvalue problem is any problem having thefollowing form:

A.v=.v

A:mxmmatrixv:mx 1 non-zero vector

: scalar

Any value of for which this equation has asolution is called the eigenvalue of A and the vectorvwhich corresponds to this value is called theeigenvector ofA.

23


19/48


20/48

Calculating Eigenvectors & Eigenvalues

Simple matrix algebra shows that:

A.v=.v

A.v-.I.v=0

(A- . I). v = 0

Finding the roots of |A- . I| will give the eigenvaluesand for each of these eigenvalues there will be aneigenvector

Example

25


21/48


22/48


23/48

Example of a problem

We collected mparameters about 100 students:

-Height

-Weight

-Hair color

-Average grade

-

We want to find the most important parameters thatbest describe a student.


24/48


Each student has a vector of datawhich describes him of length m:

-(180,70,purple,84,)

We have n = 100 such vectors.Lets put them in one matrix,

where each column is one

student vector.

So we have a mxnmatrix. Thiswill be the input of our problem.


25/48


Every student is a vector thatlies in an m-dimensionalvector space spanned by anorthnormal basis.

All data/measurement vectorsin this space are linear

combination of this set ofunit length basis vectors.

n-students


26/48


27/48


28/48

Questions

How we describemost important

features using math?

-Variance

How do we represent our data so thatthe most important features can be

extracted easily?-Change of basis


29/48

Redundancy

Multiple sensors record the same dynamic information.

Consider a range of possible plots between two arbitrary measurement

typesr1andr2.

Panel(a) depicts two recordings with no redundancy, i.e. they areun-correlated, e.g. persons height and his GPA.

However, in panel(c) both recordings appear to bestrongly related, i.e.

one can be expressed in terms of the other.


30/48


31/48

Covariance Matrix

where

The ijthelement of the variance is the dot product between thevector of the ith measurement type with the vector of the jth

measurement type.

-SXis a square symmetricmmmatrix.

-The diagonal terms of SX are the variance of particular

measurement types.

-The off-diagonal terms ofSX are the covariance betweenmeasurement types.


32/48

Covariance Matrix

where

ComputingSX quantifies the correlations between all possiblepairs of measurements. Between one pair of measurements, a

large covariance corresponds to a situation like panel (c), whilezero covariance corresponds to entirely uncorrelated data as in

panel (a).


33/48

PCA Process - STEP 1

Subtract the mean from each of the dimensions

This produces a data set whose mean is zero.

Subtracting the mean makes variance and covariancecalculation easier by simplifying their equations.

The variance and co-variance values are not affected by

the mean value.

Suppose we have two measurement typesX1 andX2,

hencem= 2, and ten samples each, hencen= 10.

56


34/48


35/48


36/48


37/48


38/48


Reduce dimensionality and formfeature vector

The eigenvector with thehighesteigenvalue is theprincipalcomponentof the data set.

In our example, the eigenvector with the largest eigenvalueis the one that points down the middle of the data.

Once eigenvectors are found from the covariance matrix,

the next step is toorder them by eigenvalue, highest tolowest. This gives the components in order of significance.

61


39/48


Now, if youd like, you can decide toignorethe componentsof lesser significance.

You dolose some information, but if the eigenvalues aresmall, you dont lose much

m dimensions in your data

calculatemeigenvectors and eigenvalues

choose only the firstreigenvectors

final data set has onlyrdimensions.

62


40/48


When the is are sorted in descending order, the proportionof variance explained by therprincipal components is:

r

i1+ +K+

i=1

m = 1+

2 r

+K

+ +K

+i=1

i

2 p m

If the dimensions are highly correlated, there will be a small

number of eigenvectors with large eigenvalues andrwill bemuch smaller thanm.

If the dimensions are not correlated,rwill be as large asmand PCA does not help.

63


41/48


42/48


43/48


FinalDatais the final data set, with data items incolumns, and dimensions along rows.

What does this give us?

The original datasolely in terms of the vectors wechose.

We have changed our data from being in terms

of the axesX1andX2, to now be in terms ofour 2 eigenvectors.

66


44/48


FinalData (transpose: dimensions along columns)

newX1

newX2

0.827870186 0.175115307

1.77758033 0.142857227

0.992197494 0.384374989

0.274210416 0.130417207

1.67580142 0.209498461

0.912949103 0.175282444

0.0991094375 0.349824698

1.14457216 0.0464172582

0.438046137 0.0177646297

1.22382956 0.162675287 67


45/48


68


46/48

Reconstruction of Original Data

Recall that:FinalData = RowFeatureVector x RowZeroMeanData

Then:

RowZeroMeanData = RowFeatureVector-1x FinalData

And thus:

RowOriginalData = (RowFeatureVector-1x FinalData) +

OriginalMean

If we use unit eigenvectors, the inverse is the sameas the transpose (hence, easier).

69


47/48


48/48

Next

We will practice with each other how would we use this powerful

mathematical tool in one of computer vision applications, which is

2D face recognition so stay excited

Documents

Presentation1 Pca