AMS04-tesis

8/13/2019 AMS04-tesis

1/20

Acta Applicandae Mathematicae, Volume 80, Issue 2, pp. 199220, January 2004

Riemannian geometry of Grassmann manifolds

with a view on algorithmic computation

P.-A. Absil R. Mahony R. Sepulchre

Last revised: 14 Dec 2003

PREPRINT

Abstract. We give simple formulas for the canonical metric, gradient, Lie

derivative, Riemannian connection, parallel translation, geodesics and distance

on the Grassmann manifold ofp-planes in Rn. In these formulas, p-planes are

represented as the column space ofn p matrices. The Newton method on

abstract Riemannian manifolds proposed by S. T. Smith is made explicit on

the Grassmann manifold. Two applications computing an invariant subspace

of a matrix and the mean of subspaces are worked out.

Key words. Grassmann manifold, noncompact Stiefel manifold, principal fiber bundle, Levi-Civita

connection, parallel transportation, geodesic, Newton method, invariant subspace, mean of subspaces.

AMS subject classification: 65J05 General theory (Numerical analysis in abstract spaces); 53C05Connections, general theory; 14M15 Grassmannians, Schubert varieties, flag manifolds.

1 Introduction

The majority of available numerical techniques for optimization and nonlinear equations as-sume an underlying Euclidean space. Yet many computational problems are posed on non-Euclidean spaces. Several authors [Gab82, Smi94, Udr94, Mah96, MM02] have proposedabstract algorithms that exploit the underlying geometry (e.g. symmetric, homogeneous, Rie-mannian) of manifolds on which problems are cast, but the conversion of these abstractgeometric algorithms into numerical procedures in practical situations is often a nontrivial

task that critically relies on an adequate representation of the manifold.The present paper contributes to addressing this issue in the case where the relevant

non-Euclidean space is the set of fixed dimensional subspaces of a given Euclidean space.This non-Euclidean space is commonly called the Grassmann manifold. Our motivation for

School of Computational Science and Information Technology, Florida State University, TallahasseeFL 32306-4120. Part of this work was done while the author was a Research Fellow with the Bel-gian National Fund for Scientific Research (Aspirant du F.N.R.S.) at the University of Liege. URL:www.montefiore.ulg.ac.be/absil

Department of Engineering, Australian National University, ACT, 0200, Australia.Department of Electrical Engineering and Computer Science, Universite de Liege, Bat. B28 Systemes,

Grande Traverse 10, B-4000 Liege, Belgium. URL: www.montefiore.ulg.ac.be/systems

1

8/13/2019 AMS04-tesis

2/20

considering the Grassmann manifold comes from the number of applications that can beformulated as finding zeros of fields defined on the Grassmann manifold. Examples includeinvariant subspace computation and subspace tracking; see e.g. [Dem87, CG90] and references

therein.A simple and robust manner of representing a subspace in computer memory is in the form

of a matrix array of double precision data whose columns span the subspace. Using this repre-sentation technique, we produce formulas for fundamental Riemannian-geometric objects onthe Grassmann manifold endowed with its canonical metric: gradient, Riemannian connec-tion, parallel translation, geodesics and distance. The formulas for the Riemannian connectionand geodesics directly yield a matrix expression for a Newton method on Grassmann, andwe illustrate the applicability of this Newton method on two computational problems cast onthe Grassmann manifold.

The classical Newton method for computing a zero of a function F : Rn Rn can be

formulated as follows [DS83, Lue69]: Solve the Newton equation

DF(x)[] =F(x) (1)

for the unknown Rn and compute the update

x+:=x + . (2)

When F is defined on a non-Euclidean manifold, a possible approach is to choose local co-ordinates and use the Newton method as in (1)-(2). However, the successive iterates onthe manifold will depend on the chosen coordinate system. Smith [Smi93, Smi94] proposesa coordinate-independent Newton method for computing a zero of a C one-form on an

abstract complete Riemannian manifold M. He suggests to solve the Newton equation

= x (3)

for the unknown TxM, where denotes the Riemannian connection (also called Levi-Civita connection) on M, and update along the geodesic as x+ := Expx. It can be proventhat ifx is chosen suitably close to a point xin M such that x= 0 and TxM isnondegenerate, then the algorithm converges quadratically to x. We will refer to this iterationas the Riemann-Newton method.

In practical cases it may not be obvious to particularize the Riemann-Newton method intoa concrete algorithm. Given a Riemannian manifoldMand an initial pointx on M, one maypick a coordinate system containingx, compute the metric tensor in these coordinates, deduce

the Christoffel symbols and obtain a tensorial equation for (3), but this procedure is oftenexceedingly complicated and computationally inefficient. One can also recognize that theRiemann-Newton method is equivalent to the classical Newton method in normal coordinatesat x[MM02], but obtaining a tractable expression for these coordinates is often elusive.

On the Grassmann manifold, a formula for the Riemannian connection was given byMachado and Salavessa in [MS85]. They identify the Grassmann manifold with the set ofprojectors into subspaces ofRn, embed the set of projectors in the set of linear maps from Rn toRn (which is an Euclidean space), and endow this set with the Hilbert-Schmidt inner product.

The induced metric on the Grassmann manifold is then the essentially unique On-invariantmetric mentioned above. The embedding of the Grassmann manifold in an Euclidean space

2

8/13/2019 AMS04-tesis

3/20

allows the authors to compute the Riemannian connection by taking the derivative in theEuclidean space and projecting the result into the tangent space of the embedded manifold.They obtain a formula for the Riemannian connection in terms of projectors.

Edelman, Arias and Smith [EAS98] have proposed an expression of the Riemann-Newtonmethod on the Grassmann manifold in the particular case where is the differential df of areal function f onM. Their approach avoids the derivation of a formula for the Riemannianconnection on Grassmann. Instead, they obtain a formula for the Hessian (1df)2 bypolarizing the second derivative offalong the geodesics.

In the present paper, we derive an easy-to-use formula for the Riemannian connection where and arearbitrarysmooth vector fields on the Grassmann manifold ofp-dimensionalsubspaces ofRn. This formula, expressed in terms ofn pmatrices, intuitively relates to thegeometry of the Grassmann manifold expressed as a set of equivalence classes ofnpmatrices.Once the formula for Riemannian connection is available, expressions for parallel transportand geodesics directly follow. Expressing the Riemann-Newton method on the Grassmann

manifold for concrete vector fields reduces to a directional derivative in Rn followed by aprojection.

We work out an example where the zeros ofare the p-dimensional right invariant sub-spaces of an arbitrary n nmatrixA. This generalizes an application considered in [EAS98]wherewas the gradient of a generalized scalar Rayleigh quotient of a matrix A = AT. TheNewton method for ourconverges locally quadratically to the nondegenerate zeros of. Weshow that the rate of convergence is cubic if and only if the targeted zero of is also a leftinvariant subspace ofA. In a second example, the zero of is the mean of a collection ofp-dimensional subspaces ofRn. We illustrate on a numerical experiment the fast convergenceof the Newton algorithm to the mean subspace.

The present paper only requires from the reader an elementary background in Riemannian

geometry (tangent vectors, gradient, parallel transport, geodesics, distance), which can beread e.g. from Boothby [Boo75], do Carmo [dC92] or the introductory chapter of [Cha93]. Therelevant definitions are summarily recalled in the text. Concepts of reductive homogeneousspace and symmetric spaces (see [Boo75, Nom54, KN63, Hel78] and particularly sections II.4,IV.3, IV.A and X.2 in the latter) are not needed, but they can help to get insight into theproblem. Although some elementary concepts of principal fiber bundle theory [KN63] areused, no specific background is needed.

The paper is organized as follows. In Section 2, the linear subspaces ofRn are identifiedwith equivalent classes of matrices and the manifold structure of Grassmann is defined. Sec-tion 3 defines a Riemannian structure on Grassmann. Formulas are given for Lie brackets,Riemannian connection, parallel transport, geodesics and distance between subspaces. TheGrassmann-Newton algorithm is made explicit in Section 4 and practical applications areworked out in details in Section 5.

2 The Grassmann manifold

The goal of this section is to recall relevant facts about the Grassmann manifolds. Moredetails can be read from [Won67, Boo75, DM90, HM94, FGP94].

Letn be a positive integer and let p be a positive integer not greater than n. The set ofp-dimensional linear subspaces ofRn (linear will be omitted in the sequel) is termed theGrassmann manifold, denoted here by Grass(p, n).

3

8/13/2019 AMS04-tesis

4/20

An element Yof Grass(p, n), i.e. a p-dimensional subspace ofRn, can be specified by abasis, i.e. a set ofp vectors y1, . . . , yp such that Y is the set of all their linear combinations.When the ys are ordered as the columns of an n-by-p matrix Y, then Y is said to span

Y and Y is said to be the column space (or range, or image, or span) of Y, and we writeY= span(Y). The span of an n-by-p matrix Y is an element of Grass(p, n) if and only ifYhas full rank. The set of such matrices is termed the noncompact Stiefel manifold1

ST(p, n) :={Y Rnp : rank(Y) =p}.

Given Y Grass(p, n), the choice of a Y in ST(p, n) such that Y spans Y is not unique.There are infinitely many possibilities. Given a matrix Y in ST(p, n), the set of the matricesin ST(p, n) that have the same span as Y is

YGLp:= {Y M :MGLp} (4)

where GLp denotes the set of the p-by-p invertible matrices. This identifies Grass(p, n) withthe quotient space ST(p, n)/GLp := {YGLp : Y ST(p, n)}. In fiber bundle theory, thequadruple (GLp, ST(p, n), , Grass(p, n)) is called a principal GLp fiber bundle, with totalspace ST(p, n), base space Grass(p, n) = ST(p, n)/GLp, group action

ST(p, n) GLp(Y, M)Y MST(p, n)

and projection map: ST(p, n) Y span(Y) Grass(p, n).

See e.g. [KN63] for the general theory of principal fiber bundles and [FGP94] for a detailedtreatment of the Grassmann case. In this paper, we use the notation span(Y) and (Y) to

denote the column space ofY.To each subspace Ycorresponds an equivalence class (4) ofn-by-p matrices that span Y,

and each equivalence class contains infinitely many elements. It is however possible to locallysingle out a unique matrix in (almost) each equivalence class, by means of cross sections.Here we will consider affine cross sections, which are defined as follows (see illustration onFigure 1). Let W ST(p, n). The matrix Wdefines an affine cross section

SW :={Y ST(p, n) :WT(Y W) = 0} (5)

orthogonal to the fiber WGLp. LetY ST(p, n). IfWTY is invertible, then the equivalence

classYGLp (i.e. the set of matrices with the same span as Y) intersects the cross sectionSWat the single point Y(WTY)1WTW. IfWTY is not invertible, which means that the spanofW contains an orthogonal direction to the span ofY, then the intersection between thefiberWGLp and the section SW is empty. Let

UW :={span(Y) :WTY is invertible} (6)

be the set of subspaces whose representing fiber YGLp intersects the section SW. The map-ping

W :UW span(Y)Y(WTY)1WTW SW, (7)

1The (compact) Stiefel manifold is the set oforthonormal npmatrices.

4

8/13/2019 AMS04-tesis

5/20

W SW

0

Y

WGLp

W(span(Y))

W

YGLpWM

WM

Figure 1: This is an illustration of Grass(p, n) as the quotient ST(p, n)/GLpfor the casep = 1,n= 2. Each point, the origin excepted, is an element of ST(p, n) = R2 {0}. Each line is an

equivalence class of elements of ST(p, n) that have the same span. So each line corresponds toan element of Grass(p, n). The affine subspaceSWis an affine cross section as defined in (5).The relation (10) satisfied by the horizontal lift of a tangent vector TWGrass(p, n) isalso illustrated. This picture can help to get insight into the general case. One has nonethelessto be careful when drawing conclusions from this picture. For example, in general there doesnot exist a submanifold ofRnp that is orthogonal to the fibersYGLpat each point, althoughit is obviously the case when p= 1 (any centered sphere in Rn will do).

which we will call cross section mapping, realizes a bijection between the subset UW ofGrass(p, n) and the affine subspace SW of ST(p, n). The classical manifold structure ofGrass(p, n) is the one that, for all W ST(p, n), makes Wa diffeomorphism between UW

and SW(embedded in the Euclidean space Rnp

) [FGP94]. Parameterizations of Grass(p, n)are then given by

R(np)p K (W+ WK) = span(W+ WK) UW,

where W is any element of ST(n p, n) such that WTW= 0.

3 Riemannian structure onGrass(p, n) = ST(p, n)/GLp

The goal of this section is to define a Riemannian metric on Grass(p, n) and then derive formu-las for the associated gradient, connection and geodesics. For an introduction to Riemanniangeometry, see e.g. [Boo75], [dC92] or the introductory chapter of [Cha93].

Tangent vectors

A tangent vectorof Grass(p, n) atWcan be thought of as an elementary variation of the p-dimensional subspaceW(see [Boo75, dC92] for a more formal definition of a tangent vector).Here we give a way to represent by a matrix. The principle is to decompose variations of abasis W ofWinto a component that does not modify the span and a component that doesmodify the span. The latter represents a tangent vector of Grass(p, n) at W.

Let W ST(p, n). The tangent space to ST(p, n) at W, denoted TWST(p, n), is trivial:ST(p, n) is an open subset ofRnp, so ST(p, n) and Rnp are identical in a neighbourhood

5

8/13/2019 AMS04-tesis

6/20

x1

x2

Y(0)

Y(1)y2(0)

y1(0)

Figure 2: This picture illustrates for the casep = 2,n = 3 howYrepresents an elementaryvariationof the subspace Yspanned by the columns ofY. ConsiderY(0) = [y1(0)|y2(0)]and Y(0) = [x1|x2]. By the horizontality condition Y(0)

TY(0) = 0, both x1 and x2 are

normal to the space Y(0) spanned by Y(0). Let y1(t) =y1(0) +tx1, y2(t) =y2(0) +tx1 andlet Y(t) be the subspace spanned by y1(t) and y2(t). Then we have = Y(0).

ofW, and therefore TWST(p, n) =TWRnp which is just a copy ofRnp. The vertical space

VW is by definition the tangent space to the fiber WGLp, namely

VW =WRpp ={W m: m Rpp}.

Its elements are the elementary variations ofW that do not modify its span. We define thehorizontal space HW as

HW :=TWSW ={WK :K R(np)p}. (8)

One readily verifies that HW verifies the characteristic properties of horizontal spaces inprincipal fiber bundles [KN63, FGP94]. In particular, TWST(p, n) = VWHW. Note thatwith our choice ofHW,

TVH= 0 for all V VWand HHW.

Letbe a tangent vector to Grass(p, n) atWand letW spanW. According to the theoryof principal fiber bundles [KN63], there exists one and only one horizontal vector W thatrepresents in the sense that W projects to via the span operation, i.e. d(W) W = .See Figure 1 for a graphical interpretation. It is easy to check that

W =dW(W) (9)

whereWis the cross section mapping defined in (7). Indeed, it is horizontal and projects to

via since W is locally the identity. The representation Wis called the horizontal liftof TWGrass(p, n) at W. The next proposition characterizes how the horizontal lift variesalong the equivalence class WGLp.

Proposition 3.1 LetW Grass(p, n), letW spanWand let TWGrass(p, n). LetWdenote the horizontal lift ofatW. Then for allMGLp,

WM =WM. (10)

Proof. This comes from (9) and the property WM(Y) =W(Y) M. The homogeneity property (10) and the horizontality ofWare characteristic of horizontal

lifts.

6

8/13/2019 AMS04-tesis

7/20

We now introduce notation for derivatives. Let fbe a smooth function between two linearspaces. We denote by

Df(x)[y] :=

d

dtf(x + ty)|t=0

the directional derivative of f at x in the direction of y. Let f be a smooth real-valuedfunction defined on Grass(p, n) in a neighbourhood ofW. We will use the notation f(W)to denote f(span(W)). The derivative off in the direction of the tangent vector at W,denoted byf, can be computed as

f= Df(W)[W]

whereW spans W.

Lie derivative

A tangent vector field on Grass(p, n) assigns to each Y Grass(p, n) an element Y TYGrass(p, n).

Proposition 3.2 (Lie bracket) Let andbe smooth tangent vector fields onGrass(p, n).LetWdenote the horizontal lift ofatWas defined in (9). Then

[, ]W = W[W, W] (11)

whereW :=I W(W

TW)1WT (12)

denotes the projection into the orthogonal complement of the span ofW and

[W, W] = D(W)[W] D(W)[W]

denotes the Lie bracket inRnp.

That is, the horizontal lift of the Lie bracket of two tangents vector fields on the Grassmannmanifold is equal to the horizontal projection of the Lie bracket of the horizontal lifts of thetwo tangent vector fields.Proof. Let W ST(p, n) be fixed. We prove formula (11) by making computations in thecoordinate chart (UW, W). In order to simplify notations, let Y :=WYandY :=WYY.

Note that W =W and W =W. One has

[, ]W = D(W)[W] D(W)[

W].

After some manipulations using (5) and (7), it comes

Y = ddtWY + Yt|t=0= Y

Y(WTW)1WTY.

Then, using WTW = 0,

D(W)[W] = D(W)[W] W(WTW)1WTD(W)[W].

The term D(W)[W] is directly deduced by interchanging and , and the result is proved.

7

8/13/2019 AMS04-tesis

8/20

Metric

We consider the following metric on Grass(p, n):

, Y:= trace

(YTY)1TYY

(13)

whereY spansY. It is easily checked that the expression (13) does not depend on the choiceof the basisYthat spansY. This metric is the only one (up to multiplications by a constant)to be invariant under the action ofOn on R

n. Indeed

trace

((QY)T QY)1 (QY)TQY

= trace

(YTY)1TYY

for all Q On, and uniqueness is proved in [Lei61]. We will see later that the definition (13)induces a natural notion ofdistancebetween subspaces.

Gradient

On an abstract Riemannian manifold M, the gradient of a smooth real function fat a pointx of M, denoted by gradf(x), is roughly speaking the steepest ascent vector of f in thesense of the Riemannian metric. More rigorously, gradf(x) is the element ofTxM satisfyinggradf(x), = f for all TxM. On the Grassmann manifold Grass(p, n) endowed withthe metric (13), one checks that

(gradf)Y = Ygradf(Y) YTY (14)

where Y is the orthogonal projection (12) into the orthogonal complement ofY, f(Y) =f(span(Y)) and gradf(Y) is the Euclidean gradient off at Y, given by (gradf(Y))ij =f(Y)Yij

(Y). The Euclidean gradient is characterized by

Df(Y)[Z] = trace(ZT gradf(Y)), Z R

np, (15)

which can ease its computation in some cases.

Riemannian connection

Let, be two tangent vector fields on Grass(p, n). There is no predefined way of computingthe derivative of in the direction of because there is no predefined way of comparing thedifferent tangent spaces TYGrass(p, n) as Y varies. However, there is a prefered definition

for the directional derivative, called the Riemannian connection (or Levi-Civita connection),defined as follows [Boo75, dC92].

Definition 3.3 (Riemannian connection) Let Mbe a Riemannian manifold and let itsmetric be denoted by, . Letx M. The Riemannian connection onMhas the followingproperties: For all smooth real functionsf, g onM, all, inTxMand all smooth vectorsfields, :1. f+g= f+ g2. (f + g) =f+ g + (f)+ (g)

3. [, ] = .

4. , = , + ,

.

8

8/13/2019 AMS04-tesis

9/20

Properties 1 and 2 define connections in general. Property 3 states that the connection istorsion-free, and property 4 specifies that the metric tensor is invariant by the connection.A famous theorem of Riemannian geometry states that there is one and only one connection

verifying these four properties. IfM is a submanifold of an Euclidean space, then the Rie-mannian connectionconsists in taking the derivative ofin the ambient Euclidean spacein the direction of and projecting the result into the tangent space of the manifold. As weshow in the next theorem, the Riemannian connection on the Grassmann manifold, expressedin terms of horizontal lifts, works in a similar way.

Theorem 3.4 (Riemannian connection) LetY Grass(p, n), letY ST(p, n) spanY.Let TYGrass(p, n), and letbe a smooth tangent vector field defined in a neighbourhood ofY. Let: W Wbe the horizontal lift ofas defined in (9). LetGrass(p, n)be endowedwith theOn-invariant Riemannian metric (13) and let denote the associated Riemannianconnection. Then

()Y = YY (16)whereY is the projection (12) into the orthogonal complement ofY and

Y:= D.(Y)[Y] := d

dt(Y+Yt)|t=0

is the directional derivative of in the direction ofY in the Euclidean spaceRnp.

This theorem says that the horizontal lift of the covariant derivative of a vector field onthe Grassmannian in the direction of is equal to the horizontal projection of the derivativeof the horizontal lift ofin the direction of the horizontal lift of.Proof. One has to prove that (16) satisfies the four characteristic properties of the Riemannian

connection. The two first properties concern linearity in andand are easily checked. Thetorsion-free property is direct from (16) and (11). The fourth property, invariance of themetric, holds for (16) since

, = DYtrace((YTY)1TYY)(W)[W]

= trace((WTW)1DT(W)[W]W+ TWD(W)[W])

= , + ,

Parallel transport

Let t Y(t) be a smooth curve on Grass(p, n). Let be a tangent vector defined along thecurve Y(). Then is said to be parallel transported along Y() if

Y(t)= 0 (17)

for all t, where Y(t) denotes the tangent vector to Y() at t.We will need the following classical result of fiber bundle theory [KN63]. A curvetY(t)

on ST(p, n) is termed horizontal ifY(t) is horizontal for all t, i.e. Y(t) HY(t). Lett Y(t)be a smooth curve on Grass(p, n) and letY0 ST(p, n) spanY(0). Then there exists a uniquehorizontal curvet Y(t) on ST(p, n) such thatY(0) =Y0and Y(t) = span(Y(t)). The curveY(0) is called the horizontal lift ofY(0) throughY0.

9

8/13/2019 AMS04-tesis

10/20

Proposition 3.5 (parallel transport) Lett Y(t)be a smooth curve onGrass(p, n). Letbe a tangent vector field onGrass(p, n) defined alongY(). LettY(t)be a horizontal liftof t Y(t). Let Y(t) denote the horizontal lift of atY(t) as defined in (9). Then is

parallel transported along the curveY() if and only if

Y(t)+ Y(t)(Y(t)TY(t))1Y(t)TY(t)= 0 (18)

whereY(t):= ddY()|=t.

In other words, the parallel transport ofalong Y() is obtained by infinitesimally removingthe vertical component (the second term in the left-hand side of (18) is vertical) of thehorizontal lift ofalong a horizontal lift ofY().

Proof. Lett Y(t),and t Y(t) be as in the statement of the proposition. Then Y(t)is the horizontal lift ofY(t) at Y(t) and

Y(t)= Y Y(t)

by (16), where Y(t):= ddY()|=t. SoY(t)= 0 if and only if Y Y(t)= 0, i.e.

Y(t)

VY(t), i.e. Y(t) = Y(t)M(t) for some M(t). Since is horizontal, one has YTY= 0, thus

YTY + YTY= 0 and therefore M=(Y

TY)1YTY. It is interesting to notice that (18) is not symmetric in Y and . This is apparently in

contradiction with the symmetry of the Riemannian connection, but one should bear in mindthatY and arenot expressions ofYandin a fixed coordinate chart, so (18) need not besymmetric.

Geodesics

We now give a formula for the geodesic t Y(t) with initial point Y(0) = Y0 and initialvelocity Y0 TY0Grass(p, n). The geodesic is characterized by YY= 0, which says thatthe tangent vector to Y() is parallel transported along Y(). This expresses the idea thatY(t) goes straight on at constant pace.

Theorem 3.6 (geodesics) Lett Y(t)be a geodesic onGrass(p, n)with Riemannian met-ric (13) fromY0 with initial velocity Y0 TY0Grass(p, n). LetY0 spanY0, let(Y0)Y0 be thehorizontal lift ofY0, and let (Y0)Y0(Y

T0 Y0)

1/2 = UVT be a thin singular value decompo-sition, i.e. U is n p orthonormal, V is p p orthonormal and is p p diagonal withnonnegative elements.Then

Y(t) = span( Y0(YT0 Y0)

1/2V cost + Usint ). (19)

This expression obviously simplifies whenY0 is chosen orthonormal.The exponential ofY0, denoted byExp(Y0), is by definitionY(t= 1).

Note that this formula is not new except for the fact that a nonorthonormal Y0 is allowed.In practice, however, one will prefer to orthonormalize Y0 and use the simplified expression.Edelman, Arias and Smith [EAS98] obtained the orthonormal version of the geodesic formulausing the symmetric space structure of Grass(p, n) =On/(Op Onp).

10

8/13/2019 AMS04-tesis

11/20

Proof. LetY(t), Y0, UVT be as in the statement of the theorem. LettY(t) be the

unique horizontal lift ofY() throughY0, so that YY(t)= Y(t). Then the formula for parallel

transport (18), applied to := Y, yields

Y + Y(YTY)1YTY = 0. (20)

Since Y() is horizontal, one hasYT(t)Y(t) = 0 (21)

which is compatible with (20). This implies that Y(t)TY(t) is constant. Moreover, equa-tions (20) and (21) imply that ddt Y(t)

TY(t) = 0, so Y(t)TY(t) is constant. Consider the thin

SVD Y(0)(YTY)1/2 =UVT. From (20), one obtains

Y(t)(YTY)1/2 + Y(t)(YTY)1/2 (YTY)1/2(YTY)(YTY)1/2 = 0

Y(t)(YTY)1/2 + Y(t)(YTY)1/2V2VT = 0

Y(t)(YTY)1/2V + Y(t)(YTY)1/2V2 = 0

which yields

Y(t)(YTY)1/2V =Y0(YT0 Y0)

1/2V cost + Y0(YT0 Y0)

1/2V1 sint

and the result follows. As an aside, Theorem 3.6 shows that the Grassmann manifold is complete, i.e. the geodesics

can be extended indefinitely [Boo75].

Distance between subspaces

The geodesics can be locally interpreted as curves of shortest length [Boo75]. This motivatesthe following notion of distance between two subspaces.

LetX and Ybelong to Grass(p, n) and let X, Ybe orthonormal bases for X, Y, respec-tively. LetY UX(6), i.e. X

TY is invertible. Let XY(XTY)1 =UVT be an SVD. Let

= atan. Then the geodesic

tExpt= span(XV cost + Usint),

where X = UVT, is the shortest curve on Grass(p, n) from X to Y. The elements i

of are called the principal angles between X and Y. The columns of XV and thoseof (XVcos +Usin ) are the corresponding principal vectors. The geodesic distance on

Grass(p, n) induced by the metric (13) is

dist(X, Y) =

, =

21+ . . . + 2p.

Other definitions of distance on Grassmann are given in [EAS98, 4.3]. A classical one is theprojection 2-norm X Y2 = sin max where max is the largest principal angle [Ste73,GV96]. An algorithm for computing the principal angles and vectors is given in [BG73, GV96].

11

8/13/2019 AMS04-tesis

12/20

Discussion

This completes our study of the Riemannian structure of the Grassmann manifold Grass(p, n)

using bases, i.e. elements of ST(p, n), to represent its elements. We are now ready to give inthe next section a formulation of the Riemann-Newton method on the Grassmann manifold.Following Smith [Smi94], the function F in (1) becomes a tangent vector field (Smithworks with one-forms, but this is equivalent because the Riemannian connection leaves themetric invariant [MM02]). The directional derivativeD in (1) is replaced by the Riemannianconnection, for which we have given a formula in Theorem 3.4. As far as we know, thisformula has never been published, and as we shall see it makes the derivation of the Newtonalgorithm very simple for some vector fields . The update (2) is performed along the geodesic(Theorem 3.6) generated by the Newton vector. Convergence of the algorithms can be assessedusing the notion of distance defined above.

4 Newton iteration on the Grassmann manifold

A number of authors have proposed and developed a general theory of Newton iteration onRiemannian manifolds [Gab82, Smi93, Smi94, Udr94, MM02]. In particular, Smith [Smi94]proposes an algorithm for abstract Riemannian manifolds which amounts to the following.

Algorithm 4.1 (Riemann-Newton) Let M be a Riemannian manifold, be the Levi-Civita connection onM, andbe a smooth vector field onM. The Newton iteration onMfor computing a zero ofconsists in iterating the mappingxx+ defined by1. Solve the Newton equation

= (x) (22)

for Tx

M.2. Compute the update x+ := Exp , whereExp denotes the Riemannian exponential map-ping.

The Riemann-Newton iteration, expressed in the so-called normal coordinates at x (normalcoordinates use the inverse exponential as a coordinate chart [Boo75]), reduces to the classicalNewton method (1)-(2) [MM02]. It converges locally quadratically to the nondegeneratezerosof, i.e. the points x such that(x) = 0 and TxGrass(p, n) is invertible (see proofin Section A).

On the Grassmann manifold, the Riemann-Newton iteration yields the following algo-rithm.

Theorem 4.2 (Grassmann-Newton) Let the Grassmann manifoldGrass(p, n)be endowed

with theOn-invariant metric(13). Letbe a smooth vector field onGrass(p, n). Letdenotethe horizontal lift ofas defined in (9). Then the Riemann-Newton method (Algorithm 4.1)onGrass(p, n) forconsists in iterating the mappingY Y+ defined by1. Pick a basisY that spansYand solve the equation

YD(Y)[Y] =Y (23)

for the unknownYin the horizontal spaceHY ={YK :K R(np)p}.

2. Compute an SVDY =UVT and perform the update

Y+:= span(Y Vcos + Usin). (24)

12

8/13/2019 AMS04-tesis

13/20

Proof. Equation (23) is the horizontal lift of equation (22) where the formula (16) for theRiemannian connection has been used. Equation (24) is the exponential update given informula (19).

It often happens thatis the gradient (14) of a cost function f,= grad f, in which casethe Newton iteration searches a stationary point off. In this case, the Newton equation (23)reads

YD(gradf())(Y)[Y] =Ygradf(Y)

where the formula (14) has been used for the Grassmann gradient. This equation can beinterpreted as the Newton equation in Rnp

D(gradf())(Y)[] =gradf(Y)

projected onto the horizontal space (8). The projection operation cancels out the directionsalong the equivalence class YGLp, which intuitively makes sense since they do not generate

variations of the span ofY .It is often the case that admits the expression

Y = YF(Y) (25)

where F is a homogeneous function, i.e. F(Y M) =F(Y)M. In this case, the Newton equa-tion (23) becomes

YDF(Y)[Y] Y(YTY)1YTF(Y) =YF(Y) (26)

where we have taken into account that YTY= 0 since Y is horizontal.

5 Practical applications of the Newton method

In this section, we illustrate the applicability of the Grassmann-Newton method (Theorem 4.2)on two problems that can be cast as the computing a zero of a tangent vector field on theGrassmann manifold.

Invariant subspace computation

LetA be an n n matrix and letY := YAY (27)

where Y denotes the projector (12) into the orthogonal complement of the span ofY. This

expression is homogeneous and horizontal, therefore it is a well-defined horizontal lift anddefines a tangent vector field on Grassmann. Moreover, (Y) = 0 if and only if Y is aninvariant subspace ofA. Obtaining the Newton equation (23) for defined in (27) is nowextremely simple: the simplification (25) holds withF(Y) =AY, and (26) immediately yields

Y(AY Y(YTY)1YTAY) =YAY (28)

which has to be solved for Y in the horizontal space HY (8). The resulting iteration, (28)-(24), converges locally to the nondegenerate zeros of, which are the spectral2 right invariant

2A right invariant subspace Y ofA is termed spectral if, given [Y|Y] orthogonal such that Y spans Y,YTAY and YT AY have no eigenvalue in common [RR02].

13

8/13/2019 AMS04-tesis

14/20

subspaces ofA; see [Abs03] for details. The rate of convergence is quadratic. It is cubic ifand only if the zero ofis also a left invariant subspace ofA (see Section B). This happensin particular when A = AT.

Edelman, Arias and Smith [EAS98] consider the Newton method on Grassmann for theRayleigh quotient cost function f(Y) := trace((Y

TY)1YTAY), assuming A = AT. Theyobtain the same equation (28), which is not surprising since it can be shown, using (14), thatour is the gradient of their f.

Equation (28) also connects with a method proposed by Chatelin [Cha84] for refininginvariant subspace estimates. She considers the equation AY = Y B whose solutions Y ST(p, n) span invariant subspaces ofA and imposes a normalization condition ZTY = I onY, where Z is a given n p matrix. This normalization condition can be interpreted asrestrictingYto a cross section (5). Then she applies the classical Newton method for findingsolutions of AY = Y B in the cross section and obtains an equation similar to (28). Theequations are in fact identical if the matrix Z is chosen to span the current iterate. Following

Chatelins approach, the projective update Y+= span(Y+ Y) is used instead of the geodesicupdate (24).

The algorithm with projective update is also related to the Grassmannian Rayleigh quo-tient iteration (GRQI) proposed in [AMSV02]. The two methods are identical when p =1 [SE02]. They differ when p >1, but they both compute eigenspaces ofA = AT with cubicrate of convergence. For A arbitrary, a two-sided version of GRQI is proposed in [AV02] thatalso computes the eigenspaces with cubic rate of convergence.

Methods for solving (28) are given in [Dem87] and [LE02]. Lundstrom and Elden [LE02]give an algorithm that allows to solve the equation without explicitly computing the inter-action matrix YAY. The global behaviour of the iteration is studied in [ASVM04] andheuristics are proposed that enlarge the basins of attraction of the invariant subspaces.

Mean of subspaces

Let Yi, i = 1, . . . , m, be a collection of p-dimensional subspaces of Rn. We consider theproblem of computing the mean of the subspaces Yi. Since Grass(p, n) is complete, if thesubspacesYi are clustered sufficiently close together then there is a uniqueX that minimizesV(X) :=

mi=1dist

2(X, Yi). ThisX is called the Karcher mean of the m subspaces [Kar77,Ken90].

A steepest descent algorithm is proposed in [Woo02] for computing the Karcher mean ofa cluster of point on a Riemannian manifold. Since it is a steepest descent algorithm, itsconvergence rate is only linear.

The Karcher mean verifiesmi=1

i = 0 where i := Exp1X Yi. This suggests to take

(X) :=

mi=1Exp1X Yi and apply the Riemann-Newton algorithm. On the Grassmann man-ifold, however, this idea does not work well because of the complexity of the relation be-tween Yi and i, see Section 3. Therefore, we use another definition of the mean in whichiX= XYiX, whereXspansX,Y

i spansYi, X =I X(XTX)1XT is the orthogo-

nal projector into the orthogonal complement of the span ofXand Y =Y(YTY)1YT is the

orthogonal projector into the span ofY. While the Karcher mean minimizesm

i=1

pj=1

2i,j

where i,j is the jth canonical angle between X and Yi, our modified mean minimizesm

i=1

pj=1sin

2 i,j . Both definitions are asymptotically equivalent for small principal an-

14

8/13/2019 AMS04-tesis

15/20

gles. Our definition yields

X=m

i=1

XYiX

and one readily obtains, using (26), the following expression for the Newton equation

mi=1

(XYiX X(XTX)1XTYiX) =

mi=1

XYiX

which has to be solved for Xin the horizontal space HX={XK :K R(np)p}.

We have tested the resulting Newton iteration in the following situation. We draw msamplesKi R

(np)p where the elements of each Kare i.i.d. normal random variables with

mean zero and standard deviation 1e2 and define Yi = span(IpKi

). The initial iterate is

X :=Y1 and we define Y0 = span( Ip0). Experimental results are shown below.

Newton X+= X gradV /m

Iterate number m

i=1 i dist(X, Y0)

mi=1

i dist(X, Y0)

0 2.4561e+01 2.9322e-01 2.4561e+01 2.9322e-011 1.6710e+00 3.1707e-02 1.9783e+01 2.1867e-012 5.7656e-04 2.0594e-02 1.6803e+01 1.6953e-013 2.4207e-14 2.0596e-02 1.4544e+01 1.4911e-014 8.1182e-16 2.0596e-02 1.2718e+01 1.2154e-01

300 5.6525e-13 2.0596e-02

6 Conclusion

We have considered the Grassmann manifold Grass(p, n) ofp-planes in Rn as the base spaceof a GLp-principal fiber bundle with the noncompact Stiefel manifold ST(p, n) as total space.Using the essentially uniqueOn-invariant metric on Grass(p, n), we have derived a formula forthe Levi-Civita connection in terms of horizontal lifts. Moreover, formulas have been given forthe Lie bracket, parallel translation, geodesics and distance between p-planes. Finally, theseresults have been applied to a detailed derivation of the Newton method on the Grassmannmanifold. The Grassmann-Newton method has been illustrated on two examples.

A Quadratic convergence of Riemann-Newton

For completeness we include a proof of quadratic convergence of the Riemann-Newton iter-ation (Algorithm 4.1). Our proof significantly differs from the proof previously reported inthe literature [Smi94]. This proof also prepares the discussion on cubic convergence cases inSection B.

Let be a smooth vector field on a Riemannian manifold M and let denote the Rie-mannian connection. Letz M be a nondegenerate zero of the smooth vector field (i.e.z = 0 and the linear operator TzM TzM is invertible). LetNz be a normalneighbourhood ofz , sufficiently small so that any two points ofNz can be joined by a uniquegeodesic [Boo75]. Let xy denote the parallel transport along the unique geodesic between xand y. Let the tangent vector TxMbe defined by Expx= z. Define the vector field

15

8/13/2019 AMS04-tesis

16/20

on Nz adapted to the tangent vector TxM byy = xy. Applying Taylors formula tothe function Expx yields [Smi94]

0 =z =x+ +12

2+ O(3) (29)

where2:= (). Subtracting Newton equation (22) from Taylors formula (29) yields

()= 2+ O(

3). (30)

Since is a smooth vector field and z is a nondegenerate zero of, and reducing the size ofNz if necessary, one has

c1

2 c32

for all y Nz and TyM. Using these results in (30) yieldsc1 c2

2 + O(3). (31)

From now on the proof significantly differs from the one in [Smi94]. We will show nextthat, reducing again the size ofNz if necessary, there exists a constant c4 such that

dist(Expy, Expy) c4 (32)

for all y Nz and all , TyMsmall enough for Expy and Expyto be in Nz . Then itfollows immediately from (31) and (32) that

dist(x+, z) = dist(Expy, Expy) c2c1

2 + O(3) =O(dist(x, z)2)

and this is quadratic convergence.To show (32), we work in local coordinates covering Nz and use tensorial notations (see

e.g. [Boo75]), so e.g. ui denotes the coordinates ofu M. Consider the geodesic equationui +

ijk(u)u

iuj = 0 where stands for the (smooth) Christoffel symbol, and denote the

solution byi[t, u(0), u(0)]. Then (Expy)i =i[1, y , ], (Expy)

i =i[1, y , ], and the curvei : i[1, y , + ( )] verifies i(0) = (Expy)

i and i(1) = (Expy)i. Then

dist(Expy, Expy)

10

gij [()]i()j() d (33)

= 10

gij [()]

i

uk [1, y , + ( )]

i

u [1, y , + ( )](k

k

)(

) d

c

k(k k)( ) (34)

c4

gk[y](k k)( ) (35)

= c4 .

Equation (33) gives the length of the curve (0, 1), for which dist(Expy, Expy) is a lowerbound. Equation (34) comes from the fact that the metric tensor gij and the derivatives ofare smooth functions defined on a compact set, thus bounded. Equation (35) comes becausegij is nondegenerate and smooth on a compact set.

16

8/13/2019 AMS04-tesis

17/20

B Cubic convergence of Riemann-Newton

We use the notations of the previous section.

If2= 0 for all tangent vector TzM, then the rate of convergence of the Riemann-Newton method (Algorithm 4.1) is cubic. Indeed, by the smoothness of , and defining xsuch that Expx=z, one has

2= O(

3), and replacing this into (30) gives the result.For the sake of illustration, we consider the particular case where M is the Grassmann

manifold Grass(p, n) ofp-planes in Rn,A is an n nreal matrix and the tangent vector fieldis defined by the horizontal lift (27)

Y = YAY

where Y := (I Y YT). Let Z Grass(p, n) verify Z= 0, which happens if and only if

Zis a right invariant subspace ofA. We show that 2= 0 for all TZGrass(p, n) if andonly ifZis a left invariant subspace ofA (which happens e.g. when A = AT).

LetZbe an orthonormal basis for Z, i.e. Z= 0 and ZTZ=I. Let Z=UVT be athin singular value decomposition of TZGrass(p, n). Then the curve

Y(t) =Z V cost + Usint

is horizontal and projects through the span operation to the Grassmann geodesic ExpZt.Since by definition the tangent vector of a geodesic is parallel transported along the geodesic,the adaptation of verifies

Y(t) = Y(t) =Ucost ZVcost.

Then one obtains successively

()Y(t) = Y(t)D(Y(t))[Y(t)]

= Y(t)d

dtY(t)AY(t)

= Y(t)AY(t) Y(t)Y(t)Y(t)

TAY(t)

and

2 = ()Z

= Zd

dt()Y(t)|t=0

= ZAY(0) ZY(0)Y(0)TAY(0) ZY(0)(

Y(0)TAY(0) + Y(0)TAY(0))

= 2UVTZTAU

where we have used YY = 0, ZTU = 0, UTAZ = ZAZ = 0. This last expression

vanishes for all TZGrass(p, n) if and only ifUTATZ= 0 for all U such that UTZ = 0,

i.e. Zis a left invariant subspace ofA.

Acknowledgements

The first author thanks P. Lecomte (Universite de Liege) and U. Helmke and S. Yoshizawa(Universitat Wurzburg) for useful discussions.

17

8/13/2019 AMS04-tesis

18/20

This paper presents research partially supported by the Belgian Programme on Inter-university Poles of Attraction, initiated by the Belgian State, Prime Ministers Office forScience, Technology and Culture. Part of this work was performed while the first author

was a guest at the Mathematisches Institut der Universitat Wurzburg under a grant fromthe European Nonlinear Control Network. The hospitality of the members of the instituteis gratefully acknowledged. The work was completed while the first and last authors werevisiting the department of Mechanical and Aerospace Engineering at Princeton university.The hospitality of the members of the department, especially Prof. N. Leonard, is gratefullyacknowledged. The last author thanks N. Leonard and E. Sontag for partial financial supportunder US Air Force Grants F49620-01-1-0063 and F49620-01-1-0382.

References

[Abs03] P.-A. Absil, Invariant subspace computation: A geometric approach, Ph.D. the-sis, Faculte des Sciences Appliquees, Universite de Liege, Secretariat de la FSA,Chemin des Chevreuils 1 (Bat. B52), 4000 Liege, Belgium, 2003.

[AMSV02] P.-A. Absil, R. Mahony, R. Sepulchre, and P. Van Dooren, A Grassmann-Rayleighquotient iteration for computing invariant subspaces, SIAM Review 44 (2002),no. 1, 5773.

[ASVM04] P.-A. Absil, R. Sepulchre, P. Van Dooren, and R. Mahony, Cubically convergentiterations for invariant subspace computation, to appear in SIAM J. Matrix Anal.Appl., 2004.

[AV02] P.-A. Absil and P. Van Dooren,Two-sided Grassmann-Rayleigh quotient iteration,

submitted to SIAM J. Matrix Anal. Appl., October 2002.

[BG73] A. Bjork and G. H. Golub,Numerical methods for computing angles between linearsubspaces, Math. Comp. 27(1973), 579594.

[Boo75] W. M. Boothby, An introduction to differentiable manifolds and Riemannian ge-ometry, Academic Press, 1975.

[CG90] P. Common and G. H. Golub,Tracking a few extreme singular values and vectorsin signal processing, Proceedings of the IEEE 78 (1990), no. 8, 13271343.

[Cha84] F. Chatelin, Simultaneous Newtons iteration for the eigenproblem, Computing,Suppl.5(1984), 6774.

[Cha93] I. Chavel, Riemannian geometry - A modern introduction, Cambridge UniversityPress, 1993.

[dC92] M. P. do Carmo,Riemannian geometry, Birkauser, 1992.

[Dem87] J. W. Demmel,Three methods for refining estimates of invariant subspaces, Com-puting38 (1987), 4357.

[DM90] B. F. Doolin and C. F. Martin,Introduction to differential geometry for engineers,Monographs and Textbooks in Pure and Applied Mathematics, vol. 136, MarcelDeckker, Inc., New York, 1990.

18

8/13/2019 AMS04-tesis

19/20

[DS83] J. E. Dennis and R. B. Schnabel,Numerical methods for unconstrained optimiza-tion and nonlinear equations, Prentice Hall Series in Computational Mathematics,Prentice Hall, Englewood Cliffs, NJ, 1983.

[EAS98] A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms withorthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998), no. 2, 303353.

[FGP94] J. Ferrer, M. I. Garca, and F. Puerta,Differentiable families of subspaces, LinearAlgebra Appl. 199(1994), 229252.

[Gab82] D. Gabay,Minimizing a differentiable function over a differential manifold, Jour-nal of Optimization Theory and Applications 37(1982), no. 2, 177219.

[GV96] G. H. Golub and C. F. Van Loan, Matrix computations, third edition, JohnsHopkins Studies in the Mathematical Sciences, Johns Hopkins University Press,

1996.[Hel78] S. Helgason,Differential geometry, Lie groups, and symmetric spaces, Pure and

Applied Mathematics, no. 80, Academic Press, Oxford, 1978.

[HM94] U. Helmke and J. B. Moore,Optimization and dynamical systems, Springer, 1994.

[Kar77] H. Karcher, Riemannian center of mass and mollifier smoothing, Comm. PureAppl. Math.30(1977), no. 5, 509541.

[Ken90] W. S. Kendall, Probability, convexity, and harmonic maps with small image. I.Uniqueness and fine existence, Proc. London Math. Soc.61(1990), no. 2, 371406.

[KN63] S. Kobayashi and K. Nomizu,Foundations of differential geometry, volumes 1 and2, John Wiley & Sons, 1963.

[LE02] E. Lundstrom and L. Elden, Adaptive eigenvalue computations using Newtonsmethod on the Grassmann manifold, SIAM J. Matrix Anal. Appl. 23 (2002),no. 3, 819839.

[Lei61] K. Leichtweiss, Zur riemannschen Geometrie in grassmannschen Mannig-faltigkeiten, Math. Z. 76 (1961), 334366.

[Lue69] D. G. Luenberger,Optimization by vector space methods, John Wiley & Sons, Inc.,1969.

[Mah96] R. E. Mahony,The constrained Newton method on a Lie group and the symmetriceigenvalue problem, Linear Algebra Appl.248 (1996), 6789.

[MM02] R. Mahony and J. H. Manton, The geometry of the Newton method on non-compact Lie groups, J. Global Optim.23 (2002), no. 3, 309327.

[MS85] A. Machado and I. Salavessa, Grassmannian manifolds as subsets of Euclideanspaces, Res. Notes in Math. 131 (1985), 85102.

[Nom54] K. Nomizu, Invariant affine connections on homogeneous spaces, Amer. J. Math.76(1954), 3365.

19

8/13/2019 AMS04-tesis

20/20

[RR02] A. C. M. Ran and L. Rodman, A class of robustness problems in matrix analy-sis, Interpolation Theory, Systems Theory and Related Topics, The Harry DymAnniversary Volume (D. Alpay, I. Gohberg, and V. Vinnikov, eds.), Operator

Theory: Advances and Applications, vol. 134, Birkhauser, 2002, pp. 337383.

[SE02] V. Simoncini and L. Elden,Inexact Rayleigh quotient-type methods for eigenvaluecomputations, BIT42 (2002), no. 1, 159182.

[Smi93] S. T. Smith, Geometric optimization methods for adaptive filtering, Ph.D. the-sis, Division of Applied Sciences, Harvard University, Cambridge, Massachusetts,1993.

[Smi94] ,Optimization techniques on Riemannian manifolds, Hamiltonian and gra-dient flows, algorithms and control (Anthony Bloch, ed.), Fields Institute Com-munications, vol. 3, American Mathematical Society, 1994, pp. 113136.

[Ste73] G. W. Stewart,Error and perturbation bounds for subspaces associated with certaineigenvalue problems, SIAM Review 15 (1973), no. 4, 727764.

[Udr94] C. Udriste,Convex functions and optimization methods on Riemannian manifolds,Kluwer Academic Publishers, 1994.

[Won67] Y.-C. Wong,Differential geometry of Grassmann manifolds, Proc. Nat. Acad. Sci.U.S.A57(1967), 589594.

[Woo02] R. P. Woods, Characterizing volume and surface deformations in an atlas frame-work, Submitted to NeuroImage, 2002.

Documents

AMS04-tesis