11
Part VII: Part VII: Bayesian StatisticsBayesian Statistics[R. Trotta 0803.4089] [R. Trotta 0803.4089] [[R. Trotta 1701.01467] R. Trotta 1701.01467]
[Amendola & Tsujikawa cap. 13][Amendola & Tsujikawa cap. 13]
Miguel QuartinMiguel QuartinInstituto de Física, UFRJInstituto de Física, UFRJ
Astrofísica, Relativ. e Cosmologia (ARCOS)Astrofísica, Relativ. e Cosmologia (ARCOS)
Curso de Cosmologia Pós – 2019/1Curso de Cosmologia Pós – 2019/1
22
TopicsTopics
Short review of probability theoryShort review of probability theory
Bayes' TheoremBayes' Theorem
The Likelihood methodThe Likelihood method
Model SelectionModel Selection
[optional] Fisher Matrix[optional] Fisher Matrix
33
ProbabilitiesProbabilities
Classical interpretation of probability: infinite realization Classical interpretation of probability: infinite realization limit of relative frequencieslimit of relative frequencies Probability: “the number of times the event occurs over the Probability: “the number of times the event occurs over the
total number of trials, in the limit of an infinite series of total number of trials, in the limit of an infinite series of equiprobable repetitions.”equiprobable repetitions.”
Let's define 2 random (stochastic) variables x and y (e.g. Let's define 2 random (stochastic) variables x and y (e.g. numbers on a die roll).numbers on a die roll). pp(X) is the probability of getting the result x = X(X) is the probability of getting the result x = X pp(X, Y) or (X, Y) or pp(X (X ∩∩ Y) prob of getting results x = X AND y = Y→ Y) prob of getting results x = X AND y = Y→ pp(X | Y) or (X | Y) or pp(X ; Y) prob of x = X given the fact that y = Y→(X ; Y) prob of x = X given the fact that y = Y→ pp(X (X ∪∪ Y) prob of getting results x = X OR y = Y→ Y) prob of getting results x = X OR y = Y→
44
Probabilities (2)Probabilities (2)
Some properties:Some properties: Joint probabilities are symmetricJoint probabilities are symmetric
Joint prob of Joint prob of independentindependent events events
Joint prob of Joint prob of dependentdependent events events
Disjoint prob of Disjoint prob of mutually exclusivemutually exclusive events events
In particularIn particular
55
Probabilities (3)Probabilities (3)
Let's discuss the conditional probability property:Let's discuss the conditional probability property:
Suppose A refers to “person that studies physics” and B to Suppose A refers to “person that studies physics” and B to “person that plays piano”“person that plays piano”
Suppose also that we know that: Suppose also that we know that: p(B)p(B) = 1/100 and = 1/100 and p(A, B)p(A, B) = 1/ 1000= 1/ 1000 In other words, out of 1000 random people, 10 will play In other words, out of 1000 random people, 10 will play
the piano and 1 will play the piano AND be a physicistthe piano and 1 will play the piano AND be a physicist So, if someone plays piano, he has 1/10 chance of being So, if someone plays piano, he has 1/10 chance of being
also a physicistalso a physicist
66
Bayes' TheoremBayes' Theorem
Note thatNote that The probability of A given B is The probability of A given B is notnot the prob of B given A. the prob of B given A. E.g.: The probability of winning the lottery given that you E.g.: The probability of winning the lottery given that you
played twice in your life is played twice in your life is notnot the same as the probability the same as the probability that you played twice in your life given that you won the that you played twice in your life given that you won the lottery!lottery!
From the symmetry of the joint probabilities we getFrom the symmetry of the joint probabilities we get
This is the This is the Bayes TheoremBayes Theorem of conditional probabilities of conditional probabilities
77
InterpretationInterpretation
Classical “Classical “frequentistfrequentist” interpretation of probability ” interpretation of probability (infinite realization limit of relative frequencies) has (infinite realization limit of relative frequencies) has limitationslimitations It is It is circularcircular assumes that the repeated trials have same → assumes that the repeated trials have same →
probability of outcomesprobability of outcomes Cannot deal with unrepeatable situations [e.g. (i) probability Cannot deal with unrepeatable situations [e.g. (i) probability
I will die in a car accident; (ii) prob the Big-Bang happened I will die in a car accident; (ii) prob the Big-Bang happened the way it did]the way it did] ““what is the probability that it rained in Manaus during what is the probability that it rained in Manaus during
D. Pedro II 43D. Pedro II 43rdrd birthday?” birthday?” How to correct for finite realizations? How many How to correct for finite realizations? How many
realizations are needed for the frequencies to be approx. the realizations are needed for the frequencies to be approx. the probabilities? This approximation is to which % accuracy?probabilities? This approximation is to which % accuracy?
88
Interpretation (2)Interpretation (2)
The The BayesianBayesian interpretation is based on Bayes' Theorem interpretation is based on Bayes' Theorem Re-interpret the theorem not in terms of regular random Re-interpret the theorem not in terms of regular random
variables but in terms of data (variables but in terms of data (DD) and theory () and theory (TT)) Inverse statistical problem: what is the probability that Inverse statistical problem: what is the probability that
theory theory TT is correct given we measured the data is correct given we measured the data DD??
The “theory” might be a model (such as The “theory” might be a model (such as ΛΛCDM or DGP) of CDM or DGP) of just the parameter values of an assumed model (such as just the parameter values of an assumed model (such as ΩΩm0m0 and and ΩΩΛ0Λ0 , assuming , assuming ΛΛCDM).CDM).
99
Interpretation (3)Interpretation (3)
Bayesian analysis has some philosophical implicationsBayesian analysis has some philosophical implications The The best theorybest theory will be the will be the most probablemost probable theory theory Bayesian analysis carry a mathematically precise Bayesian analysis carry a mathematically precise
formulation of Occam's Razor: “if 2 hypotheses are equally formulation of Occam's Razor: “if 2 hypotheses are equally likely, the hypothesis with the fewest assumptions should likely, the hypothesis with the fewest assumptions should be selected”.be selected”. Only strong reason before the 18Only strong reason before the 18thth century to choose century to choose
Copernicus' model over Ptolomy'sCopernicus' model over Ptolomy's Karl Popper: we prefer simpler theories to more Karl Popper: we prefer simpler theories to more
complex ones “because their empirical content is complex ones “because their empirical content is greater; and because they are better testable” simple →greater; and because they are better testable” simple →theories are theories are more easily falsifiablemore easily falsifiable
1010
The Likelihood MethodThe Likelihood Method
Let's define the data as a vector Let's define the data as a vector xx and the parameters as and the parameters as a vector a vector θθ
We write Bayes' theorem asWe write Bayes' theorem as PP → → posteriorposterior probability probability pp((θθ) prior probability→) prior probability→ LL → → likelihoodlikelihood function function g(g(xx)) as we will see usually just a normalization factor→ → as we will see usually just a normalization factor→ →
We are interested in the posteriorWe are interested in the posterior In the literature people sometimes refer to the posterior In the literature people sometimes refer to the posterior
also as the “likelihood”also as the “likelihood”
1111
The Likelihood Method (2)The Likelihood Method (2)
The posterior is a probability, so it has to be normalized The posterior is a probability, so it has to be normalized to unityto unity
This integral is called the evidenceThis integral is called the evidence gg((xx) does not depend on the parameters, so useless for ) does not depend on the parameters, so useless for
parameter determinationparameter determination But very useful to choose between modelsBut very useful to choose between models
1212
Prior and PrejudicePrior and Prejudice
Priors are inevitable in the likelihood (posterior) methodPriors are inevitable in the likelihood (posterior) method Frequentist don't like it subjective prior knowledge→Frequentist don't like it subjective prior knowledge→ Bayesianists have to learn to love it after all, we →Bayesianists have to learn to love it after all, we → alwaysalways
know something before the analysisknow something before the analysis
E.g.: we can use E.g.: we can use pp((ΩΩm0 m0 < 0) = 0 as it does not make < 0) = 0 as it does not make sense to have negative matter densitysense to have negative matter density
We can add information from previous experiment. We can add information from previous experiment. E.g. experiment A measured E.g. experiment A measured h h = 0.72 = 0.72 ±± 0.08, so we can 0.08, so we can use, say, the Gaussian prioruse, say, the Gaussian prior
1313
Prior and PrejudicePrior and Prejudice
You are free to choose your prior but choice must be →You are free to choose your prior but choice must be →explicitexplicit
You HAVE to choose a prior →You HAVE to choose a prior → pp((θθ) = 1 *is* a particular ) = 1 *is* a particular prior, which under a parameter change will no longer be prior, which under a parameter change will no longer be constant. constant. E.g.: E.g.: pp((tt) = 1 ) = 1 ≠≠ pp((zz) = 1 ) = 1 ≠ ≠ pp(log (log tt) = 1 …) = 1 …
E.g. 2: a measurement of ΩE.g. 2: a measurement of ΩΛΛ00 assumes the strong prior that assumes the strong prior that the model is the model is ΛΛCDMCDM
Priors may be subjective, but analysis is objectivePriors may be subjective, but analysis is objective Priors are an Priors are an advantageadvantage of Bayes no inference can be made → of Bayes no inference can be made →
without assumptionswithout assumptions Data can show that the priors were “wrong”Data can show that the priors were “wrong”
1414
The Likelihood Method (3)The Likelihood Method (3)
If we are not interested in model selection we can neglect If we are not interested in model selection we can neglect the function the function gg((xx)) The posterior The posterior PP must then be normalized must then be normalized The best-fit parameter values are the ones that maximize The best-fit parameter values are the ones that maximize PP
The The nn%% confidence region confidence region RR of the parameters are the of the parameters are the region around the best fit for whichregion around the best fit for which
The confidence region in general is not symmetricThe confidence region in general is not symmetric
1515
The Likelihood Method (4)The Likelihood Method (4)
If the likelihood (i.e. the posterior) has many parameters, If the likelihood (i.e. the posterior) has many parameters, it is interesting to know what information it has in each it is interesting to know what information it has in each parameter (or each pair) independently of the othersparameter (or each pair) independently of the others We must do a weighted sum on the other parametersWe must do a weighted sum on the other parameters
This is referred to as marginalization over a parameterThis is referred to as marginalization over a parameter
1616
The Likelihood Method (5)The Likelihood Method (5)
It is customary to use the following confidence regions: It is customary to use the following confidence regions: 68.3%, 95.4% and 99.73%. The reason is that for gaussian 68.3%, 95.4% and 99.73%. The reason is that for gaussian posteriors, these are the 1, 2 and 3 standard deviations.posteriors, these are the 1, 2 and 3 standard deviations. We therefore often refer to these regions, for simplicity, as We therefore often refer to these regions, for simplicity, as
just the 1just the 1σσ, 2, 2σσ and 3 and 3σσ regions regions
Say, if for Say, if for ΩΩm0 m0 the best fit is 0.3 the best fit is 0.3 and and
the 68% confidence region (1the 68% confidence region (1σσ) is [0.1, 0.4] ) is [0.1, 0.4] we writewe write
Note that here the 2Note that here the 2σ region will σ region will notnot be be
1717
The Likelihood Method (6)The Likelihood Method (6)
1σ
2σ
best fit
1818
How to Build the Likelihood?How to Build the Likelihood?
The likelihood is a function of the data functional →The likelihood is a function of the data functional →form depends on the instrument used to collect the dataform depends on the instrument used to collect the data Usually instruments have (approximately) either Gaussian Usually instruments have (approximately) either Gaussian
or Poisson noise. or Poisson noise. It is common to assume by default a Gaussian noiseIt is common to assume by default a Gaussian noise
If the likelihood is gaussian in the data and the data are If the likelihood is gaussian in the data and the data are independent (uncorrelated errors) we haveindependent (uncorrelated errors) we have
1919
Model SelectionModel Selection
We now want to address the more general problem: how We now want to address the more general problem: how to tell which of 2 competitive theories are statistically to tell which of 2 competitive theories are statistically better given some data?better given some data?
Frequentist approach: compare the reduced Frequentist approach: compare the reduced χχ22 (i.e. the (i.e. the χχ22 per degree of freedom – d.o.f.) of the data in the 2 per degree of freedom – d.o.f.) of the data in the 2 theoriestheories The The χχ22–distribution with –distribution with kk degrees of freedom is the degrees of freedom is the
distribution of a sum of the squares of distribution of a sum of the squares of kk independent independent standard normal (i.e. gaussian) random variables.standard normal (i.e. gaussian) random variables.
The p.d.f. is given by (although we do not use it explicitly)The p.d.f. is given by (although we do not use it explicitly)
2020
Model Selection (2)Model Selection (2)
This is the distribution if the likelihood of the data was This is the distribution if the likelihood of the data was exactly given by exactly given by
In a nutshell, it is the sum of squares of the “distance, in In a nutshell, it is the sum of squares of the “distance, in units of standard deviations, between data points and units of standard deviations, between data points and theoretical curve”theoretical curve” We refer to the We refer to the total total χχ22 as as the sum the sum
Frequentist mantra: good models haveFrequentist mantra: good models have
2121
Model Selection (3)Model Selection (3) The bayesian equivalent to The bayesian equivalent to χχ22 comparison is the comparison is the Bayes Bayes
ratioratio ratio of → ratio of → evidencesevidences of models “1” and “2” of models “1” and “2” For a model “M” the evidence isFor a model “M” the evidence is
BB1212 > 1 model 1 is favored by the data (and vice-versa)→ > 1 model 1 is favored by the data (and vice-versa)→
If you have an a priori reason to favor a model generalize →If you have an a priori reason to favor a model generalize →the above to include model priorsthe above to include model priors
The Bayes factor between 2 models is justThe Bayes factor between 2 models is just
2222
Model Selection (4)Model Selection (4) The Bayes factor has several advantages over simple The Bayes factor has several advantages over simple χχ22
If the If the data is poordata is poor and a particular parameter of one model and a particular parameter of one model is unconstrained by it, the model is is unconstrained by it, the model is notnot penalized penalized E.g.: a given dark energy model has a parameter E.g.: a given dark energy model has a parameter
related to, say, cluster abundance at z = 2, for which related to, say, cluster abundance at z = 2, for which data is poor. This is good, because poor data data is poor. This is good, because poor data ≠≠ poor poor model! model!
Mathematically the posterior is approx. flat on this →Mathematically the posterior is approx. flat on this →parameter assuming (as usual) that the priors are →parameter assuming (as usual) that the priors are →independent we have that:independent we have that:
2323
Model Selection (5)Model Selection (5) To get a better intuition, we can study the simple case of To get a better intuition, we can study the simple case of
Gaussian likelihoods + gaussian priors analytical →Gaussian likelihoods + gaussian priors analytical → EE(x)(x) Assuming uncorrelated parameters, the posterior is then Assuming uncorrelated parameters, the posterior is then
(integrating over the data): [note f ↔(integrating over the data): [note f ↔ LL]]
P(θ)
2424
Model Selection (6)Model Selection (6) The evidence is then given byThe evidence is then given by
Let's analyze the 3 distinct terms aboveLet's analyze the 3 distinct terms above fmax is the max likelihood how well the model fits the →fmax is the max likelihood how well the model fits the →
datadata is always < 1 penalizes extra parameters →is always < 1 penalizes extra parameters →
constrained by the data Ockham's Razor factor→constrained by the data Ockham's Razor factor→ exp[ … ] penalizes cases where prior best fit is very →exp[ … ] penalizes cases where prior best fit is very →
different than posterior best fitdifferent than posterior best fit
2525
Jeffrey's ScaleJeffrey's Scale
As we have seen: As we have seen: BB1212 > 1 model 1 is favored by the → > 1 model 1 is favored by the →data (and vice-versa)data (and vice-versa)
There is no absolute rule of how big must BThere is no absolute rule of how big must B1212 be to be to conclude whether one model must be replaced by anotherconclude whether one model must be replaced by another
A simple rule-of-thumb though is just to use a simple scale A simple rule-of-thumb though is just to use a simple scale to guide the discussion. to guide the discussion. Jeffrey's scaleJeffrey's scale is often used: is often used:
2626
2727
Fisher MatrixFisher Matrix
In a nutshell the Fisher Matrix method is an →In a nutshell the Fisher Matrix method is an →approximationapproximation for the computation of the posterior under for the computation of the posterior under the assumption that it is gaussian the assumption that it is gaussian in the parametersin the parameters Advantages: Advantages:
very fast to compute (either analytically or numerically)very fast to compute (either analytically or numerically) gives directly the (elliptical) confidence-level contoursgives directly the (elliptical) confidence-level contours
Disadvantages: Disadvantages: gives wrong results when non-Gaussianity is stronggives wrong results when non-Gaussianity is strong no intrinsic flags to warn you when non-Gaussianity is no intrinsic flags to warn you when non-Gaussianity is
strongstrong numerical derivatives can be noisynumerical derivatives can be noisy
For a 4-page quick-start guide, see: arXiv:0906.4123For a 4-page quick-start guide, see: arXiv:0906.4123 For more detail, see Amendola & Tsujikawa, Sect. 13.3For more detail, see Amendola & Tsujikawa, Sect. 13.3
2828
Fisher Matrix (2)Fisher Matrix (2) We write the posterior as a multivariate gaussianWe write the posterior as a multivariate gaussian
The matrix The matrix FF is called the Fisher (or information) matrix is called the Fisher (or information) matrix To compute To compute FF, we Taylor expand the posterior near its , we Taylor expand the posterior near its
peak – the maximum likelihood (ML) pointpeak – the maximum likelihood (ML) point We need to compute first this point but this is simple:We need to compute first this point but this is simple:
When doing forecasts for future experiments, we know When doing forecasts for future experiments, we know the ML beforehand (it is our fiducial model)the ML beforehand (it is our fiducial model)
For real data multi-dim. minimization algorithms are →For real data multi-dim. minimization algorithms are →fastfast
2929
Properties of the Fisher MatrixProperties of the Fisher Matrix
Once we have Once we have FF, the , the covariance matrixcovariance matrix is simply its is simply its inverseinverse For 2 parameters:For 2 parameters:
The ellipses axis lengths (The ellipses axis lengths (α α aa and and α bα b) and rotation angle ) and rotation angle are given by the eigenvalues and eigenvectors of are given by the eigenvalues and eigenvectors of CC::
3030
Properties of the Fisher Matrix (2)Properties of the Fisher Matrix (2)
MarginalizationMarginalization over a parameter simply remove the → over a parameter simply remove the →line & column of that parameter from line & column of that parameter from CC = = FF–1–1 and invert and invert the new, reduced the new, reduced CC
FixingFixing a parameter to its best fit simply remove the → a parameter to its best fit simply remove the →line & column of that parameter from line & column of that parameter from FF
Adding datasets simply add →Adding datasets simply add → FFtottot = = FF11 + + FF22
Changing variables simple jacobian transformation→Changing variables simple jacobian transformation→
3131
Covariance MatrixCovariance Matrix
When data is correlated, we need to compute the When data is correlated, we need to compute the covariance matrix covariance matrix ΣΣ
3232
Covariance Matrix (2)Covariance Matrix (2)
The cov matrix is related to the The cov matrix is related to the correlation matrixcorrelation matrix::
It corresponds to the cov matrix of the standardized random It corresponds to the cov matrix of the standardized random variable setvariable set
We can also define the We can also define the cross-covariancecross-covariance between 2 vectors between 2 vectors
Some properties of Σ:Some properties of Σ: It It is positive-semidefinite and symmetricis positive-semidefinite and symmetric
3333
Covariance Matrix (3)Covariance Matrix (3)
To compute the cov matrix, we need to compute expected To compute the cov matrix, we need to compute expected values (means) over many realizationsvalues (means) over many realizations
Sometimes it can be computed analyticallySometimes it can be computed analytically But more often it cannot, and one has to rely on But more often it cannot, and one has to rely on
simulations of simulations of mock datamock data Mock data, or mock catalogs, are collections of random Mock data, or mock catalogs, are collections of random
realizations of data according to some distributionrealizations of data according to some distribution Many mock (Monte Carlo) catalogs have to be generated in Many mock (Monte Carlo) catalogs have to be generated in
order to estimate the cov matrix with good precisionorder to estimate the cov matrix with good precision Never forget the golden rule: statistical errors decrease Never forget the golden rule: statistical errors decrease
as sqrt(N)as sqrt(N)
3434
Example: 2-pt correlation functionExample: 2-pt correlation function
Let's study one particular example involving the 2-point Let's study one particular example involving the 2-point correlation function in astronomycorrelation function in astronomy
We want to study how a given class of objects are We want to study how a given class of objects are distributed in the skydistributed in the sky Let's focus on galaxies, for instanceLet's focus on galaxies, for instance Given a random galaxy in a location, the 2-point correlation Given a random galaxy in a location, the 2-point correlation
functionfunction ξξ(r)(r) describes the describes the excess probabilityexcess probability that another that another galaxy will be found within a given (scalar) distance r, galaxy will be found within a given (scalar) distance r, compared to an compared to an uniform distributionuniform distribution
Because gravity attracts objects they tend to cluster together, Because gravity attracts objects they tend to cluster together, so we expectso we expect ξξ(r)(r) to decrease as r increasesto decrease as r increases
3535
Example: 2-pt correlation functionExample: 2-pt correlation function
In principle, In principle, ξξ depends on the vector depends on the vector rr, but if our data is , but if our data is assumed to be statistically homogeneous (same statistics assumed to be statistically homogeneous (same statistics everywhere), then x only depends on r = |everywhere), then x only depends on r = |rr|.|. Let's assume thisLet's assume this
If we define the number of galaxies dN at a small volume If we define the number of galaxies dN at a small volume dV, located at distance r from a given galaxy, and the dV, located at distance r from a given galaxy, and the average density as average density as ρρ00,we have,we have
It's an excess probability we have an integral constraint→It's an excess probability we have an integral constraint→
3636
Example: 2-pt correlation functionExample: 2-pt correlation function
If the correlation If the correlation ξξ is positive ( is positive (negativenegative), there are more ), there are more ((lessless) particles than an uniform distribution) particles than an uniform distribution
For a given catalog, unless the volume has a very simple For a given catalog, unless the volume has a very simple geometry (say, a perfect sphere), it is impossible to geometry (say, a perfect sphere), it is impossible to compute the correlation function or its cov matrix compute the correlation function or its cov matrix analyticallyanalytically
We can estimate We can estimate ξ in a given catalog with the following ξ in a given catalog with the following estimator, where DD means number of galaxies at a estimator, where DD means number of galaxies at a distance (r, r + Δr) in the data, and RR the same in a distance (r, r + Δr) in the data, and RR the same in a random, uniform catalog with the same volume:random, uniform catalog with the same volume:
3737
Example: 2-pt correlation functionExample: 2-pt correlation function
We compute the numbers DD(r) and RR(r) for all pairs of We compute the numbers DD(r) and RR(r) for all pairs of objects (here, galaxies)objects (here, galaxies)
We do this for a number of distance binsWe do this for a number of distance bins In each bin, we need to estimate the error bars In each bin, we need to estimate the error bars Since the same objects enter different bins, the data is highly Since the same objects enter different bins, the data is highly
correlated we need to compute the cov matrix→correlated we need to compute the cov matrix→ We need to generate many random data catalogs!We need to generate many random data catalogs!
3838
Example: 2-pt correlation functionExample: 2-pt correlation function
3939
Comparison of Different MethodsComparison of Different Methods
When we have a posterior with non-linear dependence in When we have a posterior with non-linear dependence in the parameters (i.e., it is not Gaussian in the parameters, the parameters (i.e., it is not Gaussian in the parameters, even if it is Gaussian in the data), the Fisher Matrix even if it is Gaussian in the data), the Fisher Matrix approach might yield incorrect resultsapproach might yield incorrect results
We have then several options to compute it. Let's assume We have then several options to compute it. Let's assume we have N parameters. The most common are:we have N parameters. The most common are: Grid analysis compute numerically the posterior for a N-→Grid analysis compute numerically the posterior for a N-→
dimensional tensor, which is the exterior product of the dimensional tensor, which is the exterior product of the different vectors of values for each parameterdifferent vectors of values for each parameter Must guess the parameter ranges, or apply trial & errorMust guess the parameter ranges, or apply trial & error
Run first a very coarse-grained tensor, the refineRun first a very coarse-grained tensor, the refine Very simple to code and implementVery simple to code and implement
4040
Comparison of Different MethodsComparison of Different Methods
MCMC analysis usually based on the Metropolis-→MCMC analysis usually based on the Metropolis-→Hastings algorithmHastings algorithm Goal: probe the N-dimensional space in a ”non-Goal: probe the N-dimensional space in a ”non-
retangular” way concentrate in the high-posterior →retangular” way concentrate in the high-posterior →space more efficient search→space more efficient search→
We will study it later We will study it later Fisher Matrix analysis anyway and hope for the bestFisher Matrix analysis anyway and hope for the best Nested Sampling analysis see 1306.2144 & 1506.00171→Nested Sampling analysis see 1306.2144 & 1506.00171→
Comparison of techniques: Comparison of techniques: Fisher Matrix fast and simple approx., maybe very wrong→Fisher Matrix fast and simple approx., maybe very wrong→ MCMC complexity is N log(N), but code requires tuning→MCMC complexity is N log(N), but code requires tuning→ Grid algorithm complexity grows as Exp(N) [only OK for →Grid algorithm complexity grows as Exp(N) [only OK for →
up to ~6 params]up to ~6 params]