View
45
Download
1
Category
Preview:
DESCRIPTION
A G A G T T C T G C T C G A G G G T T A T G C G C G. A G A G T T C T G C T C G A G G G T T A T G C G C G. A G A G T T C T G C T C G A G G G T T A T G C G C G. A G A G T T C T G C T C G A G G G T T A T G C G C G. Variación genética en el genoma. - PowerPoint PPT Presentation
Citation preview
Variación genética en el genoma
A G A G T T C T G C T C GA G G G T T A T G C G C G
A G A G T T C T G C T C GA G G G T T A T G C G C G
A G A G T T C T G C T C GA G G G T T A T G C G C G
A G A G T T C T G C T C GA G G G T T A T G C G C G
International HapMap Project (http://www.hapmap.org)
International HapMap Project (http://www.hapmap.org)
1. Disponer datos genotípicos diferentes grupos
étnicos
2. Selección TagSNPs estudio asociación -> Potencial para Whole Genome Association studies
3. Evaluación significación estadística e interpretación resultados
4. Estudio de los alelos menos comunes
5. Estudio variación estructural
6. Farmacogenómica
Aplicaciones biomédicas
Bases de datos de variación genética
Online Mendelian Inheritance in Man
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
Catalog of human genetic and genomic disorders
International HapMap Project
http://www.hapmap.org
Personalized Genomes: J. Watson’s genome
http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/
Entrez dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/
Database of Genotype and Phenotype
http://view.ncbi.nlm.nih.gov/dbgap
MamPol
DPDB
http://mampol.uab.cat
http://dpdb.uab.cat
Human Genome Variation Database
http://hgvbase.cgb.ki.se/
Human genetic & phenotypic diversity database
SNP1 SNP2 SNP3
Secuenceindividual 1
Secuence individual 2
...
...
A/A
A/C
G/C
C/C
G/T
T/T
...
Disease 1
Healthy
Phenotype
Estimation phenotypic effect
Association studies: Phenotpyic effect of SNPs
Genotype
...
Trait i
x2
x1
Cervical
Cancer
BioBanks: Studies of cohorts at a great scale
•deCODE (Islandia)•Estonia•Germany•Canada•Japan•China
USA
Association Studies
Association Studies
•Study design
•Statistical analyses
1st phase: DesignStudy designs
1st phase: DesignStudy designs
Statistical analysis methods
2nd phase: Statistical analysis
Statistical analyses
in Association Studies
1. Data validation
2. Genetic description1. Unidimensional (snp by snp)2. Multidimensional
3. Test for association genotype-phenotype1. snp by snp2. Multisnp / haplotype /tagSNP3. Power assessment
4. Predictive model
Steps
2nd phase
Statistical analyses
in Association Studies
1. Data validation (error sources: sampling, genotyping)
• Checking with SNPref • Hardy-Weinberg proportions (separately for controls and
cases)• Consistence among samples• Stratification (genetic markers)
Step
Hardy-Weinberg Test
SNP rs1137933
Genotype frequencies
SNP diallelic: A & a with p and q relative freq.
Genotypic HW proportions AA, Aa & aa p2, 2pq & q2
Three statistics:
(i) That based on the Pearson (χ2) test statistic(ii) That based on the Likelihood ratio test statistic (G test). (iii) An exact test
CT CC TT
Control 38
(50.1) 76
(70.0) 15 (8.9)
Case 105 (95.2)
122 (126.9)
13 (17.9)
Example of Hardy-Weinberg Test
CT CC TT
Control 38
(50.1) 76
(70.0) 15 (8.9)
SNP rs1137933
Genotypes
Pearson (χ2) test statistic
X2 = Σ (Oi-Ei) 2 / Ei
p = f(C)= f(CC) + f(CT)/2q = 1 – p
--------- Genotype SS SF FF Total
Number, obs 38 76 15 = 129 = N Frequency, exp p2 2pq q2 = 1,00 Number, exp p2N 2pqN q2N = N Number, exp 50.1 70.0 8.9 = 129 ----------
Control
Likelhood ratio (G) test statistic
G = - 2 Σ ln (Oi / Ei)
SNP rs1137933
Control
Genotypes = 129
p1 = f(C)= 0,736p2 = f(T)= 0,264
ChiSquare (1 gl) = 7,5** p = 0,00617
G (Likelihood ratio) (1 gl) = 7,06** p = 0,00788
Case
genotypes = 240
p1 = f(C)= 0,727p2 = f(T)= 0,273
ChiSquare (1 gl) = 2,52 ns p = 0,11241
G (Likelihood ratio) (1 gl) = 2,63 ns p = 0,10486
CT CC TT
Control 38
(50.1) 76
(70.0) 15 (8.9)
Case 105 (95.2)
122 (126.9)
13 (17.9)
Example of Hardy-Weinberg Test
Genetic description:
SNP by SNP
CT CC TT
Control 38
(29,5%) 76
(58,9%) 15
(11,6%)
Case 105
(43,8%) 122
(50,8%) 13
(5,4%)
Genotype frequencies
C T
Control 190
(73,6%) 68
(26,4%)
Case 349
(72,7%) 131
(27,3%)
Allele frequencies
SNP rs1137933
Haplotype inference
Haplotype 1 acgtagcatcgtatgcgttagacgggggggtagcaccagtacagHaplotype 2 acgtagcatcgtatgcgttagacgggggggtagcaccagtacagHaplotype 3 acgtagcatcgtatgcgttagacgggggggtagcaccagtacagHaplotype 4 acgtagcatcgtttgcgttagacgggggggtagcaccagtacagHaplotype 5 acgtagcatcgtttgcgttagacgggggggtagcaccagtacagHaplotype 6 acgtagcatcgtttgcgttagacggcatggcaccggcagtacagHaplotype 7 acgtagcatcgtttgcgttagacggcatggcaccggcagtacagHaplotype 8 acgtagcatcgtttgcgttagacggcatggcaccggcagtacagHaplotype 9 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag
Genetic description:
MultiSNP
a/t g/c ->
a) a g
t c
b) a c
t g
Genotypes Possible haplotypes
Frequency Haplotype estimates
Haplotype
SNPrs1042522
SNPrs12951053
SNPrs8064946
SNPrs6541003
SNPrs4846049
SNPrs4646421
SNPrs4986885
SNPrs91590
7
SNPrs4147567
SNPrs2266633
Total
1 G A G G T C G C G G 0.1056
2 G A G A G C G C G G 0.0767
3 G A G A G C G C A G 0.0485
4 G A G A G C G A G G 0.0423
5 G A C G G T G A A A 0.0378
6 G A C A G T A A A A 0.0282
7 G A G G G C G C A G 0.0276
Linkage disequilibrium measure (D’ Lewontin)
B1 B2 Total
A1 p11 = p1q1 + D p12 = p1q2 - D p1
A2 p21 = p2q1 - D p22 = p2q2 + D p2
Total q1 q2 1D’ = D / Dmax
r = D’ / square root (p1 p2 q1 q2)
Genetic description:
MultiSNP
Linkage Disequilibrium representation
Linkage blocks
Recombination Hotspot
Associated SitesTagSNPs
Statistical analyses
in Association StudiesSteps
1. Data validation
2. Genetic description1. Unidimensional (snp by snp)2. Multidimensional
3. Test for association genotype-phenotype1. snp by snp2. Multisnp / haplotype /tagSNP3. Power assessment
4. Predictive model
Genetic - phenotype Association -> Guilty by association
Case vs Control
SNP2 (A/T) 100% A 0% T 0% A 100% T Mendelian SNP
SNP3 (T/G) 80% T 20% G 60% T 40% G QTL SNP
SNPn
SNP1 (G/C) 40% G 60% C 40% G 60% C Neutral SNP
Case – control study
CT CC TT
Control 38
(29,5%) 76
(58,9%) 15
(11,6%)
Case 105
(43,8%) 122
(50,8%) 13
(5,4%)
Genotypic
C T
Control 190
(73,6%) 68
(26,4%)
Case 349
(72,7%) 131
(27,3%)
Allele
Test for association (snp by snp)
ChiSquare (2 gl) = 9,71** p = 0,00779
G (Likelihood ratio) (2 gl) = 9,67** p = 0,00795
ChiSquare (1 gl) = 0,07 p = 0,79134 G (Likelihood ratio) (1 gl) = 0,07 p = 0,79134
Odds Ratio (OR) = 1,05
Risk Ratio (RR) = 1,02
SNP rs1137933
Chi-square Independence Test
Odds ratio (oportunidad relativa)
odds (oportunidad) is the ratio of probabilties for an event given by the quantity p / (1 − p), where p is the probability of the event
An disease with a 1 in 5 probability of occurring for a given genotype (i.e. 0.2 or 20%), then the odds are 0.2 / (1 − 0.2) = 0.2 / 0.8 = 0.25.
The odds ratio is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. These groups might be case and control groups, or any other dichotomous classification. So if the probabilities of the event in each of the groups are p (first group) and q (second group), then the odds-ratio is:
p
1 - p
Odds ratio (razón de posibilidades)
Casos Controles Total
Alelo 1 SNP1a b a+b
Alelo 2 SNP1c d c+d
Total a+c b+d N
El cociente a/c es la Odds de exposición observada en el grupo de casos. El cociente b/d es la Odds de exposición en el grupo control
OR = 2,2 -> 2,2:1
Un efecto (enfermedad) aparece 2,2 veces más ante la presencia de otra variable (alelo SNP) que si esta variable no está presente
Riesgo relativo RR, Risk ratioRR= tasa de incidencia de expuestos/tasa de incidencia en no expuestos
Casos Controles Total
Alelo 1 SNP1 a b a+b
Alelo 2 SNP1 c d c+d
Total a+c b+d N
Riesgo Relativo
Casos Controles Total
Alelo 1 SNP1 210 250 460
Alelo 2 SNP1 100 300 400
Total 310 550 860
Riesgo Relativo = 210/460
100/400= 1,83
Razón Odds = 210/100
250/300= 2,52
Genotypic
Controling for other independent variables
SNP rs1137933
♀
♂ CC CT
TT
Control 15 (62,5%
)
6 (25%)
3 (12,5%
)
Case 16 (51,6%
)
13 (41,9%
)
2 (6,5%
ChiSquare (2 gl) = 1,95 p = 0,37719
G (Likelihood ratio) (2 gl) = 1,98 p = 0,37158
CC CT
TT
Control 61
(58,1%) 32
(30,5%) 12
(11,4%)
Case 106
(50,7%) 92
(44%) 11
(5,3%)
ChiSquare (2 gl) = 7,59*
p = 0,02248
G (Likelihood ratio)(2 gl) = 7,5*
p = 0,02352
Test for association (multisnp)
Test for association among haplotype and response (diseases) or TagSNP and response
Haplotypec
SNPrs1042522
SNPrs12951053
SNPrs8064946
SNPrs6541003
SNPrs4846049
SNPrs4646421
SNPrs4986885
SNPrs915907
SNPrs4147567
SNPrs2266633
Case
1 G A G G T C G C G G 0.1056
2 G A G A G C G C G G 0.0767
3 G A G A G C G C A G 0.0485
4 G A G A G C G A G G 0.0423
5 G A C G G T G A A A 0.0378
6 G A C A G T A A A A 0.0282
7 G A G G G C G C A G 0.0276
Haplotypec
SNPrs1042522
SNPrs12951053
SNPrs8064946
SNPrs6541003
SNPrs4846049
SNPrs4646421
SNPrs4986885
SNPrs915907
SNPrs4147567
SNPrs2266633
Control
1 G A G G T C G C G G 0.1168
2 G A G A G C G C G G 0.0657
3 G A G A G C G C A G 0.0405
4 G A G A G C G A G G 0.0345
5 G A C G G T G A A A 0.0275
6 G A C A G T A A A A 0.0185
7 G A G G G C G C A G 0.0134
Logistic regressionLogistic regression modelo de regresión estadística de variables dependientes binarias. Puede considerarse un modelo lineal generalizado que usa la función logit como función de enalce (link), y sus errores están distribuidos binomialmente.
El modelo se expresa en la forma
i, = 1, ..., n, donde
El logaritmo de odds (probabilidad dividida por uno menos la probabilidad) del resultado se modela como una función lineal de variables explicativas, X1 a Xk. Puede escribirse como
La interpretación de las estimas de los parámetros β es el efecto multiplicativo sobre la razón de odds. En el caso de variables dicotómicas explicativas, por ejemplo sexo, eβ (el antilog de β) es la estima del odds-ratio of tener el resultado según se compare machos y hembras. Los parámetros α β1, ..., βk se estiman normalmente por máxima verosimilitud.
Logistic regression is a predictive tool
if the logit β1 = 2.303, then the corresponding odds ratio (the exponential function, eβ1 ) is 10, then we may say that when the independent variable increases one unit, the odds that the dependent = 1 increase by a factor of 10, when other variables are controlled.
Links
•http://bioinfo.iconcologia.net/SNPstats (Web tool for association studies)
•http://www.mep.ki.se/genestat/tl/genass_ldmap (Tutorial for association studies)
•http://linkage.rockefeller.edu/soft (Software for genetic analysis)
•http://www.broad.mit.edu/personal/jcbarret/haploview (Haploview)
•http://www.genome.gov/26525384 (Catálogo de estudios de GWA publicados)
•http://geneticassociationdb.nih.gov (Base de datos de estudios de asociación de enfermedades humana)
Association studies: Recurso Web
http://bioinfo.iconcologia.net/index.php?module=Snpstats
Asociación genética -> Culpable por asociación
Pacientes vs Control
SNP2 (A/T) 100% A 0% T 0% A 100% T
SNP3 (T/G) 80% T 20% G 60% T 40% G
SNPn
SNP1 (G/C) 40% G 60% C 40% G 60% C
Hoy podemos abordar el análisis de asociación de miles de SNPs, pudiendo
desvelar la base genética de las enfermedades.
Translation of genetic-
phenotypic information
into the clinical practise
D.R. Bentley. 2004 Nature 429: 440-445
Translation of genetic-
phenotypic information
into the clinical practise
Translation of genetic-
phenotypic information
into the clinical practise
Translation of genetic-
phenotypic information
into the clinical practise
Recommended