Key Points
-
Genetic association analysis is a popular approach for identifying genetic variation that correlates with phenotypic variation, such as susceptibility to complex disease.
-
Association studies have a chequered history. Many published studies cannot be reproduced, or be substantiated by linkage data.
-
Genetic association occurs as a result of linkage disequilibrium (LD). But LD levels vary within the genome and between populations, making it difficult to predict the best sample populations for a particular study.
-
The most popular sampling strategy is the case-control study. Selection of the control population is key to the success of this approach, and small sample sizes or poorly matched controls are sources of error in association studies.
-
Prospective study designs can avoid the errors of case-control studies, but require large sample sizes. Family-based studies are also useful in overcoming errors due to population stratification.
-
Multiple testing of the same population, or population subgroups, is another source of error.
-
With the availability of the human genome sequence, and new methods for genotyping single nucleotide polymorphisms, association studies will become increasingly popular. Applications will include whole-genome screens and regional LD mapping.
-
More rigorous study design, independent replication of data and careful attention to the effects of multiple testing are among the recommendations that will improve the value of association data in the future.
Abstract
Assessing the association between DNA variants and disease has been used widely to identify regions of the genome and candidate genes that contribute to disease. However, there are numerous examples of associations that cannot be replicated, which has led to scepticism about the utility of the approach for common conditions. With the discovery of massive numbers of genetic markers and the development of better tools for genotyping, association studies will inevitably proliferate. Now is the time to consider critically the design of such studies, to avoid the mistakes of the past and to maximize their potential to identify new components of disease.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
βGuilt by associationβ is not competitive with genetic association for identifying autism risk genes
Genetic associations of protein-coding variants in human disease
References
Mullikin, J. C. et al. An SNP map of human chromosome 22. Nature 407, 516β520 (2000).
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513β 516 (2000).
Drews, J. & Ryser, S. The role of innovation in drug development . Nature Biotechnol. 15, 1318β 1319 (1997).
Terwilliger, J. D. & Weiss, K. M. Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr. Opin. Biotechnol. 9, 578β594 ( 1998).
Gambaro, G., Anglani, F. & D'Angelo, A. Association studies of genetic polymorphisms and complex disease. Lancet 355, 308β 111 (2000).
Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nature Genet. 26, 151β157 (2000).This paper is essential reading for anyone undertaking association studies of common characters. The primary aim is to elucidate the difficulties in identifying genetic loci that contribute to complex traits. The literature cited covers some necessary population genetics material.
Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847β856 ( 2000).An excellent summary of current statistical procedures and their comparative strengths and weaknesses for complex trait mapping. Very useful for comparing linkage and association and for distinguishing familial influences on discrete versus quantitative traits.
Schork, N. J., Cardon, L. R. & Xu, X. The future of genetic epidemiology. Trends Genet. 14, 266β272 ( 1998).
Collins, F. Positional cloning moves from perditional to traditional. Nature Genet. 9, 347β350 ( 1995).
Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037β 2048 (1994).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516β1517 (1996).
Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435β1444 (2000).
Xiong, M. & Guo, S. W. Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am. J. Hum. Genet. 60, 1513β1531 ( 1997).
Freimer, N. B. et al. Genetic mapping using haplotype, association and linkage methods suggests a locus for severe bipolar disorder (BPI) at 18q22-q23. Nature Genet. 12, 436β441 (1996).
Hastbacka, J. et al. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genet. 2, 204β211 (1992).This is becoming a classic paper on using disequilibrium/haplotype data to identify disease loci. The trait studied does not reflect the common disease framework of current widespread interest, but the procedures used offer a useful model from which to start.
Collins, A., Lonjou, C. & Morton, N. E. Genetic epidemiology of single-nucleotide polymorphisms . Proc. Natl Acad. Sci. USA 96, 15173β 15177 (1999).One of a series of key papers by these authors who compare disequilibrium measures, evaluate real data patterns to infer genome-wide marker spacing requirements, and combine population genetics principles with those of disease-gene mapping to characterize allelic association.
Eaves, I. A. et al. The genetically isolated populations of finland and sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nature Genet. 25, 320β 323 (2000).
Taillon-Miller, P. et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nature Genet. 25, 324β328 (2000).
Nickerson, D. A. et al. DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233β 240 (1998).
Clark, A. G. et al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63, 595β612 (1998).
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231β238 (1999).
Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239β247 (1999).
Templeton, A. R. et al. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66, 69β 83 (2000).
Ott, J. Predicting the range of linkage disequilibrium. Proc. Natl Acad. Sci. USA 97, 2β3 (2000 ).
Chapman, N. H. & Thompson, E. A. Linkage disequilibrium mapping: the role of population history, size, and structure. Adv. Genet. 42, 413β437 (2001).
Fisher, R. A. The rhesus factor: a study in scientific method. Am. Sci. 35, 95β103 (1947).
Tiwari, J. L. & Terasaki, P. I. HLA and Disease Associations (Springer, New York, 1985).
Lander, E. S. Array of hope. Nature Genet. 21, 3β 4 (1999).
Risch, N. & Teng, J. Design and analysis of linkage disequilibrium studies for complex human diseases. Am. J. Hum. Genet. 61, 1707 (1997).
Risch, N. & Teng, J. The relative power of family-based and caseβcontrol designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res. 8, 1273β1288 (1998).
Teng, J. & Risch, N. The relative power of family-based and caseβcontrol designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. Genome Res. 9, 234β241 (1999).
Keavney, B. Genetic association studies in complex diseases. J. Hum. Hypertens. 14, 361β367 ( 2000).
Keavney, B. et al. Large-scale test of hypothesised associations between the angiotensin-converting-enzyme insertion/deletion polymorphism and myocardial infarction in about 5000 cases and 6000 controls. International Studies of Infarct Survival (ISIS) Collaborators. Lancet 355, 434β442 (2000).The need for association studies to involve thousands of patients is clearly shown by comparing the results of a number of typical, small studies with that of a large-scale, well-controlled design. Reference 33 offers a similar example for non-insulin-dependent diabetes mellitus.
Altshuler, D. et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26 , 76β80 (2000).
Cambien, F. et al. Deletion polymorphism in the gene for angiotensin-converting enzyme is a potent risk factor for myocardial infarction. Nature 359, 641β644 ( 1992).
Arnheim, N., Strange, C. & Erlich, H. Use of pooled DNA samples to detect linkage disequilibrium of polymorphic restriction fragments and human disease: studies of the HLA class II loci. Proc. Natl Acad. Sci. USA 82, 6970β6974 (1985).
Barcellos, L. F. et al. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am. J. Hum. Genet. 61, 734 β747 (1997).
Daniels, J. et al. A simple method for analyzing microsatellite allele image patterns generated from DNA pools and its application to allelic association studies. Am. J. Hum. Genet. 62, 1189β 1197 (1998).
Shaw, S. H., Carrasquillo, M. M., Kashuk, C., Puffenberger, E. G. & Chakravarti, A. Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes. Genome Res. 8, 111β 123 (1998).
Kirov, G., Williams, N., Sham, P., Craddock, N. & Owen, M. J. Pooled genotyping of microsatellite markers in parent-offspring trios. Genome Res. 10, 105β 115 (2000).
Falk, C. T. & Rubinstein, P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations . Ann. Hum. Genet. 51, 227β 233 (1987).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus. Am. J. Hum. Genet. 52, 506β516 (1993).The TDT test and its immediate predecessors changed the way human genetic studies were conducted throughout the past decade. This is the original paper describing the method.
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. The transmission/disequilibrium test detects cosegregation and linkage. Am. J. Hum. Genet. 54, 559β 560 (1994).
Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59, 983β989 (1996).
Sham, P. C. & Curtis, D. An extended transmission/disequilibrium test (TDT) for multiallelic marker loci. Ann. Hum. Genet. 59, 323β326 (1995).
Spielman, R. S. & Ewens, W. J. A sibship test for linkage in the presence of association: The sib transmission/disequilibrium test. Am. J. Hum. Genet. 62, 450β 458 (1998).
Curtis, D. Use of siblings as controls in caseβcontrol association studies. Ann. Hum. Genet. 61, 319β333 (1997).
Martin, E. R., Kaplan, N. L. & Weir, B. S. Tests for linkage and association in nuclear families . Am. J. Hum. Genet. 61, 439β 448 (1997).
Allison, D. B. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60, 676β690 (1997).
Rabinowitz, D. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47, 342β350 (1997).
Abecasis, G. R., Cardon, L. R. & Cookson, W. O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279β292 (2000).
Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N. L. A test for linkage and association in general pedigrees: The pedigree disequilibrium test. Am. J. Hum. Genet. 67, 146β 154 (2000).
Pritchard, L. E. et al. Analysis of the CD3 gene region and type 1 diabetes: application of fluorescence-based technology to linkage disequilibrium mapping. Hum. Mol. Genet. 4, 197β202 (1995).
Bennett, S. T. & Todd, J. A. Human type 1 diabetes and the insulin gene: Principles of mapping polygenes. Annu. Rev. Genet. 30, 343β370 ( 1996).
Bennett, S. T. et al. Insulin VNTR allele-specific effect in type 1 diabetes depends on identity of untransmitted paternal allele. The IMDIAB Group. Nature Genet. 17, 350β352 (1997).
Merriman, T. R. et al. Transmission of haplotypes of microsatellite markers rather than single marker alleles in the mapping of a putative type 1 diabetes susceptibility gene (IDDM6). Hum. Mol. Genet. 7, 517β 524 (1998).
Eaves, I. A. et al. Transmission ratio distortion at the INS-IGF2 VNTR. Nature Genet. 22, 324β325 (1999).
Lernmark, A. & Ott, J. Sometimes it's hot, sometimes it's not . Nature Genet. 19, 213β 214 (1998).
Goring, H. H. & Terwilliger, J. D. Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am. J. Hum. Genet. 66, 1310β1327 (2000).
Morton, N.E. & Collins, A. Tests and estimates of allelic association in complex inheritance. Proc. Natl Acad. Sci. USA 95 , 11389β93 (1998).
Riordan, J. R. et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245, 1066β 1073 (1989).
Rommens, J. M. et al. Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245, 1059β 1065 (1989).
Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis . Science 245, 1073β1080 (1989).
Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72, 971β983 (1993).
Martin, E. R. et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383β394 ( 2000).
Martin, E. R. et al. Analysis of association at single nucleotide polymorphisms in the APOE region. Genomics 63, 7β 12 (2000).
Horikawa, Y. et al. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genet. 26, 163β175 (2000).
Roses, A. D. Pharmacogenetics and the practice of medicine. Nature 405, 857β865 (2000).
Keavney, B. et al. Measured haplotype analysis of the angiotensin-I converting enzyme gene. Hum. Mol. Genet. 7, 1745β 1751 (1998).The ACE locus and ACE phenotype is a model quantitative system. Despite the unusually clear haplotype relationships in this gene and population, the study clearly demonstrates the difficulty in distinguishing which specific variants are responsible for phenotypic variability.
Moffatt, M. F., Traherne, J. A., Abecasis, G. R. & Cookson, W. O. Single nucleotide polymorphism and linkage disequilibrium within the TCR alpha/delta locus. Hum. Mol. Genet. 9, 1011β 1019 (2000).
Abecasis, G. R. et al. Patterns of linkage disequilibrium from three genomic regions . Am. J. Hum. Genet. 68, 191β 197 (2001).
Farrall, M. et al. Fine-mapping of an ancestral recombination breakpoint in DCP1 . Nature Genet. 23, 270β 271 (1999).
Abecasis, G. R., Cookson, W. O. & Cardon, L. R. Pedigree tests of transmission disequilibrium. Eur. J. Hum. Genet. 8, 545β551 (2000).
Todd, J. A. et al. Identification of susceptibility loci for insulin-dependent diabetes mellitus by trans-racial gene mapping. Nature 338, 587β589 (1989).
Mijovic, C. H., Barnett, A. H. & Todd, J. A. Genetics of diabetes. Trans-racial gene mapping studies . Baillieres Clin. Endocrinol. Metab. 5, 321β340 (1991).
Cardon, L. R. & Watkins, H. Waiting for the working draft from the human genome project: A huge achievement, but not of immediate medical use. Br. Med. J. 320, 1221β 1222 (2000).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139β 144 (1999).Mathematical population genetics modelling is used to simulate background levels of linkage disequilibrium in the genome, indicating that very fine-scale maps are required for disease gene association mapping. Although hotly contested and not always supported by empirical reports, this paper clearly outlines the issues and importance of disequilibrium levels in the genome.
Collins, A. & Morton, N. E. Mapping a disease locus by allelic association. Proc. Natl Acad. Sci. USA 95, 1741β1745 (1998).
Cox, N. J. et al. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nature Genet. 21, 213β215 ( 1999).
Risch, N. Evolving methods in genetic epidemiology. 2. Genetic linkage from an epidemiologic perspective. Epidemiol. Rev. 19, 24β 32 (1997).
Potter, J. D. At the interfaces of epidemiology, genetics and genomics. Nature Rev. Genet. 2, 142β147 ( 2001).
Khoury, M. J., Beaty, T. H. & Cohen, B. H. Fundamentals of Genetic Epidemiology (Oxford Univ. Press, Oxford, 1993).
Huttley, G. A., Smith, M. W., Carrington, M. & O'Brien, S. J. A scan for linkage disequilibrium across the human genome. Genetics 152, 1711β1722 ( 1999).
Goddard, K. A., Hopkins, P. J., Hall, J. M. & Witte, J. S. Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. Am. J. Hum. Genet. 66, 216β234 (2000).
Majewski, J. & Ott, J. GT repeats are associated with recombination on human chromosome 22. Genome Res. 10, 1108β1114 (2000).
Abbott, A. Manhattan versus Reykjavik. Nature 406, 340β342 (2000).
Borecki, I. B. & Suarez, B. K. Linkage and association: basic concepts. Adv. Genet. 42, 45β66 (2001).
Slatkin, M. Linkage disequilibrium in growing and stable populations. Genetics 137, 331β336 ( 1994).
Hartl, D. L. & Clark, A. G. Principles of Population Genetics (Sinauer Associates, Sunderland, MA, 1997).
Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies . Am. J. Hum. Genet. 65, 220β 228 (1999).This paper describes the use of unlinked genetic markers to detect population stratification, with minimal mathematical complexity. The key issues of marker spacing and informativeness are evaluated in detail. Reference 94 should be read in follow-up of this paper to see how stratification can be accounted for when it is present.
Devlin, B. & Roeder, K. Genomic control for association studies . Biometrics 55, 997β1004 (1999).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945β 959 (2000).
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170β181 ( 2000).
Bacanu, S. A., Devlin, B. & Roeder, K. The power of genomic control. Am. J. Hum. Genet. 66, 1933β1944 ( 2000).
Witte, J. S., Elston, R. C. & Schork, N. J. Genetic dissection of complex traits. Nature Genet. 12, 355β358 (1996).
Acknowledgements
This work was supported by the Wellcome Trust and in part by a grant from the NIH (to L.R.C.). We wish to thank Dr Joe Terwilliger for critical review of this manuscript.
Related links
Glossary
- POWER
-
The probability of correctly rejecting the null hypothesis when it is truly false. For association studies, the power can be considered as the probability of correctly detecting a genuine association.
- GENETIC DRIFT
-
The random fluctuation in allele frequencies as genes are transmitted from one generation to the next.
- POPULATION ADMIXTURE
-
A population in which multiple subgroups are included. Admixture often refers to intermarriage/reproduction from different groups of individuals, but most simply is used to denote a population of subgroups having different allele frequencies (see population stratification).
- PROSPECTIVE COHORT
-
Longitudinal study of individuals initially assessed for exposure to certain risk factors and then followed over time to evaluate the progression towards specific outcomes (often disease).
- LOCUS HETEROGENEITY
-
The appearance of phenotypically similar characteristics resulting from mutations at different genetic loci. Differences in effect size or in replication between studies and samples are often ascribed to different loci leading to the same disease.
- POPULATION STRATIFICATION
-
The presence of multiple subgroups with different allele frequencies within a population. The different underlying allele frequencies in sampled subgroups might be independent of the disease within each group, and they can lead to erroneous conclusions of linkage disequilibrium or disease relevance.
- TYPE I ERROR
-
The probability of rejecting the null hypothesis when it is true. For association studies, Type I errors are manifest as false-positive reports of phenotypeβgenotype correlation.
- RISK RATIO
-
A measure of association effect reflecting the probability of disease in people with a particular allele or genotype versus the probability of disease in those who do not have the particular genotype.
Rights and permissions
About this article
Cite this article
Cardon, L., Bell, J. Association study designs for complex diseases. Nat Rev Genet 2, 91β99 (2001). https://doi.org/10.1038/35052543
Issue date:
DOI: https://doi.org/10.1038/35052543
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
-
Exploring genetic diversity and ascertaining genetic loci associated with important fruit quality traits in apple (MalusβΓβdomestica Borkh.)
Physiology and Molecular Biology of Plants (2023)
-
The genetics of non-monogenic IBD
Human Genetics (2023)
-
MDM4: What do we know about the association between its polymorphisms and cancer?
Medical Oncology (2022)
-
Genetic diversity of Prunus sibirica L. superior accessions based on the SSR markers developed using restriction-site associated DNA sequencing
Genetic Resources and Crop Evolution (2021)
-
Understanding straighthead: a complex physiological disorder of rice (Oryza sativa L.)
Acta Physiologiae Plantarum (2021)
