Proceedings | Open | Published:

# The impact of complex informative missingness on the validity of the transmission/disequilibrium test (TDT)

*BMC Proceedings***volume 1**, Article number: S26 (2007)

## Abstract

The transmission/disequilibrium test was introduced to test for linkage and association between a marker and a putative disease locus using case-parent triads. Several extensions have been proposed to accommodate incomplete triads. Some strategies assumed that parental genotypes were missing completely at random and some methods allowed informative missingness for parental genotypes. However, the above tests assumed that offspring genotypes were missing completely at random and concluded that the transmission/disequilibrium test remained a valid test by excluding incomplete triads from the analysis. In this article, the conditional distribution of ascertained triads allowing informative missingness for offspring genotypes, as well as their parental genotypes, was derived and several tests under such scenarios were evaluated. In simulations, independent triads from the Genetic Analysis Workshop 15 simulated data (Problem 3) was ascertained. When offspring genotypes were missing informatively, simulation results revealed inflated type I error and/or reduced power for the transmission/disequilibrium test excluding incomplete triads.

## Background

Recently, family-based association studies have drawn substantial attention in genetic studies as a way to avoid spurious association due to population admixture. The transmission/disequilibrium test (TDT) by Spielman et al. [1] was proposed to test for linkage and association between a marker and a disease locus using ascertained case-parent triads. However, parental genotypes may be unavailable due to refusals or other unknown causes. Assuming that only one parental genotype is available and the other one is missing completely at random (MCAR), Clayton [2] and Weinberg [3] proposed likelihood ratio tests and Sun et al. [4] introduced the TDT with only one parent is available (1-TDT) to incorporate such dyads (affected offspring with one parental genotype). Later, the expectation maximization algorithm based haplotype relative risk (EM-HRR) proposed by Guo et al. [5] extended the haplotype relative risk (HRR) test [6] to accommodate both dyads and monads (affected offspring without parental genotype). However, when missingness cannot be ignored (i.e., a missing pattern of parental genotypes is related to the disease under study), the assumption of MCAR is violated and these tests may be invalid.

When parental genotypes are missing informatively, Allen et al. [7] and Chen [8] proposed likelihood ratio tests to assure the validity of testing for association between a candidate gene and a disease. However, the cost of accounting for informative missingness is reduced power. When the missing pattern was indeed completely at random, one can see that Allen et al.'s strategy could be less powerful than the 1-TDT [7]. This is also true for Chen's method (see Table 4 [8]). The power of Chen's score statistic with 1 degree of freedom is less than that of the TDT using only intact triads for a common (rare) allele under the dominant (recessive) disease model, as is the score statistic with 2 degrees of freedom for both rare and common variant alleles under the multiplicative inheritance. This means that the inclusion of dyads reduces the power of the score test in these cases.

Regardless of different missing patterns among parental genotypes, the above-mentioned methods assumed that offspring genotypes were MCAR. In the following, the conditional distribution of ascertained triads that allows informative missingness for offspring genotypes will be derived, as well as their parental genotypes, and several tests under such scenarios will be evaluated.

## Methods

### Distribution of ascertained triads

First, it was assumed that the data consisted of genotypes of bi-allelic markers such as a single-nucleotide polymorphism (SNP). Therefore, there are exactly two alleles, B_{1} and B_{2}, at the marker locus. The distribution of complete triads was derived as the following: Let G_{o}, G_{pf}, G_{pm} be the offspring's, father's, and mother's genotypes, respectively. Let G_{of} and G_{om} be the offspring allele inherited from the father and mother, respectively. Here, imprinting was not considered, and the four possible joint probabilities of a given parental genotype and the probability of transmitting a given allele to the offspring from that parent, all conditional on offspring affected status are:

*μ* = Pr{[G_{f} = (B_{1}B_{1}) & G_{of} = (B_{1})] or [G_{m} = (B_{1}B_{1}) & G_{om} = (B_{1})]|affected offspring}

*υ* = Pr{[G_{f} = (B_{1}B_{2}) & G_{of} = (B_{1})] or [G_{m} = (B_{1}B_{2}) & G_{om} = (B_{1})]|affected offspring}

*ζ* = Pr{[G_{f} = (B_{1}B_{2}) & G_{of} = (B_{2})] or [G_{m} = (B_{1}B_{2}) & G_{om} = (B_{2})]|affected offspring}

*τ* = Pr{[G_{f} = (B_{2}B_{2}) & G_{of} = (B_{2})] or [G_{m} = (B_{2}B_{2}) & G_{om} = (B_{2})]|affected offspring}.

When the disease model is recessive, Ott (Table 2, [9]) showed that *μ* = (*s* + *δ*/*r*)*s*, *ν* = (*s* + *δ*/*r*)(1 - *s*) - *θδ*/*r*, *ξ* = (1 - *s* - *δ*/*r*)*s* + *θδ*/*r* and *τ* = (1 - *s* - *δ*/*r*)(1 - *s*), where *r* is the allele frequency of the recessive disease allele, and *s* is the allele frequency of marker allele "B_{1}". The parameter *θ* denotes the recombination fraction, and *δ* = *p*(*aB*_{1}) - *p*(*a*)*p*(*B*_{1}) denotes the disequilibrium coefficient between the marker and the disease locus.

Let I_{f}, I_{m} and I_{o} be binary indicator functions for father, mother, and offspring having missing genotype information. For example, I_{f} = 1 if the father's genotype is missing and 0 otherwise. Let P_{o11}, P_{o12}, and P_{o22} denote missing rates for offspring with B_{1}B_{1}, B_{1}B_{2}, and B_{2}B_{2} genotypes, respectively. Similarly, let P_{f11}, P_{f12}, and P_{f22} (P_{m11}, P_{m12}, and P_{m22}) denote missing rates for father (mother) with B_{1}B_{1}, B_{1}B_{2}, and B_{2}B_{2} genotypes, respectively. Note that we do not assume any pattern for the nine missing parameters, i.e., missingness of a given parental genotype can be dependent or independent of the other parent's and/or offspring's genotype. Assuming random mating, one can calculate the conditional probability of ascertaining a complete triad with the father, mother, and affected offspring's genotypes being B_{1}B_{1}, B_{1}B_{2}, and B_{1}B_{2}, respectively, as

Pr(L_{f} = 0 & G_{f} = (B_{1}B_{1}); I_{m} = 0 & G_{m} = (B_{1}B_{2}); I_{o} = 0 & G_{o} = (B_{1}B_{2})|affected offspring) = *μ* × *ζ* × (1 - P_{f11}) × (1 - P_{f12}) × (1 - P_{o12}).

The distribution of remaining ascertained triads can be derived in a similar manner and is displayed in Table 1. P_{
k
}^{i, j} and M_{
k
}^{i, j} are the conditional probability and observed counts for each type of triad data, where *k* = "0", "1", or "2" represents the total number of B_{1} alleles transmitted to the offspring, and *i*, *j* = "0", "1", or "2" represents the total number of B_{1} alleles for fathers and mothers, respectively.

### Validity of the TDT under various missing patterns

As shown in Table 1, the conditional probability of a heterozygous parent transmitting the B_{1} (B_{2}) allele to the affected offspring was calculated as ${T}_{1}=\frac{{P}_{2}^{2,1}}{2}+\frac{{P}_{2}^{1,2}}{2}+{P}_{2}^{1,1}+\frac{{P}_{1}^{1,1}}{2}+\frac{{P}_{1}^{1,0}}{2}+\frac{{P}_{1}^{0,1}}{2}({T}_{2}=\frac{{P}_{0}^{1,0}}{2}+\frac{{P}_{2}^{0,1}}{2}+{P}_{0}^{1,1}+\frac{{P}_{1}^{1,1}}{2}+\frac{{P}_{1}^{2,1}}{2}+\frac{{P}_{1}^{1,2}}{2})$. When there is no linkage or no association, *T*_{1} = *T*_{2}, if and only if offspring genotypes are missing completely at random (*P*_{o11 }= *P*_{o12 }= *P*_{o22}). Therefore, when offspring genotypes are missing informatively (at least two of *P*_{o11}, *P*_{o12}, and *P*_{o22 }are not equal), the TDT does not provide a valid test for linkage and association by excluding incomplete triads from the analysis (*T*_{1} ≠ T_{2}). Such phenomenon is also true for the HRR proposed by Falk and Rubinstein [6], which is a valid test for association in the presence of linkage.

### Simulations

Unrelated nuclear families were used each with two affected siblings and complete parental genotypes from the Genetic Analysis Workshop 15 simulated data (Problem 3). Based on the 100 replicates provided, the first 10 replicates were pooled together. To assure the assumption of independence among ascertained triads, we randomly selected only one affected offspring from each nuclear family to form the new population for simulations. In order to reflect realistically complex disease models, missing status for the affected offspring and their parents was assigned. The missing patterns considered were the recessive, dominant, and additive genetic effect models for both major and minor alleles as indicated in the second column of Table 2. Therefore, only a proportion of families with an affected offspring were eligible for the ascertainment and the total number of families ascertained including triads, dyads and monads were 200.

"SNP6_150" on chromosome 6 and "SNP15_55" on chromosome 15 were used in power and type I error simulations, respectively. Several other SNPs were also considered but with similar results and the results are not shown here. For SNP6_150 (SNP15_55), genotype frequencies are 0.41 (0.31) for major homozygote, 0.46 (0.50) for heterozygote, and 0.13 (0.19) for minor homozygote. A total of 1000 repetitions were conducted for power and type I error simulations. The TDT and HRR were applied to the subset of complete triads. The 1-TDT [4] and EM-HRR [5] were both applied to the subset of complete triads and dyads.

## Results

In Table 2, the first column indicates the model of missingness (1, MCAR for all genotypes; 2, informative missingness for parental genotypes and MCAR for offspring genotypes; 3, informative missingness for all genotypes). The three brackets in the second column display missing rates for the father, mother and offspring, respectively. The results in the first seven rows indicate that, when offspring genotypes are MCAR, the TDT and HRR are valid tests at 5% nominal level as seen in Guo et al. [10]. However, the 1-TDT and EM-HRR were invalid due to inflated type I error over the nominal level when parental genotypes are missing informatively (row 2–7), which matches the results in Allen et al. [7] and Chen [8]. In addition to previous findings, we also discovered that power of the 1-TDT and EM-HRR can be not only inflated (row 2–4), but also reduced (row 5–7) compared to the scenario under MCAR (row 1), providing that the missing rate for genotype "11" is preferentially higher or lower.

The remaining missing patterns (row 8–13) are when all family members are missing informatively. By excluding incomplete triads from the analysis, the TDT and HRR are no longer valid for testing linkage and association. However, incorporation of dyads and monads reduced such biases. We also found that power of the TDT and HRR excluding incomplete triads can be either reduced (row 8–10) or inflated (row 11–13) compared to the scenario under MCAR (row 1) when the missing rate for genotype 11 is preferentially higher or lower.

## Discussion

The TDT was introduced to test for linkage and association between a marker and a putative disease locus using case-parent triads. Assuming that offspring genotypes are missing complete at random, the TDT excluding incomplete triads is considered a valid test even when parental genotypes are missing informatively. However, if a specific genotype is missing preferentially for parents, it is also likely to occur for the affected offspring.

In this article, the conditional distribution of ascertained triads allowing informative missingness for offspring genotypes as well as their parental genotypes was derived. Through mathematical calculations, we prove that the TDT and HRR do not provide a valid test for linkage and association under such a missing pattern. In addition, we confirmed our conclusion based on computer simulations, since we observed inflated type I error and/or reduced power for the TDT and HRR under such scenarios. Therefore, if the missing pattern for offspring genotypes is not confirmed to be completely at random, a significant result from the TDT or HRR using only complete triads does not assure true association between the marker and a putative disease locus.

## References

- 1.
Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin dependent diabetes mellitus. Am J Hum Genet. 1993, 52: 506-516.

- 2.
Clayton D: A generalization of the transmission/disequilibrium test for uncertain haplotype transmission. Am J Hum Genet. 1999, 65: 1170-1177. 10.1086/302577.

- 3.
Weinberg CR: Allowing for missing parents in genetic studies of case-parent triads. Am J Hum Genet. 1999, 64: 1186-1193. 10.1086/302337.

- 4.
Sun F, Flanders W, Yang Q, Khoury J: Transmission disequilibrium test (TDT) with only one parent is available: the 1-TDT. Am J Epidemiol. 1999, 150: 97-104.

- 5.
Guo CY, DeStefano AL, Lunetta KL, Dupuis J, Cupples LA: Expectation maximization algorithm based haplotype relative risk (EM-HRR): test of linkage disequilibrium using incomplete case-parent trios. Hum Hered. 2005, 59: 125-135. 10.1159/000085571.

- 6.
Falk CT, Rubinstein P: Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987, 51: 227-233. 10.1111/j.1469-1809.1987.tb00875.x.

- 7.
Allen AS, Rathouz PJ, Satten GA: Informative missingness in genetic association studies: case-parent designs. Am J Hum Genet. 2003, 72: 671-680. 10.1086/368276.

- 8.
Chen YH: New approach to association testing in case-parent designs under informative parental missingness. Genet Epidemiol. 2004, 27: 131-140. 10.1002/gepi.20004.

- 9.
Ott J: Statistical properties of the haplotype relative risk. Genet Epidemiol. 1989, 6: 127-130. 10.1002/gepi.1370060124.

- 10.
Guo CY, Cui J, Cupples LA: Impact of non-ignorable missingness on genetic tests of linkage and/or association using case-parents trios. BMC Genet. 2005, 6 (Suppl 1): S90-10.1186/1471-2156-6-S1-S90.

## Acknowledgements

This work was supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contract No. N01-HC-25195).

We thank Dr. Bickeböller, Dr. Goddard and three anonymous reviewers for their insightful comments and suggestions.

This article has been published as part of *BMC Proceedings* Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.

## Author information

## Additional information

### Competing interests

The author(s) declare that they have no competing interests.

## Rights and permissions

## About this article

#### Published

#### DOI

### Keywords

- Parental Genotype
- Genetic Analysis Workshop
- B2B2 Genotype
- Inflated Type
- Affected Offspring