| SU1 |
100 data sets of 90 unrelated individuals simulated with constant recombination rate across the region, constant population size, and random mating. Each of the 100 data sets consisted of 1 Mb of sequence |
| SU2 |
Same as SU1, but with the addition of a variable recombination rate across the region. |
| SU3 |
Same as SU2, except a model of demography consistent with white Americans was used. |
| SU4 |
Same as SU3, with 2% missing data (missing at random). |
| SU-100kb |
Since some studies may be concerned only with the performance of phasing algorithms on lengths of sequence shorter than 1 Mb, we simulated a set of data sets identical to set SU3, except that the sequences were only 100 kb in length. Each of these 100-kb data sets was created by subsampling a set of 1,180 simulated haplotypes. The remaining 1,000 haplotypes were used to estimate the true population haplotype frequencies. This allowed a comparison of each method s ability to predict the haplotype frequencies in a small region of interest. |
| ST1 |
100 data sets of 30 trios simulated with constant recombination rate across the region, constant population size, and random mating. Each of the 100 data sets consisted of 1 Mb of sequence. |
| ST2 |
Same as ST1, but with the addition of a variable recombination rate across the region. |
| ST3 |
Same as ST2, except a model of demography consistent with white Americans was used. |
| ST4 |
Same as ST3, with 2% missing data (missing at random). |
| RU |
We used HapMap CEU sample to create artificial data sets of unrelated individuals by simply removing the children from each of the trios. Since the phase of a large number of heterozygous genotypes will be known from the trios, we can use these phase-known sites to assess the performance of the algorithms for unrelated data. One hundred 1-Mb regions were selected at random from the CEU sample and processed in this way. |
| RT-CEU |
100 data sets consisting of 30
HapMap CEU trios across 1 Mb of
sequence. For each data set, we created 30 new data sets, each with a
different trio altered so that the transmission status of the alleles
in one of the parents is switched. By switching only one trio at a time
to create a new data set, the majority of the genotypes are unaltered,
and a minimum amount of new missing data is introduced. In each region,
the error rates for the different algorithms were calculated using only
the phase estimates in the altered trios. RT-YRI Same as RT-CEU, except 30 HapMap YRI trios were used. |
| RT-YRI |
Same as RT-CEU, except 30 HapMap YRI trios were used. |
| Input files |
Answer files |
| SU1/genos.haps.1 | SU1/genos.haps.res.1 |
| ST1/pgenos.haps.1, ST1/cgenos.haps.1 | ST1/genos.haps.res.1 |
| RT-CEU1/pgenos.haps.1, RT-CEU/cgenos.haps.1.5 | RT-CEU/genos.haps.res.1.5 |
| HAP |
Lin S, Cutler DJ, Zwick ME, Chakravarti A (2002) Haplotype inference in random population samples. Am J Hum Genet 71:1129-1137 |
| HAP2 |
Eskin E, Halperin E, Karp R (2003) Efficient reconstruction of haplotype structure via perfect phylogeny. J Bioinform Comput Biol 1:1-20 |
| PLEM |
Qin ZS, Niu T, Liu JS (2002) Partition-ligation expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet 71:1242-1247 |
| wphase |
Unpublished algorithm by Nick Patterson, Broad Institute of MIT and Harvard (see Marchini et al. (2006) for more details. |
| PHASE
v2.1 |
Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162-1169 |
| fastPHASE |
As yet unpublished algorithm
written by Paul Sheet and Matthew Stephens. |
| Switch |
Switch error is the percentage
of possible switches in haplotype orientation, used to recover the
correct phase in an individual or trio (Lin et al. (2004)). |
| IHP
|
Incorrect Haplotype Percentage
(IHP) is the percentage of ambiguous individuals whose haplotype
estimates are not completely correct (Stephens et al. 2001). It is
worth noting that, as the length of the considered region increases,
all methods will find it harder to correctly infer entire haplotypes.
Thus, this measure will increase with genetic distance and eventually
reach 100%, once the region becomes long enough. |
| IGP |
Incorrect Genotype Percentage
(IGP). We counted the number of genotypes (ambiguous heterozygotes and
missing genotypes) that had their phase incorrectly inferred and
expressed them as a percentage of the total number of genotypes. To
calculate this measure, we first aligned the estimated haplotypes with
the true haplotypes, to minimize the number of sites at which there
were phase differences. For the trio data, this alignment is fixed by
the known transmission status of alleles at nonambiguous sites. For the
real data sets in which the truth for the missing data was not known,
we removed such sites from consideration in both the numerator and the
denominator. We believe the utility of this measure lies in its
comparison with levels of genotyping error and missing data. |
| Missing
Error |
Missing error is the percentage
of incorrectly inferred missing data. To calculate this measure, we
first aligned the estimated haplotypes with the true haplotypes, to
minimize the number of sites at which there were phase differences.
This alignment ignored the sites at which there was missing data.We
then compared the estimated and true haplotypes at the sites of missing
data and counted the number of incorrectly imputed alleles and then
expressed this as a percentage of the total number of missing data. |
| SU1 | Switch |
IHP |
IGP |
| PHASE
v2.1 |
2.41 | 35.46 | 2.54 |
| wphase | 3.65 | 48.02 | 3.50 |
| fastPHASE | 4.47 | 65.25 | 5.65 |
| HAP | 6.53 | 88.62 | 7.86 |
| HAP2 | 6.92 |
73.50 | 7.11 |
| PLEM | 8.98 | 61.13 | 5.81 |
| SU2 | Switch |
IHP |
IGP |
| PHASE v2.1 | 2.21 | 40.39 | 2.45 |
| wphase | 3.66 | 55.51 |
4.29 |
| fastPHASE | 6.92 |
88.68 |
7.84 |
| HAP | 9.75 |
97.15 |
9.47 |
| PLEM | 13.18 |
83.42 |
9.47 |
| HAP2 | 15.14 | 99.00 | 11.03 |
| SU3 | Switch |
IHP |
IGP |
| PHASE v2.1 | 4.79 |
59.08 | 5.08 |
| fastPHASE | 5.64 | 76.22 | 7.00 |
| wphase |
6.62 | 66.42 | 5.79 |
| HAP | 7.13 | 90.06 | 8.47 |
| HAP2 | 8.21 | 85.10
|
8.59 |
| PLEM | 11.02 | 81.42 | 8.21 |
| SU4 | Switch |
IHP |
IGP |
Missing
Error |
| PHASE v2.1 | 5.04 | 60.80 | 5.24
|
7.29 |
| fastPHASE | 5.75 |
75.98 | 6.75
|
9.06 |
| wphase |
6.60 |
67.97 | 5.98
|
10.14 |
| HAP | 7.44 |
90.64 | 8.39
|
11.59 |
| HAP2 | 8.73
|
87.06 | 8.65 | 15.02 |
| PLEM | 10.86 | 81.49 | 8.03
|
19.36 |
| SU-100kb | Switch |
IHP |
IGP |
| PHASE v2.1 | 4.16 | 17.17 | 1.53 |
| wphase |
5.08 | 19.44 | 1.77 |
| HAP | 5.39 | 21.83 | 1.94 |
| HAP2 | 5.44 | 22.22 | 2.05 |
| PLEM | 7.92 |
24.69 | 2.34 |
| ST1 | Switch |
IHP |
IGP |
| PHASE v2.1 | 0.74 | 5.55 | 0.06 |
| wphase |
0.98 | 6.52 | 0.08 |
| HAP | 2.14 | 12.79 | 0.17 |
| HAP2 | 2.58 | 17.16 | 0.23 |
| PLEM | 3.03 | 18.59 | 0.24 |
| ST2 | Switch |
IHP |
IGP |
| PHASE v2.1 | 0.22 | 1.89 | 0.02 |
| wphase |
0.22 | 1.89 | 0.02 |
| HAP | 1.52 | 11.44 | 0.11 |
| PLEM |
2.88 | 21.15 | 0.20 |
| HAP2 |
5.97 | 36.23 | 0.43 |
| ST3 | Switch |
IHP |
IGP |
| PHASE v2.1 | 1.36 | 10.36 | 0.13 |
| wphase |
2.23 | 14.21 | 0.20 |
| HAP | 2.40 | 17.01 | 0.21 |
| HAP2 |
2.95 |
20.76 | 0.27 |
| PLEM |
3.81 | 24.80 | 0.33 |
| ST4 | Switch |
IHP |
IGP |
Missing
Error |
| PHASE v2.1 | 1.48 | 10.30 | 0.13 | 1.46 |
| wphase |
2.34 | 14.67 | 0.20 | 1.89 |
| HAP | 2.63 | 17.83 | 0.22 | 4.36 |
| HAP2 | 3.17 | 21.31 | 0.30
|
5.26 |
| PLEM | 4.12 | 25.06 | 0.36
|
3.38 |
| RU |
Switch |
IHP |
IGP |
| PHASE v2.1 | 8.41
|
77.66 | 2.69 |
| fastPHASE |
9.21 |
83.57 | 3.02 |
| HAP | 10.72 | 87.96 | 3.26 |
| HAP2 |
12.56 | 87.67 | 3.39 |
| RT-CEU |
Switch |
IHP |
IGP |
| PHASE v2.1 | 0.53 |
6.20 |
0.05 |
| HAP2 | 2.05 | 20.42 | 0.33 |
| HAP |
2.95 | 20.78 | 0.40 |
| RT-YRI |
Switch |
IHP |
IGP |
| PHASE v2.1 | 2.16 |
15.7 |
0.16 |
| HAP |
4.44 | 29.25 | 0.33 |