############################## ## Simulated data - sim.tgz ## ############################## There are 3 main parts to the simulated data Part I : 100 datasets, 1Mb regions, 30 trios, 4 scenarios 1: vanilla coalescent (recomb rate varies across datasets) (sub-directories trialg1) 2: 1 + hotspot model - (sub-directories trialg2) 3: 2 + demography - (sub-directories trialg3) 4: (a) Another realisation of scenario 3 - (sub-directories trial g4) (b) 4(a) + missing data Part II: Same as Part I but 90 unrelated individuals rather than 30 trios - (sub-directories trialh1-4) Part III : 100 datasets, 100 Kb regions, 90 unrelated individuals, coalescent + hotspot + demography + no missing data. This part is designed to compare the methods on small regions and will be of considerable interest to alot of people who collect data at this scale. There is likely to be less difference between the methods at this scale - (sub-directory trialjj) NB : Please remember that for for trialg4 and trialh4 i need answers to the datasets both with and without missing data. The directories contain the following files pos.info:i [i=1-100] - positions of the loci in each dataset pgenos.haps:i [i=1-100] - the parents genotypes (for the trio datasets trialg1-4) cgenos.haps:i [i=1-100] - the childrens genotypes (for the trio datasets trialg1-4) genos.haps:i [i=1-100] - the individual genotypes (for the unrelated datasets trialh1-4, trialjj) ################################################ ## Real data - alt_univ.tgz and real_univ.tgz ## ################################################ alt_univ.tgz contains 100 datasets constructed from HapMap data. For each region 30 datasets were constructed each with one "alternative universe" child. For each region the performance of the methods will be assessed using only the trios with an AU child. Thus there are 3000 datasets to phase in this directory. The basic naming format is similar to before pgenos.haps.i [i=1-100] - the parents genotypes posinfo.haps.i [i=1-100] - positions of the loci in each dataset but also cgenos.haps.i.j [i=1-100, j=1-30] - corresponds to the jth child being replaced by the alternate universe in the ith data set real_univ.tgz contains 100 regions without children (so just the pgenos.haps.i and posinfo.i files)