#+title: Genetic drift
#+seq_todo: TODO | DONE
#+property: cache no
* A simple model of evolution
Evolution is the result of changes in the genetic composition of
populations over time. One of the simplest models of evolution is as
follows. There is a population of N individuals, among which there
are two genetic types: red and blue. Here is the initial
generation of the population (N=10).
#+begin_src ditaa :file drift-1-gen.png :cmdline -r :exports none
/----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
Generation 1 |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
| | | | | | | | | | | | | | | | | | | |
+----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
#+end_src
#+results:
There is no mutation, no selection and no sex; the next generation
is made up by randomly choosing 10 individuals from the previous
generation. A single individual can be chosen more than once,
or not at all; the number of times an individual is chosen
corresponds to the number of progeny it has in the next
generation. Even without mutation or natural selection the
proportions of red and blue types will change, because different
individuals will have different numbers of offspring, by chance.
So the first two generations might look like this.
#+begin_src ditaa :file drift-2-gen.png :cmdline -r :exports none
/----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
Generation 1 |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
| | | | | | | | | | | | | | | | | | | |
+----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
/----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
Generation 2 |cBLU| |cBLU| |cRED| |cRED| |cRED| |cBLU| |cRED| |cRED| |cBLU| |cBLU|
| | | | | | | | | | | | | | | | | | | |
+----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
#+end_src
#+results:
This is a form of evolution called "genetic drift". It is inevitable,
although if the population is very large it will have less effect.
Let X_t be the number of red individuals in generation t, and let p_t
be the proportion of red individuals. So X_1 is 6 and p_1 is 0.6. To
choose the number of red individuals in generation 2 we make 10
choices, each time having probability 6/10 of getting a red
individual. So X_2 is a /binomial/ random variable, with 10 trials and
success probability 0.6. In general, the random process is described
by the following transition probabilities.
#+begin_src latex :file transprob.png :exports results
\begin{equation}
\Pr(X_t=j|X_{t-1}=i) = \frac{j(j-1)}{2}\Big(\frac{i}{N}\Big)^j\Big(\frac{N-i}{N}\Big)^{n-j}
\end{equation}
#+end_src
#+results:
We can simulate the evolution over many generations in R. This code
simulates the change in frequency in a single population over 100
generations. We'll make the population larger (N=1000) but still
start off with 60% red individuals.
#+source: simpledrift(N=1000, X1=600, ngens=100)
#+begin_src R :file simpledrift.png :exports code
p <- numeric(ngens)
p[1] <- X1/N
for(g in 2:ngens)
p[g] <- rbinom(1, size=N, prob=p[g-1]) / N
plot(p, type="l", ylim=c(0,1), xlab="Generation", ylab="Proportion red")
#+end_src
#+results: simpledrift
But how variable is this process? To answer this we need to repeat
the simulation many times (i.e. simulate many identical but
independent populations). We could do that as follows
#+begin_src R :session t
drift.slow <- function(N, X1, ngens, nreps) {
p <- matrix(NA, nrow=ngens, ncol=nreps)
p[1,] <- X1/N
for(rep in 1:nreps) {
for(g in 2:ngens)
p[g,rep] <- rbinom(1, size=N, prob=p[g-1,rep]) / N
}
p
}
#+end_src
But this is not a good implementation. One should make use of
"vectorisation", which makes the simulation much more efficient when
there are many replicates. Note the way that rbinom simulates
all replicates at once, but still one generation at a time.
#+begin_src R :session t
drift.faster <- function(N, X1, ngens, nreps) {
p <- matrix(NA, nrow=ngens, ncol=nreps)
p[1,] <- X1/N
for(gen in 2:ngens)
p[gen,] <- rbinom(n=nreps, size=N, prob=p[gen-1,]) / N
p
}
#+end_src
To run the simulation:
#+source: drift(N=1000, X1=600, nreps=10, ngens=100)
#+begin_src R :session t :file repdrift.png :exports code
p <- drift.faster(N, X1, ngens, nreps)
matplot(p, type="l", ylim=c(0,1), lty=1)
#+end_src
#+results: drift
And let's quickly see how much of a speed difference the vectorisation
makes.
#+source: compare-times(N=1000, X1=600, nreps=1000, ngens=100)
#+begin_src R :session t :colnames t :results output
functions <- c(drift.slow=drift.slow, drift.faster=drift.faster)
times <- sapply(functions, function(f) as.numeric(system.time(f(N, X1, ngens, nreps))[1]))
print(times)
cat(sprintf("\nFactor speed-up = %.1f\n", times[1] / times[2]))
#+end_src
#+results: compare-times
: drift.slow drift.faster
: 6.053 0.204
:
: Factor speed-up = 29.7
* Footnotes...
* Config :noexport:...
* Tasks :noexport:...