Tuesday, 5 August 2014

Summary: Genetics, math and statistics

Variance components

Variance and variance analysis have been discussed earlier in this blog, so for now let's just remind ourselves that variance expresses the variation of the data: how far apart are the extreme values in the current dataset. Variance is also expressed in the same units than the data, so the unit affects the amount of variance (for example 1,5kg vs 1500g). 

Variance components are just the different factors that create variation between measurements. 
Variance components in animal breeding can be viewed from two perspectives: genetic vs environmental contribution (which together are the phenotypic variance), or dam/sire vs residual variance. Additive, maternal and dominance effects together form the genetic contribution. Common and general environment form the environmental contribution . Therefore

P = Var(A) + Var(M) + Var(D) + Var(e) + Var(eg).

All these contributions are built from variance from the dam, the sire and the residual. The chart below shows how the different components (pillars) are built. Note that sire variance (Var(s)) affects only additive genetic variance, of which it constitutes 25 %. Therefore Var(s) = 0,25 Var(A), and Var(A) = 4*Var(s). Dam variance is also 25 % of the additive genetic effect and dominance effect, but 100 % of common environmental and maternal effects.

Only some of the components above are inherited from parent to offspring. The rest are often ignored, so the formula can be simplified into

Var(P) = Var(A) + Var(M) + Var(e).

One interesting aspect about variation is that when the reliability of the breeding value, rTI, increases, the variance of estimated breeding values increases as well. This is because the higher the reliability, the better we see the differences between the animals, and the more variation we get. However, when the reliability increases, the variance of the true breeding values decreases between animals with the same estimated breeding value. This is of course because the increased reliability brings our estimation closer to the true breeding value. One has to consider variance in its context.

Heritability h2 is often written as additive genetic variance divided by phenotypic variance, i.e Var(A) / Var(P). Considering the previous formulas, heritability can be deduced from sire variance:  h2 = 4 * (Var(s) / Var(P)).


Change of gene frequency under selection is

where q1 is the frequency of the selected gene after one generation, q is the square root of the original frequency (q2), s is the coefficient of selection and q2 is the original frequency as per the Hardy-Weinberg equation (q2 + 2pq + p2 = 1). 

Number of generations required
The number of generations required to achieve a certain breeding objective is calculated thusly:
where t is the number of generations, qt is the gene frequency after t generations and q0 is the original gene frequency. qt = q0 / (1 + tq0).


Coefficient of selection, s
Coefficient of selection is the proportionate reduction of gametic contribution of a genotype compared to the standard genotype. It shows how much less animals of a certain genotype, usually the less facorable, affect the next generation when selection takes place. For example how much less gametes do unpolled animals contribute compared to polled, when polled ones are selected for breeding.
The contribution of the favorable genotype is 1, the coefficient is s so the contribution of the less favorable genotype is 1-s.
If s = 0,1, then the contribution of the favorable genotype is 1 and the contribution (and the fitness) of the less favorable is 1-0,1 = 0.9. In practice, for each 100 zygotes by the favorable genotype, 90 zygotes are born by the less favorable.

1 comment:

  1. Wish I had read it when I was in college, very well explained the concept of maths and stats but as people say it is never too late. Thank you for sharing such informative blog with us