干细胞之家 - 中国干细胞行业门户第一站

标题: Model Selection and the Molecular Clock [打印本页]

作者: 春天的风筝    时间: 2009-4-23 09:05     标题: Model Selection and the Molecular Clock

Oliver G. Pybus is in the Department of Zoology, University of Oxford, Oxford, United Kingdom. E-mail: oliver.pybus@zoo.ox.ac.uk
; b, M& M6 s  W9 u2 K2 k( ]- I1 {, q
There are no mathematical equations in On the Origin of Species. A good thing too, you might think, and it is undoubtedly true that Darwin's clear and flowing narrative style helped ensure the popularity of his writings. Modern research in evolutionary biology can make for less easy reading. Much of it concerns the development of an expanding arsenal of mathematical and statistical techniques, necessary to do battle with the relentless onslaught of gene and genome sequences. Of course, the discrete, ordered nature of genetic information and the stochastic character of Mendelian inheritance have naturally lent themselves to numerical analysis. Consequently, the mathematical foundations of evolutionary genetics have, somewhat unusually for biology, tended to precede the data to which they are applied. The Genetical Theory of Natural Selection by R. A. Fisher, published only fifty years after Darwin's death, is full of equations [1]." ^8 I6 a" z- ~
+ e$ d; O* ~* J6 U* b9 G1 L& q2 Z
The simplest weapon in the armoury of evolutionary genetics is genetic distance, a measure of the number of evolutionary changes between sequences from different organisms. Genetic distances can be calculated for a pair of sequences by simply counting the number of nucleotides or amino acids that differ between them. Unfortunately, this approach underestimates the amount of evolutionary change because it does not account for the fact that each site may change more than once during evolutionary history. Statistical tools, called nucleotide or amino acid substitution models, are therefore used to estimate genetic distances between sequences. There is a bewildering hierarchy of substitution models available, each making a different and specific set of assumptions about the evolutionary process of sequence change [2]. The simplest models assume that all types of mutation are equivalent and that all sites in a sequence change at the same rate. More complex models loosen these assumptions, allowing heterogeneity in the process of sequence change, but they can be reliably applied to larger datasets only. The task of deciding amongst these competing models is known as statistical model selection and can be thought of as a trade-off between model accuracy and model complexity. The degree to which a model fits the data at hand (accuracy) is always improved by adding more parameters (complexity), but since the amount of data remains constant the statistical uncertainty about each parameter increases. In addition, the biological meaning of each parameter becomes harder to decipher so the explanatory power of the model decreases (Figure 1). Thus the chosen model should have enough parameters to adequately explain the data〞but no more. Once an appropriate model is chosen, genetic distances are combined using other statistical techniques to generate a phylogenetic tree of the sequences being studied [2]. The lengths of the branches in the phylogeny thus represent estimated numbers of sequence changes (Figure 2A).
' W) _% J' {4 l0 M# J
# `, u( E4 U# N6 c  D(A) A hypothetical dataset consisting of thirteen points plotted on two axes. (B) A simple model, represented by a straight line through the points. This model has few parameters but does not fit the data particularly well. (C) A very complex model, which fits the data almost perfectly but has too many parameters. The estimated parameters tell us little about the biological process that gave rise to the data. (D) A model with an intermediate number of parameters represented by a curve. This fits the data well but still has relatively few parameters and therefore has greater explanatory power.
9 d' q. I% r; U$ i1 E0 Z5 E
! Q; d3 h- ]- w) k8 pThe circles represent common ancestors: the common ancestor of A, B, and C (black circle), the common ancestor of D and E (white circle), and the overall common ancestor of all five (grey circle). (A) A phylogeny generated using the no-clock model of evolution. Such phylogenies are ※unrooted§; that is, the position of the overall common ancestor cannot be identified. Branch lengths represent genetic distance, not time. (B) A phylogeny generated using the relaxed-clock model. The overall common ancestor is identified; hence the phylogeny is ※rooted§. Branch lengths represent time. The thickness of each branch indicates the rate of evolution of that branch. (C) A phylogeny generated using the strict-clock model. This is the same as the relaxed-clock case, except that the rate of evolution is identical for every branch.: Q0 \, P8 E2 w% I/ ?4 a0 O7 O
  U% r9 {0 r! {- y1 t3 m
However, genetic distances are rather crude indicators of evolutionary history. A small genetic distance between two sequences may suggest a recent common ancestor, but is also consistent with a slower rate of sequence change and a more ancient common ancestor (i.e., genetic distance = evolutionary rate ℅ time). Genetic distances alone are therefore of little use if, for example, we wish to know the age of the common ancestor of mammals, or the rate at which bacterial antibiotic resistance genes evolve. Such questions can be answered only if independent information about rates or divergence times is found. Often paleontology or biogeography can provide a date for one or more points in a phylogeny, which are then used to ※calibrate§ the timescale for the rest of the phylogeny [3,4]. Less commonly, sequences sampled at different times can provide an estimated rate of evolution; this requires either a very fast evolutionary rate (e.g., rapidly evolving RNA viruses [5]) or widely spaced sampling times (e.g., ancient DNA from sub-fossil samples [6]). Whatever the source of the independent information, it is usual to calibrate a phylogeny by assuming that all its branches evolve at the same rate〞i.e., there is a constant but stochastic ※molecular clock§ of sequence change. The concept of the molecular clock originated in the early 1960s and has since been used widely, more as a result of its downright usefulness than its biological accuracy, as it is clear that rates of evolution can and do vary considerably among species [4,7]. Evolutionary rates depend on a combination of factors: generation time, population size, metabolic rate, the efficacy of DNA repair, and the degree to which mutations are beneficial or deleterious, all of which may vary among species. As the geneticist Steve Jones recently remarked, evolutionary biologists seem to use the molecular clock ※with our fingers crossed§ [8].  E* C9 G. ?, W; l3 {7 m; J+ j- N

  {2 r) {' t% z! L& xThe article by Alexei Drummond, Andrew Rambaut, and colleagues in this issue of PLoS Biology [9] gives us reason to uncross our fingers a little. The paper describes a new ※relaxed§ approach to the estimation of phylogeny divergence times. A relaxed molecular clock is a phylogenetic technique that allows the rate of sequence evolution to vary among groups of organisms, or more generally, among different parts of a phylogeny (Figure 2B). The use of a single rate across the whole phylogeny is termed a ※strict§ clock (Figure 2C). Such methods have developed steadily in the past ten years (e.g., [10,11]) and can now be applied to large datasets due to the continued increase in computer processing speed. A common aspect of previous relaxed-clock approaches is that they considered closely related organisms to have similar rates of evolution. On a phylogeny this means that neighbouring branches have more similar rates than distant branches, a property termed ※autocorrelation§. The idea that rates of sequence evolution can be ※inherited§ in this way played an important role in the history and development of evolutionary theory [7,12], and it is well known that viruses, bacteria, and animals evolve at hugely different rates, but the assumption has never been comprehensively tested. Drummond et al.'s new method allows phylogeny branches to vary in rate, but it does not assume these rates are correlated among adjacent branches (Figure 2). Thus their relaxed clock is slightly more laid-back than its predecessors, and crucially it can estimate the level of autocorrelation in each dataset. A further advantage of their approach is that it simultaneously estimates both phylogeny shape and rate variation among phylogeny branches, two tasks that previously had to be performed separately.% ~: d3 w2 f6 J& K% |) k7 A' \
7 q% g5 S  o' W' F3 S% G5 a' b$ Q
We should note that Drummond et al's paper emphasises the fact that molecular clocks exist as a family of statistical models, analogous to the hierarchy of substitution models discussed earlier, among which the most appropriate model should be chosen. When constructing a phylogeny, many researchers opt not to ※enforce§ a molecular clock, perhaps believing that they are avoiding having to make any possibly unrealistic assumptions about evolutionary rates. In truth, this ※no-clock§ approach is equivalent to using an evolutionary model that assumes no limit to the variation in evolutionary rate among branches. In fact, it has a separate evolutionary rate parameter for each branch in the phylogeny. If, as is often the case, rate variation among organisms is not great, then the no-clock model will have an unnecessarily large number of parameters, leading to an increase in statistical uncertainty and, in some circumstances, poorer estimates of phylogeny shape. Drummond et al. analysed five large datasets, containing sequences from bacteria, yeast, plants, animals, and primates, and found that in every case their relaxed-clock model identified the ※true§ phylogeny slightly more often than the no-clock model. Importantly, the relaxed-clock estimates were more certain than those of the no-clock model, as expected given the greater number of parameters in the latter model [9]. A key area of future research will be to investigate these results using statistical model selection theory.' L- U( C4 u! K! D$ x" e3 S3 c  r
8 I3 P) {& c% M% }: P5 k
It is perhaps surprising that gene sequences contain sufficient information to estimate as complex a process as evolutionary rate variation among organisms. But it is well known that if phylogenies are constructed using the no-clock model, then the genetic distances of sequences to a shared common ancestor are unequal (Figure 2A). Since sequences are sampled at the same time (on an evolutionary scale), the times to the common ancestor will be identical for each; hence the variation in genetic distance directly reflects the variation in evolutionary rate since the common ancestor. This valuable information about the evolutionary process is ignored whenever the no-clock model is used, despite it being used for many years in the relative rates test, a statistical test used to detect evolutionary rate variation [7].9 X  J! X; r7 q. n9 f: g) Q! r+ S& ]

& s6 h! I, c* j, F9 [It is likely that the widespread adoption of relaxed-clock models in phylogenetics will act as a stepping-stone to even more intricate models of sequence change. Work has already begun on combining evolutionary rate variation among organisms with rate variation among genomic sites, so that particular sets of sites are able to evolve quicker or slower on different sets of branches [13]. This could be important if certain parts of a gene are under selection in some species but not others. This complex situation, known as heterotachy, is currently the subject of debate amongst phylogeneticists, as it is unclear whether model-based statistical approaches are better than ※model-free§ parsimony methods that appear not to make assumptions about the evolutionary process [13,14]. In many ways, this debate echoes the relaxed-clock and no-clock comparison discussed above. It is quite possible that in this case, too, the model-free parsimony methods are making implicit assumptions about the nature of rate variation among sites and lineages, but the underlying process is so complicated that it will take time for these assumptions to be fully understood. The complexity of heterotachy will also require larger datasets than are currently used in phylogenetics. But in the midst of a revolution in high-speed genomics, it is not sequence data we are short of, but tools for statistical analysis〞and the equations on which they are based.
9 E% O7 P0 V3 |. g' q& b
# r" z; n8 S9 d8 L  R5 ~$ g$ w5 XAcknowledgments
1 a2 U" S& _- ^- Y1 u( E
3 Q+ r- O3 r# Z% \Funding. OGP was funded by the Royal Society.
! a4 S) Q& C2 H" d
, N( Y/ A! x+ B9 p0 l/ g7 d0 kReferences& F3 u6 D. `. b3 _
! ~3 C$ f# i+ t. ]- p# v
Fisher RA (1999) The genetical theory of natural selection. A complete variorum edition. Oxford: Oxford University Press. 354 p./ `9 \0 w8 n+ {; x3 k4 _. D/ V' H

* u& Z9 J* i, HFelsenstein J (2004) Inferring phylogenies. Sunderland (Massachusetts): Sinauer Associates. 664 p.$ M! |& m1 }& |8 I# z( j. y2 D

+ `  V9 Q8 a) lZuckerkandl E, Pauling L (1962) Molecular disease, evolution and genic heterogeneity. In: Kasha M, Pullman B, eds. Horizons in biochemistry New York: Academic Press. 604 p.
5 q' z* T: z: i* r1 P9 u$ F9 j# H% Z& m  i- V+ o
Bromham L, Penny D (2003) The modern molecular clock. Nat Rev Genet 4:216–224.
/ k% i8 h, X$ [' h7 u3 c# D) _
. ?, ?+ n4 Z0 }5 j  b+ r$ ~Buonagurio DA, Nakada S, Parvin JD, Krystal M, Palese P, et al. (1986) Evolution of human influenza A viruses over 50 years: Rapid, uniform rate of change in NS gene. Science 232:980–982.: o2 ~( Z9 ~8 K% H0 U& r
1 H+ ?5 _) @' `- e* |
Lambert DM, Ritchie PA, Millar CD, Holland B, Drummond AJ, et al. (2002) Rates of evolution in ancient DNA from Adelie penguins. Science 295:2270–2273.3 j" k, A1 D* G! Q8 f, _5 b

: i( G4 }2 h" ?, NKumar S (2005) Molecular clocks: Four decades of evolution. Nat Rev Genet 6:654–662.
# l$ L3 j1 v4 i3 U$ L6 b' C4 e3 {: n2 v8 ?
Jones S (2006) In our time: Human evolution. BBC Radio 4. First broadcast 16 February 2006. Available: http://www.bbc.co.uk/radio4/history/inourtime/inourtime_20060216.shtml. Accessed 20 March 2006.
2 f4 D3 l) K8 q' T
% o  E) }2 c( }% x4 C. m4 `: LDrummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88 DOI: 10.1371/journal.pbio.0040088.: l4 R4 V  L$ u$ ]8 O
1 ]$ N4 s5 r# z9 l: R9 V
Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231.' C3 C4 l2 v8 e& ?3 [5 a
, p+ n, \* E* v" b3 O; }
Thorn JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of evolution. Mol Biol Evol 15:1647–1657.
  e8 _  D8 L; z9 |9 j) s
2 ?: Z1 R0 d# kGillespie JH (1991) The causes of molecular evolution. New York: Oxford University Press. 352 p.
, ~- e) D1 g& Z+ l# M8 q+ K9 {$ l$ t5 [5 L
Kolaczkowski B, Thorton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984.
; f, }3 d* t+ d& S' n
" Z( S) G0 a( o; @Steel M (2005) Should phylogenetics be trying to ※fit an elephant§ Trend Genet 21:307–309.(Oliver G. Pybus)
作者: sky蓝    时间: 2015-5-28 16:02

风物长宜放眼量  
作者: 依旧随遇而安    时间: 2015-6-3 12:24

只有一条路不能选择——那就是放弃的路;只有一条路不能拒绝——那就是成长的路。  
作者: xuguofeng    时间: 2015-7-12 18:42

哈哈,顶你了哦.  
作者: bluesuns    时间: 2015-7-15 12:16

顶你一下,好贴要顶!  
作者: sky蓝    时间: 2015-7-21 20:07

每天早上起床都要看一遍“福布斯”富翁排行榜,如果上面没有我的名字,我就去上班……  
作者: tempo    时间: 2015-7-24 16:01

一楼的位置好啊..  
作者: 榴榴莲    时间: 2015-7-29 15:27

做对的事情比把事情做对重要。  
作者: 舒思    时间: 2015-8-10 22:22

呵呵 都没人想我~~  
作者: bluesuns    时间: 2015-8-24 19:43

又看了一次  
作者: immail    时间: 2015-8-29 12:10

风物长宜放眼量  
作者: 陈晴    时间: 2015-9-9 14:28

是楼主原创吗  
作者: marysyq    时间: 2015-9-12 11:53

拿把椅子看表演
作者: sky蓝    时间: 2015-10-4 21:41

呵呵,找个机会...  
作者: txxxtyq    时间: 2015-10-7 11:41

这个贴不错!!!!!看了之后就要回复贴子,呵呵  
作者: 龙水生    时间: 2015-10-20 14:27

(*^__^*) 嘻嘻……  
作者: 陈晴    时间: 2015-10-29 14:27

我有家的感觉~~你知道吗  
作者: dypnr    时间: 2015-11-1 10:10

终于看完了~~~  
作者: nauticus    时间: 2015-11-26 17:33

慢慢来,呵呵  
作者: 兔兔    时间: 2015-12-17 11:53

干细胞产业是朝阳产业
作者: txxxtyq    时间: 2016-1-4 20:09

初来乍到,请多多关照。。。嘿嘿,回个贴表明我来过。  
作者: tempo    时间: 2016-1-6 13:54

不错不错.,..我喜欢  
作者: dypnr    时间: 2016-2-18 19:54

干细胞产业是朝阳产业
作者: s06806    时间: 2016-3-2 09:43

我又回复了  
作者: 多来咪    时间: 2016-4-13 08:10

我喜欢这个贴子  
作者: pspvp    时间: 2016-4-13 16:01

HOHO~~~~~~  
作者: biobio    时间: 2016-5-9 14:35

先顶后看  
作者: 草长莺飞    时间: 2016-6-9 17:01

生殖干细胞
作者: 草长莺飞    时间: 2016-6-12 13:35

我帮你 喝喝  
作者: 求索迷茫    时间: 2016-7-3 21:03

我等你哟!  
作者: cjms    时间: 2016-7-10 21:16

真是汗啊  我的家财好少啊  加油  
作者: 生物小菜鸟    时间: 2016-7-22 21:52

嘿嘿......哈哈......呵呵.....哟~呼  
作者: txxxtyq    时间: 2016-8-25 21:42

嘿嘿  
作者: vsill    时间: 2016-9-26 08:44

加油站加油  
作者: 蚂蚁    时间: 2016-11-2 09:10

加油啊!!!!顶哦!!!!!  
作者: 狂奔的蜗牛    时间: 2016-11-11 22:31

说的不错  
作者: 小倔驴    时间: 2016-11-13 23:01

我有家的感觉~~你知道吗  
作者: 快乐小郎    时间: 2016-12-1 17:10

不对,就是碗是铁的,里边没饭你吃啥去?  
作者: 追风    时间: 2016-12-5 15:43

做一个,做好了,请看  
作者: 坛中酒    时间: 2016-12-15 12:01

老大,我好崇拜你哟  
作者: 狂奔的蜗牛    时间: 2016-12-20 14:18

不错啊! 一个字牛啊!  
作者: changfeng    时间: 2016-12-31 20:57

我在努力中  
作者: 快乐小郎    时间: 2017-1-5 13:35

呵呵 大家好奇嘛 来观看下~~~~  
作者: beautylive    时间: 2017-2-1 05:02

我的妈呀,爱死你了  
作者: htc728    时间: 2017-2-19 03:32

经过你的指点 我还是没找到在哪 ~~~  
作者: 365wy    时间: 2017-2-26 16:06

小心大家盯上你哦  
作者: aakkaa    时间: 2017-3-1 23:23

拿把椅子看表演
作者: 365wy    时间: 2017-3-2 08:10

(*^__^*) 嘻嘻……  
作者: happyboy    时间: 2017-3-6 00:01

很有吸引力  
作者: 初夏洒脱    时间: 2017-3-14 18:25

哈哈,顶你了哦.  
作者: nosoho    时间: 2017-4-26 05:10

支持~~顶顶~~~  
作者: 干细胞2014    时间: 2017-5-19 09:27

不错,支持下  
作者: 小小C    时间: 2017-6-3 15:27

非常感谢楼主,楼主万岁万岁万万岁!  
作者: mk990    时间: 2017-6-16 05:22

先顶后看  
作者: 再来一天    时间: 2017-6-23 01:18

顶一个先  
作者: 再来一天    时间: 2017-7-24 17:09

哈哈,这么多的人都回了,我敢不回吗?赶快回一个,很好的,我喜欢  
作者: s06806    时间: 2017-7-31 03:30

回帖是种美德.  
作者: 考拉    时间: 2017-8-15 05:40

加油啊!偶一定会追随你左右,偶坚定此贴必然会起到抛砖引玉的作用~  
作者: doc2005    时间: 2017-8-23 11:27

呵呵,支持一下哈  
作者: 蝶澈    时间: 2017-8-29 19:39

这个贴不错!!!!!看了之后就要回复贴子,呵呵  
作者: leeking    时间: 2017-9-7 14:35

真好。。。。。。。。。  
作者: myylove    时间: 2017-9-25 23:57

长时间没来看了 ~~  
作者: 未必温暖    时间: 2017-10-6 10:11

不管你信不信,反正我信  
作者: 桦子    时间: 2017-10-19 16:05

努力,努力,再努力!!!!!!!!!!!  
作者: htc728    时间: 2017-10-25 00:00

呵呵,找个机会...  
作者: 命运的宠儿    时间: 2017-11-2 15:33

嘿嘿......哈哈......呵呵.....哟~呼  
作者: dongmei    时间: 2017-11-23 14:18

楼上的话等于没说~~~  
作者: dreamenjoyer    时间: 2017-11-26 10:43

我在努力中  
作者: 兔兔    时间: 2017-12-23 09:26

顶的就是你  
作者: yunshu    时间: 2018-1-2 16:42

干细胞治疗糖尿病  
作者: dreamenjoyer    时间: 2018-1-7 13:27

好 好帖 很好帖 确实好帖 少见的好帖  
作者: alwaysniu    时间: 2018-2-11 11:35

不要等到人人都说你丑时才发现自己真的丑。  
作者: biodj    时间: 2018-2-24 15:36

干细胞产业是朝阳产业
作者: 多来咪    时间: 2018-3-2 06:21

在线等在线等  
作者: dataeook    时间: 2018-3-16 13:01

慢慢来,呵呵  
作者: 心仪    时间: 2018-3-20 06:44

快毕业了 希望有个好工作 干细胞还是不错的方向
作者: Diary    时间: 2018-3-28 01:43

要不我崇拜你?行吗?  
作者: syt7000    时间: 2018-5-18 17:41

顶也~  
作者: 咕咚123    时间: 2018-5-22 16:47

呵呵,找个机会...  
作者: www1202000    时间: 2018-6-11 17:01

朕要休息了..............  
作者: frogsays    时间: 2018-6-22 08:09

严重支持!
作者: doors    时间: 2018-7-9 21:00

回个帖子支持一下!
作者: 科研人    时间: 2018-7-11 01:56

胚胎干细胞
作者: 红旗    时间: 2018-7-13 04:56

活着,以死的姿态……  
作者: 榴榴莲    时间: 2018-7-14 23:51

是楼主原创吗  
作者: 加菲猫    时间: 2018-8-16 22:54

一定要回贴,因为我是文明人哦  
作者: DAIMAND    时间: 2018-8-26 23:10

谢谢楼主啊!
作者: 黄山    时间: 2018-9-24 19:12

干细胞研究重在基础
作者: 若天涯    时间: 2018-10-10 03:09

今天再看下  
作者: doc2005    时间: 2018-10-18 00:20

肌源性干细胞
作者: nauticus    时间: 2018-11-1 16:39

生殖干细胞
作者: 依旧随遇而安    时间: 2018-11-9 20:34

细胞治疗行业  
作者: heart10    时间: 2018-12-4 16:35

我仅代表干细胞之家论坛前来支持,感谢楼主!  
作者: 兔兔    时间: 2018-12-20 22:34

嘿...反了反了,,,,  
作者: sshang    时间: 2018-12-28 01:21

我又回复了  
作者: 天蓝色    时间: 2019-1-3 16:02

支持一下吧  
作者: 剑啸寒    时间: 2019-1-9 00:07

楼主,支持!  
作者: heart10    时间: 2019-2-10 22:43

好帖,有才  
作者: sky蓝    时间: 2019-3-4 23:41

dc-cik nk  
作者: yukun    时间: 2019-3-15 06:31

我是来收集资料滴...  




欢迎光临 干细胞之家 - 中国干细胞行业门户第一站 (http://stemcell8.cn/) Powered by Discuz! X1.5