大数据与未来医学 (Big Data and the Future Medicine)

数据库的容量越来越大,从当初的千级(Kilobyte1024 × byte)),百万级(Megabyte1024 × KB)),到现在动辄十亿级(Gigabyte1024 × MB))再到兆级、千兆级(Terabyte1024 × GB)、Petabyte1024 × TB)),以致常规的单核计算机,甚至并行机所具有的组织和操作手段也日渐不能有效地对大数据进行处理。典型的例子是人工智能围棋程序,AlfaGo,也放弃了用完全搜索的方法对数据进行搜集、处理(exhaustive search is infeasible [1])。

The sizes of database increase dramatically from initially Kilobyte order to currently Gigabyte (1024 × MB), Terabyte (1024 × GB), and Petabyte (1024 × TB). It is so big that even a single CPU would be difficult to effectively handle the big data. A typical example for this is that the artificial intelligence (AI) of a computer Go program, AlfaGo, gives up the way of exhaustive searching and use network methods enhanced by self-evaluation simulation.[1]   

数据库容量大了,数据也便捷有效地应用到了社会的方方面面,标志人类社会真正进入了信息社会,知识的储备和应用也发生了根本性的改变。既往个人要记住一些知识不容易,要综合应用这些知识更是要用大量的时间在不同书籍或图书馆中搜寻。由于个人知识储备有限,能够精专某一很狭窄的领域就很不错了。现在,当人类进入信息社会后,这种在写作、设计前,要准备大量的读书笔记,文摘、数据卡片的经历,成了过往时代的记忆。

The great advance of large database and AI allows us to accumulate information in all domains and apply the comprehensive information in a fast and practical way. This information processing way is fundamentally changing current human knowledge system. 

既然大数据和人工智能使人类能够在允许的时间内全方位搜集、整理某一专业各个层次的知识,并且能高速、大量更新和分析输入数据,这种与以往人类知识构建和应用完全不同的方式标志着,人类到了重新审视和构建人类既有知识体系的时候。

医学面临同样的挑战和机遇。既往通过认识一个疾病又一个疾病积累起来的医学模式,在全方位人体信息的框架下,已经表现出不能及时、准确地维护人体健康的情况。医学的实践和进展已经表明,人群的疾病状态不应该以一个又一个的病名来规范,因为同一个病名往往可以由截然不同的病因产生。例如,导致糖尿病产生的原因,既可是能量输入/出异常,也可以是感染性的[23]。这反映出,疾病实际上是在不同的个体差异情况下,机体表现出的系列动态病理生理和病理解剖状态[4]。大数据所构成的系统,正好能够反映出这种动态的,具有个体差异的病理变化。这样,从疾病的概念到处理方式,都需要重新审视了。

Medicine faces the same challenge and opportunity. The current pattern of medicine consists of various groups of diseases. This disease-oriented pattern involves two problems. First, it usually does not indicate the correct etiology. Second, it does not emphasize the integrative effects of pathogens on the whole living system. For example of type 2 diabetes, it can be caused by abnormality of metabolic input/output, or be caused by infections.[2, 3] Furthermore, there are many risk factors that impact on different organs and parts of body and contribute to the eventual pathology of type 2 diabetes.[4] Disease is actually the dynamic pathoanatomical and pathophysiological process caused by various harmful factors. The organization principles and analytical methods of big database are coincident with the above re-considered aspects of disease. Therefore, future medicine should take advantage of the progresses of AI and database system. 

大数据和人工智能能够全方位地积累和组织相关知识,使我们可以围绕生命的根本原理来组织大数据,从新规划医学。刺激和反应(stimulus and response)特性是生命的最基本表现方式之一[5]。刺激是作用于生物体的环境变化,反应是生物体对该刺激所表现出的变化。这样我们可以把疾病定义为,疾病是综合有害刺激居于优势地位时,机体所表现出的病理反应。这一病理反应是机体由稳态(homeostasis)转变到失代偿(decompensation)的动态过程。

Large database makes it possible that we can accumulate information in all domains from the essence of living system to the promotion of human health. The comprehensive information will guide us to review the current pattern of disease-oriented medicine. The essential living pattern is stimulus-response or sensitivity.[5] The stimulus is the environmental changes to the living individual. The response is the structural and functional changes of living individual to the stimulus. Accordingly, we can redefine the concept of disease in this way: Disease is the pathological responses to the dominated harmful stimuli. The pathological responses reflect the body status from homeostasis to decompensation. 

这一疾病的定义也能准确描述环境和基因的关系。基因是生命体的信息模板,生物体把生命结构信息和既往与生命密切相关的刺激和反应信息以DNA碱基排列顺序的方式储存起来,并以前馈(feedforward)的方式预先规划生命的发生和生命的基础功能活动;在随后生活中,又以反馈(feedback)方式通过对基因结构和表达的调节来参与调定刺激和反应活动。基因的信息储存功能看来没有好坏选择。只要刺激长期存在,并与生命活动密切相关,基因在顺序和表达上都能产生改变并影响机体结构和功能。以糖尿病为例,研究发现,父系动物长期高脂饮食不仅损害父系动物代谢功能,子系正常饮食的动物也表现出代谢障碍的高危发病率[6-8]。而父系膳食的改善和运动训练对父系和子系的代谢功能都有显著的帮助[10] 。虽然刺激对DNA碱基排列顺序的影响尚无直接的实验室证据,但大量证据表明,碱基变异与疾病状态显著相关,并有明显的个体差异[1112]

This definition of disease also can well describe the environment-gene relation. Gene is an information template for earth life. The life structural information and basic functional information can be viewed as the original and vital response information to stimuli, which are stored in the manner of DNA sequence in gene. The genetic information guides initial organism responses: the formation of basic structure and function so that the newborn organism can adapt to both internal and external environments.  Gene also can store information for long-term stimuli after birth in genetic and epigenetic ways. Interestingly, both good and bad stimuli can be stored as genetic responsive information. 

Again taking type 2 diabetes as an example, several laboratories report that long-term parental high-fat diet not only impairs subjects’ metabolism but also weaken the offspring’s beta cell functions.[6-8] Mansuy group (2014) reports that the effects of traumatic stress on mice of early life alter microRNAs expression, behavioral, and metabolic responses in both experimental animals and their progeny. Injection of sperm RNAs from these males into fertilized wild-type oocytes reproduced the behavioral and metabolic alterations in the resulting offspring.[9] On the other hand, researches show that healthy diet or exercise training not only have beneficial effects on both healthy and T2D individuals but also have beneficial effects on their offspring.[10] 

Even though there are no direct evidences showing the changes of DNA sequence to stimuli, it is generally recognized that gene variation is highly correlated with long-term disease status.[11] Report also shows that a large number of undocumented genetic regions exist in individual human genome sequences and these regions can be identified by very deep sequencing and de novo assembly.[12] 

依据刺激与反应关系重新定义的疾病概念,将来的医学模式就应该是规划医学(Planning Medicine)。规划医学是依据医学原理,把人体全方位的有利和有害刺激信息整理、组织成相关数据库,再根据各个个体具体的结构和功能情况,与数据库对比,通过相关性和因果性分析,提取出个体相关的发生、发展信息。在专业医学规划师的指导下,制定出保留、发展有利刺激,去除、避免有害刺激的医学规划。这样的医学规划将使大部分的疾病避免发生,一些疾病即使不可避免,也能使其发生时间延迟,发展过程延缓。这将从根本上提高健康水平,减轻个体和社会的医疗负担。

According to the above redefinition of disease, the future medicine should be the planning medicine. The planning medicine applies medical principles as guidance to create relational database for the health related stimuli in all domains under each hierarchical level of the living system. The health information for each individual will then be compared with the database and correlation and causal analyses will be conducted to refine the individualized personal plan. With the professional helps of the future medical planner, each person will develop behaviors to enhance healthy stimuli and avoid harmful stimuli so that many diseases can be avoided or the onset and development of other diseases can be significantly delayed or attenuated. 

依据目前认识,个体的医学规划可以分为三阶段:

1.      胚胎规划期。这一期包括二个阶段。第一阶段是对父母的基因储存信息进行分析,并对父母进行扬长避短的训练,使其处于制造胚胎的最佳状态。第二阶段是对母体和胚胎的结构和功能信息进行分析,制定出最佳胚胎发育方案。

2.      发育规划期。这一阶段的医学规划是关键。这是从胚胎期被动反应基础上逐步发展出主动行为反应。同时在婴幼儿发育期,存在着机体各个结构和功能发展的关键期,错过了关键期,再维护或纠正就很困难。同样在这一期,可以继承和发扬父辈的信息优势,改良父辈的信息不良倾向,并培养、发展出自身的信息优势,维护稳态的良好状态。这一阶段,同样为今后的子辈做好了准备。

3.      成体规划期。这一阶段的要点是将主动的医学规划行为发展成习惯,并根据更新的医学信息作相应的调整。

In summary, there are three key periods to conduct medical intervention on the individualized personal health plan.

1.      The embryonic intervention period: As mentioned earlier, both parental original genetic information and information for their general behaviors can be inherited to the upcoming generation. Parents, therefore, need to adjust their living behaviors so to reach their physical, mental, and social well-beings before the formation of an embryo. During pregnancy, the mother and the embryo will follow the plan under the guidance of medical planner.

2.      The childhood intervention period: Because adverse stimuli not only directly impair body structures and functions, but also cause adaptive changes of gene activities, it is crucial to train children to actively develop healthy behaviors in order to avoid the risk factors and gain healthy conditions. The plasticity in children development is much more flexible than that in adults. It will also help the upcoming parents for the new generations. This period is central to ensure the best effects for the whole lifetime.

3.      The adulthood intervention period: After a child successfully develops healthy behaviors, the following medical intervention will be relatively easy. At adulthood, the individual mainly needs to maintain those healthy behaviors and adjust or develop new behaviors according to updated information from database.  

参考资料

1Silver D, Huang A, Maddison CJ, et al. (2016) Mastering the game of Go with deep neural networks and tree search. Nature. 529:484-9.

2Lionetti L, Mollica MP, Lombardi A, et al. (2009) From chronic overnutrition to insulin resistance: The role of fat-storing capacity and inflammation. Nutr. Metab. Cardiovasc. Dis. 19:146-152.

3King GL. (2008) The role of inflammatory cytokines in diabetes and its complications. J Periodontol. 79(8 Suppl):1527-34.

4Leahy JL (2005) Pathogenesis of type 2 diabetes mellitus. Arch Med Res. 36:197-209.

5Mason KA, Losos JB, and Singer SR. The Science of Biology. In: Biology. 11 edition. pp.1-16. McGraw-Hill Education, New York, 2017

6Dunn GA & Bale TL. (2009) Maternal High-Fat Diet Promotes Body Length Increases and Insulin Insensitivity in Second-Generation Mice. Endocrinology. 150: 4999–5009.

7Ng SF, Lin RC, Laybutt DR, et al (2010). Chronic high-fat diet in fathers programs β-cell dysfunction in female rat offspring. Nature. 467:963-6.

8de Castro Barbosa T, Ingerslev LR, Alm PS, et al (2015) High-fat diet reprograms the epigenome of rat spermatozoa and transgenerationally affects metabolism of the offspring. Mol Metab. 5(3):184-97.

9Gapp K, Jawaid A, Sarkies P et al. (2014) Implication of sperm RNAs in transgenerational inheritance of the effects of early trauma in mice. Nat Neurosci. 17:667-9.

10Barrès R1, Zierath JR. (2016) The role of diet and exercise in the transgenerational epigenetic landscape of T2DM. Nat Rev Endocrinol. 12:441-51.

11McCarthy MI. (2015) Genomic medicine at the heart of diabetes management. Diabetologia. 58:1725-9.

12Chen R, Mias GI, Li-Pook-Than J, et al. (2012) Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 148:1293-307.

此条目发表在 研究 分类目录。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已被标记为 *

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>