Discovering Dependencies via Algorithmic Mutual Information: A Case Study in DNA Sequence Comparisons期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Discovering Dependencies via Algorithmic Mutual Information: A Case Study in DNA Sequence Comparisons

Authors:	Milosavljević Aleksandar

Affiliation:	(1) Genome Structure Group Center for Mechanistic Biology and Biotechnology, Argonne National Laboratory, 60439-4833 Argonne, Illinois;(2) Present address: CuraGen Corporation, 322 East Main Street, 06405 Branford, CT

Abstract:	Algorithmic mutual information is a central concept in algorithmic information theory and may be measured as the difference between independent and joint minimal encoding lengths of objects; it is also a central concept in Chaitin's fascinating mathematical definition of life. We explore applicability of algorithmic mutual information as a tool for discovering dependencies in biology. In order to determine significance of discovered dependencies, we extend the newly proposed algorithmic significance method. The main theorem of the extended method states thatd bits of algorithmic mutual information imply dependency at the significance level 2^–d+O(1). We apply a heuristic version of the method to one of the main problems in DNA and protein sequence comparisons: the problem of deciding whether observed similarity between sequences should be explained by their relatedness or by the mere presence of some shared internal structure, e.g., shared internal repetitive patterns. We take advantage of the fact that mutual information factors out sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. In addition to providing a general framework for sequence comparisons, we also propose an efficient way to compare sequences based on their subword composition that does not require any a priori assumptions about k-tuple length.

Keywords:	Minimal length encoding DNA sequence analysis Machine discovery Algorithmic mutual information Algorithmic significance
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏