首页 | 本学科首页   官方微博 | 高级检索  
     

联合聚类非线性相关的时序基因表达数据
引用本文:闫雷鸣,孙志挥,吴英杰,张柏礼.联合聚类非线性相关的时序基因表达数据[J].计算机研究与发展,2008,45(11).
作者姓名:闫雷鸣  孙志挥  吴英杰  张柏礼
作者单位:东南大学计算机科学与工程学院,南京,210096
摘    要:为聚类非线性相关的数据对象,引入广义信息论中二次互信息作为相似性度量,利用矩阵理论降低了二次互信息的计算量,并结合滑动窗口技术,建立了一种时序数据非线性相关模型.在此基础上提出了适用于时序基因表达数据的确定性联合聚类算法MI-TSB.该算法将时序数据转化为抽象字符序列,然后插入到MI-泛化后缀树中,避免了穷举各种组合,从而快速索引全部聚类结果.实验结果显示MI-TSB算法具有良好的运行性能,成功聚类出非线性相关的对象;利用Gene Ontology对聚类结果进行基因注释,也验证了聚类结果的生物学意义.

关 键 词:二次互信息  非线性相关  联合聚类  生物信息学  基因表达数据

Biclustering Nonlinearly Correlated Time Series Gene Expression Data
Yan Leiming,Sun Zhihui,Wu Yingjie,Zhang Baili.Biclustering Nonlinearly Correlated Time Series Gene Expression Data[J].Journal of Computer Research and Development,2008,45(11).
Authors:Yan Leiming  Sun Zhihui  Wu Yingjie  Zhang Baili
Affiliation:Yan Leiming,Sun Zhihui,Wu Yingjie,, Zhang Baili (School of Computer Science & Engineering,Southeast University,Nanjing 210096)
Abstract:The biclustering algorithms focus on clustering correlated patterns in sub-spaces. However, most of the biclustering algorithms nowadays address only the linearly correlated pattern or a certain linearly similar pattern, leaving the nonlinearly correlated patterns untouched, which are often hidden in a great many of real data sets. In this paper, a novel biclustering algorithm called MI-TSB is proposed to find and report all nonlinearly correlated patterns in time series gene expression data. It first deduc...
Keywords:quadratic mutual information  nonlinear correlation  biclustering  bioinformactics  gene expression data  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号