首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
The MEME algorithm extends the expectation maximization (EM) algorithm for identifying motifs in unaligned biopolymer sequences. The aim of MEME is to discover new motifs in a set of biopolymer sequences where little or nothing is known in advance about any motifs that may be present. MEME innovations expand the range of problems which can be solved using EM and increase the chance of finding good solutions. First, subsequences which actually occur in the biopolymer sequences are used as starting points for the EM algorithm to increase the probability of finding globally optimal motifs. Second, the assumption that each sequence contains exactly one occurrence of the shared motif is removed. This allows multiple appearances of a motif to occur in any sequence and permits the algorithm to ignore sequences with no appearance of the shared motif, increasing its resistance to noisy data. Third, a method for probabilistically erasing shared motifs after they are found is incorporated so that several distinct motifs can be found in the same set of sequences, both when different motifs appear in different sequences and when a single sequence may contain multiple motifs. Experiments show that MEME can discover both the CRP and LexA binding sites from a set of sequences which contain one or both sites, and that MEME can discover both the –10 and –35 promoter regions in a set of E. coli sequences.  相似文献   

2.
The goal of motif discovery algorithms is to efficiently find unknown recurring patterns. In this paper, we focus on motif discovery in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least super-linear time and space complexity. In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process. The paper then provides two algorithms called MCFull and MCInc for efficiently solving the constrained motif discovery problem. We also show that most unconstrained motif discovery problems be converted into constrained ones using a change-point detection algorithm. A novel change-point detection algorithm called the Robust Singular Spectrum Transform (RSST) is then introduced and compared to traditional Singular Spectrum Transform using synthetic and real-world data sets. The results show that RSST achieves higher specificity and is more adequate for finding constraints to convert unconstrained motif discovery problems to constrained ones that can be solved using MCFull and MCInc. We then compare the combination of RSST and MCFull or MCInc with two state-of-the-art motif discovery algorithms on a large set of synthetic time series. The results show that the proposed algorithms provided four to ten folds increase in speed compared the unconstrained motif discovery algorithms studied without any loss of accuracy. RSST+MCFull is then used in a real world human-robot interaction experiment to enable the robot to learn free hand gestures, actions, and their associations by watching humans and other robots interacting.  相似文献   

3.
随着生物信息学的发展,模体识别已经成为一种能够从生物序列中提取有用生物信息的方法。文中介绍了有关模体的一些概念,讨论了模体识别算法(MEME)的基础,即EM(expectation maximization)算法,由于MEME算法是建立在EM算法的基础上的,所以又由此引出了MEME算法,并对MEME算法的一些基本问题比如时间复杂度、算法性能等进行了详细讨论,对算法的局限性和有待改进的地方作了说明。实践证明,MEME是一个较好的模体识别算法,它能够识别出蛋白质或者DNA序列中单个或多个模体,具有很大的灵活性。  相似文献   

4.

In this paper, recent algorithms are suggested to repair the issue of motif finding. The proposed algorithms are cuckoo search, modified cuckoo search and finally a hybrid of gravitational search and particle swarm optimization algorithm. Motif finding is the technique of handling expressive motifs successfully in huge DNA sequences. DNA motif finding is important because it acts as a significant function in understanding the approach of gene regulation. Recent results of existing motifs finding programs display low accuracy and can not be used to find motifs in different types of datasets. Practical tests are implemented first on synthetic datasets and then on benchmark real datasets that are based on nature-inspired algorithms. The results revealed that the hybridization of gravitational search algorithm and particle swarm algorithms provides higher precision and recall values and provides average enhancement of F-score up to 0.24, compared to other existing algorithms and tools, and also that cuckoo search and modified cuckoo search have been able to successfully locate motifs in DNA sequences.

  相似文献   

5.
基于不同算法的Motif预测比较分析与优化   总被引:2,自引:1,他引:1       下载免费PDF全文
张斐  谭军  谢竞博 《计算机工程》2009,35(22):94-96
研究转录因子结合位点(TFBs)的主要预测模型及其预测的算法,通过基于调控元件预测的3种代表性的算法MEME、Gibbs采样和Weeder预测拟南芥基因组。比较结果表明,Gibbs采样算法和Weeder算法预测长、短motif效率较高。重点分析MEME算法,提出结合不同算法查找motif的优化方法,并以实验验证该方法能有效提高预测效率。  相似文献   

6.
7.
ContextThe Next Release Problem involves determining the set of requirements to implement in the next release of a software project. When the problem was first formulated in 2001, Integer Linear Programming, an exact method, was found to be impractical because of large execution times. Since then, the problem has mainly been addressed by employing metaheuristic techniques.ObjectiveIn this paper, we investigate if the single-objective and bi-objective Next Release Problem can be solved exactly and how to better approximate the results when exact resolution is costly.MethodsWe revisit Integer Linear Programming for the single-objective version of the problem. In addition, we integrate it within the Epsilon-constraint method to address the bi-objective problem. We also investigate how the Pareto front of the bi-objective problem can be approximated through an anytime deterministic Integer Linear Programming-based algorithm when results are required within strict runtime constraints. Comparisons are carried out against NSGA-II. Experiments are performed on a combination of synthetic and real-world datasets.FindingsWe show that a modern Integer Linear Programming solver is now a viable method for this problem. Large single objective instances and small bi-objective instances can be solved exactly very quickly. On large bi-objective instances, execution times can be significant when calculating the complete Pareto front. However, good approximations can be found effectively.ConclusionThis study suggests that (1) approximation algorithms can be discarded in favor of the exact method for the single-objective instances and small bi-objective instances, (2) the Integer Linear Programming-based approximate algorithm outperforms the NSGA-II genetic approach on large bi-objective instances, and (3) the run times for both methods are low enough to be used in real-world situations.  相似文献   

8.
Discovering approximately recurrent motifs (ARMs) in timeseries is an active area of research in data mining. Exact motif discovery is defined as the problem of efficiently finding the most similar pairs of timeseries subsequences and can be used as a basis for discovering ARMs. The most efficient algorithm for solving this problem is the MK algorithm which was designed to find a single pair of timeseries subsequences with maximum similarity at a known length. This paper provides three extensions of the MK algorithm that allow it to find the top K similar subsequences at multiple lengths using both the Euclidean distance metric and scale invariant normalized version of it. The proposed algorithms are then applied to both synthetic data and real-world data with a focus on discovery of ARMs in human motion trajectories.  相似文献   

9.
在DNA序列中查找基序是生物信息学中一个重要的计算问题,人们针对这一计算问题提出了多种模型和算法.由于DNA序列数据的复杂性,在其中有许多是比强信号基序更难提取的弱信号基序.而目前植入(ι,d)基序问题(PMP)和扩展植入(ι,d)基序问题(EMP)是较适合模拟弱信号基序查找的问题模型.本文归纳分析了基序查找的基本方法、策略和基序模型,指出了各种策略和模型的优势与不足.在此基础上对现有的基于植入基序查找问题模型的主要弱信号基序查找算法进行了分析和实验评估,为选择计算方法查找弱基序信号提供了参考,并讨论了该方向上尚未解决的问题和发展趋势.  相似文献   

10.
张斐 《微机发展》2011,(10):171-175
主要研究了如何评价蛋白质家族Motifs预测算法的预测结果,目的是在对传统的算法预测问题分析优化的基础上,制定新的评价策略。主要方法是通过对MEME算法和PKG算法预测结果的比较分析,计算同一家族中Motifs的敏感性和特异性并比较它们对应的ROC曲线,确定真实的Motifs,进而获得该蛋白质家族的最佳Motifs的模型。实验结果表明这种评价策略可用于算法对蛋白质家族Motifs预测结果的评价,还可利用确定的最佳Motifs搜索数据库来预测蛋白质家族中其他的Motifs。  相似文献   

11.
The identification of overrepresented motifs in a collection of biological sequences continues to be a relevant and challenging problem in computational biology. Currently popular methods of motif discovery are based on statistical learning theory. In this paper, a machine-learning approach to the motif discovery problem is explored. The approach is based on a Self-Organizing Map (SOM) where the output layer neuron weight vectors are replaced by position weight matrices. This approach can be used to characterise features present in a set of sequences, and thus can be used as an aid in overrepresented motif discovery. The SOM approach to motif discovery is demonstrated using biological sequence datasets, both real and simulated  相似文献   

12.
余胜  曾接贤  谢莉 《计算机工程》2012,38(24):216-219
为有效提取和描述图像特征,提高图像检索性能,提出一种基于纹理、颜色和形状多特征融合的图像检索算法。检测彩色图像的边缘,对其进行变换得到基元图像。遍历基元图像得到基元共生矩阵,对每个基元求梯度值得到基元梯度直方图。将彩色图像量化到64色颜色空间,得到对应的颜色直方图。利用上述3个特征量描述图像特征,并用于图像检索。实验结果表明,与BCTF和MCM算法相比,该算法的查全率和查准率较高,计算复杂度较低。  相似文献   

13.
Protein structural motif detection has important applications in structural genomics. Compared with sequence motifs, structural motifs are more sensitive in revealing the evolutionary relationships among proteins. A variety of algorithms have been proposed to attack this problem. However, they are either heuristic without theoretical performance guarantee, or inefficient due to employing exhaustive search strategies. This paper studies a reasonably restricted version of this problem: the compact structural motif problem. We prove that this restricted version is still NP-hard, and we present a polynomial-time approximation scheme to solve it. This is the first approximation algorithm with a guaranteed ratio for the protein structural motif problem. 1  相似文献   

14.
Motif patterns consisting of sequences of intermixed solid and don’t-care characters have been introduced and studied in connection with pattern discovery problems of computational biology and other domains. In order to alleviate the exponential growth of such motifs, notions of maximal saturation and irredundancy have been formulated, whereby more or less compact subsets of the set of all motifs can be extracted, that are capable of expressing all others by suitable combinations. In this paper, we introduce the notion of maximal irredundant motifs in a two-dimensional array and develop initial properties and a combinatorial argument that poses a linear bound on the total number of such motifs. The remainder of the paper presents approaches to the discovery of irredundant motifs both by offline and incremental algorithms.  相似文献   

15.
16.
We present an algorithm for maintaining the biconnected components of a graph during a sequence of edge insertions and deletions. It requires linear storage and preprocessing time. The amortized running time for insertions and for deletions isO(m 2/3 ), wherem is the number of edges in the graph. Any query of the form ‘Are the verticesu andv biconnected?’ can be answered in timeO(1). This is the first sublinear algorithm for this problem. We can also output all articulation points separating any two vertices efficiently. If the input is a plane graph, the amortized running time for insertions and deletions drops toO(√n logn) and the query time isO(log2 n), wheren is the number of vertices in the graph. The best previously known solution takes timeO(n 2/3 ) per update or query.  相似文献   

17.
18.
Motif在转录和后转录水平的基因表达调控中起着重要的作用。目前,识别Motif的算法和相应的软件已有不少,但是却鲜有对各种算法及软件性能共同评测的研究和报告。介绍了算法的分类以及三种常见的Motif识别算法Wordup,MM和Gibbs采样,并对AlignACE,MEME,MotifSampler,Weeder等13种Motif寻找软件进行性能比较分析。通过生物学意义的研究和性能比较结果可以得出:由于唯有Weeder算法考虑了Motif保守核心位置,因而它在各种软件中识别效果较好;大部分算法只考虑简单而  相似文献   

19.
20.
(l,d)-模体识别问题的遗传优化算法   总被引:1,自引:0,他引:1  
转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过程,将输入序列中的l-mers投影到k维(k相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号