首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We study several fundamental problems arising from biological sequence analysis. Given a sequence of real numbers, we present two linear-time algorithms, one for locating the “longest” sum-constrained segment, and the other for locating the “shortest” sum-constrained segment. These two algorithms are based on the same framework and run in an online manner, hence they are capable of handling data stream inputs. Our algorithms also yield online linear-time solutions for finding the longest and shortest average-constrained segments by a simple reduction.  相似文献   

2.
给定一棵树,树上的每个节点被赋予一对数值,它们分别表示节点的值和权重。基于权重约束的最大密度路径算法用于搜索树上的最大密度路径,即最大密度路径上所有节点的值之和与节点的权重之和的比值是所有路径中最大的。通过研究发现,现有的基于权重约束的最大密度路径算法有一定的局限性。文中提出了突破该局限性的可行性方案,进而设计并改进了基于权重约束的最大密度路径算法。  相似文献   

3.
Fast Algorithms for the Density Finding Problem   总被引:1,自引:0,他引:1  
We study the problem of finding a specific density subsequence of a sequence arising from the analysis of biomolecular sequences. Given a sequence A=(a 1,w 1),(a 2,w 2),…,(a n ,w n ) of n ordered pairs (a i ,w i ) of numbers a i and width w i >0 for each 1≤in, two nonnegative numbers , u with u and a number δ, the Density Finding Problem is to find the consecutive subsequence A(i *,j *) over all O(n 2) consecutive subsequences A(i,j) with width constraint satisfying w(i,j)=∑ r=i j w r u such that its density is closest to δ. The extensively studied Maximum-Density Segment Problem is a special case of the Density Finding Problem with δ=∞. We show that the Density Finding Problem has a lower bound Ω(nlog n) in the algebraic decision tree model of computation. We give an algorithm for the Density Finding Problem that runs in optimal O(nlog n) time and O(nlog n) space for the case when there is no upper bound on the width of the sequence, i.e., u=w(1,n). For the general case, we give an algorithm that runs in O(nlog 2 m) time and O(n+mlog m) space, where and w min=min  r=1 n w r . As a byproduct, we give another O(n) time and space algorithm for the Maximum-Density Segment Problem. Grants NSC95-2221-E-001-016-MY3, NSC-94-2422-H-001-0001, and NSC-95-2752-E-002-005-PAE, and by the Taiwan Information Security Center (TWISC) under the Grants NSC NSC95-2218-E-001-001, NSC95-3114-P-001-002-Y, NSC94-3114-P-001-003-Y and NSC 94-3114-P-011-001.  相似文献   

4.
时间序列数据的特征表示方法是时间序列数据挖掘任务的关键技术,符号聚合近似表示(SAX)是特征表示方法中比较常用的一种。针对SAX算法在各序列段表示符号一致时无法区分时间序列间的相似性这一缺陷,提出了一种基于始末距离的时间序列符号聚合近似表示方法(SAX_SM)。由于时间序列有很强的形态趋势,因此文中提出的方法选用起点和终点来表示各个序列段的形态特征,并使用各序列段的形态特征和表示符号来近似表示时间序列数据,以将其从高维空间映射到低维空间;然后,针对起点和终点构建始末距离来计算两序列段间的形态距离;最后, 结合 始末距离和符号距离定义一种新的距离度量方式,以更客观地度量时间序列间的相似性。理论分析表明,该距离度量满足下界定理。在20组UCR时间序列数据集上的实验表明,所提SAX_SM方法在13个数据集中获得了最高的分类准确率(包含并列最大的),而SAX只在6个数据集中获得了最高的分类准确率(包含并列最大的),因此SAX_SM具有比SAX更优的分类效果。  相似文献   

5.
郭强  肖尧  顾建江  李浩 《系统仿真技术》2012,8(2):158-164,168
盾构法隧道施工技术作为地下空间开发中的核心技术,已在能源、交通等领域的隧道建设中得到广泛的应用.管片拼装机是盾构掘进过程中,将开挖好的隧道表面用预制混凝土管片拼装成圆形隧道的机构.管片拼装的质量将直接对地下水土的渗透和地表沉降产生影响.管片拼装机行走横梁、提升横梁及管片座的强度将直接影响拼装机的功能及拼装过程的安全性.在明确管片拼装机工作过程中危险位置的基础上,使用有限元分析的手段分析管片拼装机固定环至行走横梁最远端、旋转环转动至90.及180°、提升横梁伸出时拼装机关键零部件行走横梁、提升横梁及管片座的力学性能.  相似文献   

6.
Mining Navigation Patterns Using a Sequence Alignment Method   总被引:2,自引:0,他引:2  
In this article, a new method is illustrated for mining navigation patterns on a web site. Instead of clustering patterns by means of a Euclidean distance measure, in this approach users are partitioned into clusters using a non-Euclidean distance measure called the Sequence Alignment Method (SAM). This method partitions navigation patterns according to the order in which web pages are requested and handles the problem of clustering sequences of different lengths. The performance of the algorithm is compared with the results of a method based on Euclidean distance measures. SAM is validated by means of user-traffic data of two different web sites. Empirical results show that SAM identifies sequences with similar behavioral patterns not only with regard to content, but also considering the order of pages visited in a sequence.  相似文献   

7.
浅谈数据挖掘技术   总被引:3,自引:0,他引:3  
本文叙述了数据挖掘技术的定义,说明了数据挖掘技术的基本方法,分析了数据挖掘技术面临的问题。  相似文献   

8.
胡新荣 《计算机工程》2006,32(3):191-192,202
提出了一种改进的投影分割算法,其基本思想是将数列平滑的方法用于曲线投影上,该方法避免了传统分割算法的缺点,不仅有效地去除了噪声,而且相对于直线检测分割方法计算量减小,又比简单的投影方法分割效果明显。  相似文献   

9.
皮鞋内腔的自动测量是当今国内外都没有解决的难题。国外通常采用接触式测量,用传感器与皮鞋内腔接触获得数据。这种测量方法存在精度不高的缺点。重庆大学ICT中心在多年ICT研究的基础上提出了用工业CT来解决长期困扰皮鞋工业的难题。皮鞋内腔CT测量仪获得皮鞋的断面图像以后,由于工业CT通过断层扫描获得的图像通常会产生伪影见图1,它是图像中与被检物的物理参数分布没有对应关系的部分,因此图像质量不是非常理想,要对重建出来的图像进行准确的分割,才能够根据分割后的图像进行测量,进而根据多个断面的  相似文献   

10.
线段是一种组成几何体的基本元素,蕴含着非常丰富的几何信息。从图像中提取完整、连续且具有语义信息的线段对恢复场景的几何结构具有重要意义。该文提出了一种多分辨率线段提取方法,并对线段进行语义分析以区分轮廓线段和纹理线段。该方法首先运用多分辨率思想进行线段提取,然后结合深度神经网络技术对线段进行语义分析,最后对线段进行聚类合并得到最终结果。在线段连续性和完整性方面,该文提出的方法与当前常用的线段提取方法相比具有明显优势;在语义分析准确性方面,该文提出的方法在测试集上的像素精度高达 97.82%。  相似文献   

11.
Parametric optimization of sequence alignment   总被引:1,自引:0,他引:1  
Theoptimal alignment or theweighted minimum edit distance between two DNA or amino acid sequences for a given set of weights is computed by classical dynamic programming techniques, and is widely used in molecular biology. However, in DNA and amino acid sequences there is considerable disagreement about how to weight matches, mismatches, insertions/deletions (indels or spaces), and gaps.Parametric sequence alignment is the problem of computing the optimal-valued alignment between two sequences as afunction of variable weights for matches, mismatches, spaces, and gaps. The goal is to partition the parameter space into regions (which are necessarily convex) such that in each region one alignment is optimal throughout and such that the regions are maximal for this property. In this paper we are primarily concerned with the structure of this convex decomposition, and secondarily with the complexity of computing the decomposition. The most striking results are the following: For the special case where only matches, mismatches, and spaces are counted, and where spaces are counted throughout the alignment, we show that the decomposition is surprisingly simple: all regions are infinite; there are at most n2/3 regions; the lines that bound the regions are all of the form =c + (c + 0.5); and the entire decomposition can be found inO(knm) time, wherek is the actual number of regions, andn相似文献   

12.
提出C语言程序关键词序列表示法,以消除由于个性化编程所造成的代码的不同,以证明程序是相同的.  相似文献   

13.
In this paper we describe and implement a parallel algorithm to find approximate solutions for the Closest String Problem (CSP). The CSP, also known as Motif Finding problem, has applications in Coding Theory and Computational Biology. The CSP is NP-hard which motivates us to think about heuristics to solve large instances. Several approximation algorithms have been designed for the CSP, but all of them have a poor performance guarantee. Recently some researchers have shown empirically that integer programming techniques can be successfully used to solve moderate-size instances (10–30 strings each of which is 300–800 characters long) of the CSP. However, real-world instances are larger than those tested. In this paper we show how a simple heuristic can be used to find near-optimal solutions to that problem. We implemented a parallel version of this heuristic and report computational experiments on large-scale instances. These results show the effectiveness of our approach.  相似文献   

14.
Given a taxonomy of events and a dataset of sequences of these events, we study the problem of finding efficient and effective ways to produce a compact representation of the sequences. We model sequences with Markov models whose states correspond to nodes in the provided taxonomy, and each state represents the events in the subtree under the corresponding node. By lumping observed events to states that correspond to internal nodes in the taxonomy, we allow more compact models that are easier to understand and visualize, at the expense of a decrease in the data likelihood. We formally define and characterize our problem, and we propose a scalable search method for finding a good trade-off between two conflicting goals: maximizing the data likelihood, and minimizing the model complexity. We implement these ideas in Taxomo, a taxonomy-driven modeler, which we apply in two different domains, query-log mining and mining of moving-object trajectories. The empirical evaluation confirms the feasibility and usefulness of our approach.  相似文献   

15.
基于纹理谱的纹理分割方法   总被引:15,自引:0,他引:15       下载免费PDF全文
纹理分析是图象处理中的一个重要领域。本文提出一种基于纹理谱特征分割纹理图象的方法。它首次将纹理谱特征与区域生长算法结合起来,从而实现了无监督的纹理分割。纹理谱特征具有对方向性敏感等优点,基于纹理谱的纹理图象分割取得了良好效果。  相似文献   

16.
FlexRay通信协议的总线周期优化   总被引:2,自引:1,他引:1  
FlexRay是一种具有高带宽、确定性和可靠性的车载网络通信协议,能够满足未来先进汽车高速控制的需要。为了更好地在汽车上使用FlexRay总线协议,根据特定的应用场合,对总线上消息的时间分析和总线周期的优化配置十分重要。对FlexRay周期调度进行了研究,基于二维装箱法和NSGA-II算法提出对在静态段和动态段传输的FlexRay消息进行时间分析的方法,最后将其整合成一个总线优化配置算法。通过优化算法对FlexRay总线周期进行配置,显著提高了 FlexRay总线的带宽利用率,在实际应用中增强了系统的时间  相似文献   

17.
In comparative genomics, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given d genomic maps as sequences of gene markers, the objective of MSR-d is to find d subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant d≥2, a polynomial-time 2d-approximation for MSR-d was previously known. In this paper, we show that for any d≥2, MSR-d is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating MSR-d for all d≥2. In particular, we show that MSR-d is NP-hard to approximate within Ω(d/logd). From the other direction, we show that the previous 2d-approximation for MSR-d can be optimized into a polynomial-time algorithm even if d is not a constant but is part of the input. We then extend our inapproximability results to several related problems including CMSR-d, δ-gap-MSR-d, and δ-gap-CMSR-d.  相似文献   

18.
Term equations involving individual and sequence variables and sequence function symbols are studied. Function symbols can have either fixed or flexible arity. A sequence variable can be instantiated by any finite sequence of terms. A sequence function abbreviates a finite sequence of functions all having the same argument lists. It is proved that solvability of systems of equations of this form is decidable. A new unification procedure that enumerates a complete almost minimal set of solutions is presented, together with variations for special cases. The procedure terminates if the solution set is finite. Applications in various areas of artificial intelligence, symbolic computation, and programming are discussed.  相似文献   

19.
混合积判断线段相交的方法分析   总被引:2,自引:0,他引:2  
判断线段相交是许多算法都要解决的问题,介绍了一种利用混合积来判断线段相交的方法,并给出了这种方法的严密数学基础和基于V isual C++的编程实现过程。对其中所用到的定理给予严密的数学证明。该方法形式简单,运算量较小,适用范围广,乃判断线段相交的理想方法之一。  相似文献   

20.
DNA序列分割作为DNA序列分析中的一部分正受到越来越多人的关注,引入数据挖掘技术是提高DNA序列分割有效性的一个重要途径。全面综述了目前数据挖掘技术在DNA序列分割中的应用,最后指出了尚待解决的问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号