首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O(2|Σ|4|V|n2) and O(2|Σ|4|V|n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(3|V|n2) time and O(2|V|n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(2|V|log|V|n2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of 2|Σ|=400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.  相似文献   

2.
Constraints have played a central role in cp because they capture key substructures of a problem and efficiently exploit them to boost inference. This paper intends to do the same thing for search, proposing constraint-centered heuristics which guide the exploration of the search space toward areas that are likely to contain a high number of solutions. We first propose new search heuristics based on solution counting information at the level of individual constraints. We then describe efficient algorithms to evaluate the number of solutions of two important families of constraints: occurrence counting constraints, such as alldifferent, and sequencing constraints, such as regular. In both cases we take advantage of existing filtering algorithms to speed up the evaluation. Experimental results on benchmark problems show the effectiveness of our approach. An earlier version of this paper appeared as [24].  相似文献   

3.
We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way.  相似文献   

4.
Signature-based intrusion detection is required to inspect network traffic at wire-speed. Matching packet payloads against patterns specified with regular expression is a computation intensive task. Hence, the design of hardware accelerator to speed up regular expression matching has been an active research area. A systematic approach to detect regular expression is based on finite automaton. The space-time trade-off between deterministic finite automaton (DFA) and non-deterministic finite automaton (NFA) is well-known. DFA can offer constant throughput but it may suffer from the state explosion problem. Hence, implementation of DFA for large pattern sets on embedded device with limited on-chip memory may not be viable. NFA requires linear space but the throughput can be very low. Implementations of NFA with hardwired circuits can overcome the speed deficiency by exploiting the massive parallelism offered by dedicated hardware circuitries, but this approach does not support efficient dynamic updates. In this paper, we shall present a memory-based architecture for the implementation of NFA to speed up regular expression matching for signature-based intrusion detection. The proposed method supports dynamic updates and offers constant throughput so that it can be used to supplement the existing DFA-based methods in handling large pattern sets.  相似文献   

5.
Previous studies on mining sequential patterns have focused on temporal patterns specified by some form of propositional temporal logic. However, there are some interesting sequential patterns, such as the multi-sequential patterns, whose specification needs a more expressive formalism, the first-order temporal logic. Multi-sequential patterns appear in different application contexts, for instance in spatial census data mining, which is the target application of the study developed in this paper. We extend a well-known user-controlled tool, based on regular expressions constraints, to the multi-sequential pattern context. This specification tool enables the incorporation of user focus into the mining process. We present MSP-Miner, an Apriori-based algorithm to discover all frequent multi-sequential patterns satisfying a user-specified regular expression constraint.  相似文献   

6.
Martin Richards 《Software》1979,9(7):527-534
This paper describes a simple compiler and interpreter for a finite state machine recognizer of patterns represented by regular expressions. The algorithm is designed to be compact and to require little work space.  相似文献   

7.
Distributed constraint satisfaction problems (DisCSPs) are composed of agents, each holding its own variables, that are connected by constraints to variables of other agents. Due to the distributed nature of the problem, message delay can have unexpected effects on the behavior of distributed search algorithms on DisCSPs. This has been recently shown in experimental studies of asynchronous backtracking algorithms (Bejar et al., Artif. Intell., 161:117–148, 2005; Silaghi and Faltings, Artif. Intell., 161:25–54, 2005). To evaluate the impact of message delay on the run of DisCSP search algorithms, a model for distributed performance measures is presented. The model counts the number of non concurrent constraints checks, to arrive at a solution, as a non concurrent measure of distributed computation. A simpler version measures distributed computation cost by the non-concurrent number of steps of computation. An algorithm for computing these distributed measures of computational effort is described. The realization of the model for measuring performance of distributed search algorithms is a simulator which includes the cost of message delays. Two families of distributed search algorithms on DisCSPs are investigated. Algorithms that run a single search process, and multiple search processes algorithms. The two families of algorithms are described and associated with existing algorithms. The performance of three representative algorithms of these two families is measured on randomly generated instances of DisCSPs with delayed messages. The delay of messages is found to have a strong negative effect on single search process algorithms, whether synchronous or asynchronous. Multi search process algorithms, on the other hand, are affected very lightly by message delay.  相似文献   

8.
The reconstruction of founder genetic sequences of a population is a relevant issue in evolutionary biology research. The problem consists in finding a biologically plausible set of genetic sequences (founders), which can be recombined to obtain the genetic sequences of the individuals of a given population. The reconstruction of these sequences can be modelled as a combinatorial optimisation problem in which one has to find a set of genetic sequences such that the individuals of the population under study can be obtained by recombining founder sequences minimising the number of recombinations. This problem is called the founder sequence reconstruction problem. Solving this problem can contribute to research in understanding the origins of specific genotypic traits. In this paper, we present large neighbourhood search algorithms to tackle this problem. The proposed algorithms combine a stochastic local search with a branch-and-bound algorithm devoted to neighbourhood exploration. The developed algorithms are thoroughly evaluated on three different benchmark sets and they establish the new state of the art for realistic problem instances.  相似文献   

9.
We present parallel algorithms for constructing and maintaining balancedm-way search trees. These parallel algorithms have time complexity O(1) for ann processors configuration. The formal correctness of the algorithms is given in detail.  相似文献   

10.
在化学化工研究中,Markush结构越来越多地用于申请专利保护和报告新发明成果。随着化学化工专利数目的大量增加,对化学化工专利文献的存储和检索的需求日益殷切。为了建立可检索的化学化工专利数据库,迫切需要有效地解决化学化工专利文献中所涉及的Markush结构和其他非确定结构的计算机表示与检索问题。而检索算法是与结构表达紧密联系的。本文在总结前人工作基础上,提出了广义的非确定性化学结构(包括Markush结构)的计算机表达方法。与之相对应的非确定结构的结构识别算法已经实现并在优化中。  相似文献   

11.
12.
Optimal algorithms for the online time series search problem   总被引:1,自引:0,他引:1  
In the problem of online time series search introduced by El-Yaniv et al. (2001) [1], a player observes prices one by one over time and shall select exactly one of the prices on its arrival without the knowledge of future prices, aiming to maximize the selected price. In this paper, we extend the problem by introducing profit function. Considering two cases where the search duration is either known or unknown beforehand, we propose two optimal deterministic algorithms respectively. The models and results in this paper generalize those of El-Yaniv et al. (2001) [1].  相似文献   

13.
一种十字形运动搜索算法   总被引:2,自引:0,他引:2  
近几年,虽然运动估计算法有了很多种的快速算法,但是,运动搜索巨大的计算量依然是视频压缩速率的瓶颈,本文针对运动矢量的分布特点,提出了一种新的运动搜索算法,算法不仅结构简单,而且测试结果表明,该算法比原有DS9(dia-mondsearch)算法在搜索点数和图像质量方面有较大的提高,最好时的搜索点数只有DS算法的3/4。  相似文献   

14.
Database systems are becoming increasingly popular for answering queries. Partial-match search queries are an important class of queries in such a system. Several storage structures have been proposed to answer these queries efficiently. The BD tree is an example of such a storage structure. A previous study indicated that the k-d tree performance is better than that of the BD tree for partial-match search queries. A recent paper reported some improved algorithms. However, it is unclear whether the improved algorithms show the BD tree in a favourable light for partial-match search queries. This paper explores the performance of these algorithms and compares their performance to that of the k-d tree. Since the BD tree construction process uses some heuristics to make it a better balanced tree, this paper also evaluates the effect of these heuristics on the partial-match search algorithms. The major conclusions of this study are that the BD tree performance for partial-match search is better than that of the k-d tree when an improved algorithm is used for partial-match search, and only the DZ expression rearrangement heuristic has substantial effect on partial-match search performance.  相似文献   

15.
This paper presents a new polygonal approximation method using ant colony search algorithm. The problem is represented by a directed graph such that the objective of the original problem becomes to find the shortest closed circuit on the graph under the problem-specific constraints. A number of artificial ants are distributed on the graph and communicate with one another through the pheromone trails which are a form of the long-term memory guiding the future exploration of the graph. The important properties of the proposed method are thoroughly investigated. The performance of the proposed method as compared to those of the genetic-based and the tabu search-based approaches is very promising.  相似文献   

16.
Backtrack algorithms are applicable to a wide variety of problems. An efficient but readable version of such an algorithm is presented and its use in the problem of finding the maximal common subgraph of two graphs is described. Techniques available in this application area for ordering and pruning the backtrack search are discussed. This algorithm has been used successfully as a component of a program for analysing chemical reactions and enumerating the bond changes which have taken place.  相似文献   

17.
In this paper we consider the problem of computing an mRNA sequence of maximal similarity for a given mRNA of secondary structure constraints, introduced by Backofen et al. in [R. Backofen, N.S. Narayanaswamy, F. Swidan, On the complexity of protein similarity search under mRNA structure constraints, in: Proceedings of the Annual Symposium of Theoretical Aspects of Computer Science, in: LNCS, vol. 2285, Springer, 2002, pp. 274-286] denoted as the MRSO problem. The problem is known to be NP-complete for planar associated implied structure graphs of vertex degree at most 3. In [G. Blin, G. Fertin, D. Hermelin, S. Vialette, Fixed-parameter algorithms for protein similarity search under mRNA structure constraints, in: Proceedings of Graph-Theoretical Concepts in Computer Science, in: LNCS, vol. 3787, Springer, 2005, pp. 271-282] a first polynomial dynamic programming algorithms for MRSO on implied structure graphs with maximum vertex degree 3 of bounded cut-width is shown.We give a simple but much more general polynomial dynamic programming solution for the MRSO problem for associated implied structure graphs of bounded clique-width. Our result implies that MRSO is polynomial for graphs of bounded tree-width, co-graphs, P4-sparse graphs, and distance hereditary graphs. Further we conclude that the problem of comparing two solutions for MRSO is hard for the class , which is defined as the set of problems which can be solved in polynomial time with a number of parallel queries to an oracle in NP.  相似文献   

18.
Training neural networks (NNs) is a complex task of great importance in the supervised learning area. However, performance of the NNs is mostly dependent on the success of training process, and therefore the training algorithm. This paper addresses the application of harmony search algorithms for the supervised training of feed-forward (FF) type NNs, which are frequently used for classification problems. In this paper, five different variants of harmony search algorithm are studied by giving special attention to Self-adaptive Global Best Harmony Search (SGHS) algorithm. A structure suitable to data representation of NNs is adapted to SGHS algorithm. The technique is empirically tested and verified by training NNs on six benchmark classification problems and a real-world problem. Among these benchmark problems two of them have binary classes and remaining four are n-ary classification problems. Real-world problem is related to the classification of most frequently encountered quality defect in a major textile company in Turkey. Overall training time, sum of squared errors, training and testing accuracies of SGHS algorithm, is compared with the other harmony search algorithms and the most widely used standard back-propagation (BP) algorithm. The experiments presented that the SGHS algorithm lends itself very well to the training of NNs and also highly competitive with the compared methods in terms of classification accuracy.  相似文献   

19.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

20.
Problems with local ambiguities in handwritten mathematical expressions (MEs) are often resolved at global level. Therefore, keeping local ambiguities is desirable for high accuracy, with a hope that they may be resolved by later global analyses. We propose a layered search framework for handwritten ME recognition. From given handwritten input strokes, ME structures are expanded by adding symbol hypotheses one by one, representing ambiguities of symbol identities and spatial relationships as numbers of branches in the expansion. We also propose a novel heuristic predicting how likely the set of remaining input strokes forms valid spatial relationships with the current partially interpreted structure. Further complexity reduction is achieved by delaying the symbol identity decision. The elegance of our approach is that the search result would be unchanged even if we prune out unpromising branches of the search. Therefore, we can examine a much larger number of local hypotheses with a limited amount of computing resource in making global level decisions. The experimental evaluation shows promising results of the efficiency of the proposed approach and the performance of our system, which results from the system's capacity to examine a large number of possibilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号