首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We investigate the complexity and expressive power of a spatial logic for reasoning about graphs. This logic was previously introduced by Cardelli, Gardner and Ghelli, and provides the simplest setting in which to explore such results for spatial logics. We study several forms of the logic: the logic with and without recursion, and with either an exponential or a linear version of the basic composition operator. We study the combined complexity and the expressive power of the four combinations. We prove that, without recursion, the linear and exponential versions of the logic correspond to significant fragments of first-order (FO) and monadic second-order (MSO) Logics; the two versions are actually equivalent to FO and MSO on graphs representing strings. However, when the two versions are enriched with μ-style recursion, their expressive power is sharply increased.Both are able to express PSPACE-complete problems, although their combined complexity and data complexity still belong to PSPACE.  相似文献   

2.
In a strongly typed system supporting user-defined data abstractions, the designer of a data abstraction ought to be careful in choosing the operations for the abstraction. If the operation set chosen is not expressive enough, it might be impossible or inconvenient to implement certain useful functions on the values of the data abstraction. In this paper, two properties of the operation set of a data abstraction, expressive completeness and expressive richness, are defined to formally characterize the expressive power of the operation set.For an expressively complete data abstraction, the operation set is powerful enough to implement in principle all computable properties of the values, whereas for an expressively rich data abstraction, the operation set can be used to implement the properties in a ‘simple and natural’ fashion. It is shown that if the equality predicate on the values of a data abstraction can be implemented in terms of its operations, then the data abstraction is expressively complete.For expressive richness, we identify a finite set of functions that represent certain basic kinds of manipulations of the values, and require them to be implemented in terms of the operation set as ‘straight line’ programs. The relation between these formal properties and the intuitive notions are considered. We argue that it is important to consider both expressive completeness and expressive richness while designing the operation set of a data abstraction. Practical applications of the properties of expressiveness introduced are also discussed.  相似文献   

3.
Median strings     
The concept of ‘median string’ over a set of strings is introduced in this paper. The set median is defined to be that element in the set of strings which has the smallest sum of distances (with respect to an arbitrary measure) from the other elements, and the (generalized) median is a hypothetical, artificially constructed element which has the smallest sum of distances from all the elements of the given set. The applicability of the median strings in the recognition of garbled strings is demonstrated, where they mostly yield a higher accuracy than the multiple-similarity and the k-nearest-neighbour methods.  相似文献   

4.
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In classical pattern matching, both the text and pattern are strings. Applications such as searching in xml or searching in hypertext require searching strings in non-linear structures such as trees or graphs. There has been work in the literature on exact and approximate parameterized matching, as well as work on exact and approximate string matching on non-linear structures. In this paper we explore parameterized matching in non-linear structures. We prove that exact parameterized matching on trees can be computed in linear time for alphabets in an O(n)-size integer range, and in time O(nlogm) in general, where n is the tree size and m the pattern length. These bounds are optimal in the comparison model. We also show that exact parameterized matching on directed acyclic graphs (DAGs) is NP-complete.  相似文献   

5.
A string similarity join finds similar pairs between two collections of strings. Many applications, e.g., data integration and cleaning, can significantly benefit from an efficient string-similarity-join algorithm. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and suffer from the following limitations: (1) They are inefficient for the data sets with short strings (the average string length is not larger than 30); (2) They involve large indexes; (3) They are expensive to support dynamic update of data sets. To address these problems, we propose a novel method called trie-join, which can generate results efficiently with small indexes. We use a trie structure to index the strings and utilize the trie structure to efficiently find similar string pairs based on subtrie pruning. We devise efficient trie-join algorithms and pruning techniques to achieve high performance. Our method can be easily extended to support dynamic update of data sets efficiently. We conducted extensive experiments on four real data sets. Experimental results show that our algorithms outperform state-of-the-art methods by an order of magnitude on the data sets with short strings.  相似文献   

6.
An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are “nested” are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. (ACM Trans. Algorithms 2(1): 44–65, 2006) gave an algorithm for this problem using O(nm) time and space, where m and n are the lengths of P and Q, respectively. In this paper we present a new algorithm using O(nm) time and O(n+m) space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.  相似文献   

7.
Situational data integration with data services and nested table   总被引:2,自引:0,他引:2  
Situational data integration is often ad hoc, involves active participation of business users, and requires just-in-time treatment. Agility and end-user programming are of importance. The paper presents a spreadsheet-like programming environment called Mashroom, which offers required agility and expressive power to support situational data integration by non-professional users. In Mashroom, various data sources are encapsulated as data services with nested tables as their unified data model both for internal processing and for external uses. Users can operate on the nested tables interactively. Mashroom also supports the basic control flow patterns. The expressive power of Mashroom is analyzed and proved to be richer than N1NF relational algebra. All the XQuery expressions can be mapped to Mashroom operations and formulas. Experiments have revealed the potentials of Mashroom in situational data integration.  相似文献   

8.
This paper proposes a sorting hardware module that can directly cope with variable length character strings. It gives a pipelined heap sort algorithm for a set of variable-length character strings, and a VLSI architecture that implements this algorithm. The hardware consists of a specially designed single chip module and an external memory bank. This special chip module is called a V-Sort Engine Core. The number of words in the external memory bank should be larger than the total length of strings to be sorted. A hardware module that can sort no more than 2 L strings uses a V-Sort Engine core consisting ofL levels. Thei-th level of a V-Sort Engine Core has a logic cell and a memory bank with 2 i words. Each word consists of three fields and a mark bit, i. e., level number, character, and path number. A triple (j, c, i) consisting of these field values denotes thej+1st characterc of thei-th input string. Concurrent execution of the external memory bank and all the level logic cells of the V-Sort Engine Core allows the hardware module to receive a sequence of strings sequentially character by character, and to begin the sequential output of the sort result immediately after receiving the last input character. It requires no extra time other than those required for sequential data transfer to and from itself.  相似文献   

9.
Many pattern recognition algorithms are based on the nearest-neighbour search and use the well-known edit distance, for which the primitive edit costs are usually fixed in advance. In this article, we aim at learning an unbiased stochastic edit distance in the form of a finite-state transducer from a corpus of (input, output) pairs of strings. Contrary to the other standard methods, which generally use the Expectation Maximisation algorithm, our algorithm learns a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimise the parameters of a conditional transducer instead of a joint one. We apply our new model in the context of handwritten digit recognition. We show, carrying out a large series of experiments, that it always outperforms the standard edit distance.  相似文献   

10.
During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden Markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x built from an alphabet Σ requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over Σ* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.  相似文献   

11.
The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between a pair of strings of total length O(N) in O(N 2) time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known o(N 2) edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using straight line programs. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods. For two strings of total length N having straight-line program representations of total size n, we present an algorithm running in O(nNlg(N/n)) time for computing the edit-distance of these two strings under any rational scoring function, and an O(n 2/3 N 4/3) time algorithm for arbitrary scoring functions. Our new result, while providing a speed up for compressible strings, does not surpass the quadratic time bound even in the worst case scenario.  相似文献   

12.
A shuffle of two strings is formed by interleaving the characters into a new string, keeping the characters of each string in order. A string is a square if it is a shuffle of two identical strings. There is a known polynomial time dynamic programming algorithm to determine if a given string z is the shuffle of two given strings x, y; however, it has been an open question whether there is a polynomial time algorithm to determine if a given string z is a square. We resolve this by proving that this problem is NP-complete via a many-one reduction from 3-Partition.  相似文献   

13.
Abstract State Machines (ASMs) were introduced as “a computation model that is more powerful and more universal than standard computation models”, by Yuri Gurevich in 1985. ASMs gained much attention as a specification method. It is extremely flexible because any mathematical structure may serve as a state. Gurevich characterized the expressive power of ASMs in terms of intuitively convincing postulates.  相似文献   

14.
We present an efficient data structure for finding the longest prefix of a query string q in a dynamic database of strings. When the database strings are prefixes of IP-addresses then this is the IP-lookup problem. Our data structure is I/O efficient. It supports a query with a string q using $O(\log_{B}(n)+\frac{|q|}{B})$ I/O operations, where B is the size of a disk block. It also supports an insertion and a deletion of a string q with the same number of I/Os. The data structure requires O(n/B) blocks, and the running time for each operation is O(Blog B (n)+|q|).  相似文献   

15.
We consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet Σ, a set Cs of strings, and a function Co:ΣN, the DC-LCS problem consists of finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol σΣ is upper bounded by Co(σ). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem that have been introduced previously in the literature: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution which is also applicable to the more specialized problems. Second, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings (|Cs|) and the size of the alphabet Σ. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.  相似文献   

16.
The consensus (string) problem is finding a representative string, called a consensus, of a given set S of strings. In this paper we deal with consensus problems considering both distance sum and radius, where the distance sum is the sum of (Hamming) distances from the strings in S to the consensus and the radius is the longest (Hamming) distance from the strings in S to the consensus. Although there have been results considering either distance sum or radius, there have been no results considering both, to the best of our knowledge.We present the first algorithms for two consensus problems considering both distance sum and radius for three strings: one problem is to find an optimal consensus minimizing both distance sum and radius. The other problem is to find a bounded consensus such that the distance sum is at most s and the radius is at most r for given constants s and r. Our algorithms are based on characterization of the lower bounds of distance sum and radius, and thus they solve the problems efficiently. Both algorithms run in linear time.  相似文献   

17.
It is a trivial observation that every decidable set has strings of length n with Kolmogorov complexity log?n+O(1) if it has any strings of length n at all. Things become much more interesting when one asks whether a similar property holds when one considers resource-bounded Kolmogorov complexity. This is the question considered here: Can a feasible set A avoid accepting strings of low resource-bounded Kolmogorov complexity, while still accepting some (or many) strings of length?n? More specifically, this paper deals with two notions of resource-bounded Kolmogorov complexity: Kt and KNt. The measure Kt was defined by Levin more than three decades ago and has been studied extensively since then. The measure KNt is a nondeterministic analog of Kt. For all strings x, Kt(x)??KNt(x); the two measures are polynomially related if and only if NEXP?EXP/poly (Allender et al. in J.?Comput. Syst. Sci. 77:14?C40, 2011). Many longstanding open questions in complexity theory boil down to the question of whether there are sets in P that avoid all strings of low Kt complexity. For example, the EXP vs ZPP question is equivalent to (one version of) the question of whether avoiding simple strings is difficult: (EXP=ZPP if and only if there exist ?>0 and a ??dense?? set in P having no strings x with Kt(x)??|x| ? (Allender et al. in SIAM J. Comput. 35:1467?C1493, 2006)). Surprisingly, we are able to show unconditionally that avoiding simple strings (in the sense of KNt complexity) is difficult. Every dense set in NP??coNP contains infinitely many strings x such that KNt(x)??|x| ? for every ?>0. The proof does not relativize. As an application, we are able to show that if E=NE, then accepting paths for nondeterministic exponential time machines can be found somewhat more quickly than the brute-force upper bound, if there are many accepting paths.  相似文献   

18.
We give an acount of the basic determinants of the courses of computation of the Infinite Time Turing Machine model of Hamkins and Kidder, a model of computation which allows for transfinitely many steps of computation, and therefore may accept and output infinite strings of bits. We provide, inter alia, a Normal form Theorem, and a characterisation of which ordinals start gaps in halting times of such machines.  相似文献   

19.
We consider the problem of finding short strings that contain all permutations of order k over an alphabet of size n, with k?n. We show constructively that k(n−2)+3 is an upper bound on the length of shortest such strings, for n?k?10. Consequently, for n?10, the shortest strings that contain all permutations of order n have length at most n2−2n+3. These two new upper bounds improve with one unit the previous known upper bounds.  相似文献   

20.
In this paper we introduce a biologically inspired distributed computing model called networks of evolutionary processors with parallel string rewriting rules (NEPPS), which is a variation of the hybrid networks of evolutionary processors introduced by Martin-Vide et al. Such a network contains simple processors that are located in the nodes of a virtual graph. Each processor has strings (each string having multiple copies) and string rewriting rules. The rules are applied parallely on the strings. After the strings have been rewritten, they are communicated among the processors through filters. We show that we can theoretically break the DES (data encryption standard), which is the most widely used cryptosystem, using NEPPS. We prove that, given an arbitrary <plain-text, cipher-text> pair, one can recover the DES key in a constant number of steps.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号