首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
近似字符串匹配是模式匹配研究领域中的一个重要研究方向。压缩后缀数组是字符串匹配、数据压缩等领域广泛使用的索引结构,具有检索速度快和适用广泛的优点。利用压缩后缀数组,提出了适合近似字符串匹配搜索算法的数据结构,并在此基础上提出了一种匹配搜索算法。实验结果表明,相对于现有的算法,提出的算法在小字母表的情况下具有计算优势。  相似文献   

3.
Peter Fenwick 《Software》2001,31(9):815-833
We present a string matching or pattern matching method which is especially useful when a single block of text must be searched repeatedly for different patterns. The method combines linking the text according to digrams, searching on the least‐frequent digram, and probing selected characters as a preliminary filter before full pattern comparison. Tests on real alphabetic data show that the number of character comparisons may be decreased by two orders of magnitude compared with Knuth–Morris–Pratt and similar searching, but with an initialization overhead comparable to five to ten conventional searches. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

4.
Jonathan D. Cohen 《Software》1998,28(15):1605-1635
A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 104 to 105 entries), and to arbitrary byte streams for both patterns and data samples. The approach involves a sequence of tests, beginning with Boyer–Moore–Horspool skipping on digrams, followed by a succession of hash tests, and completed by trie searching, the combination of which is quite fast. Background information is provided, the algorithm and its implementation are described in detail, and experimental results are presented. In particular, tests suggest that the proposed method outperforms the algorithms of Aho–Corasick and Commentz–Walter when implementing large dictionaries. © 1998 John Wiley & Sons, Ltd.  相似文献   

5.
Gonzalo Navarro 《Software》2001,31(13):1265-1312
We present nrgrep (‘non‐deterministic reverse grep’), a new pattern‐matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept that is fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string‐matching tools for the simplest patterns, and is by far unmatched for more complex patterns. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

6.
We study different efficient implementations of an Aho–Corasick pattern matching automaton when searching for patterns in Unicode text. Much of the previous research has been based on the assumption of a relatively small alphabet, for example the 7‐bit ASCII. Our aim is to examine the differences in performance arising from the use of a large alphabet, such as Unicode that is widely used today. The main concern is the representation of the transition function of the pattern matching automaton. We examine and compare array, linked list, hashing, balanced tree, perfect hashing, hybrid, triple‐array, and double‐array representations. For perfect hashing, we present an algorithm that constructs the hash tables in expected linear time and linear space. We implement the Aho–Corasick automaton in Java using the different transition function representations, and we evaluate their performance. Triple‐array and double‐array performed best in our experiments, with perfect hashing, hashing, and balanced tree coming next. We discovered that the array implementation has a slow preprocessing time when using the Unicode alphabet. It seems that the use of a large alphabet can slow down the preprocessing time of the automaton considerably depending on the transition function representation used. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

7.
We present a software tool called seft which balances the convenience of search tools such as grep with the functionality of full‐text index‐based information retrieval. Based on a novel retrieval heuristic which uses term locality as a guide to relevance, seft combines the freedom of natural language queries with the benefits of a ranked answer list and easy inspection of retrieval results. While not as fast as grep ‐style tools, seft provides a valuable facility for impromptu personal information retrieval tasks. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

8.
We investigate the category of Eilenberg–Moore algebras for the Giry monad associated with stochastic relations over Polish spaces with continuous maps as morphisms. The algebras are identified as the positive convex structures on the base space. The forgetful functor assigning a positive convex structure the underlying Polish space has the stochastic powerdomain as its left adjoint.  相似文献   

9.
In this study, we introduce a web information fusion tool – web warehouse, which is suitable for web mining and knowledge discovery. To formulate a web warehouse, a four-layer web warehouse architecture for decision support is firstly proposed. According to the layered web warehouse framework architecture, an extraction–fusion–mapping–loading (EFML) process model for web warehouse construction is then constructed. In the web warehouse process model, a series of web services including wrapper service, mediation service, ontology service and mapping service are used. Particularly, two kinds of mediators are introduced to fuse the heterogeneous web information. Finally, a simple case study is presented to illustrate the construction process of web warehouse.  相似文献   

10.
In this paper, we consider the design and implementation of SPARE Parts, a C++ toolkit for pattern matching. SPARE Parts (in particular, the 2003 version presented in this article) is the second generation string pattern matching toolkit by the authors. The toolkit, the first generic program for keyword pattern matching, contains implementations of the well‐known Knuth–Morris–Pratt, Boyer–Moore, Aho–Corasick and Commentz–Walter algorithms (and their variants). The toolkit is freely available for non‐commercial use. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

11.
The Cochran–Armitage test is a widely used test for trend among binomial proportions of a dose–response relationship. This test requires preassigned fixed dose scores. Equally spaced scores were usually suggested if the dose-dependent shape of the binomial proportions is a priori unknown. Another approach is the construction of a maximin efficiency robust test. We recommend a combination of Cochran–Armitage tests based on different scores and the use of the maximum of the test statistics for a new test. Simulation results suggest that this combined test is superior to both the single test with equally spaced scores and the maximin efficiency robust test. The methods are applied to data of a toxicity and a tumorigenicity study with a stratified design.  相似文献   

12.
13.
14.
Controllability for a class of simple Wiener–Hammerstein systems is considered. Necessary and sufficient conditions for dead-beat and complete controllability for these systems are presented. The controllability tests consist of two easy-to-check tests for the subsystems.  相似文献   

15.
We propose a method that allows users to define flow features in form of patterns represented as sparse sets of stream line segments. Our approach finds similar occurrences in the same or other time steps. Related approaches define patterns using dense, local stencils or support only single segments. Our patterns are defined sparsely and can have a significant extent, i.e., they are integration‐based and not local. This allows for a greater flexibility in defining features of interest. Similarity is measured using intrinsic curve properties only, which enables invariance to location, orientation, and scale. Our method starts with splitting stream lines using globally consistent segmentation criteria. It strives to maintain the visually apparent features of the flow as a collection of stream line segments. Most importantly, it provides similar segmentations for similar flow structures. For user‐defined patterns of curve segments, our algorithm finds similar ones that are invariant to similarity transformations. We showcase the utility of our method using different 2D and 3D flow fields.  相似文献   

16.
17.
We use knowledge‐based theory to develop and test a model of client–vendor knowledge transfer at the level of the individual offshore information systems engineer. We define knowledge transfer in this context in terms of mechanisms by which an offshore engineer employed by a vendor can (a) gain understanding of their onshore client; and (b) utilize their knowledge for the benefit of the client. Over large geographic, cultural and institutional distances, effective knowledge transfer is difficult to achieve, although it is central to the success of many offshore outsourcing contracts. Our empirical test consists of a survey of vendor software engineers physically located in India but working on development projects for clients in Europe and the United States. The findings support predictions regarding engineer exposure to explicit and tacit knowledge: We find client–vendor knowledge transfer to the offshore vendor engineer to be positively associated with formal training and client embedment. We also test whether an offshore vendor engineer's inappropriate reliance on informal discussions in the offshore location hinders effective client–vendor knowledge transfer. Our result for this is mixed. Finally, we show differences between offshore engineers who have had previous onshore experience and those who have not. Client embedment is a potent driver of knowledge transfer when the offshore engineer has had previous onshore placement, while it acts to reduce inappropriate reliance on informal discussions for those that have not had an onshore placement.  相似文献   

18.
A major challenge in data‐driven biomedical research lies in the collection and representation of data provenance information to ensure that findings are reproducibile. In order to communicate and reproduce multi‐step analysis workflows executed on datasets that contain data for dozens or hundreds of samples, it is crucial to be able to visualize the provenance graph at different levels of aggregation. Most existing approaches are based on node‐link diagrams, which do not scale to the complexity of typical data provenance graphs. In our proposed approach, we reduce the complexity of the graph using hierarchical and motif‐based aggregation. Based on user action and graph attributes, a modular degree‐of‐interest (DoI) function is applied to expand parts of the graph that are relevant to the user. This interest‐driven adaptive approach to provenance visualization allows users to review and communicate complex multi‐step analyses, which can be based on hundreds of files that are processed by numerous workflows. We have integrated our approach into an analysis platform that captures extensive data provenance information, and demonstrate its effectiveness by means of a biomedical usage scenario.  相似文献   

19.
In this paper, a new observer‐based controller is proposed for a photovoltaic DC – DC buck converter; both photovoltaic (PV) voltage and current regulation are considered. In order to deal with the complex and nonlinear PV mathematical model and adapt it to the control purpose, a hybrid PV current observer model is proposed; three modes are defined and the stability of the observer is discussed using the hybrid dynamical system approach (HDS). The observer‐based controller is designed for both voltage and current regulation of the PV system; the closed loop of the full system stability is demonstrated through Lyapunov analysis. Experimental results are also presented showing the feasibility of the proposed observer‐based controller.  相似文献   

20.
Advancements in computer technology have allowed the development of human-appearing and -behaving virtual agents. This study examined if increased richness and anthropomorphism in interface design lead to computers being more influential during a decision-making task with a human partner. In addition, user experiences of the communication format, communication process, and the task partner were evaluated for their association with various features of virtual agents. Study participants completed the Desert Survival Problem (DSP) and were then randomly assigned to one of five different computer partners or to a human partner (who was a study confederate). Participants discussed each of the items in the DSP with their partners and were then asked to complete the DSP again. Results showed that computers were more influential than human partners but that the latter were rated more positively on social dimensions of communication than the former. Exploratory analysis of user assessments revealed that some features of human–computer interaction (e.g. utility and feeling understood) were associated with increases in anthropomorphic features of the interface. Discussion focuses on the relation between user perceptions, design features, and task outcomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号