共查询到20条相似文献,搜索用时 15 毫秒
1.
The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/ fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/ O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are state-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we pose particular attention to differentiate between ``random'' and ``contiguous'' disk accesses, in order to explain reasonably some practical I/ O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in the light of other simpler external-memory models. We also address two other issues. The former is concerned with the problem of building word indexes; we show that our results can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive ``contradiction'' between the effective practical performance of the well-known Baeza-Yates—Gonnet—Snider algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memory algorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance. 相似文献
2.
Roberto Grossi 《Theoretical computer science》2011,412(27):2964-2973
Suffix arrays are a key data structure for solving a run of problems on texts and sequences, from data compression and information retrieval to biological sequence analysis and pattern discovery. In their simplest version, they can just be seen as a permutation of the elements in {1,2,…,n}, encoding the sorted sequence of suffixes from a given text of length n, under the lexicographic order. Yet, they are on a par with ubiquitous and sophisticated suffix trees. Over the years, many interesting combinatorial properties have been devised for this special class of permutations: for instance, they can implicitly encode extra information, and they are a well characterized subset of the n! permutations. This paper gives a short tutorial on suffix arrays and their compressed version to explore and review some of their algorithmic features, discussing the space issues related to their usage in text indexing, combinatorial pattern matching, and data compression. 相似文献
3.
From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction 总被引:6,自引:0,他引:6
We review the linear-time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most
recent algorithm, Ukkonen's on-line construction, to explain its historic predecessors. This reveals relationships much closer
than one would expect, since the three algorithms are based on rather different intuitive ideas. Moreover, it completely explains
the differences between these algorithms in terms of simplicity, efficiency, and implementation complexity.
Received February 12, 1995; revised January 28, 1996. 相似文献
4.
We find, in polynomial time, a schedule for a complete binary tree directed acyclic graph (dag) with n unit execution time tasks on a linear array whose makespan is optimal within a factor of 1+o(1) . Further, given a binary tree dag T with n tasks and height h , we find, in polynomial time, a schedule for T on a linear array whose makespan is optimal within a factor of 5 + o(1) .
On the other hand, we prove that explicit lower and upper bounds on the makespan of optimal schedules of binary tree dags
on linear arrays differ at least by a factor of 1+ . We also find, in polynomial time, schedules for bounded tree dags with n unit execution time tasks, degree d , and height on a linear array which are optimal within a factor of 1+o(1) , this time under the assumption of links with unlimited bandwidth.
Finally, we compute an improved upper bound on the makespan of an optimal schedule for a tree dag on the architecture independent
model of Papadimitriou and Yannakakis [14], provided that its height is not too large.
Received January 21, 1997; revised June 5, 1997. 相似文献
5.
In this paper, a double hinge-beam with a variable quadratic flexure moment is described. This smart beam can undergo a significant change of deflection. A shape memory alloy is used to obtain the moment of the change of flexure. Structure shape can be changed because of elastic energy stored in the beam by prestrain can be modified. The actuator is used to change the stress distribution of the beam. This concept is more efficient than a classical method, where all the energy is provided directly by the actuator. A thermomechanical model of the beam is proposed, and experimental data demonstrates its accuracy. 相似文献
6.
7.
探讨了智能组卷系统中试题库设计和研发中,图形和文本的混合存储以及输出技术这一问题的解决,在很多应用中所采用的方案是将图形信息和文本信息分开处理,或者存储图形的存放路径,使用上述方法得到的排版结果均不够理想,而且存储繁琐,针对这一问题,利用Visual basic开发了专门的图文数据共同存储输出程序,并实现了预期的效果。 相似文献
8.
This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs). As a hybrid of message passing and shared address space programming models, the DSA programming model allows programmers to explicitly control data distribution so as to take advantage of the deep memory hierarchy, while relieving them from error-prone orchestration of communication and synchronization at run-time. The DSA system is developed as an integral component of mobility support middleware for grid computing so that DSA-based virtual machines can be reconfigured to adapt to the varying resource supplies or demand over the course of a computation. The DSA runtime system also features a directory-based cache coherence protocol in support of replication of user-defined sharing granularity and a communication proxy mechanism for reducing network contention. We demonstrate the programmability of the model in a number of parallel applications and evaluate its performance on a cluster of SMP servers, in particular, the impact of the coherence granularity. 相似文献
9.
表达式求值是程序设计语言编译中的一个最基本问题。与人们习惯的中缀表示的表达式相比,后缀表达式不存在括号,没有优先级的差别,表达式中各个运算是按照运算符出现的顺序进行的。因此非常适合串行工作的计算机处理方式。该文首先对这两种表达式表示方法进行了分析比较,然后通过具体分析实现这两种表达式求值的算法来论证表达式后缀表示优于中缀表示。最后简要谈一下中缀表达式到后缀表达式的转换。 相似文献
10.
Haibo Chen Susan Grant-Muller Lorenzo Mussone Frank Montgomery 《Neural computing & applications》2001,10(3):277-286
In this paper we present an application of hybrid neural network approaches and an assessment of the effects of missing data
on motorway traffic flow forecasting. Two hybrid approaches are developed using a Self-Organising Map (SOM) to initially classify
traffic into different states. The first hybrid approach includes four Auto-Regressive Integrated Moving Average (ARIMA) models,
whilst the second uses two Multi-Layer Perception (MLP) models. It was found that the SOM/ARIMA hybrid approach out-performs
all individual ARIMA models, whilst the SOM/MLP hybrid approach achieves superior forecasting performance to all models used
in this study, including three na?ve models. The effects of different proportions of missing data on Neural Network (NN) performance
when forecasting traffic flow are assessed and several initial substitution options to replace missing data are discussed.
Over-all, it is shown that ARIMA models are more sensitive to the percentage of missing data than neural networks in this
context. 相似文献
11.
Constructing Bernstein-Bezier triangular interpolating curve surface interpolating a series of arbitrary disordered data points is of considerable importance for the design and modeling of surfaces with a variety of continuity information. In this article. a kind of simple and reliable algorithm that can process complex field triangular grid generating is presented, and a group of formulae for determining triangular curved surface with wholly C1 continuity are given. It can process arbitrary non-convex boundary and can be used to construct surfaces inner holes. 相似文献
12.
One of the most important aspects of the (statistical) analysis of imprecise data is the usage of a suitable distance on the family of all compact, convex fuzzy sets, which is not too hard to calculate and which reflects the intuitive meaning of fuzzy sets. On the basis of expressing the metric of Bertoluzza et al. [C. Bertoluzza, N. Corral, A. Salas, On a new class of distances between fuzzy numbers, Mathware Soft Comput. 2 (1995) 71-84] in terms of the mid points and spreads of the corresponding intervals we construct new families of metrics on the family of all d-dimensional compact convex sets as well as on the family of all d-dimensional compact convex fuzzy sets. It is shown that these metrics not only fulfill many good properties, but also that they are easy to calculate and easy to manage for statistical purposes, and therefore, useful from the practical point of view. 相似文献
13.
14.
在比较和分析分布式环境中数据传输的必要性及各种策略的基础上,提出数据构造方略的相关因素及其ODS模型,介绍了定时主动/被动及非定时数据传输规则;详细描述了在该规则指导下的正常传输、定时传输以及后台传输等三种执行策略.该技术将数据仓库和分布式数据库有机结合,在很大程度上提升了现有分布式数据库管理系统的功能.从而对分布式数据库技术的应用和推广产生了积极的作用. 相似文献
15.
任务空间概念模型(CMMS)研究 总被引:5,自引:3,他引:5
任务空间概念模型(Conceptual Models of the Mission Space,CMMS)是美国建模与仿真主计划提出的三大目标之一,与HLA、DS共同组成M&S的技术框架,它是对真实世界的、独立于仿真的首次抽象,即有关真实世界军事行动过程(功能、任务、行动和状态等)、实体(人员、单位、机制和系统)、交互(控制、转换、信息)的概念性描述.该文通过对任务空间概念模型的定义、组成、过程及位置等理论的阐述,使大家对CMMS有一定的认识.然后通过对CMMS与系统仿真开发过程中的几个关键因素之间的关系进行的分析,使大家进一步认识CMMS作为作战人员的真实世界和仿真开发人员的综合世界在仿真开发过程中所起到的桥梁作用,它对建模与仿真的互操作与重用有着重要的意义. 相似文献
16.
分析了实验仪器设备利用率不高的原因,论证了实验室全面开放管理、疏通设备流通渠道和加强实验工作者的管理与提高仪器设备利用率的关系,提出了提高实验仪器设备利用率的途径,简述了实验室开放管理的基本函意。 相似文献
17.
18.
19.
当前医疗卫生信息化已成为国家医疗体系建设至关重要的组成部分。结合体系现状,电信运营商将集合终端商、软件商的能力和资源,逐步发展成为区域医疗健康信息化建设的集成者、整合者,为医疗健康信息化的建设和使用提供一体化解决方案。通过充分整合产业链,电信运营商可以实现对卫生信息化平台的统一管理和部署,便于信息安全管理的同时,降低建设成本,最终实现为广大群众服务的目的。 相似文献
20.
计算机多媒体技术与教学的结合使得教学有了突飞猛进的发展。为了激发学生学习兴趣,提高教学效率,以及满足当前教学的需求,计算机的多项技术在教学中发挥着重要作用。通过介绍多媒体技术的相关特性,多媒体的常用数据准备软件和视频点播,结合教育技术学中的理论,阐述多媒体技术在教育领域中所发挥的作用。详细介绍了几种常用数据准备软件。说明了在教学中多媒体技术的重要性。 相似文献