首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Database systems are becoming increasingly popular for answering queries. Partial-match search queries are an important class of queries in such a system. Several storage structures have been proposed to answer these queries efficiently. The BD tree is an example of such a storage structure. A previous study indicated that the k-d tree performance is better than that of the BD tree for partial-match search queries. A recent paper reported some improved algorithms. However, it is unclear whether the improved algorithms show the BD tree in a favourable light for partial-match search queries. This paper explores the performance of these algorithms and compares their performance to that of the k-d tree. Since the BD tree construction process uses some heuristics to make it a better balanced tree, this paper also evaluates the effect of these heuristics on the partial-match search algorithms. The major conclusions of this study are that the BD tree performance for partial-match search is better than that of the k-d tree when an improved algorithm is used for partial-match search, and only the DZ expression rearrangement heuristic has substantial effect on partial-match search performance.  相似文献   

2.
利用粒子群算法优化多源检索融合结果的方法   总被引:1,自引:0,他引:1  
对多个搜索引擎系统返回结果进行自动整合,是当前网络信息检索应用至今尚未较好解决的一个难点,也是影响元搜索引擎效果的关键技术环节。在实验多种处理多源搜索结果融合算法的基础上,文中提出一种可对多种其它融合排序算法输出结果做进一步优化的离散粒子群算法。该算法不仅能在整体效果上优于作为其预处理输入的其它融合排序算法,而且对不同查询有更好的适应性,不需考虑各独立源检索返回结果的质量权重及相互间重叠率等因素。与作为其输入处理的其它融合算法相比,该算法的相关文档识别准确率可提高约20%,而准确率随查询主题变化的标准差可降低约50%。  相似文献   

3.
Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets. Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119. Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research Center Support Program, grant number IITA-2006-C1090-0603-0031.  相似文献   

4.
The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.  相似文献   

5.
A Memetic Approach to the Nurse Rostering Problem   总被引:3,自引:0,他引:3  
Constructing timetables of work for personnel in healthcare institutions is known to be a highly constrained and difficult problem to solve. In this paper, we discuss a commercial system, together with the model it uses, for this rostering problem. We show that tabu search heuristics can be made effective, particularly for obtaining reasonably good solutions quickly for smaller rostering problems. We discuss the robustness issues, which arise in practice, for tabu search heuristics. This paper introduces a range of new memetic approaches for the problem, which use a steepest descent improvement heuristic within a genetic algorithm framework. We provide empirical evidence to demonstrate the best features of a memetic algorithm for the rostering problem, particularly the nature of an effective recombination operator, and show that these memetic approaches can handle initialisation parameters and a range of instances more robustly than tabu search algorithms, at the expense of longer solution times. Having presented tabu search and memetic approaches (both with benefits and drawbacks) we finally present an algorithm that is a hybrid of both approaches. This technique produces better solutions than either of the earlier approaches and it is relatively unaffected by initialisation and parameter changes, combining some of the best features of each approach to create a hybrid which is greater than the sum of its component algorithms.  相似文献   

6.
7.
Skyline查询处理   总被引:7,自引:1,他引:7  
魏小娟  杨婧  李翠平  陈红 《软件学报》2008,19(6):1386-1400
对目前的Skyline查询方法进行分类和综述.首先介绍Skyline查询处理问题产生的背景,然后介绍Skyline查询处理的内存算法,并从带索引和不带索引两个方面对现有的外存Skyline查询处理方法进行分类介绍,在每组算法后,都对该组算法进行了性能评价,然后介绍不同子空间上的多SKyline查询处理模型——SKYCUBE的概念和相关研究.另外,还介绍了不同应用环境下解决Skyline查询处理的策略以及Skyline查询处理问题的扩展,最后归结出Skyline查询处理后续研究的几个方向.  相似文献   

8.
9.
This paper addresses a novel distributed assembly permutation flowshop scheduling problem that has important applications in modern supply chains and manufacturing systems. The problem considers a number of identical factories, each one consisting of a flowshop for part-processing plus an assembly line for product-processing. The objective is to minimize the makespan. To suit the needs of different CPU time and solution quality, we present a mixed integer linear model, three constructive heuristics, two variable neighborhood search methods, and an iterated greedy algorithm. Important problem-specific knowledge is obtained to enhance the effectiveness of the algorithms. Accelerations for evaluating solutions are proposed to save computational efforts. The parameters and operators of the algorithms are calibrated and analyzed using a design of experiments. To prove the algorithms, we present a total of 16 adaptations of other well-known and recent heuristics, variable neighborhood search algorithms, and meta-heuristics for the problem and carry out a comprehensive set of computational and statistical experiments with a total of 810 instances. The results show that the proposed algorithms are very effective and efficient to solve the problem under consideration as they outperform the existing methods by a significant margin.  相似文献   

10.
Generating action sequences to achieve a set of goals is a computationally difficult task. When multiple goals are present, the problem is even worse. Although many solutions to this problem have been discussed in the literature, practical solutions focus on the use of restricted mechanisms for planning or the application of domain dependent heuristics for providing rapid solutions (i.e., domain-dependent planning). One previously proposed technique for handling multiple goals efficiently is to design a planner or even a set of planners (usually domain-dependent) that can be used to generate separate plans for each goal. The outputs are typically either restricted to be independent and then concatenated into a single global plan, or else they are merged together using complex heuristic techniques. In this paper we explore a set of limitations, less restrictive than the assumption of independence, that still allow for the efficient merging of separate plans using straightforward algorithmic techniques.
In particular, we demonstrate that for cases where separate plans can be individually generated, we can define a set of limitations on the allowable interactions between goals that allow efficient plan merging to occur. We propose a set of restrictions that are satisfied across a significant class of planning domains. We present algorithms that are efficient for special cases of multiple plan merging, propose a heuristic search algorithm that performs well in a more general case (where alternative partially ordered plans have been generated for each goal), and describe an empirical study that demonstrates the efficiency of this search algorithm.  相似文献   

11.
In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.  相似文献   

12.
We consider the problem of determining a route of a search resource to search visually multiple areas in which targets are expected to be located. It is assumed that the probability a target exists in each area is given as a result of target detection operations and that the probability decreases as time passes. It is necessary to search the areas using a search resource, and identify the exact locations of the targets. We propose heuristic algorithms including a simulated annealing (SA) algorithm for the search sequencing problem. Since the search sequence must be determined as quickly as possible not to delay the search, heuristics for search sequencing should not take too much time. We introduce a new neighborhood generation method and a new parameter for an easier control of the overall computation time in the SA algorithm. A series of computational experiments is performed for evaluating the suggested algorithms, and results are reported.  相似文献   

13.
This paper addresses the minimization of the mean absolute deviation from a common due date in a two-machine flowshop scheduling problem. We present heuristics that use an algorithm, based on proposed properties, which obtains an optimal schedule for a given job sequence. A new set of benchmark problems is presented with the purpose of evaluating the heuristics. Computational experiments show that the developed heuristics outperform results found in the literature for problems up to 500 jobs.  相似文献   

14.
Designing data warehouses   总被引:9,自引:0,他引:9  
A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented.  相似文献   

15.
In this paper, we study the problem of scheduling tasks on a distributed system, with the aim to simultaneously minimize energy consumption and makespan subject to the deadline constraints and the tasks’ memory requirements. A total of eight heuristics are introduced to solve the task scheduling problem. The set of heuristics include six greedy algorithms and two naturally inspired genetic algorithms. The heuristics are extensively simulated and compared using an simulation test-bed that utilizes a wide range of task heterogeneity and a variety of problem sizes. When evaluating the heuristics, we analyze the energy consumption, makespan, and execution time of each heuristic. The main benefit of this study is to allow readers to select an appropriate heuristic for a given scenario.  相似文献   

16.
A storing of spatial data and processing of spatial queries are important tasks for modern data-bases. The execution efficiency of spatial query depends on underlying index structure. R-tree is a well-known spatial index structure. Currently there exist various versions of R-tree, and one of the most common variations between them is node splitting algorithm. The problem of node splitting in one-dimensional R-tree may seem to be too trivial to be considered separately. One-dimensional intervals can be split on the base of their sorting. Some of the node splitting algorithms for R-tree with two or more dimensions comprise one-dimensional split as their part. However, under detailed consideration, existing algorithms for one-dimensional split do not perform ideally in some complicated cases. This paper introduces a novel one-dimensional node splitting algorithm based on two sortings that can handle such complicated cases better. Also this paper introduces node splitting algorithm for R-tree with two or more dimensions that is based on the one-dimensional algorithm mentioned above. The tests show significantly better behavior of the proposed algorithms in the case of highly overlapping data.  相似文献   

17.
在数据仓库中存在着大量的数据。联机分析处理包含着对大量数据的复杂的查询过程。在对这些数据的存储与查询中都遇到了许多困难。解决这一问题的有效办法就是先将数据划分成便于处理的数据块,再分别对每个数据块进行处理,最后将个数据块的处理结果归并在一起。对几种常用的归并算法进行了比较,并讨论了归并中的缓冲区分配问题。  相似文献   

18.
The problem of multidimensional file partitioning (MDFP) arises in large databases that are subject to frequent range queries on one or more attributes. In an MDFP scheme, the search attribute space is partitioned into cells, which are mapped to physical disk locations. This mapping preserves the order of the search attribute values so that range queries can be answered most efficiently, while maintaining good performance for other types of queries. Recently, MDFP schemes have been suggested to include both dynamic and static file organizations. Optimal and heuristic MDFP algorithms are developed for the static case. The results of extensive computational experiments show that the proposed heuristics perform better than known static ones. It is also shown that incorporating a static algorithm into a dynamic MDFP such as a grid file at conversion and/or periodical reorganization points significantly improves the resulting storage utilization of the data file and decreases the size of the directory file  相似文献   

19.
In a traffic-aware route search (TARS), the user provides start and target locations and sets of search terms. The goal is to find the fastest route from the start location to the target via geographic entities (points of interest) that correspond to the search terms, while taking into account variations in the travel speed due to changes in traffic conditions, and the possibility that some visited entities will not satisfy the search requirements. A TARS query may include temporal constraints and order constraints that restrict the order by which entities are visited. Since TARS generalizes the Traveling-Salesperson Problem, it is an NP-hard problem. Thus, it is unlikely to find a polynomial-time algorithm for evaluating TARS queries. Hence, we present in this paper three heuristics to answer TARS queries—a local greedy approach, a global greedy approach and an algorithm that computes a linear approximation to the travel speeds, formulates the problem as a Mixed Integer Linear Programming (MILP) problem and uses a solver to find a solution. We provide an experimental evaluation based on actual traffic data and show that using a MILP solver to find a solution is effective and can be done within a limited running time in many real-life scenarios. The local-greedy approach is the least effective in finding a fast route, however, it has the best running time and it is the most scalable.  相似文献   

20.
This paper introduces the family traveling salesperson problem (FTSP), a variant of the generalized traveling salesman problem. In the FTSP, a subset of nodes must be visited for each node cluster in the graph. The objective is to minimize the distance traveled. We describe an integer programming formulation for the FTSP and show that the commercial grade integer programming solver CPLEX 11 can only solve small instances of the problem in reasonable running time. We propose two randomized heuristics for finding optimal and near‐optimal solutions of this problem. These heuristics are a biased random‐key genetic algorithm and a GRASP with evolutionary path‐relinking. Computational results comparing both heuristics are presented in this study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号