首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optimizing top-k selection queries over multimedia repositories   总被引:2,自引:0,他引:2  
Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically, request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted.  相似文献   

2.
Index selection for relational databases is an important issue which has been researched quite extensively [1–5]. In the literature, in index selection algorithms for relational databases, at most one index is considered as a candidate for each attribute of a relation. However, it is possible that more than one different type of indexes with different storage space requirements may be present as candidates for an attribute. Also, it may not be possible to eliminate locally all but one of the candidate indexes for an attribute due to different benefits and storage space requirements associated with the candidates. Thus, the algorithms available in the literature for optimal index selection may not be used when there are multiple candidates for each attribute and there is a need for a global optimization algorithm in which at most one index can be selected from a set of candidate indexes for an attribute. The problem of index selection in the presence of multiple candidate indexes for each attribute (which we call the multiple choice index selection problem) has not been addressed in the literature. In this paper, we present the multiple choice index selection problem, show that it is NP-hard, and present an algorithm which gives an approximately optimal solution within a user specified error bound in a logarithmic time order.  相似文献   

3.
This paper deals with relational databases which are extended in the sense that fuzzily known values are allowed for attributes. Precise as well as partial (imprecise, uncertain) knowledge concerning the value of the attributes are represented by means of [0,1]-valued possibility distributions in Zadeh's sense. Thus, we have to manipulate ordinary relations on Cartesian products of sets of fuzzy subsets rather than fuzzy relations. Besides, vague queries whose contents are also represented by possibility distributions can be taken into account. The basic operations of relational algebra, union, intersection, Cartesian product, projection, and selection are extended in order to deal with partial information and vague queries. Approximate equalities and inequalities modeled by fuzzy relations can also be taken into account in the selection operation. Then, the main features of a query language based on the extended relational algebra are presented. An illustrative example is provided. This approach, which enables a very general treatment of relational databases with fuzzy attribute values, makes an extensive use of dual possibility and necessity measures.  相似文献   

4.
Users of information systems would like to express flexible queries over the data possibly retrieving imperfect items when the perfect ones, which exactly match the selection conditions, are not available. Most commercial DBMSs are still based on the SQL for querying. Therefore, providing some flexibility to SQL can help users to improve their interaction with the systems without requiring them to learn a completely novel language. Based on the fuzzy set theory and the α-cut operation of fuzzy number, this paper presents the generic fuzzy queries against classical relational databases and develops the translation of the fuzzy queries. The generic fuzzy queries mean that the query condition consists of complex fuzzy terms as the operands and complex fuzzy relations as the operators in a fuzzy query. With different thresholds that the user chooses for the fuzzy query, the user’s fuzzy queries can be translated into precise queries for classical relational databases.  相似文献   

5.
An efficient means of accessing indexed hierarchical databases using a relational query language is presented. The purpose is to achieve an effective sharing of heterogeneous distributed databases. Translation of hierarchical data to an equivalent relational data definition, translation of a relational query language statement to an equivalent program that can be processed by a hierarchical database management system, and automatic selection of secondary indexes of hierarchical databases are investigated. A major portion of the result has been implemented, and the performance of the implemented system is analyzed. The performance of the system is satisfactory for a wide range of test data and test queries. It is shown that the utilization of the secondary index significantly enhances the efficiency in accessing hierarchical databases  相似文献   

6.
A problem of considerable interest in the design of a database is the selection of indexes. In this paper, we present a probabilistic model of transactions (queries, updates, insertions, and deletions) to a file. An evaluation function, which is based on the cost saving (in terms of the number of page accesses) attributable to the use of an index set, is then developed. The maximization of this function would yield an optimal set of indexes. Unfortunately, algorithms known to solve this maximization problem require an order of time exponential in the total number of attributes in the file. Consequently, we develop the theoretical basis which leads to an algorithm that obtains a near optimal solution to the index selection problem in polynomial time. The theoretical result consists of showing that the index selection problem can be solved by solving a properly chosen instance of the knapsack problem. A theoretical bound for the amount by which the solution obtained by this algorithm deviates from the true optimum is provided. This result is then interpreted in the light of evidence gathered through experiments.  相似文献   

7.
Systems EROS is a physical design tool for CODASYL database systems which covers a large spectrum of decision variables, notably location mode, set implementation, set order, and search keys. System EROS is based on a model where the CODASYL physical database design problem is formulated as an extension of the index selection problem in the relational database environment. Optimization algorithms for index selection are extended to solve the more complex problem of selecting a good physical access path configuration for CODASYL databases. The proposed approach represents a unified solution to the physical database design problem for both CODASYL and relational systems  相似文献   

8.
Currently relational databases are widely used, while object-oriented databases are emerging as a new generation of database technology. This paper presents a methodology to provide effective sharing of information in object-oriented databases and relational databases. The object-oriented data model is selected as a common data model to build an integrated view of the diverse databases. An object-oriented query language is used as a standard query language. A method is developed to transform a relational data definition to an equivalent object-oriented data definition and to integrate local data definitions. Two distributed query processing methods are derived. One is for general queries and the other for a special class of restricted queries. Using the methods developed, it is possible to access distributed object-oriented databases and relational databases such that the locations and the structural differences of the databases are transparent to users.  相似文献   

9.
Reliability of answers to queries in relational databases   总被引:1,自引:0,他引:1  
The author studies the problem of determining the reliability of answers to queries in a relational database system, where the information in the database comes from various sources with varying degrees of reliability. An extended relational model is proposed in which each tuple in a relation is associated with an information source vector which identifies the information source(s) that contributed to that tuple. The author shows how relational algebra operations can be extended, and implemented using information source vectors, to calculate the vector corresponding to each tuple in the answer to a query, and hence, to identify information source(s) contributing to each tuple in the answer. This also enables the database system to calculate the reliability of each tuple in the answer to a query as a function of the reliability of information sources  相似文献   

10.
Designing data warehouses   总被引:9,自引:0,他引:9  
A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented.  相似文献   

11.
A solution to the problem of supporting relational database operations despite domain mismatch is presented. Mismatched domains occur when information must be obtained from databases that were developed independently. Domain differences are resolved by mapping conflicting attributes to common domains by means of a mechanism of virtual attributes and then applying a set of extended relational operations to the resulting values. When one-one mappings cannot be established between domains, the values that result from attribute mappings may be partial. A set of extended relational operators that formalize operations over partial values and thus manipulate the incomplete information that results from resolving domain mismatch is defined  相似文献   

12.
The index selection problem (ISP) is an important optimization problem in the physical design of databases. The aim of this paper is to show that ISP, although NP-hard, can in practice be solved effectively through well-designed algorithms. We formulate ISP as a 0-1 integer linear program and describe an exact branch-and-bound algorithm based on the linear programming relaxation of the model. The performance of the algorithm is enhanced by means of procedures to reduce the size of the candidate index set. We also describe heuristic algorithms based on the solution of a suitably defined knapsack subproblem and on Lagrangian decomposition. Finally, computational results on several classes of test problems are given. We report the exact solution of large-scale ISP instances involving several hundred indexes and queries. We also evaluate one of the heuristic algorithms we propose on very large-scale instances involving several thousand indexes and queries and show that it consistently produces very tight approximate (and sometimes provably optimal) solutions. Finally, we discuss possible extensions and future directions of research  相似文献   

13.
The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.  相似文献   

14.
We present a general rank-aware model of data which supports handling of similarity in relational databases. The model is based on the assumption that in many cases it is desirable to replace equalities on values in data tables by similarity relations expressing degrees to which the values are similar. In this context, we study various phenomena which emerge in the model, including similarity-based queries and similarity-based data dependencies. Central notion in our model is that of a ranked data table over domains with similarities which is our counterpart to the notion of relation on relation scheme from the classical relational model. Compared to other approaches which cover related problems, we do not propose a similarity-based or ranking module on top of the classical relational model. Instead, we generalize the very core of the model by replacing the classical, two-valued logic upon which the classical model is built by a more general logic involving a scale of truth degrees that, in addition to the classical truth degrees 0 and 1, contains intermediate truth degrees. While the classical truth degrees 0 and 1 represent nonequality and equality of values, and subsequently mismatch and match of queries, the intermediate truth degrees in the new model represent similarity of values and partial match of queries. Moreover, the truth functions of many-valued logical connectives in the new model serve to aggregate degrees of similarity. The presented approach is conceptually clean, logically sound, and retains most properties of the classical model while enabling us to employ new types of queries and data dependencies. Most importantly, similarity is not handled in an ad hoc way or by putting a “similarity module” atop the classical model in our approach. Rather, it is consistently viewed as a notion that generalizes and replaces equality in the very core of the relational model. We present fundamentals of the formal model and two equivalent query systems which are analogues of the classical relational algebra and domain relational calculus with range declarations. In the sequel to this paper, we deal with similarity-based dependencies.  相似文献   

15.
Traditional database search uses pattern match in the comparison process. For a query with some search words, tuples are selected only if the words of the tuples exactly match the query words. In this paper, we propose a new method for evaluating relational ranking queries (or top-N queries) with text attributes. This method defines semantic distance functions and utilizes semantic match between words in database search. The attempt is that tuples, not only exactly matching, but also close to the query according to semantic distances, can both be fetched. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. Extensive experiments are carried out to measure the performance of this new strategy for the evaluation of ranking queries over relational databases.  相似文献   

16.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.  相似文献   

17.
When planning a database, the problem of index selection is of particular interest. The authors examine a transaction model that includes queries, updates, insertions, and deletions, and they define a function that calculates the transaction's total cost when an index set is used. Their aim is to minimize the function cost in order to identify the optimal set. The algorithms proposed in other studies require an exponential time in the number of attributes in order to solve the problem. The authors propose a heuristic algorithm based on some properties of the cost function that produces an almost optimal set in polynomial time. In many cases, the cost function properties make it possible to prove that the solution obtained is the optimal one  相似文献   

18.
Data warehouses are very large databases usually designed using the star schema. Queries defined on data warehouses are generally complex due to join operations involved. The performance of star schema queries in data warehouses is highly critical and its optimization is hard in general. Several query performance optimization methods exist, such as indexes and table partitioning. In this paper, we propose a new approach based on binary particle swarm optimization for solving the bitmap join index selection problem in data warehouses. This approach selects the optimal set of bitmap join indexes based on a mathematical cost model. Several experiments are performed to demonstrate the effectiveness of the proposed method on the bitmap join index selection problem. Further testing of the method is performed using a database environment specific cost function. The binary particle swarm optimization is found to be more effective than both the genetic algorithm and data mining based approaches.  相似文献   

19.
Imprecise data exist in databases due to their unavailability or to data/ schema incompatibilities in a multidatabase system. Partial values have been used to represent imprecise data. Manipulation of partial values is therefore necessary to process queries involving imprecise data. In this article, we study the problem of eliminating redundant partial values that result from a projection on an attribute with partial values. The redundancy of partial values is defined through the interpretation of a set of partial values. This problem is equivalent to searching a minimal semantically-equivalent subset of a set of partial values. A semantically-equivalent subset contains exactly the same information as the original set. We derive a set of useful properties and apply a graph matching technique to develop an efficient algorithm for searching such a minimal subset and therefore eliminating redundant partial values. By this process, we not only provide a concise answer to the user, but also reduce the communication cost when partial values are requested to be transmitted from one site to another site in a distributed environment. Moreover, further manipulation of the partial values can be simplified. This work is also extended to the case of multi-attribute projections.  相似文献   

20.
陈井爽  陈珂  寿黎但  江大伟  陈刚 《软件学报》2022,33(12):4688-4703
学习型索引通过学习数据分布可以准确地预测数据存取的位置,在保持高效稳定的查询下,显著降低索引的内存占用.现有的学习型索引主要针对只读查询进行优化,而对插入和更新支持不足.针对上述挑战,设计了一种基于Radix Tree的工作负载自适应学习型索引ALERT. ALERT使用Radix Tree来管理不定长的分段,段内采用具有最大误差界的线性插值模型进行预测.同时,ALERT使用一种高效的插入缓冲来降低数据插入更新的代价.针对点查询和范围查询提出两种自适应重组优化方法,通过对工作负载进行感知,动态地调整插入缓冲的组织结构.经实验验证,ALERT与业界流行的学习型索引相比,构建时间平均降低了81%,内存占用平均降低了75%,在保持了优秀读性能的同时,使插入延迟平均降低了50%;此外, ALERT使用自适应重组优化能有效感知查询工作负载特征,与不使用自适应重组优化相比,查询延迟平均降低了15%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号