首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 656 毫秒
随着互联网技术与数据库技术的不断发展和相互结合,越来越多的用户需要访问在线数据库来获取各种信息,在该过程中,用户要对数据库中的数据进行查询。因此,用户就必需要掌握一定的结构化查询语言SQL,而且还要对数据库模式有一个认识和了解。但事实上,多数用户并不会使用结构化查询语言,也不了解数据库模式。所以,便产生了一个很自然的用户需求——数据库支持基于关键字的查询,文章主要就是对数据库支持基于关键字的查询进行简要的分析和探讨。  相似文献   

联机分析查询处理是一种涉及大量数据的即席复杂查洵,它通常都包含分组聚集运算。分析了关系数据仓库星型模式存储结构和数据更新的特点,把实体关系看成分布式数据库中以内存排序缓冲区人小为分段条件的全局关系,对分组操作进行分布式聚集运算,给出了一种改进的MuSA算法,有效地提高了算法性能。  相似文献   

应用分布式索引提高海量数据查询性能   总被引:1,自引:0,他引:1  
在电信领域的精准化营销、即席查询业务中,存在着大量针对一张宽表或几张宽表(超过50字段)的随机查询场景. 传统处理模式(直接查询数据库)在数据量不大(〈;1000万)时,查询响应时间可优化到几秒至数十秒级,而当数据量到达几千万、上亿甚至十亿记录以上时,此处理模式无论如何优化或更改索引机制,都无法满足秒级并发查询要求.新的处理模式通过引入分布式Solr索引层解决上述问题.索引层预先对数据库记录建立索引,查询不再作用于数据库而直接查询索引层,如此,可大幅提高查询性能.经过对两种处理模式的对比验证,在相同环境下,数据量到达5000万,每秒20并发访问的宽表查询场景,传统处理模式的查询全部超时失败,而使用分布式索引层的查询可以在2秒以内返回,查询全部成功.  相似文献   

Précis queries represent a novel way of accessing data, which combines ideas and techniques from the fields of databases and information retrieval. They are free-form, keyword-based, queries on top of relational databases that generate entire multi-relation databases, which are logical subsets of the original ones. A logical subset contains not only items directly related to the given query keywords but also items implicitly related to them in various ways, with the purpose of providing to the user much greater insight into the original data. In this paper, we lay the foundations for the concept of logical database subsets that are generated from précis queries under a generalized perspective that removes several restrictions of previous work. In particular, we extend the semantics of précis queries considering that they may contain multiple terms combined through the AND, OR, and NOT operators. On the basis of these extended semantics, we define the concept of a logical database subset, we identify the one that is most relevant to a given query, and we provide algorithms for its generation. Finally, we present an extensive set of experimental results that demonstrate the efficiency and benefits of our approach.  相似文献   

Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches.  相似文献   

Because of users’ growing utilization of unclear and imprecise keywords when characterizing their information need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms occurring in the largest possible number of documents where the query keywords appear; (2) proximity, where more importance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria simultaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the retrieval performance as compared to the baseline.  相似文献   

We consider the problem of designing schemas for deductive databases. The design problem is to construct a database schema that supports, at minimal expected cost, a given set of database transactions. Our results include a formal definition of both a deductive database schema and a schema transformation. A schema transformation is used in the design process to transform one schema into another, with the goal of reducing the expected database costs. Our design methodology defines the concept of a schema transformation within the context of the clause-based deductive database model. The IDB of the schema that results from the design process includes clauses sufficient for a theorem prover to map queries stated against the original schema into queries against the (more cost effective) resulting schema. This allows users to interact exclusively with the initial schema, while the schema that results from the design process specifies the actual structure of the implemented database. In other words, the initial schema serves as the logical schema for the database, and the result of the design process serves as its physical schema.  相似文献   

Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and Pipelined (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time.  相似文献   

The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we present Hermes, an infrastructure for data Web search that addresses a number of challenges involved in realizing search on the data Web. To provide an end-user oriented interface, we support expressive user information needs by translating keywords into structured queries. We integrate heterogeneous Web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local Web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data Web, we show the practicability and scalability of the Hermes infrastructure.  相似文献   

Keyword search can provide users an easy method to query large and complex databases without any knowledge of structured query languages or underlying database schema. Most of the existing studies have focused on generating candidate structured queries relevant to keywords. Due to the large size of generated queries, the execution costs may be prohibitive. However, existing studies lack the idea of a generalized method to optimize the plan of the large set of generated queries. In this paper, we introduce a graph-theoretic optimization approach. We propose a general graph model, Weighted Operator Graph, to address the costs of keyword query evaluation plans. The proposed model is flexible to integrate all of the cost-based plans in a uniform way. We define a Keyword Query Optimization Problem based on a theoretical cost model as a graph-theoretic problem and show it to be a NP-hard problem. We propose a greedy heuristic Maximum Propagation that reduces the size of the intermediate result as early as possible. The proposed algorithm allows us to achieve efficiency in terms of query evaluation costs. The experimental studies on both synthetic and real data set results show that our work outperforms the existing work.  相似文献   

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries.  相似文献   

The concept of a database skeleton which reflects both the user's conception of the real world and the system's understanding of the interrelationships among database entities is described. It consists of a conceptual schema (conceptual graphs) and a relational schema (information graph). With the aid of the database skeleton, fuzzy queries can be translated and disambiguated by analyzing the queries using the conceptual graphs of a database skeleton. The query language XQL is introduced, and the XQL translator is described in some detail.  相似文献   

黎玲利  王宏志  高宏  李建中 《软件学报》2012,23(6):1561-1577
利用关键字可以在模式未知的情况下对XML数据进行查询.在当前的XML数据流上的关键字查询处理中,打分函数往往不能都满足各种用户不同的需求.提出了一种基于skyline的XML数据流上的Top-K关键字查询.对于这种查询,不需要考虑影响结果与查询相关性的复杂因素,只需利用skyline挑选与查询最相关的结果.提出了两种XML数据流上的有效的基于skyline的Top-K关键查询处理算法,包括对单查询和多查询的处理算法.通过扩展实验对两种算法的有效性和可扩展性进行了验证.经过实验验证,所提出的查询处理算法的效率几乎不受关键字个数、查询结果数量、查询数量等参数的影响,运行时间和文档大小大致呈线性关系.  相似文献   

Coupled transformation occurs when multiple software artifacts must be transformed in such a way that they remain consistent with each other. For instance, when a database schema is adapted in the context of system maintenance, the persistent data residing in the system's database needs to be migrated to conform to the adapted schema. Also, queries embedded in the application code and any declared referential constraints must be adapted to take the schema changes into account. As another example, in XML-to-relational data mapping, a hierarchical XML Schema is mapped to a relational SQL schema with appropriate referential constraints, and the XML documents and queries are converted into relational data and relational queries. The 2LT project is aimed at providing a formal basis for coupled transformation. This formal basis is found in data refinement theory, point-free program calculation, and strategic term rewriting. We formalize the coupled transformation of a data type by an algebra of information-preserving data refinement steps, each witnessed by appropriate data conversion functions. Refinement steps are modeled by so-called two-level rewrite rules on type expressions that synthesize conversion functions between redex and reduct while rewriting. Strategy combinators are used to composed two-level rewrite rules into complete rewrite systems. Point-free program calculation is applied to optimized synthesize conversion function, to migrate queries, and to normalize data type constraints. In this paper, we provide an overview of the challenges met by the 2LT project and we give a sketch of the solutions offered.  相似文献   

通过以关系名的同义关键字作为模式信息的索引键以及垂直分区关系元组,设计了用结构化重叠网络索引模式和数据的方法.基于这两级索引,提出了支持多属性复杂查询的算法.定性分析和比较表明,该方法比相关工作更接近P2P数据管理的理想目标.  相似文献   

Current microarray databases use different terminologies and structures and thereby limit the sharing of data and collating of results between laboratories. Consequently, an effective integrated microarray data model is required. One important process to develop such an integrated database is schema matching. In this paper, we propose an effective schema matching approach called MDSM, to syntactically and semantically map attributes of different microarray schemas. The contribution from this work will be used later to create microarray global schemas. Since microarray data is complex, we use microarray ontology to improve the measuring accuracy of the similarity between attributes. The similarity relations can be represented as weighted bipartite graphs. We determine the best schema matching by computing the optimal matching in a bipartite graph using the Hungarian optimisation method. Experimental results show that our schema matching approach is effective and flexible to use in different kinds of database models such as; database schema, XML schema, and web site map. Finally, a case study on an existing public microarray schema is carried out using the proposed method.  相似文献   

This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

A polymorphic object algebra for an object-oriented database model is introduced. Types of schema modification that follow naturally from this model are described. It is shown to what extent queries return identical or equivalent results when the objects in the database are modified to conform to a modified schema  相似文献   

In this paper, we present a new method for fuzzy query processing in relational database systems based on automatic clustering techniques and weighting concepts. The proposed method allows the query conditions and the weights of query items of users' fuzzy SQL queries to be described by linguistic terms represented by fuzzy numbers. Because the proposed fuzzy query processing method allows the users to construct their fuzzy queries more conveniently, the existing relational database systems will be more intelligent and more flexible to the users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号