首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A methodology to retrieve text documents from multiple databases   总被引:1,自引:0,他引:1  
This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.  相似文献   

2.
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost.  相似文献   

3.
Recently, considerable interest has been shown in the automation of database design. The paper discusses the query facility for an experimental prototype of a database management system (the SPUR system) based on the universal relation concept, which removes some logical database design details from the human designer. The basic motivation for the study as a whole is to make databases easier to develop and use. The query language of the SPUR system is described and its power and correctness are explored through the use of case studies. The feasibility of implementing such a query language is established, although the user, perhaps not an end-user in this case, needs to have a good understanding of database terminology to use the system effectively.  相似文献   

4.
石柯 《计算机工程》2008,34(8):66-68
为了集成网格环境中的数据库资源,促进网格应用支持现有数据库的访问,提出一种基于服务的数据库访问和集成系统(GridDBAdmin)。GridDBAdmin为用户提供了虚拟的全局逻辑数据库视图,支持用户使用现有的SQL语言同时访问多个数据库。系统由元数据服务和网格虚拟数据库服务构成。其中元数据服务负责发现含有用户所需数据的数据库,网格虚拟数据库服务提供全局逻辑视图,通过分布式查询机制将用户的SQL请求分解到具体的数据库中并进行结果合并。对基于Globus和OGSA-DAI工具包开发的原型系统进行了测试,得到了较好的结果。  相似文献   

5.
In big data era, people cannot afford more and more complex computation work due to the constrained computation resources. The high reliability, strong processing capacity, large storage space of cloud computing makes the resource-constrained clients remotely operate the heavy computation task with the help of cloud server. In this paper, a new algorithm for secure outsourcing of high degree polynomials is proposed. We introduce a camouflage technique, which the real polynomial will be disguised to the untrusted cloud server. In addition, the input and output will not be revealed in the computation process and the clients can easily verify the returned result. The application of the secure outsourcing algorithm in keyword search system is also studied. A verification technique for keyword search is generated based on the outsourcing algorithm. The client can easily verify whether the server faithfully implement the search work in the whole ciphertext space. If the server does not implement the search work and returns the client “null” to indicate there is no files with the query keyword, the client can easily verify whether there are some related files in the ciphertext database.  相似文献   

6.
A query is said to be secure against inference attacks by a user if there exists no database instance for which the user can infer the result of the query, using only authorized queries to the user. In this paper, first, the security problem against inference attacks on object-oriented databases is formalized. The definition of inference attacks is based on equational logic. Secondly, the security problem is shown to be undecidable, and a decidable sufficient condition for a given query to be secure under a given schema is proposed. The idea of the sufficient condition is to over-estimate inference attacks using over-estimated results of static type inference. The third contribution is to propose subclasses of schemas and queries for which the security problem becomes decidable. Lastly, the decidability of the security problem is shown to be incomparable with the static type inferability, although the tightness of the over-estimation of the inference attacks is affected in a large degree by that of the static type inference.  相似文献   

7.
Approximating query answering on RDF databases   总被引:1,自引:0,他引:1  
Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems in efficient query relaxation. First, to ensure the quality of answers, we compute the similarities between relaxed queries with regard to the user query and use them to score the potential relevant answers. Second, for obtaining top-k answers, we develop two algorithms. One is based on the best-first strategy and relaxed queries are executed in the ranking order. The batch based algorithm executes the relaxed queries as a batch and avoids unnecessary execution cost. At last, we implement and experimentally evaluate our approaches.  相似文献   

8.
Starting from a member of an image database designated the "query image," traditional image retrieval techniques, for example, search by visual similarity, allow one to locate additional instances of a target category residing in the database. However, in many cases, the query image or, more generally, the target category, resides only in the mind of the user as a set of subjective visual patterns, psychological impressions, or "mental pictures." Consequently, since image databases available today are often unstructured and lack reliable semantic annotations, it is often not obvious how to initiate a search session; this is the "page zero problem." We propose a new statistical framework based on relevance feedback to locate an instance of a semantic category in an unstructured image database with no semantic annotation. A search session is initiated from a random sample of images. At each retrieval round, the user is asked to select one image from among a set of displayed images-the one that is closest in his opinion to the target class. The matching is then "mental." Performance is measured by the number of iterations necessary to display an image which satisfies the user, at which point standard techniques can be employed to display other instances. Our core contribution is a Bayesian formulation which scales to large databases. The two key components are a response model which accounts for the user's subjective perception of similarity and a display algorithm which seeks to maximize the flow of information. Experiments with real users and two databases of 20,000 and 60,000 images demonstrate the efficiency of the search process.  相似文献   

9.
Security is an important issue that must be considered as a fundamental requirement in information systems development, and particularly in database design. Therefore security, as a further quality property of software, must be tackled at all stages of the development. The most extended secure database model is the multilevel model, which permits the classification of information according to its confidentiality, and considers mandatory access control. Nevertheless, the problem is that no database design methodologies that consider security (and therefore secure database models) across the entire life cycle, particularly at the earliest stages currently exist. Therefore it is not possible to design secure databases appropriately. Our aim is to solve this problem by proposing a methodology for the design of secure databases. In addition to this methodology, we have defined some models that allow us to include security information in the database model, and a constraint language to define security constraints. As a result, we can specify a fine-grained classification of the information, defining with a high degree of accuracy which properties each user has to own in order to be able to access each piece of information. The methodology consists of four stages: requirements gathering; database analysis; multilevel relational logical design; and specific logical design. The first three stages define activities to analyze and design a secure database, thus producing a general secure database model. The last stage is made up of activities that adapt the general secure data model to one of the most popular secure database management systems: Oracle9i Label Security. This methodology has been used in a genuine case by the Data Processing Center of Provincial Government. In order to support the methodology, we have implemented an extension of Rational Rose, including and managing security information and constraints in the first stages of the methodology.  相似文献   

10.
The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems; however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.  相似文献   

11.
We describe two scenarios of user tasks in which access to multimedia data plays a significant role. Because current multimedia databases cannot support these tasks, we introduce three new requirements on multimedia databases: multimedia objects should be active objects, querying is an interaction process, and query processing uses multiple representations. We discuss three techniques to handle multimedia objects as active objects. Also, we introduce a promising database architecture to meet the new user requirements. Agents within the database handle objects' representations, and a search engine on top of a conventional database handles relevance feedback and multiple representations.  相似文献   

12.
《Information Systems》2002,27(1):1-19
Inclusion dependencies together with functional dependencies form the most important data dependencies used in practice. Inclusion dependencies are important for various database applications such as database design and maintenance, semantic query optimization and efficient view maintenance of data warehouse. Existing approaches for discovering inclusion dependencies consist in producing the whole set of inclusion dependencies holding in a database, leaving the task of selecting the interesting ones to an expert user.In this paper, we take another look at the problem of discovering inclusion dependencies. We exploit the logical navigation, inherently available in relational databases through workloads of SQL statements, as a guess to automatically find out only interesting inclusion dependencies. This assumption leads us to devise a tractable algorithm for discovering interesting inclusion dependencies. Within this framework, approximate dependencies, i.e. inclusion dependencies which almost hold, are also considered.As an example, we present a novel application, namely self-tuning the logical database design, where the discovered inclusion dependencies can be used effectively.  相似文献   

13.
Query rewriting for SWIFT (First) answers   总被引:2,自引:0,他引:2  
Traditionally, the answer to a database query is construed as the set of all tuples that meet the criteria stated. Strict adherence to this notion in query evaluation is, however, increasingly unsatisfactory because decision makers are more prone to adopting an exploratory strategy for information search which we call “getting some answers quickly, and perhaps more later”. From a decision-maker's perspective, such a strategy is optimal for coping with information overload and makes economic sense (when used in conjunction with a micropayment mechanism). These new requirements present new opportunities for database query optimization. In this paper, we propose a progressive query processing strategy that exploits this behavior to conserve system resources and to minimize query response time and user waiting time. This is accomplished by the heuristic decomposition of user queries into subqueries that can be evaluated on demand. To illustrate the practicality of the proposed methods, we describe the architecture of a prototype system that provides a nonintrusive implementation of our approach. Finally, we present experimental results obtained from an empirical study conducted using an Oracle server that demonstrate the benefits of the progressive query processing strategy  相似文献   

14.
基于遗传算法的实时内存数据库查询优化   总被引:3,自引:0,他引:3  
各种事务类型的查询处理是实时数据库实现的关键点之一.由于现有的关系查询处理不能适合于实时数据库,因此实时数据库系统必须具有自己的查询处理器.为此,结合正在开发的嵌入式实时数据库系统ERTDBMS,给出了一个实时数据库查询处理的系统RTQP,并在对实时数据库查询处理做了一般性探讨后,将重点放在内存代价和遗传算法上,类似于关系系统RTQP提供了在MMDB环境下节省内存的查询处理的实现算法,以及遗传算法和实时数据库规则相结合的查询优化方案。  相似文献   

15.
The performance of a database system depends, to a large extent, on the storage structure selected to represent the logical schema of the database. A comprehensive model for the physical design of network model databases is presented. It evaluates the retrieval time for each user query, database updating cost, storage requirements and total cost of the system in terms of design parameters. A linear 0–1 goal programming model, because of its multicriteria nature, has been selected here as a solution procedure. It finds the optimal location mode of each database record type based on the priority and weights assigned to the conflicting design objectives; short retrieval time for a user query, low database updating cost, small storage requirements and low total cost of the system. The designer can interactively change value of design parameters, priority and weights to perform tradeoff analysis. The model has been tested in the design of a department store database.  相似文献   

16.
Précis queries represent a novel way of accessing data, which combines ideas and techniques from the fields of databases and information retrieval. They are free-form, keyword-based, queries on top of relational databases that generate entire multi-relation databases, which are logical subsets of the original ones. A logical subset contains not only items directly related to the given query keywords but also items implicitly related to them in various ways, with the purpose of providing to the user much greater insight into the original data. In this paper, we lay the foundations for the concept of logical database subsets that are generated from précis queries under a generalized perspective that removes several restrictions of previous work. In particular, we extend the semantics of précis queries considering that they may contain multiple terms combined through the AND, OR, and NOT operators. On the basis of these extended semantics, we define the concept of a logical database subset, we identify the one that is most relevant to a given query, and we provide algorithms for its generation. Finally, we present an extensive set of experimental results that demonstrate the efficiency and benefits of our approach.  相似文献   

17.
针对目前基于数据加密技术的安全数据库服务不能有效平衡数据处理性能与数据隐私保护的不足,提出一种新的基于分布式安全数据库服务的隐私保护方法,通过引入准标志属性集的自动检测技术,采用对部分敏感属性加密和分解准标志属性集的方式实现数据的垂直分解,通过基于元数据的查询分解实现分布式查询处理。实验结果表明,该方法能较好地平衡查询性能与隐私保护之间的矛盾。  相似文献   

18.
Privacy has become a major concern for the users of location-based services (LBSs) and researchers have focused on protecting user privacy for different location-based queries. In this paper, we propose techniques to protect location privacy of users for trip planning (TP) queries, a novel type of query in spatial databases. A TP query enables a user to plan a trip with the minimum travel distance, where the trip starts from a source location, goes through a sequence of points of interest (POIs) (e.g., restaurant, shopping center), and ends at a destination location. Due to privacy concerns, users may not wish to disclose their exact locations to the location-based service provider (LSP). In this paper, we present the first comprehensive solution for processing TP queries without disclosing a user’s actual source and destination locations to the LSP. Our system protects the user’s privacy by sending either a false location or a cloaked location of the user to the LSP but provides exact results of the TP queries. We develop a novel technique to refine the search space as an elliptical region using geometric properties, which is the key idea behind the efficiency of our algorithms. To further reduce the processing overhead while computing a trip from a large POI database, we present an approximation algorithm for privacy preserving TP queries. Extensive experiments show that the proposed algorithms evaluate TP queries in real time with the desired level of location privacy.  相似文献   

19.
Describes an approach for multiparadigmatic visual access integration of different interaction paradigms. The user is provided with an adaptive interface augmented by a user model, supporting different visual representations of both data and queries. The visual representations are characterized on the basis of the chosen visual formalisms, namely forms, diagrams and icons. To access different databases, a unified data model called the “graph model” is used as a common underlying formalism to which databases, expressed in the most popular data models, can be mapped. Graph model databases are queried through the adaptive interface. The semantics of the query operations is formally defined in terms of graphical primitives. Such a formal approach permits us to define the concept of an “atomic query”, which is the minimal portion of a query that can be transferred from one interaction paradigm to another and processed by the system. Since certain interaction modalities and visual representations are more suitable for certain user classes, the system can suggest to the user the most appropriate interaction modality as well as the visual representation, according to the user model. Some results on user model construction are presented  相似文献   

20.
Similar-shape retrieval in shape data management   总被引:1,自引:0,他引:1  
Mehrotra  R. Gary  J.E. 《Computer》1995,28(9):57-62
Addresses the problem of similar-shape retrieval, where shapes or images in a shape database that satisfy specified shape-similarity constraints with respect to the query shape or image must be retrieved from the database. In its simplest form, the similar-shape retrieval problem can be stated as, “retrieve or select all shapes or images that are visually similar to the query shape or the query image's shape”. We focus on databases of 2D shapes-or equivalently, databases of images of flat or almost flat objects. (We use the terms “object” and “shape” interchangeably). Two common types of 2D objects are rigid objects, which have a single rigid component called a link, and articulated objects, which have two or more rigid components joined by movable (rotating or sliding) joints. An ideal similar-shape retrieval technique must be general enough to handle images of articulated as well as rigid objects. It must be flexible enough to handle simple query images, which have isolated shapes, and complex query images, which have partially visible, overlapping or touching objects. We discuss the central issues in similar-shape retrieval and explain how these issues are resolved in a shape retrieval scheme called FIBSSR (Feature Index-Based Similar-Shape Retrieval). This new similar-shape retrieval system effectively models real-world applications  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号