首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforcement, an arbitrary value from the underlying data domain can be used for the value in common that is used for a matching. However, the overall number of changes of attribute values is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data cleaning through the enforcement of MDs. We characterize the intended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query answering is investigated. Tractable and intractable cases depending on the MDs are identified and characterized.  相似文献   

2.
语义缓存的聚集查询匹配研究   总被引:1,自引:1,他引:0  
为提高海量数据库系统的查询效率,围绕海量数据库系统中的聚集查询技术,把通常应用于小型数据库查询的语义缓存技术拓展到海量数据库的聚集查询中.首先研究了面向聚集查询的语义缓存形式化描述,在此基础上讨论了利用缓存处理查询的条件并对查询匹配进行了分类,提出并实现了包含匹配判定算法和相交匹配判定算法,最后给出了相应的实验结果.在某大型实际工程中的应用表明上述判定算法是有效的.  相似文献   

3.
Relaxation as a platform for cooperative answering   总被引:2,自引:1,他引:1  
Responses to queries posed by a user of a database do not always contain the information desired. Database answers to a query, although they may be logically correct, can sometimes be misleading. Research in the area of cooperative answering for databases and deductive databases seeks to rectify these problems. We introduce a cooperative method calledrelaxation for expanding deductive database and logic programming queries. The relaxation method expands the scope of a query by relaxing the constraints implicit in the query. This allows the database to return answers related to the original query as well as the literal answers themselves. These additional answers may be of interest to the user. In section 1 we introduce the problem and method. In Section 2 we give some background on the research done in cooperative answering. Section 3 discusses the relaxation method, a potential control strategy, and uses. Section 4 looks at a semantic counterpart to this notion. In Section 5 we explore some of the control and efficiency issues. We enumerate open issues in Section 6, and conclude in Section 7.  相似文献   

4.
Aggregate question answering essentially returns answers for given questions by obtaining query graphs with unique dependencies between values and corresponding objects. Word order dependency, as the key to uniquely identify dependency of the query graph, reflects the dependencies between the words in the question. However, due to the semantic gap caused by the expression difference between questions encoded with word vectors and query graphs represented with logical formal elements, it is not trivial to match the correct query graph for the question. Most existing approaches design more expressive query graphs for complex questions and rank them just by directly calculating their similarities, ignoring the semantic gap between them. In this paper, we propose a novel Structure-sensitive Semantic Matching(SSM) approach that learns aligned representations of dependencies in questions and query graphs to eliminate their gap. First, we propose a cross-structure matching module to bridge the gap between two modalities(i.e., textual question and query graph). Then, we propose an entropy-based gated AQG filter to remove the structural noise caused by the uncertainty of dependencies. Finally, we present a two-channel query graph representation that fuses the semantics of abstract structure and grounding content of the query graph explicitly. Experimental results show that SSM could learn aligned representations of questions and query graphs to eliminate the gaps between their dependencies, and improves up to 12% (F1 score) on aggregation questions of two benchmark datasets.  相似文献   

5.
模式匹配是模式集成、数据仓库、电子商务以及语义查询等领域中的一个难点.它主要利用元素自身信息(如元素名、数据类型等信息)、数据实例信息(模式中的数据)和结构信息(模式元素相互关联的关系)来挖掘元素语义以获得正确的映射关系.文中介绍了一种将数据实例信息与结构信息相结合来辅助匹配的新方法.此方法首先根据模式对应的数据实例信息来计算模式元素间的部分函数依赖度(模式结构信息),然后根据部分函数依赖关系建立模式元素间的依赖图,再根据元素依赖图计算元素间的结构相似度,最后得到模式元素间的映射关系.由于利用了更多的结构信息辅助匹配,所以文中方法在性能上要优于其它仅使用完全函数依赖结构信息进行匹配的方法.实验表明此方法在查准率、查全率以及全面性等各个指标上都优于已有的其它方法.  相似文献   

6.
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem.We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. We show that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions. We then identify fairly general, yet practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of the “certain answers” in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also address other algorithmic issues that arise in data exchange. In particular, we study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution, and we explore the boundary of what queries can and cannot be answered this way, in a data exchange setting.  相似文献   

7.
We consider two issues in polynomial-time exact learning of concepts using membership and equivalence queries: (1) errors or omissions in answers to membership queries, and (2) learning finite variants of concepts drawn from a learnable class.To study (1), we introduce two new kinds of membership queries: limited membership queries and malicious membership queries. Each is allowed to give incorrect responses on a maliciously chosen set of strings in the domain. Instead of answering correctly about a string, a limited membership query may give a special I don't know answer, while a malicious membership query may give the wrong answer. A new parameter Lis used to bound the length of an encoding of the set of strings that receive such incorrect answers. Equivalence queries are answered correctly, and learning algorithms are allowed time polynomial in the usual parameters and L. Any class of concepts learnable in polynomial time using equivalence and malicious membership queries is learnable in polynomial time using equivalence and limited membership queries; the converse is an open problem. For the classes of monotone monomials and monotone k-term DNF formulas, we present polynomial-time learning algorithms using limited membership queries alone. We present polynomial-time learning algorithms for the class of monotone DNF formulas using equivalence and limited membership queries, and using equivalence and malicious membership queries.To study (2), we consider classes of concepts that are polynomially closed under finite exceptions and a natural operation to add exception tables to a class of concepts. Applying this operation, we obtain the class of monotone DNF formulas with finite exceptions. We give a polynomial-time algorithm to learn the class of monotone DNF formulas with finite exceptions using equivalence and membership queries. We also give a general transformation showing that any class of concepts that is polynomially closed under finite exceptions and is learnable in polynomial time using standard membership and equivalence queries is also polynomial-time learnable using malicious membership and equivalence queries. Corollaries include the polynomial-time learnability of the following classes using malicious membership and equivalence queries: deterministic finite acceptors, boolean decision trees, and monotone DNF formulas with finite exceptions.  相似文献   

8.
9.
针对已有证据理论(DS)方法在深层网接口集成方面的局限性,设计一种基于概念词与语义异构模型的深层网模式匹配方法。通过提取概念词对概念词模型进行预处理,识别并组合成组属性,使m︰n的复杂匹配转变为1︰1的简单匹配,提高系统执行速度。在语义异构模型中引入属性实例,将挖掘语义异构的同义属性问题,转化为对属性间各特征相似值的计算、综合评测和选取问题。实验结果表明,该方法在匹配效率和准确率上较DS方法有较大改进。  相似文献   

10.
答案选择是自动问答系统中的关键任务之一,其主要目的是根据问题与候选答案的相似性对候选答案进行排序,并选择出相关性较高的答案返回给用户。可将其看作成一个文本对的匹配问题。该文利用词向量、双向LSTM、2D神经网络等深度学习模型对问题—答案对的语义匹配特征进行了提取,并将其与传统NLP特征相结合,提出一种融合深度匹配特征的答案选择模型。在Qatar Living社区问答数据集上的实验显示,融合深度匹配特征的答案选择模型比基于传统特征的模型MAP值高5%左右。  相似文献   

11.
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost.  相似文献   

12.
In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries and unions of conjunctive queries. We present results about the decidability of OWA query answering under ICs. In particular, we study OWA query answering both over finite databases and over unrestricted databases, and identify the cases in which such a problem is finitely controllable, i.e., when OWA query answering over finite databases coincides with OWA query answering over unrestricted databases. Moreover, we are able to easily turn the above results into new results about implication of ICs and query containment under ICs, due to the deep relationship between OWA query answering and these two classical problems in database theory. In particular, we close two long-standing open problems in query containment, since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies. The results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption: e.g., data integration, data exchange, view-based information access, ontology-based information systems, and peer data management systems.  相似文献   

13.
Consistent query answering is the problem of characterizing and computing the semantically correct answers to queries from a database that may not satisfy certain integrity constraints. Consistent answers are characterized as those answers that are invariant under all minimally repaired versions of the original database. We study the problem of repairing databases with respect to denial constraints by fixing integer numerical values taken by attributes. We introduce a quantitative definition of database repair, and investigate the complexity of several decision and optimization problems. Among them, Database Repair Problem (DRP): deciding the existence of repairs within a given distance to the original instance, and CQA: deciding consistency of answers to simple and aggregate conjunctive queries under different semantics. We provide sharp complexity bounds, identifying relevant tractable and intractable cases. We also develop approximation algorithms for the latter. Among other results, we establish: (a) The -hardness of CQA. (b) That DRP is MAXSNP-hard, but has a good approximation. (c) The intractability of CQA for aggregate queries for one database atom denials (plus built-ins), and also that it has a good approximation.  相似文献   

14.
When data sources are virtually integrated, there is no common and centralized method to maintain global consistency, so inconsistencies with regard to global integrity constraints are very likely to occur. In this paper, we consider the problem of defining and computing consistent query answers when queries are posed to virtual XML data integration systems, which are specified following the local-as-view approach. We propose a powerful XML constraint model to define global constraints, which can express keys and functional dependencies, and which also extends the newly introduced conditional functional dependencies to XML. We provide an approach to defining XML views, which supports not only edge-path mappings but also data-value bindings to express the join operator. We give formal definitions of repair and consistent query answers with the XML data integration settings. Given a query on the global system, we present a two-step method to compute consistent query answers. First, the given query is transformed using the global constraints, such that to run the transformed query on the original global system will generate exactly the consistent query answers. Because the global instance is not materialized, the query on the global instance is then rewritten in the form of queries on the underlying data sources by reversing rules in view definitions. We illustrate that the XPath query transformations can be implemented in XQuery. Finally, we implement prototypes of our method and evaluate our algorithms in the experiments.  相似文献   

15.
In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database. We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is viable and in many practical cases it drastically reduces the query execution time.  相似文献   

16.
Traditional database search uses pattern match in the comparison process. For a query with some search words, tuples are selected only if the words of the tuples exactly match the query words. In this paper, we propose a new method for evaluating relational ranking queries (or top-N queries) with text attributes. This method defines semantic distance functions and utilizes semantic match between words in database search. The attempt is that tuples, not only exactly matching, but also close to the query according to semantic distances, can both be fetched. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. Extensive experiments are carried out to measure the performance of this new strategy for the evaluation of ranking queries over relational databases.  相似文献   

17.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

18.
Neighbor knowledge construction is the foundation for the development of cooperative query answering systems capable of searching for close match or approximate answers when exact match answers are not available. This paper presents a technique for developing neighbor hierarchies at the attribute level. The proposed technique is called the evolved pattern-based knowledge induction (ePKI) technique and allows construction of neighbor hierarchies for nonunique attributes based upon confidences, popularities, and clustering correlations of inferential relationships among attribute values. The technique is applicable for both categorical and numerical (discrete and continuous) attribute values. Attribute value neighbor hierarchies generated by the ePKI technique allow a cooperative query answering system to search for approximate answers by relaxing each individual query condition separately. Consequently, users can search for approximate answers even when the exact match answers do not exist in the database (i.e., searching for existing similar parts as part of the implementation of the concepts of rapid prototyping). Several experiments were conducted to assess the performance of the ePKI in constructing attribute-level neighbor hierarchies. Results indicate that the ePKI technique produces accurate neighbor hierarchies when strong inferential relationships appear among data.  相似文献   

19.
Data exchange is the problem of transforming data that is structured under a source schema into data structured under another schema, called the target schema, so that both the source and target data satisfy the relationship between the schemas. Many applications such as planning, scheduling, medical and fraud detection systems, require data exchange in the context of temporal data. Even though the formal framework of data exchange for relational database systems is well-established, it does not immediately carry over to the settings of temporal data, which necessitates reasoning over unbounded periods of time.In this work, we study data exchange for temporal data. We first motivate the need for two views of temporal data: the concrete view, which depicts how temporal data is compactly represented and on which the implementations are based, and the abstract view, which defines the semantics of temporal data as a sequence of snapshots. We first extend the chase procedure for the abstract view to have a conceptual basis for the data exchange for temporal databases. Considering non-temporal source-to-target tuple generating dependencies and equality generating dependencies, the chase algorithm can be applied on each snapshot independently. Then we define a chase procedure (called c-chase) on concrete instances and show the result of c-chase on a concrete instance is semantically aligned with the result of chase on the corresponding abstract instance. In order to interpret intervals as constants while checking if a dependency or a query is satisfied by a concrete database, we will normalize the instance with respect to the dependency or the query. To obtain the semantic alignment, the nulls (which are introduced by data exchange and model incompleteness) in the concrete view are annotated with temporal information. Furthermore, we show that the result of the concrete chase provides a foundation for query answering. We define naïve evaluation on the result of the c-chase and show it produces certain answers.  相似文献   

20.
Reachability query plays a vital role in many graph analysis tasks. Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs. Since many real graphs are labeled graph, it highly demands Label-Constrained Reachability (LCR) query in which constraint includes a set of labels besides vertex pairs. Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path. Besides that constraint may be a label set, query constraint may be ordered labels, namely OLCR (Ordered-Label-Constrained Reachability) queries which retrieve paths matching a sequence of labels. Currently, no solutions are available for OLCR. Here, we propose DHL, a novel bloom filter based indexing technique for answering OLCR queries. DHL can be used to check reachability between vertex pairs. If the answers are not no, then constrained DFS is performed. So, we employ DHL followed by performing constrained DFS to answer OLCR queries. We show that DHL has a bounded false positive rate, and it’s powerful in saving indexing time and space. Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8–22.5 times smaller index space and 4.6–114 times less index construction time than two state-of-art techniques for LCR queries, while achieving comparable query response time. The results also show that our algorithm can answer OLCR queries effectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号