首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
王兵  ;刘彩虹 《微机发展》2008,(7):176-180
随着Internet信息的迅速增长,许多Web信息已经被各种各样的可搜索在线数据库所深化,并被隐藏在Web查询接口下面。传统的搜索引擎由于技术原因不能索引这些信息——DeepWeb信息。由于DeepWeb惟一“入口点”是查询接口,为使查询接口自动产生有意义有查询,给出了DeepWeb信息集成系统框架,提出了基于数据类型的搜索驱动的用户查询转换方法,基于此设计并实现了一个针对中文DeepWeb信息集成原型系统。通过在实际DeepWeb站点上的实验证明了此方法是非常有效的。  相似文献   

2.
Deep Web查询接口是Web数据库的接口,其对于Deep Web数据库集成至关重要。本文根据网页表单的结构特征定义查询接口;针对非提交查询法,给出界定Deep Web查询接口的一些规则;提出提交查询法,根据链接属性的特点进行判断,找到包含查询接口的页面;采用决策树C4.5算法进行分类,并用Java语言实现Deep Web查询接口系统。  相似文献   

3.
随着越来越多的信息隐藏在Deep Web中,针对用户查询找出最相关的Web数据库成为亟待解决的问题。提出了一种基于Web数据库主题分布的方法用于Deep Web数据集成中的Web数据库选择。获取主题覆盖度形式的Web数据库内容描述,而后利用选定的Web数据库获取查询主题,最终由查询主题和主题分布矩阵来选择Web数据库。在真实Web数据库上的实验结果表明,该方法既取得了较高的查询召回率,也可有效降低数据库内容描述建立的代价。  相似文献   

4.
A two-phase framework for quality-aware Web service selection   总被引:1,自引:0,他引:1  
Service-oriented computing is gaining momentum as the next technological tool to leverage the huge investments in Web application development. The expected large number of Web services poses a set of new challenges for efficiently accessing these services. We propose an integrated service query framework that facilitates users in accessing their desired services. The framework incorporates a service query model and a two-phase optimization strategy. The query model defines service communities that are used to organize the large and heterogeneous service space. The service communities allow users to use declarative queries to retrieve their desired services without worrying about the underlying technical details. The two-phase optimization strategy automatically generates feasible service execution plans and selects the plan with the best user-desired quality. In particular, we present an evolutionary algorithm that is able to “co-evolve” multiple feasible execution plans simultaneously and allows them to compete with each other to generate the best plan. We conduct a set of experiments to assess the performance of the proposed algorithms.  相似文献   

5.
We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass” and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin fragment of the relational algebra.  相似文献   

6.
将返回结果受限的DeepWeb数据源中预测查询结果大小并且抽取的问题转化为概念覆盖问题。首先证明由属性及属性组合产生的集合划分之间为容差关系,进而又证明其构成一个完全格,并且与概念格同态。使用概念间的偏序关系来刻画属性间的相关性,使用概念内涵为查询属性,概念外延为返回结果的预测,基于外延的势剪枝后的概念格为搜索空间,最终提出一种基于格空间的DeepWeb数据抽取算法。实验由可控实验和实际应用实验组成,结果证明该算法理论正确性和现实应用的可行性及有效性。  相似文献   

7.
With the extensive use of XML in applications over the Web, efficient query processing over streaming XML has become a core challenge due to one-pass processing and limited resources. Taking advantage of Hole-Filler model for XML fragments, this paper proposes a hybrid structure (FQ-Index) for both the queries and fragments, and proposes an XML fragment processing algorithm to evaluate forward XPath queries over streamed XML fragments. Two optimization rules, dependence pruning and prefix pruning are also developed. Dependence pruning scheme prunes off the dependent operations caused by fragmentation and transforms the queries for XML tag into queries for XML fragments, while prefix pruning scheme prunes off the “redundant” prefix along the path according to the tag structure. The effectiveness of the techniques developed is illustrated with a detailed set of experiments.  相似文献   

8.
Traditional information systems return answers after a user submits a complete query. Users often feel “left in the dark” when they have limited knowledge about the underlying data and have to use a try-and-see approach for finding information. A recent trend of supporting autocomplete in these systems is a first step toward solving this problem. In this paper, we study a new information-access paradigm, called “type-ahead search” in which the system searches the underlying data “on the fly” as the user types in query keywords. It extends autocomplete interfaces by allowing keywords to appear at different places in the underlying data. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental-search algorithms for both single-keyword queries and multi-keyword queries, using previously computed and cached results in order to achieve a high interactive speed. We develop novel techniques to support fuzzy search by allowing mismatches between query keywords and answers. We have deployed several real prototypes using these techniques. One of them has been deployed to support type-ahead search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.  相似文献   

9.
Web数据集成系统基于QC模型的物化视图选择   总被引:2,自引:0,他引:2  
在Web数据集成系统中,物化视图能够有效地减少网络传输代价,提高系统的查询效率.如何选择查询进行物化,使得选中的查询满足集成层的空间限制,同时获取最大物化收益,成为集成系统中一个迫切需要解决的问题.传统方法没有考虑到海量XML查询之间的包含关系,其选择的物化视图中可能包含冗余的信息.针对上述问题,提出了①Web数据集成系统中海量查询集合的QC(query containment)模型,该模型能够捕捉查询之间最常见的包含关系;②基于QC模型的物化视图选择算法,算法考虑了物化视图选择相关的主要因素,包括查询提交的频率、空间代价、查询重写能力和查询结果的完备性,提出了查询位图的物化视图组织方式,从而获取更加合理的物化视图选择方案.实验结果证明了该方法的有效性.  相似文献   

10.
In multimedia retrieval, a query is typically interactively refined towards the “optimal” answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. Furthermore, it may also take too many iterations to get the “optimal” answers. In this paper, we introduce a new approach called OptRFS (optimizing relevance feedback search by query prediction) for iterative relevance feedback search. OptRFS aims to take users to view the “optimal” results as fast as possible. It optimizes relevance feedback search by both shortening the searching time during each iteration and reducing the number of iterations. OptRFS predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses (i.e., random accesses) can be saved, hence reducing the searching time for the next iteration. In addition, efficient scan on the overlap before the next search starts also tightens the search space with smaller pruning radius. As a step forward, OptRFS also predicts the “optimal” query, which corresponds to “optimal” answers, based on the early executed iterations’ queries. By doing so, some intermediate iterations can be saved, hence reducing the total number of iterations. By taking the correlations among the early executed iterations into consideration, OptRFS investigates linear regression, exponential smoothing and linear exponential smoothing to predict the next refined query so as to decide the overlap of candidates between two consecutive iterations. Considering the special features of relevance feedback, OptRFS further introduces adaptive linear exponential smoothing to self-adjust the parameters for more accurate prediction. We implemented OptRFS and our experimental study on real life data sets show that it can reduce the total cost of relevance feedback search significantly. Some interesting features of relevance feedback search are also discovered and discussed.  相似文献   

11.
In this paper we propose a query expansion and user profile enrichment approach to improve the performance of recommender systems operating on a folksonomy, storing and classifying the tags used to label a set of available resources. Our approach builds and maintains a profile for each user. When he submits a query (consisting of a set of tags) on this folksonomy to retrieve a set of resources of his interest, it automatically finds further “authoritative” tags to enrich his query and proposes them to him. All “authoritative” tags considered interesting by the user are exploited to refine his query and, along with those tags directly specified by him, are stored in his profile in such a way to enrich it. The expansion of user queries and the enrichment of user profiles allow any content-based recommender system operating on the folksonomy to retrieve and suggest a high number of resources matching with user needs and desires. Moreover, enriched user profiles can guide any collaborative filtering recommender system to proactively discover and suggest to a user many resources relevant to him, even if he has not explicitly searched for them.  相似文献   

12.
We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.  相似文献   

13.
Skyline queries are used with data extensive applications, such as mobile location-based services, to support multi-criteria decision-making and to prune the data space by returning the most “interesting” data points. Most interesting data points are the points, which are not dominated by any other point. Spatial network skyline query is a subset of the skyline query problem where data points are nodes in a road network and the attributes of the data points are network distance relative to a set of query points. Spatial network skyline query’s problem is the need to calculate the attributes with an expensive distance calculation operation. Previous works (Deng et al. Proceedings of the 23th international conference on data engineering, 796–805, 2007), Sharifzadeh et al. Proceedings of the 32nd international conference on very large databases, 751–762, 2009) that addressed this problem involved extensive network distance calculation between the query points and data points. A new algorithm that requires a remarkably less number of network distance calculations is proposed in this work. Our approach uses a progressive nearest neighbor algorithm to minimize the set of candidates then evaluates those candidates by only comparing them to a subset of discovered skyline points. Experiments showed the effectiveness of our algorithm compared to previous works.  相似文献   

14.
Being decades of study, the usability of database systems have received more attention in recent years. Now it is especially able to explain missing objects in a query result, which is called “why-not” questions, and is the focus of concern. This paper studies the problem of answering whynot questions on KNN queries. In our real life, many users would like to use KNN queries to investigate the surrounding circumstances. Nevertheless, they often feel disappointed when finding the result not including their expected objects. In this paper, we use the query refinement approach to resolve the problem. Given the original KNN query and a set of missing objects as input, our algorithm offer a refined KNN query that includes the missing objects to the user. The experimental results demonstrate the efficiency of our proposed optimizations and algorithms.  相似文献   

15.
在Deep Web页面的背后隐藏着海量的可以通过结构化的查询接口进行访问的数据源。将这些数据源按所属领域进行组织划分,是DeepWeb数据集成中的一个关键步骤。已有的划分方法主要是基于查询接口模式和提交查询返回结果,存在查询接口特征难以完全抽取和提交数据库查询效率不高等问题。提出了一种结合网页文本信息,基于频繁项集的聚类方法,根据数据源查询接口所在页面的标题、关键词和提示文本,将数据源按照领域进行聚类,有效解决了传统方法中依赖查询接口特征以及文本模型的高维性问题。实验结果表明该方法是可行的,具有较高的效率。  相似文献   

16.
Reinforcement learning (RL) attracts much attention as a technique for realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL to practical use. The difficulty includes the problem of designing suitable state and action spaces for an agent. Previously, we proposed an adaptive state space construction method which is called a “state space filter,” and an adaptive action space construction method which is called “switching RL,” after the other space has been fixed. In this article, we reconstitute these two construction methods as one method by treating the former and the latter as a combined method for mimicking an infant’s perceptual development. In this method, perceptual differentiation progresses as an infant become older and more experienced, and the infant’s motor development, in which gross motor skills develop before fine motor skills, also progresses. The proposed method is based on introducing and referring to “entropy.” In addition, a computational experiment was conducted using a so-called “path planning problem” with continuous state and action spaces. As a result, the validity of the proposed method has been confirmed.  相似文献   

17.
Given a query workload, a database and a set of constraints, the view-selection problem is to select views to materialize so that the constraints are satisfied and the views can be used to compute the queries in the workload efficiently. A typical constraint, which we consider in the present work, is to require that the views can be stored in a given amount of disk space. Depending on features of SQL queries (e.g., the DISTINCT keyword) and on whether the database relations on which the queries are applied are sets or bags, the queries may be computed under set semantics, bag-set semantics, or bag semantics. In this paper we study the complexity of the view-selection problem for conjunctive queries and views under these semantics. We show that bag semantics is the “easiest to handle” (we show that in this case the decision version of view selection is in NP), whereas under set and bag-set semantics we assume further restrictions on the query workload (we only allow queries without self-joins in the workload) to achieve the same complexity. Moreover, while under bag and bag-set semantics filtering views (i.e., subgoals that can be dropped from the rewriting without impacting equivalence to the query) are practically not needed, under set semantics filtering views can reduce significantly the query-evaluation costs. We show that under set semantics the decision version of the view-selection problem remains in NP only if filtering views are not allowed in the rewritings. Finally, we investigate whether the cgalg algorithm for view selection introduced in Chirkova and Genesereth (Linearly bounded reformulations of conjunctive databases, pp. 987–1001, 2000) is suitable in our setting. We prove that this algorithm is sound for all cases we examine here, and that it is complete under bag semantics for workloads of arbitrary conjunctive queries and under bag-set semantics for workloads of conjunctive queries without self-joins. Rada Chirkova’s work on this material has been supported by the National Science Foundation under Grant No. 0307072. The project is co-funded by the European Social Fund (75%) and National Resources (25%)- Operational Program for Educational and Vocational Training II (EPEAEK II) and particularly the program PYTHAGORAS. A preliminary version of this paper appears in F. Afrati, R. Chirkova, M. Gergatsoulis, V. Pavlaki. Designing Views to Efficiently Answer Real SQL Queries. In Proc. of SARA 2005, LNAI Vol. 3607, pages 332-346, Springer-Verlag, 2005.  相似文献   

18.
Approximate query mapping: Accounting for translation closeness   总被引:2,自引:0,他引:2  
In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources support different constraints for formulating queries, and often these constraints cannot be precisely translated. For instance, a query [score>8] might be “perfectly” translated as [rating>0.8] at some site, but can only be approximated as [grade=A] at another. Unlike other work, our general framework adopts a customizable “closeness” metric for the translation that combines both precision and recall. Our results show that for query translation we need to handle interdependencies among both query conjuncts as well as disjuncts. As the basis, we identify the essential requirements of a rule system for users to encode the mappings for atomic semantic units. Our algorithm then translates complex queries by rewriting them in terms of the semantic units. We show that, under practical assumptions, our algorithm generates the best approximate translations with respect to the closeness metric of choice. We also present a case study to show how our technique may be applied in practice. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

19.
In this paper we introduce a general framework for casting fully dynamic transitive closure into the problem of reevaluating polynomials over matrices. With this technique, we improve the best known bounds for fully dynamic transitive closure. In particular, we devise a deterministic algorithm for general directed graphs that achieves O(n 2) amortized time for updates, while preserving unit worst-case cost for queries. In case of deletions only, our algorithm performs updates faster in O(n) amortized time. We observe that fully dynamic transitive closure algorithms with O(1) query time maintain explicitly the transitive closure of the input graph, in order to answer each query with exactly one lookup (on its adjacency matrix). Since an update may change as many as Ω(n 2) entries of this matrix, no better bounds are possible for this class of algorithms. This work has been partially supported by the Sixth Framework Programme of the EU under contract number 507613 (Network of Excellence “EuroNGI: Designing and Engineering of the Next Generation Internet”), and number 001907 (“DELIS: Dynamically Evolving, Large Scale Information Systems”), and by the Italian Ministry of University and Research (Project “ALGO-NEXT: Algorithms for the Next Generation Internet and Web: Methodologies, Design and Experiments”). Portions of this paper have been presented at the 41st Annual Symp. on Foundations of Computer Science, 2000.  相似文献   

20.
The RDF-3X engine for scalable management of RDF data   总被引:1,自引:0,他引:1  
RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号