首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
top-k查询主要用来从海量的数据中返回用户最为偏好的k个对象.目前已经有大量的研究工作致力于top-k查询中的性能研究,近年来针对top-k查询结果进行解释的研究逐渐得到了广泛的关注.在top-k查询中,由于用户不能精确地指定自己的偏好,因此针对top-k查询的结果用户可能产生这样的质疑:"既然连对象p都出现在top-k结果中,为什么我期望的对象m块没有出现在top-k结果/"针对用户这样的疑问,提出了一种基于用户反馈的top-k查询修改算法,该算法首先定义了用来衡量初始化top-k查询变化的评估模型函数,基于该评估模型函数,使用抽样方法得到候选权重集合,针对每一个候选权重通过渐进式top-k算法来得到新的最优化查询.最后在模拟数据上验证了提出算法的效率.  相似文献   

2.
在许多应用中,Skyline查询是一种十分重要的查询类型,它在潜在的巨大的数据空间中返回不被其他元组支配的用户感兴趣的元组,但是Skyline查询无法控制返回结果的数量。处理一个新的top-k Skyline查询问题,该查询返回支配分数最大的k个Skyline元组,从而控制了需要向用户返回的查询结果数量。分析发现,大多数现有算法忽略了利用支配分数作为限制Skyline查询的结果数量的度量。提出一个新的基于表扫描的RSTS(ranked Skyline with table scan)算法来有效计算海量数据上的top-k Skyline结果。RSTS算法首先对表执行预排序操作,保证预排序表的元组按照对有序列表的round-robin扫描的顺序排列。RSTS算法包括两个阶段。阶段1利用对预排序表的顺序扫描来获得候选元组。阶段2计算候选元组的支配分数并返回结果。可以证明,RSTS算法具有早结束特性,并给出其扫描深度的理论分析。提出对于候选元组的剪切操作,理论剪切效果表明,绝大多数的Skyline结果可以直接丢弃。实验结果表明,RSTS算法可以有效计算top-k Skyline结果。  相似文献   

3.
在分布式数据流场景中,如何动态维护top-k集合并尽可能地降低通信开销是非常重要的.通常的做法是:把大量的数据从分布式节点传送到中央节点,然后在中央节点计算top-k集合.这样的通信开销非常大,在许多场合下是根本无法实现的.提出了一种高效地动态维护分布式环境下top-k集合的近似算法top-k'.在算法中对一个top-k查询,通过动态维护k'(K<,max>≥k'≥k)个最高积分的元组,可以从中选取积分最高的k个元组返回.实验表明top-k'显著降低了各节点与中央协调节点之间的通信代价.  相似文献   

4.
不确定数据库中的概率阈值top-k查询是计算元组排在前k位的概率和,返回概率和不小于p的元组,但现有的查询语义没有将x-tuple内的元组进行整体处理.针对该情况,定义一种新的查询语义——概率阈值x-top-k查询,并给出查询处理算法.在该查询语义下采用动态规划方法求取x-tuple内每个元组排在前k位的概率和,对其进行聚集后做概率阈值top-k查询,并利用观察法、最大上限值等剪枝方法进行优化.实验结果表明,该算法平均扫描全体数据集中60%的数据即可返回正确结果集,证明其查询处理效率较高.  相似文献   

5.
为了解决Web数据库多查询结果的问题,该文提出了一种基于上下文偏好的查询结果top-k排序方法,首先提出了一种带偏好程度的上下文偏好模型:i_1i_2,d|X,表示在上下文条件X下,项i_1与i_2相比,用户偏好项i_1的程度为d(0.5≤d≤1),带偏好程度的上下文偏好通过在查询历史中使用关联规则挖掘获得.基于上下文偏好,提出了一种查询结果top-k排序方法,给出了相应的元组排列创建、聚类和top-k排序算法.实验结果表明,提出的上下文偏好模型具有较强的偏好表达能力,top-k排序方法能够较好地满足用户需求和偏好并且具有较高的执行效率.  相似文献   

6.
k代表轮廓查询是从传统轮廓查询中衍生出来的一类查询.给定多维数据集合D,轮廓查询从D中找到所有不被其他对象支配的对象,将其返回给用户,便于用户结合自身偏好选择高质量对象.然而,轮廓对象规模通常较大,用户需要从大量数据中进行选择,导致选择速度和质量无法得到保证.与传统轮廓查询相比,k代表轮廓查询从所有轮廓对象中选择“代表性”最强的k个对象返回给用户,有效地解决了传统轮廓查询存在的这一问题.给定滑动窗口W和连续查询q,q监听窗口中的数据.当窗口滑动时,查询q返回窗口中,组合支配面积最大的k个对象.现有算法的核心思想是:实时监测当前窗口中的轮廓对象集合,当轮廓对象集合更新时,算法更新k代表轮廓.然而,实时监测窗口中,轮廓集合的计算代价通常较大.此外,当轮廓集合规模较大时,从中选择k代表轮廓的计算代价是同样巨大的,导致已有算法无法在高速流环境下使用.针对上述问题,提出了ρ-近似k代表轮廓查询.为了支持该查询,提出了查询处理框架PAKRS(predict-basedapproximatekrepresentativeskyline).首先,PAKRS利用高速流的特性对当前窗口进行划分,根据划分结...  相似文献   

7.
曹立新  高宏 《计算机学报》2011,34(10):1926-1935
top-k join查询返回用户最感兴趣的k个连接结果.近来top-k join已经成为一个重要的研究课题,且在Web数据库、信息抽取和数据挖掘中均有应用.星型模式的数据仓库在实际应用中也存在top-k join查询,如有时决策者只想查询星型连接结果中他最感兴趣的k个.然而,现有top-k join算法不适合星型模式....  相似文献   

8.
网格索引构造简单,常用于数据流系统计算top-k和skyline。但是,网格索引结构粗略,查询过程可能访问大量非top-k结点。为了提高网格索引计算top-k查询的精确度,本文提出基于数据点逆支配点集性质的网格索引方法,将查询访问集缩小到网格索引的"k-最大运算区域区域k-MCA"中,有效地减少了网格索引存储量和查询计算开销。同时,给出了k-MCA索引结构及适应于数据流计算的k-MCA维护更新算法。理论分析和实验结果均验证了上述方法的有效性。  相似文献   

9.
俞闽敏  陈宁江 《计算机科学》2012,39(6):151-154,174
已有的不确定数据top-k查询语义只返回在可能世界中聚集概率最大的一个应答,并不能很好地满足用户差异化的查询需求。针对这个问题,通过引入反映查询需求的指标"需求扩展度",定义了基于需求扩展的不确定数据查询语义RU-Topk,并且提出了在新语义下的查询算法。实验表明,RU-Topk算法具有较小的平均单位查询运行时间,且在满足用户需求的情况下,具备更高的查询效率。  相似文献   

10.
基于关系数据库的关键词查询,使得用户在不需要掌握结构化查询语言和数据库模式的情况下,可以方便地进行关系数据库查询.给定一个关键词查询,已有的方法通过数据库中的主外键关联,查询得到包含关键词的元组集合.但是,在很多实际应用中,元组集合的聚合结果对用户更有价值;研究了基于关系数据库的top-k聚合关键词查询,提出了基于递归的聚合单元枚举算法——基于递归的完全搜索(recursion-based full search,RFS).为了获得更好的查询性能,设计了新的排序方法、二维索引和快速搜索算法——基于输出的快速搜索(output-based quick search,OQS),从而可以高效地枚举top-k个聚合单元;在不同的数据集上进行了大量的实验,实验结果表明OQS算法具有良好的查询性能.  相似文献   

11.
Preference query processing is important for a wide range of applications involving distributed databases, such as network monitoring, web-based systems, and market analysis. In such applications, data objects are generated frequently and massively, which presents an important and challenging problem of continuous query processing over distributed data stream environments. A top-k dominating query, which has been receiving much research attention recently, returns the k data objects that dominate the highest number of data objects in a given dataset, and due to its dominance-based ranking function, we can easily obtain superior data objects. An emerging requirement in distributed stream environments is an efficient technique for continuously monitoring top-k dominating data objects. Despite of this fact, no study has addressed this problem. In this paper, therefore, we address the problem of continuous top-k dominating query processing over distributed data stream environments. We present two algorithms that monitor the exact top-k dominating data and efficiently eliminate unqualified data objects for the result, which reduces both communication and computation costs. In addition to these algorithms, we present an approximate algorithm that further reduces both communication and computation costs. Extensive experiments on both synthetic and real data have demonstrated the efficiency and scalability of our algorithms.  相似文献   

12.
目前大多数P2P系统只提供文件的共享,缺乏数据管理能力.基于关系数据库上的关键搜索,本文提出了一种在P2P环境下共享数据库的新框架,其中每个节点上的数据库被看成是一个文档集,用户不用考虑数据库的模式结构信念,简化了不同节点数据库模式间的映射过程,能更好地适应P2P的分散和动态特性.将基于直方图的分层Top-k查询算法扩展到P2P环境下的数据库管理系统上,文档集和数据库的查询被统一起来,一致对待.在查询处理期间,直方图可以自动更新,同时根据查询结果,邻居节点可以自调整,具有自适应性.实验结果表明,基于关键词的数据库共享突破了传统的数据库共享模式,简化了数据访问方式,而基于直方图的Top-k查询算法提高了查询效率.  相似文献   

13.
The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems  相似文献   

14.
Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods.  相似文献   

15.
纯Peer to Peer环境下有效的Top-k查询   总被引:21,自引:2,他引:19  
何盈捷  王珊  杜小勇 《软件学报》2005,16(4):540-552
目前大多数的Peer-to-Peer(P2P)系统只支持基于文件标识的搜索,用户不能根据文件的内容进行搜索.Top-k查询被广泛地应用于搜索引擎中,获得了巨大的成功.可是,由于P2P系统是一个动态的、分散的系统,在纯的P2P环境下进行top-k查询是具有挑战性的.提出了一种基于直方图的分层top-k查询算法.首先,采用层次化的方法实现分布式的top-k查询,将结果的合并和排序分散到P2P网络中的各个节点上,充分利用了网络中的资源.其次,根据节点返回的结果为节点构建直方图,利用直方图估计节点可能的分数上限,对节点进行选择,提高了查询效率.实验证明,top-k查询提高了查询效果,而直方图则提高了查询效率.  相似文献   

16.
P2P环境中不确定数据Top-k查询处理算法   总被引:1,自引:0,他引:1  
近年来随着P2P技术的日益发展,P2P环境中的Top-k查询处理技术也越来越成熟.但是,自从不确定数据在数据库的各个领域受到广泛重视,这就引发了学术界和工业界对研发新型的不确定性数据管理技术的兴趣.所以在P2P环境中对不确定数据进行Top-k查询处理就成为了一个新的挑战.主要研究P2P环境下的不确定数据Top-k查询处理技术.首先给出了在不确定数据集上的Top-k查询的定义;然后,以Chord拓扑为例阐述了在P2P环境中对不确定数据的Top-k查询处理算法,并且在保序散列的基础上提出了基于upper-bound的剪枝策略及其改进的路由剪枝策略;最后,通过大量的实验来验证了所提出算法的性能.  相似文献   

17.
The Group Nearest Neighbor (GNN) search is an important approach for expert and intelligent systems, i.e., Geographic Information System (GIS) and Decision Support System (DSS). However, traditional GNN search starts from users’ perspective and selects the locations or objects that users like. Such applications fail to help the managers since they do not provide managerial insights. In this paper, we focus on solving the problem from the managers’ perspective. In particular, we propose a novel GNN query, namely, the reverse top-k group nearest neighbor (RkGNN) query which returns k groups of data objects so that each group has the query object q as their group nearest neighbor (GNN). This query is an important tool for decision support, e.g., location-based service, product data analysis, trip planning, and disaster management because it provides data analysts an intuitive way for finding significant groups of data objects with respect to q. Despite their importance, this kind of queries has not received adequate attention from the research community and it is a challenging task to efficiently answer the RkGNN queries. To this end, we first formalize the reverse top-k group nearest neighbor query in both monochromatic and bichromatic cases, and then propose effective pruning methods, i.e., sorting and threshold pruning, MBR property pruning, and window pruning, to reduce the search space during the RkGNN query processing. Furthermore, we improve the performance by employing the reuse heap technique. As an extension to the RkGNN query, we also study an interesting variant of the RkGNN query, namely a constrained reverse top-k group nearest neighbor (CRkGN) query. Extensive experiments using synthetic and real datasets demonstrate the efficiency and effectiveness of our approaches.  相似文献   

18.
In mobile ad hoc peer to peer (M-P2P) networks, since nodes are highly resource constrained, it is effective to retrieve data items using a top-k query, in which data items are ordered by the score of a particular attribute and the query-issuing node acquires data items with the k highest scores. However, when network partitioning occurs, the query-issuing node cannot connect to some nodes having data items included in the top-k query result, and thus, the accuracy of the query result decreases. To solve this problem, data replication is a promising approach. However, if each node sends back its own data items (replicas) responding to a query without considering replicas held by others, same data items are sent back to the query-issuing node more than once through long paths, which results in increase of traffic. In this paper, we propose a top-k query processing method considering data replication in M-P2P networks. This method suppresses duplicate transmissions of same data items through long paths. Moreover, an intermediate node stops transmitting a query message on-demand.  相似文献   

19.
关系数据库上的关键字检索和不确定数据处理过去一直是两个独立的研究方向。研究了运用关键字方法检索不确定数据的问题,定义了不确定关键字查询的基本模型和语义,提出了一种在属性级粒度的不确定数据库上进行top-k关键字检索的算法。该算法根据用户指定的k值,计算并返回分数最高的前k个结果,其查询结果的评价函数综合考虑了结果与关键字的相关度和结果在可能世界语义下的概率大小。对算法进行了优化,显著降低了计算复杂度。最后通过实验,证明了算法的高效性和实用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号