首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Skyline计算是多准则决策,数据挖掘和数据库可视化的重要操作。移动对象在运动过程中,由于位置信息的不确定,导致局部各数据点间的支配关系不稳定,从而影响全局概率Skyline集合。针对分布式环境下不确定移动对象的连续概率Skyline查询更新进行研究,提出了一种降低通信开销的连续概率Skyline查询的有效算法CDPS-UMO,该算法在局部节点中对局部概率Skyline点的变化进行跟踪;提出了有效的排序方法和反馈机制,大大降低了通信开销和计算代价;提出一种基本算法naive,与CDPS-UMO进行了对比实验,实验结果证明了算法的有效性。  相似文献   

2.
频繁项查询在网络监控、网络入侵检测、关联规则挖掘等方面是一项非常重要的技术.该技术在静态的不确定数据中已经得到了深入的研究.但随着数据流特征和不确定性表现的日益明显,在不确定数据流环境下的查询已经成为一项新的研究课题.因此基于数据流普遍采用的滑动窗口模型,提出了一种高效的概率Top-K频繁项查询算法sTopK-UFI.该算法避免了每次窗口更新都重新计算查询答案,而是利用现有的计算结果进行增量更新,从而减少查询代价.另外,该算法基于窗口中的现有数据对未来可能成为频繁项的元素进行预测,并利用泊松分布计算元素成为频繁项的概率上下界,提出相应的过滤策略,可以显著减少检测数据的数量,提高查询效率.实验结果表明,所提出算法可以有效地减少候选集、降低搜索空间、改善在不确定数据流上的查询性能.  相似文献   

3.
为了解决连续不确定XML高效的top-k查询,提出CProTJFast算法.该算法基于P-文档模型,扩展PEDewey(probabilistic extended Dewey)编码支持连续分布类型节点的编码,采用路径概率下限值进行节点过滤,并针对连续概率密度函数制定过滤策略,从而在计算连续节点概率之前过滤掉不参与结果的节点.实验结果表明,采用连续节点过滤策略的CProTJFast算法有效地提高了连续不确定XML的top-k查询效率.  相似文献   

4.
针对现有方法无法有效处理不确定数据的障碍k聚集最近邻查询问题的不足,提出了基于不确定Voronoi图的概率障碍k聚集最近邻查询(probabilistic obstacle k aggregate nearest neighbor query,POk ANN)方法。该方法分为3个阶段,分别是查询点集处理阶段、过滤阶段和精炼阶段。在处理阶段,计算查询点集的最小覆盖圆圆心q,为剪枝做准备。过滤阶段针对3种聚集函数设计了不同的过滤算法,去除不可能成为结果的数据点进而得到候选集合。精炼阶段将候选集合中概率值大于给定阈值的k个数据点集合存入结果集合并返回给用户。理论研究和实验表明,所提出的方法在概率障碍k聚集最近邻查询方面有明显的优势。  相似文献   

5.
真实世界中,常存在很多障碍物,影响空间对象到查询点的可见性及距离,可见k近邻查询查找距查询点最近的k个可见对象,是时空查询领域的一类重要算法.由于度量设备误差以及通信开销的限制等因素,空间对象位置不确定因素广泛存在.文中拟对不确定对象执行可见k近邻查询,提出了概率可见k近邻(PVkNN)查询,即查找前k个成为查询点最近邻居概率最大的节点.为了高效地执行这一查询,文中提出了k-界限剪枝方法,基于可见质心的紧缩过滤以及对不可见对象的剪枝策略,从空间角度过滤掉不符合条件的对象.为避免对候选集合中每个对象的概率都进行精确计算,从概率角度提出了根据概率上下限来对候选集合进行进一步的求精方法,采用近似采样技术来获取可见区域的比例,实现了对PVkNN的高效计算.采用真实和模拟数据集设计实验,充分验证了算法的效率和精度.  相似文献   

6.
《计算机工程》2017,(2):79-84
现有Top-k查询算法主要运用在集中式关系型数据库中,当应用于分布式网络时会产生巨大的通信开销,导致算法效率低下。为此,提出一种改进的Top-k查询算法,利用预处理索引表对分布式网络中无关数据进行裁剪,在此基础上建立包含正确Top-k结果的候选子集并实现Top-k查询。实验结果表明,与Fagin和Naive Top-k查询算法相比,改进算法获得的查询结果更准确,运行时间更短,网络开销更小。  相似文献   

7.
不确定数据库中的概率阈值top-k查询是计算元组排在前k位的概率和,返回概率和不小于p的元组,但现有的查询语义没有将x-tuple内的元组进行整体处理.针对该情况,定义一种新的查询语义——概率阈值x-top-k查询,并给出查询处理算法.在该查询语义下采用动态规划方法求取x-tuple内每个元组排在前k位的概率和,对其进行聚集后做概率阈值top-k查询,并利用观察法、最大上限值等剪枝方法进行优化.实验结果表明,该算法平均扫描全体数据集中60%的数据即可返回正确结果集,证明其查询处理效率较高.  相似文献   

8.
孙永佼  袁野  王国仁 《计算机学报》2011,34(11):2155-2164
分布式环境中的top-k查询已经有了广泛的研究.由于仪器不精确和网络延时等原因,大多数分布式数据都存在不确定性.文中基于水平分布在P2P网络中的不确定数据提出了一个有效的top-k查询处理方法.首先利用Quad-tree构建一个分布式的不确定数据的索引,并基于索引提出了一个空间剪枝算法.然后,根据局部top-k概率与全...  相似文献   

9.
分析面向大数据平台的MapReduce分布式编程技术以及实现数据查询时的连接算法,针对SSB数据模型,提出基于分布式缓存的多表星型连接优化技术.利用谓词向量技术,将维表中间连接的数据依赖转化为表上的位图索引过滤,减少数据依赖产生的巨大网络开销;采用分布式缓存技术充分利用处理节点的内存,优化网络传输,减少查询代价.  相似文献   

10.
孙平平  刘方爱 《微机发展》2011,(10):70-72,76
不确定数据普遍存在于大量应用之中,如在传感器网络、P2P系统、移动计算及RFID(Radio Frequency IDentification)等,研究者已经提出了多种针对不确定数据库的数据模型,其核心思想都源自于可能世界模型。针对可能世界模型能够演化出数量远大于不确定数据库规模的可能世界实例,文中提出一种减小可能世界的RPW—kBest算法,此算法利用概率和评定条件进行筛选,尽可能将不影响查询结果的数据抛弃,使之在最小的搜索空间内完成查询处婵过程,以降低存储开销。实验结果表明,此算法能正确的得到查询结果并显著提高查淘效率和降低内存使用。  相似文献   

11.
Although top-k queries over uncertain data in centralized databases have been studied widely in recent years, it is still a challenging issue in distributed environments. In distributed environments, such as Peer-to-Peer (P2P) systems and sensor networks, there exists an inherent uncertainty on the data objects due to imprecise measurements and network delays. Therefore, it is necessary to study the problem of how to efficiently retrieve top-k uncertain data objects over distributed environments with minimum network overhead. In this paper, we propose a novel approach of processing uncertain top-k queries in large-scale P2P networks, where datasets are horizontally partitioned over peers. In our approach, each peer constructs an Uncertain Quad-Tree (UQ-Tree) index for its local uncertain data, while the P2P network constructs a global index by summarizing the local indexes. Based on the global index, we propose a spatial-pruning algorithm to reduce communication costs and a distributed-pruning algorithm to reduce computation costs. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time.  相似文献   

12.
Approximate Distributed K-Means Clustering over a Peer-to-Peer Network   总被引:4,自引:0,他引:4  
Data intensive Peer-to-Peer (P2P) networks are finding increasing number of applications. Data mining in such P2P environments is a natural extension. However, common monolithic data mining architectures do not fit well in such environments since they typically require centralizing the distributed data which is usually not practical in a large P2P network. Distributed data mining algorithms that avoid large-scale synchronization or data centralization offer an alternate choice. This paper considers the distributed K-means clustering problem where the data and computing resources are distributed over a large P2P network. It offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm. The first is designed to operate in a dynamic P2P network that can produce clusterings by “local” synchronization only. The second algorithm uses uniformly sampled peers and provides analytical guarantees regarding the accuracy of clustering on a P2P network. Empirical results show that both the algorithms demonstrate good performance compared to their centralized counterparts at the modest communication cost.  相似文献   

13.
Distributed classification in large-scale P2P networks has gained relevance in recent years and support applications like distributed intrusion detection in P2P monitoring environments, online match-making, personalized information retrieval, distributed document classification in a P2P media repository and P2P recommender systems to mention a few. However, classification in a P2P network is a challenging task due to the constraints such as centralization of data is not feasible, scarce communication bandwidth, scalability, synchronization and peer dynamism. Moreover, without considering data distributions and topological scenarios of real world P2P systems, most of the existing distributed classification approaches lack in their predictive and network-cost performance. In this paper, we investigate a collaborative classification method (TRedSVM) based on Support Vector Machines (SVM) in Scale-free P2P networks. In particular, we demonstrate how to construct SVM classifier in real world P2P networks which exhibit inherently skewed distribution of node links and eventually data. The proposed method propagates the most influential instances of SVM models to the vast majority of scarcely connected peers in a controlled way that improves their local classification accuracy and, at the same time, keeps the communication cost low throughout the network. Besides using benchmark Machine Learning data sets for extensive experimental evaluations, we have evaluated the proposed method particularly for music genre classification to exhibit its performance in a real application scenario. Additionally, performance analysis is carried out with respect to centralized approaches, data replication in P2P networks and cost accuracy trade-off. TRedSVM outperforms baseline approaches of model propagation by improving the overall classification performance substantially at the cost of a tolerable increase in communication.  相似文献   

14.
In comparison to centralized database systems, distributed database systems have certain advantages depending on the manner in which data are redundantly distributed. These advantages are improvement in response time, better data availability, reduction in transmission cost, etc.  相似文献   

15.
基于数据挖掘的数据可视化是将大数据量展示给用户的一种有效手段.在EFCS-Grid中,基于特定属性的k-平均聚类分析算法进行聚类分析,之后将聚类结果展示给用户.本文通过实验测试并分析了多用户下的采用服务器进行聚类分析的时间代价以及EFCS-Grid系统在不同压力情况下的数据处理的总时间代价,得出了聚类分析在系统的数据处理过程中占重要比重,并随着数据量和并发用户数的增加,系统的性能急剧下降.为此,本文结合P2P体系结构,提出了采用分布式聚类分析数据的处理策略,并将数据处理分为数据合成层和数据分析层.由数据合成层实现数据的整合,保证合成后的数据满足用户的模式需求,之后,在相同模式的基础上实现数据的一次聚类分析和二次聚类分析,达到了通过利用P2P的分布计算能力,缓解集中处理瓶颈和提高网格内数据处理的效率的目的.  相似文献   

16.
In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.  相似文献   

17.
很多两层C/S结构的信息系统都采用了集中式数据库为系统提供数据存储支持,经过长时间地运行,所积累的海量数据使查询速度和网络通信性能大大降低。本文通过应用面向方面的软件开发学,提出了一种改进遗产系统的有效方法。该方法使用分布式数据库替代原有集中式数据库,在不修改遗产信息系统代码的情况下,改善了遗产系统的性能。文中通过实例对该方法进行了说明。  相似文献   

18.
Distributed parameter networked control systems mean distributed parameter systems are controlled through a network, where the control loops are closed. In this paper, the problem of guaranteed cost and state feedback controller design is investigated for a class of distributed parameter networked control systems. With the network factors, such as transmission delays, data packet dropouts considered, the distributed parameter networked control system is modeled as a linear closed‐loop system with time‐varying delay and uncertain parameters. By selecting an appropriate Lyapunov‐Krasovskii function and using linear matrix inequality (LMI) approach, the controller is designed to render the system stable and it can keep the cost function less than a certain upper value. In addition, numerical simulation is included to demonstrate the theoretical results.  相似文献   

19.
目前RFID复杂事件处理技术的研究主要针对集中式的处理。集中式RFID复杂事件处理技术对于海量RFID数据的处理具有很多局限性,主要表现为网络通讯代价高和处理效率低。针对集中式RFID复杂事件处理存在的问题,本文研究了分布式环境下RFID复杂事件处理的关键算法,采用一种Pull(抽取)类型的数据通讯模型来降低通讯代价,在此基础上提出了两种分布式的RFID复杂事件处理算法。实验结果表明,本文提出的分布式RFID复杂事件处理算法比集中式复杂事件处理算法更有效。  相似文献   

20.
Collaborative applications are characterized by high levels of data sharing. Optimistic replication has been suggested as a mechanism to enable highly concurrent access to the shared data, whilst providing full application-defined consistency guarantees. Nowadays, there are a growing number of emerging cooperative applications adequate for Peer-to-Peer (P2P) networks. However, to enable the deployment of such applications in P2P networks, it is required a mechanism to deal with their high data sharing in dynamic, scalable and available way. Previous work on optimistic replication has mainly concentrated on centralized systems. Centralized approaches are inappropriate for a P2P setting due to their limited availability and vulnerability to failures and partitions from the network. In this paper, we focus on the design of a reconciliation algorithm designed to be deployed in large scale cooperative applications, such as P2P Wiki. The main contribution of this paper is a distributed reconciliation algorithm designed for P2P networks (P2P-reconciler). Other important contributions are: a basic cost model for computing communication costs in a DHT overlay network; a strategy for computing the cost of each reconciliation step taking into account the cost model; and an algorithm that dynamically selects the best nodes for each reconciliation step. Furthermore, since P2P networks are built independently of the underlying topology, which may cause high latencies and large overheads degrading performance, we also propose a topology-aware variant of our P2P-reconciler algorithm and show the important gains on using it. Our P2P-reconciler solution enables high levels of concurrency thanks to semantic reconciliation and yields high availability, excellent scalability, with acceptable performance and limited overhead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号