首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ullah  Ihsan  Youn  Hee Yong 《The Journal of supercomputing》2020,76(12):10009-10035
The Journal of Supercomputing - Wireless sensor network is effective for data aggregation and transmission in IoT environment. Here, the sensor data often contain a significant amount of noises or...  相似文献   

2.
鉴于传统的K-means聚类算法只限于处理数值型数据,将K-means算法扩展到分类型数据域,提出一种分类型数据聚类方法.根据与每个分类属性的每个值相关的数据分布信息,同时结合数据的纵向与横向分布来评价数据对象与类之间的差异性,定义了一种新的距离度量.该方法能发现同一属性不同值间的内在关系,并能有效地度量对象间的差异性.用UCI中的数据集对所提算法进行验证,实验结果表明了该算法具有较好的聚类效果.  相似文献   

3.
为了增强无线传感器网络中数据的融合效率,降低节点的能耗,延长网络的寿命,提出一种基于簇势场的高效数据融合策略(CPF-EDAS)。提出的策略是在簇内引入序列势场和在簇头与sink之间引入混合势场来进行数据融合,且簇头根据其局部信息快速构建路线。实验结果表明CPF-EDAS的性能明显优于DP。该方案不但适用于静态路由,且适合火灾监测等动态环境,具有更好的扩展性,可用于各种大规模的无线传感器网络。  相似文献   

4.
5.
Clustering uncertain data streams has recently become one of the most challenging tasks in data management because of the strict space and time requirements of processing tuples arriving at high speed and the difficulty that arises from handling uncertain data. The prior work on clustering data streams focuses on devising complicated synopsis data structures to summarize data streams into a small number of micro-clusters so that important statistics can be computed conveniently, such as Clustering Feature (CF) (Zhang et al. in Proceedings of ACM SIGMOD, pp 103–114, 1996) for deterministic data and Error-based Clustering Feature (ECF) (Aggarwal and Yu in Proceedings of ICDE, 2008) for uncertain data. However, ECF can only handle attribute-level uncertainty, while existential uncertainty, the other kind of uncertainty, has not been addressed yet. In this paper, we propose a novel data structure, Uncertain Feature (UF), to summarize data streams with both kinds of uncertainties: UF is space-efficient, has additive and subtractive properties, and can compute complicated statistics easily. Our first attempt aims at enhancing the previous streaming approaches to handle the sliding-window model by using UF instead of old synopses, inclusive of CluStream (Aggarwal et al. in Proceedings of VLDB, 2003) and UMicro (Aggarwal and Yu in Proceedings of ICDE, 2008). We show that such methods cannot achieve high efficiency. Our second attempt aims at devising a novel algorithm, cluUS , to handle the sliding-window model by using UF structure. Detailed analysis and thorough experimental reports on synthetic and real data sets confirm the advantages of our proposed method.  相似文献   

6.
熊丽琼  郭帆  余敏 《计算机应用》2008,28(4):896-898
提出了一种基于遗传聚类算法对入侵检测系统(IDS)报警进行聚合的方法。将报警间属性的相异程度转换到值域区间[0.0,1.0]上,两报警间的相异程度用一个相异度矩阵表示;利用遗传算法的自适应优化特性选取较优的聚类中心,根据报警间的相异度矩阵将相似的报警进行聚类;在此基础上,分别对每一类中的报警采用凝聚层次的聚合方法进行聚合。实验结果证明,该方法能够有效地减少重复报警。  相似文献   

7.
Fingerprint identification has been a great challenge due to its complex search of database. This paper proposes an efficient fingerprint search algorithm based on database clustering, which narrows down the search space of fine matching. Fingerprint is non-uniformly partitioned by a circular tessellation to compute a multi-scale orientation field as the main search feature. The average ridge distance is employed as an auxiliary feature. A modified K-means clustering technique is proposed to partition the orientation feature space into clusters. Based on the database clustering, a hierarchical query processing is proposed to facilitate an efficient fingerprint search, which not only greatly speeds up the search process but also improves the retrieval accuracy. The experimental results show the effectiveness and superiority of the proposed fingerprint search algorithm.  相似文献   

8.
针对微阵列基因表达数据聚类的高维复杂性,提出了一种基于密度的并行聚类算法,在APRAM模型的分布式存储系统中,通过欧几里德距离矩阵和密度函数两次时间复杂度为O(■)的计算,可使聚类过程的时间复杂度为O(■),以增加一次计算的代价来降低聚类过程的时间复杂度。基于8结点的机群计算实验表明:本算法能够达到较同类算法更高的并行加速比,提高高维生物数据的聚类速度。  相似文献   

9.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

10.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

11.
Neural Computing and Applications - Traditional subspace clustering methods [such as sparse subspace clustering (SSC), least squares representation (LSR) and smooth representation clustering]...  相似文献   

12.
基于分簇的无线传感器网络数据聚合方案研究   总被引:1,自引:0,他引:1  
张强  卢潇  崔晓臣 《传感技术学报》2010,23(12):1778-1782
数据聚合技术是目前无线传感器网络中的研究热点,同时也是一种重要的节能技术之一。在基于分簇网络拓扑结构的基础上,提出了一种新的数据聚合方案。分别对簇内成员节点和簇头节点进行数据聚合处理,簇内节点引入相对信息熵减少数据量的发送,而簇头节点维持一个反馈比较值,当接收到簇内成员节点发送的数据或得到自身传感器模块的数据时,该值可以用来判断是否转发接收到的数据。通过与LEACH协议的仿真对比实验,结果表明新方案能有效减少网络中的数据包传送数目,降低节点能耗,并显著地延长了网络寿命。  相似文献   

13.
为了解决无线传感器网络中的不确定数据,提出了一种无线传感器网络不确定数据高效处理算法.根据不确定性数据的概率密度分布进行概率聚类,并利用Hilbert编码技术将多维数据映射到一维数据空间,通过基于Hilbert-R树索引的不确定性数据HPDBSCAN算法对不确定性数据进行聚类.实验结果表明,HPDBSCAN算法预处理效果较好,比其它聚类算法更适合不确定性数据的聚类.  相似文献   

14.
15.
Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that more than 70% of energy is consumed in data transmission in WSN. Since most of the time, the sensed information is redundant due to geographically collocated sensors, most of this energy can be saved through data aggregation. Furthermore, data aggregation improves bandwidth usage and reduces collisions due to interference. Unfortunately, while aggregation eliminates redundancy, it makes data integrity verification more complicated since the received data is unique.  相似文献   

16.
基于分布式的大数据集聚类分析   总被引:1,自引:0,他引:1  
为了提高聚类效率提出了一种基于分布式的大数据集聚类算法。该方法并不是一次性对所有的数据进行聚类,而是将大数据集随机分成若干个子集,对每个子集同时进行聚类,最后进行类的合并。实验结果表明大多数情况下该方法和传统的一次性聚类的结果一致,而且极大地提高了聚类的速度。  相似文献   

17.
重复记录的清除是数据清洗领域的核心问题,但如何实施有效的清除一直是研究的难点。提出了一种通过建立聚类反馈模式规约来验证重复记录的有效性方法。依据经过聚类后各个类别间的关联性关系分析,首先提出了聚类模式和反馈模式的概念和实现方法;然后给出了数据清洗中聚类反馈模式规约;最后应用项目案例验证了它的有效性。  相似文献   

18.
基于数据预处理的并行分层聚类算法*   总被引:3,自引:0,他引:3  
分层聚类技术在图像处理、入侵检测和生物信息学等方面有着极为重要的应用,是数据挖掘领域的研究热点之一。针对目前基于SIMD模型的并行分层聚类算法处理海量数据时效果不理想的问题,提出一种基于数据预处理的自适应并行分层聚类算法,在O((λn)2/p)的时间内对n个输入数据点进行聚类。其中1≤p≤n/log n,0.1≤λ≤0.3。将提出的算法与现有文献结论进行的性能对比分析表明,本算法明显改进了现有文献的研究结果。  相似文献   

19.
针对轨迹聚类算法在相似性度量中多以空间特征为度量标准,缺少对时间特征的度量,提出了一种基于时空模式的轨迹数据聚类算法。该算法以划分再聚类框架为基础,首先利用曲线边缘检测方法提取轨迹特征点;然后根据轨迹特征点对轨迹进行子轨迹段划分;最后根据子轨迹段间时空相似性,采用基于密度的聚类算法进行聚类。实验结果表明,使用所提算法提取的轨迹特征点在保证特征点具有较好简约性的前提下较为准确地描述了轨迹结构,同时基于时空特征的相似性度量因同时兼顾了轨迹的空间与时间特征,得到了更好的聚类结果。  相似文献   

20.
为提高匿名化后数据的可用性,给出了一种加权确定惩罚模型作为数据有用性的度量方法,提出了两种基于局部聚类的数据匿名化算法。通过真实数据实验评估,该算法能够很好地降低实现匿名保护时概化处理所带来的信息损失。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号