首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to generate different base partitions for cluster ensemble. Similar to ensemble classification, many studies have been focusing on generating different partitions of the original dataset, i.e., clustering on different subsets (e.g., obtained using random sampling) or clustering in different feature spaces (e.g., obtained using random projection). However, little attention has been paid to the diversity and quality of the partitions generated using these two approaches. In this paper, we propose a novel cluster generation method based on random sampling, which uses the nearest neighbor method to fill the category information of the missing samples (abbreviated as RS-NN). We evaluate its performance in comparison with k-means ensemble, a typical random projection method (Random Feature Subset, abbreviated as FS), and another random sampling method (Random Sampling based on Nearest Centroid, abbreviated as RS-NC). Experimental results indicate that the FS method always generates more diverse partitions while RS-NC method generates high-quality partitions. Our proposed method, RS-NN, generates base partitions with a good balance between the quality and the diversity and achieves significant improvement over alternative methods. Furthermore, to introduce more diversity, we propose a dual random sampling method which combines RS-NN and FS methods. The proposed method can achieve higher diversity with good quality on most datasets.  相似文献   

2.
文章提出了一种改进的随机抽样算法,对其时间和空间复杂性进行了分析,结果表明改进的随机抽样算法总体性能优于现有随机抽样算法,最后,给出了改进算法在等距抽样中的应用.  相似文献   

3.
In this work, a new algorithm for drawing a weighted random sample of size m from a population of n weighted items, where m?n, is presented. The algorithm can generate a weighted random sample in one-pass over unknown populations.  相似文献   

4.
为了解决高频段信号由于受到A/D转换器和后续信号处理器件运算速度和成本的限制,提出一种用欠采样获得Nyquist采样的信号处理方法。将信号分成两路进行欠采样,根据两路信号构造一个新信号,得到一个虚拟的Nyquist采样值,进行傅里叶变换,单信号时直接得到所估计信号的频率值;多信号情况下在得到各信号频率估计值的同时,会得到因频率“交叉”引起的虚假频率点,通过对双路傅氏变换的结果进行处理可以消除虚假频率点。仿真实验验证了该方法的有效性。  相似文献   

5.
An algorithm for drawing a random sample of size M from the population of size N(M < N) has been proposed. The algorithm has the time complexity of 0(M log2 M) and the space complexity of 0(M),  相似文献   

6.
在文「1」的基础上,本文对欠采样的问题作进一步研究,给出2种特殊情况的采样频率的选取公式,在任意情况的采样频率的选取的3种方法及几个应用实例。  相似文献   

7.
在“信息爆炸”的当今社会,海量数据对数据挖掘提出新的挑战。在数据挖掘转向云计算平台实现并行化的同时,研究并行化数据随机抽样进一步降低处理的数据规模。提出一种单次扫描即可实现清理脏数据并实现等概率抽样的mapreduce并行抽样算法。在hadoop平台上实现并与普通随机抽样方法进行比较,得出其时间效率非常高,是一种行之有效的方法。为以后数据挖掘中的抽样研究和推动数据挖掘在海量数据下的发展奠定良好基础。  相似文献   

8.
由于缺少数据分布、参数和数据类别标记的先验信息,部分基聚类的正确性无法保证,进而影响聚类融合的性能;而且不同基聚类决策对于聚类融合的贡献程度不同,同等对待基聚类决策,将影响聚类融合结果的提升。为解决此问题,提出了基于随机取样的选择性K-means聚类融合算法(RS-KMCE)。该算法中的随机取样策略可以避免基聚类决策选取陷入局部极小,而且依据多样性和正确性定义的综合评价值,有利于算法快速收敛到较优的基聚类子集,提升融合性能。通过2个仿真数据库和4个UCI数据库的实验结果显示:RS-KMCE的聚类性能优于K-means算法、K-means融合算法(KMCE)以及基于Bagging的选择性K-means聚类融合(BA-KMCE)。  相似文献   

9.
在总结流场可视化方法的基础上,分析流场可视化的关键技术,提出一种基于加权随机采样的流场可视化方法。利用屏幕空间四叉树分割法定义屏幕横纵坐标的种子点选择概率模型,对种子点的随机选择进行加权引导,利用HTML5的Canvas特性实现流线的动态绘制,结合粒子系统实现流线的内存管理,整合一套有效的流场可视化方法。对比实验和三维GIS平台上的整合应用结果表明,该方法在有效展示整体流场的同时能够实现LOD方式的流场细节可视化展示。  相似文献   

10.
In this paper, a new linear delayed delta operator switched system model is proposed to describe networked control systems with packets dropout and network‐induced delays. The plant is a continuous‐time system, which is sampled by time‐varying random sampling periods. A general delta domain Lyapunov stability criterion is given for delta operator switched systems with time delays. Sufficient conditions for asymptotic stability of closed‐loop networked control systems with both packets dropout and network‐induced delays are presented in terms of linear matrix inequalities (LMIs). A verification theorem is given to show the solvability of the stabilization conditions by solving a class of finite LMIs. Both the case of data packets arrive instantly and the case of invariant sampling periods in delta operator systems are given, respectively. Three numerical examples are given to illustrate the effectiveness and potential of the developed techniques. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
针对遗传算法在最大子团求解中保持群体多样性能力不足、早熟、耗时长、成功率低等缺陷,利用随机抽样方法对交叉操作进行重新设计,结合免疫机理定义染色体浓度,设计克隆选择策略,提出了求解最大子团问题的随机抽样免疫遗传算法。用仿真算例说明了新算法在解的质量、收敛速度等各项指标上均有提高,且不比DLS-MC、QUALEX等经典搜索算法差,对某些算例还得到了更好解。  相似文献   

12.
在节点随机分布情况下,利用节点自身及邻居的位置信息,根据逆向检测从节点开始由阿基米德螺旋曲线散开的点的覆盖信息,判断其是否属于冗余节点,同时最小化节点通信半径和感知半径,采用休眠机制节能,实现对监测区域覆盖,并通过仿真实验验证了算法的可行性和高效性。  相似文献   

13.
带权值数据流滑动窗口随机抽样算法的改进   总被引:3,自引:0,他引:3       下载免费PDF全文
通过改进加权抽样算法,结合基本窗口技术,提出了两种面向带权值数据流上连续更新滑动窗口的随机抽样算法:WRSB算法和IWRSB算法。当新的数据元组到达时,根据数据元组的权值计算出该元组的键值,根据元组键值的大小决定其是否进入样本集以及样本集中被替换的数据元组,同时设置一个系统缓冲区来保存最近到达的键值较大的部分数据元组,作为过期数据元组的后备,使算法能够有效地处理过期数据元组问题。理论分析和实验结果表明,两种算法都能有效地处理带权值数据流上连续更新滑动窗口的随机抽样问题,相比较而言,IWRSB算法具有更好的性能。  相似文献   

14.
数据挖掘是大数据服务计算的一个重要方法,对于优化服务计算有重要意义。作为一种典型的数据挖掘方法,随机森林有着较高的正确率,因而得到广泛的应用。为了更加准确高效地处理服务计算中的大数据问题,进一步提升随机森林的正确率和效率,成为一项极其重要的研究。通过改变训练集的样本量和样本抽样方法,对平衡样本集和不平衡样本集进行分析,发现通过上述两个改进后,在优化区间内,平衡样本集泛化误差会减小12%~20%;单项改变抽样方法,可以使算法时间缩短,提升效率达10%~40%;对不平衡数据,也能够明显提升效率。理论和实验均证明,基于综合不放回抽样的随机森林算法改进能够提升平衡样本的正确率,使得该数据挖掘方法更适用于服务计算中的大数据分析和处理。  相似文献   

15.
随机需求随机补货间隔零售商补货控制策略研究   总被引:1,自引:0,他引:1  
张川  潘德惠 《控制与决策》2007,22(7):805-807
研究分销系统中零售商的补货控制策略.分销系统中各零售商可独立决定自己的补到水平.零售商需求率是随机变量.服从某一泊松分布;分销中心循环为各零售商送货.送货间隔是随机变量.认为所有未满足的需求销售机会都丢失,零售商既要支付库存持有费用.又要支付缺货损失费用.给出了收益数学期望值函数,求出了送货间隔是均匀分布随机变量时使收益数学期望值最大化的零售商补到水平控制策略.  相似文献   

16.
This paper addresses the problem of optimal control of constrained linear systems when fast sampling rates are utilised. We show that there exists a well-defined limit as the sampling rate increases. An immediate consequence of this result is the existence of a finite sampling period such that the achieved performance is arbitrarily close to the limiting performance.  相似文献   

17.
在物联网时代,传感器在环境监测等领域得到了越来越广泛的应用,然而其监测性能的进一步提升受到了传感器自身能量、通信及硬件资源的制约.相对固定频率采样的传感器,变频采样的传感器在监测性能上更有优势.针对传感器的应用场景提出了一种策略模型,在此基础上设计了一种应用于节点异常监测状态的频率控制(DisTros)算法,并用MATLAB/Simulink工具进行了仿真分析.DisTros算法的两种子算法可分别应用于监测对象的快速变化及慢速变化两种应用场景.最后的仿真结果表明,该算法在快速变化场景下保证了监测的实时性,在慢速变化场景下保证了监测的密集度.  相似文献   

18.
在讨论周期性数据采样算法弊端的基础上,提出自适应往返时延(RTT)的采样算法。该算法以时延变化率作为动态控制采样频率的主要依据,根据网络时延变化的缓急自动调整采样时间间隔。通过实验分析证明,该算法实现简单,可有效地跟踪网络RTT变化情况,从总体上减少采样工作量,减轻因网络测量而给网络带来的额外负载。  相似文献   

19.
We consider a queue with the arrival process, the service time process and the service rate process as regenerative processes. We provide conditions for its stability, rates of convergence, finiteness of moments and functional limit theorems. This queue can model a queue serving ABR and UBR traffic in an ATM switch; a multiple access channel with TDMA or CDMA protocol and fading; a queue holding best effort or controlled and guaranteed traffic in a router in the integrated service architecture (ISA) of IP-based Internet and a scheduler in the router of a differentiated service architecture. In the process we also provide results for a queue with a leaky bucket controlled bandwidth scheduler. This result is of independent interest. We extend these results to feed-forward networks of queues. We also obtain the results when the arrival rate to the queue can be feedback controlled based on the congestion information in the queue (as in ABR service in the ATM networks or in the real time applications controlled by RTCP protocol in the Internet).  相似文献   

20.
一种基于包速率自适应的报文抽样算法*   总被引:1,自引:0,他引:1  
针对NetFlow抽样概率需手动配置的缺陷,提出了一种基于包速率自适应的分组抽样算法。通过测量包速率,采用预定义测量误差的方法,根据包速率的变化自适应地调整抽样概率,从而在有限资源情况下达到控制测量误差的目的。基于实际互联网数据进行了实验比较,结果显示:与传统的NetFlow算法相比,该方法易于实现,测量误差可控,具有高效性和准确性,同时具有资源节约性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号