期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A noise-detection based AdaBoost algorithm for mislabeled data

Jingjing Cao Sam Kwong Ran Wang 《Pattern recognition》2012,45(12):4451-4465

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets. 相似文献

2.

Identifying mislabeled training data with the aid of unlabeled data 总被引：1，自引：0，他引：1

Donghai?Guan Weiwei?Yuan Young-Koo?Lee Email author Sungyoung?Lee 《Applied Intelligence》2011,35(3):345-358

This paper presents a new approach for identifying and eliminating mislabeled training instances for supervised learning algorithms. The novelty of this approach lies in the using of unlabeled instances to aid the detection of mislabeled training instances. This is in contrast with existing methods which rely upon only the labeled training instances. Our approach is straightforward and can be applied to many existing noise detection methods with only marginal modifications on them as required. To assess the benefit of our approach, we choose two popular noise detection methods: majority filtering (MF) and consensus filtering (CF). MFAUD/CFAUD is the new proposed variant of MF/CF which relies on our approach and denotes majority/consensus filtering with the aid of unlabeled data. Empirical study validates the superiority of our approach and shows that MFAUD and CFAUD can significantly improve the performances of MF and CF under different noise ratios and labeled ratios. In addition, the improvement is more remarkable when the noise ratio is greater. 相似文献

3.

一种新型的人脸眼睛定位方法

詹皇源张星明李维钟孝明《计算机工程与设计》2006,27(22):4232-4235

提出了一种在复杂背景、光照、姿势变化条件下的人脸眼睛定位系统。首先采用Adaboost方法提取人脸,并提出了解决平面旋转和深度旋转的方法。接着,采用数学形态学提取人脸特征并用各种规则去过滤特征点。然后采用SVM眼睛确认方法确认眼睛对。最后采用Camshift和Kalman滤波进行跟踪。基于IFACE数据库的实验结果表明我们的算法具有很高的眼睛定位准确率并对光照、姿势、复杂背景不敏感．相似文献

4.

个体行为数据聚类的双重混合高斯模型算法 总被引：1，自引：0，他引：1

戴涛骆科东李春平《计算机应用》2004,24(8):44-46,49

传统的基于概率的混合模型算法可以很好地解决个体行为数据的聚类问题,但是对于具有“多峰值”特征的行为数据则需要更精巧的方法。提出双重混合高斯模型算法(DualMGM)扩展了普通混合模型的概念,解决了多峰值特征的个体行为数据的聚类问题。DualMGM的算法复杂度是随数据量线性增长的,具有很好的可扩展性。相似文献

5.

一种非平衡分布数据的支持向量机新算法 总被引：1，自引：0，他引：1

孙蕾周明全李丙春《计算机应用》2004,24(12):14-15

支持向量机是近几年发展起来的机器学习方法,它是利用接近边界的少数向量来构造一个最优分类面。然而当两类中的样本数量差别悬殊时,支持向量机的分类能力会下降。为了解决此问题,文中提出了一种改进的支持向量机算法——DFP-SVM算法。实验表明,此方法在解决两类样本数量十分不均衡问题时有着很强的分类能力。相似文献

6.

Novel mathematical algorithm for pupillometric data analysis

Matthew C. Canver Adam C. Canver Karen E. Revere Defne Amado Jean Bennett Daniel C. Chung 《Computer methods and programs in biomedicine》2014

Pupillometry is used clinically to evaluate retinal and optic nerve function by measuring pupillary response to light stimuli. We have developed a mathematical algorithm to automate and expedite the analysis of non-filtered, non-calculated pupillometric data obtained from mouse pupillary light reflex recordings, obtained from dynamic pupillary diameter recordings following exposure of varying light intensities. The non-filtered, non-calculated pupillometric data is filtered through a low pass finite impulse response (FIR) filter. Thresholding is used to remove data caused by eye blinking, loss of pupil tracking, and/or head movement. Twelve physiologically relevant parameters were extracted from the collected data: (1) baseline diameter, (2) minimum diameter, (3) response amplitude, (4) re-dilation amplitude, (5) percent of baseline diameter, (6) response time, (7) re-dilation time, (8) average constriction velocity, (9) average re-dilation velocity, (10) maximum constriction velocity, (11) maximum re-dilation velocity, and (12) onset latency. No significant differences were noted between parameters derived from algorithm calculated values and manually derived results (p ≥ 0.05). This mathematical algorithm will expedite endpoint data derivation and eliminate human error in the manual calculation of pupillometric parameters from non-filtered, non-calculated pupillometric values. Subsequently, these values can be used as reference metrics for characterizing the natural history of retinal disease. Furthermore, it will be instrumental in the assessment of functional visual recovery in humans and pre-clinical models of retinal degeneration and optic nerve disease following pharmacological or gene-based therapies. 相似文献

7.

一种新的多尺度边缘跟踪算法及其应用

《计算机应用研究》2015,(12)

相似文献

8.

Novel dynamic load balancing algorithm for cloud-based big data analytics

Aghdashi Arman Mirtaheri Seyedeh Leili 《The Journal of supercomputing》2022,78(3):4131-4156

Big data analytics in cloud environments introduces challenges such as real-time load balancing besides security, privacy, and energy efficiency. This paper proposes a novel load balancing algorithm in cloud environments that performs resource allocation and task scheduling efficiently. The proposed load balancer reduces the execution response time in big data applications performed on clouds. Scheduling, in general, is an NP-hard problem. Our proposed algorithm provides solutions to reduce the search area that leads to reduced complexity of the load balancing. We recommend two mathematical optimization models to perform dynamic resource allocation to virtual machines and task scheduling. The provided solution is based on the hill-climbing algorithm to minimize response time. We evaluate the performance of proposed algorithms in terms of response time, turnaround time, throughput metrics, and request distribution with some of the existing algorithms that show significant improvements.

相似文献

9.

基于密度的异常数据检测算法GSWCLOF

《计算机工程与应用》2016,(19):7-11

为改善有关数据流的异常数据检测方法中存在的检测准确度低和执行效率低等问题,根据数据挖掘技术理论,提出了一种新的基于密度的异常数据检测算法GSWCLOF。该算法引入滑动时间窗口和网格的理念,在滑动时间窗口内利用网格将数据细分,同时利用信息熵对所有网格内的数据进行剪枝和筛选,从而剔除绝大部分正常的数据,最后再利用离群因子对剩下的数据进行最终判断。实验结果表明,该算法有效地提高了检测准确度和执行效率。相似文献

10.

免疫算法在入侵检测数据预处理中的应用

张玉芳陈艳吕佳陈良程平《计算机工程与设计》2006,27(22):4387-4388,F0003

基于聚类的入侵检测方法大都是以距离差异为基础的,而同等重要地依赖所有属性的相似性度量会引起误导。提出利用免疫算法确定网络数据属性的权重值的设计方法。采用二进制编码方式对网络数据的属性进行编码,并设计了抗体和抗原亲和力的评价算法。实验结果显示,该方法确定的权重值在检测入侵方面是可行的、有效的。相似文献

11.

一种新的混合模型运动检测算法 总被引：1，自引：0，他引：1

曹鹏张鹏《计算机工程与应用》2012,48(31):171-174,190

设计了一种新的基于自动阈值分割的混合模型运动检测算法。将单高斯背景模型中求解门限阈值的思想引入,实现了混合模型门限的自适应调整和运动目标的分割。为提高运动检测的准确性,使用相邻三帧差法生成的时域运动前景掩模对检测结果进行修正。实验结果表明,混合模型算法不仅能够清晰准确地获得运动目标的完整信息,而且较好地消除了噪声的干扰。相似文献

12.

入侵检测中基于遗传禁忌搜索的模糊聚类的应用

张永曹东侠《计算机工程与设计》2012,33(2):479-483

传统的模糊C均值聚类(FCM)算法须事先指出聚类数,该算法对孤立点和初始聚类敏感、易陷入局部最优,这些因素都将影响最终聚类结果的质量.针对这些缺陷,采用遗传算法和禁忌搜索的混合策略对FCM进行改进,该策略兼具了这两种算法的优势,改进后的算法自动生成最佳聚类数,优化初始聚类的选择,增强算法的爬山能力,有效改善了算法的性能.将改造前后的两种算法用于网络入侵检测实验,实验结果表明,改造后的算法产生的聚类质量明显优于原算法,用新算法对入侵检测建模,提高了模型的自适应性和实用性. 相似文献

13.

Fast SVM training algorithm with decomposition on very large data sets 总被引：9，自引：0，他引：9

Dong JX Krzyzak A Suen CY 《IEEE transactions on pattern analysis and machine intelligence》2005,27(4):603-618

Training a support vector machine on a data set of huge size with thousands of classes is a challenging problem. This paper proposes an efficient algorithm to solve this problem. The key idea is to introduce a parallel optimization step to quickly remove most of the nonsupport vectors, where block diagonal matrices are used to approximate the original kernel matrix so that the original problem can be split into hundreds of subproblems which can be solved more efficiently. In addition, some effective strategies such as kernel caching and efficient computation of kernel matrix are integrated to speed up the training process. Our analysis of the proposed algorithm shows that its time complexity grows linearly with the number of classes and size of the data set. In the experiments, many appealing properties of the proposed algorithm have been investigated and the results show that the proposed algorithm has a much better scaling capability than Libsvm, SVM/sup light/, and SVMTorch. Moreover, the good generalization performances on several large databases have also been achieved. 相似文献

14.

基于聚类和拟合的QAR数据离群点检测算法

杨慧王丽婧《计算机工程与设计》2015,36(1):174-177

为解决从飞机快速存取记录器(QAR)数据中发现异常数据并预测飞机潜在故障的问题,考虑QAR数据量大、飞行参数数据值相对较为稳定的特点,提出一种适用于QAR数据的离群点检测算法。第一阶段采用K均值聚类对QAR数据流分区进行聚类生成均值参考点;第二阶段采用最小二乘法对生成的均值参考点进行拟合,通过计算均值参考点到拟合飞机参数曲线的距离来判断并找出可能的离群点。实验结果表明,该算法可以准确发现飞机中的故障数据,有效解决部分飞机故障的离群点检测问题。相似文献

15.

Power information network intrusion detection based on data mining algorithm

Zuo Xiaojun Chen Ze Dong Limian Chang Jie Hou Botao 《The Journal of supercomputing》2020,76(7):5521-5539

The Journal of Supercomputing - Intrusion detection technology plays an important role in ensuring information security. This paper briefly describes the intrusion detection technology and its... 相似文献

16.

针对无序数据集异常检测的树突状细胞算法

袁嵩陈启卷《计算机工程与设计》2013,34(3)

为了提高树突状细胞算法对无序数据集的异常检测性能,分析了上下文环境的频繁转换是导致检测精度降低的主要原因,提出了一个“倍增-归并”的树突状细胞算法.先将数据集放大n倍,即每种抗原产生n个实例,对每个实例进行评估,综合每种抗原的n次评估得到最终结果.算法体现了细胞环境决定抗原状态的生物机制,通过倍增营造了相对稳定的环境,通过归并综合了多数正确判断减少了误判的影响.实验结果表明,该算法对无序数据集具有可观的检测精度和稳定的检测性能. 相似文献

17.

天然气井口监控系统移动侦测新算法

张杰郑郁正《计算机工程与设计》2011,32(7):2490-2493

以监控野外无人值守天然气井口设备为目的,设计提出了一种图像移动侦测新算法。根据野外天然气井口监控的特点,在井口监控图像经过灰度处理、平滑滤波、背景差分提取背景函数后,将图像分为兴趣区和非兴趣区使用不同的权值进行综合计算,最后采用双重判决门限进行移动侦测判决。仿真实验结果表明,该算法有很高的正确侦测率并且误侦测率极低,可有效减低天气因素对野外移动侦测结果的影响,同时该算法具有高效率、较低运算量等特点,可以为野外特定设备监控提供实践方法。相似文献

18.

基于舰船振动检测的数据压缩算法的研究

任勇峰李辉景焦新泉《传感器与微系统》2008,27(7)

舰船振动检测会产生大量的数据,由于受到触发方式和存储器容量的限制,必须对大量的数据进行压缩。通过理论分析有损压缩的冗余度和压缩熵,分析了外界噪声与AD的分辨力的关系,并将该算法运用到实际的舰船振动检测中。试验表明:该算法的压缩效果与被测信号变化快慢、信道噪声特性有关,在实际动态数据压缩处理中,压缩效果十分明显。相似文献

19.

基于八叉树精确划分型值点的碰撞检测算法

崔云飞苏凡囤王海涛郑桂凯赵洋刁孝发《微型机与应用》2013,(20):80-82,90

针对虚拟现实中碰撞检测的快速计算问题,提出一种新的粗略碰撞检测与精确碰撞检测相结合的检测算法。首先利用AABB包围盒法排除不可能相交的物体,然后对可能发生碰撞的包围盒采用八叉树算法进行空间分割,在包围盒内找到由型值点形成的三角形面片,利用三角形面片的碰撞检测算法精确地判断物体是否碰撞。通过与OBB包围盒算法的碰撞检测数据对比,验证了该方法的有效性。相似文献

20.

MINAS: multiclass learning algorithm for novelty detection in data streams

Elaine Ribeiro de Faria André Carlos Ponce de Leon Ferreira Carvalho João Gama 《Data mining and knowledge discovery》2016,30(3):640-680

Data stream mining is an emergent research area that aims at extracting knowledge from large amounts of continuously generated data. Novelty detection (ND) is a classification task that assesses if one or a set of examples differ significantly from the previously seen examples. This is an important task for data stream, as new concepts may appear, disappear or evolve over time. Most of the works found in the ND literature presents it as a binary classification task. In several data stream real life problems, ND must be treated as a multiclass task, in which, the known concept is composed by one or more classes and different new classes may appear. This work proposes MINAS, an algorithm for ND in data streams. MINAS deals with ND as a multiclass task. In the initial training phase, MINAS builds a decision model based on a labeled data set. In the online phase, new examples are classified using this model, or marked as unknown. Groups of unknown examples can be used later to create valid novelty patterns (NP), which are added to the current model. The decision model is updated as new data come over the stream in order to reflect changes in the known classes and allow the addition of NP. This work also presents a set of experiments carried out comparing MINAS and the main novelty detection algorithms found in the literature, using artificial and real data sets. The experimental results show the potential of the proposed algorithm. 相似文献