概念漂移是动态流数据挖掘中一类常见的问题,但混杂噪声或训练样本规模过小而产生的伪概念漂移会引起与真实概念漂移相似的结果,即模型在线测试性能的不稳定波动,导致二者容易混淆,发生概念漂移的误报.针对流数据中真伪概念漂移的混淆问题,提出一种基于在线性能测试的概念漂移检测方法(concept drift detection method based on online performance test,简称CDPT).该方法将最新获得的数据集进行均匀分组,在每组子数据集上分别进行在线学习,同时记录每组子数据集训练测试得到的分类精度向量,并计算相邻学习时间单元之间的精度落差,依据测试精度下降阈值得到有效波动位点.然后采用交叉检验的方式整合不同分组中的有效波动位点,以消除流数据在线学习过程中由于训练样本过小导致模型不稳定造成的检测干扰,根据精度波动一致性得到一致波动位点.最后,通过跟踪在线学习分类准确率,得到一致波动位点邻域参照点的测试精度变化,比较一致波动位点邻域参照点对应的模型测试精度下降幅度及收敛情况,以有效检测一致波动位点当中真实的概念漂移位点.实验结果表明,该方法能够有效辨识流数据在线学习过程中发生的真实概念漂移,并能有效避免训练样本过小或者流数据中噪声对检测结果的负面影响,同时提高模型的泛化性能.  相似文献   

袁泉  郭江帆 《计算机应用》2018,38(6):1591-1595
针对数据流中概念漂移和噪声问题,提出一种新型的增量式学习的数据流集成分类算法。首先,引入噪声过滤机制过滤噪声;然后,引入假设检验方法对概念漂移进行检测,以增量式C4.5决策树为基分类器构建加权集成模型;最后,实现增量式学习实例并随之动态更新分类模型。实验结果表明,该集成分类器对概念漂移的检测精度达到95%~97%,对数据流抗噪性保持在90%以上。该算法分类精度较高,且在检测概念漂移的准确性和抗噪性方面有较好的表现。  相似文献   

针对输电线路小金具缺失的检测问题,对小金具缺失算法的推理加速进行了研究,采用了多任务学习的方法,将小金具缺失检测任务使用一个Swin Transformer [26]主干网络连接和多个MLP任务头的方式进行多任务学习和多任务推理,并进行了单任务学习和多任务学习的精度和性能对比实验,最后还验证了在多任务学习中无缝增加扩展任务,实验结果表明多任务学习的输电线路小金具缺失推理在比单任务学习的推理速度提升了2倍以上,在推理显存占用上降低了22%以上。通过扩展任务实验结果验证了扩展任务的有效性,提高了任务配置的灵活性。  相似文献   

Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system.Although many machine learning methods have been successfully applied to the task,most of them fail to consider two practical yet important issues in software defect detection.First,it is rather difficult to collect a large amount of labeled training data for learning a well-performing model;second,in a software system there are usually much fewer defective modules than defect-free modules,so learning would have to be conducted over an imbalanced data set.In this paper,we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus.This method exploits the abundant unlabeled examples to improve the detection accuracy,as well as employs under-sampling to tackle the class-imbalance problem in the learning process.Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection.Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.  相似文献   

针对现有深度学习方法训练难、检测慢、训练数据难以获取等问题,在图例检测问题上,提出一种新的解决方法。以高效的卷积神经网络为骨干网络,并根据图例宽高比固定、具有个体独立性等特点,使用一种新的SiameseSSD检测框架进行目标检测。该框架包含了用于特征提取的孪生网络结构子网和用于分类和回归的改良SSD子网。同时利用数据增强技术和特殊的图片配对算法训练模型,通过解决单样本问题、调整网络结构和检测方法以检测大分辨率施工图。该方法在施工图数据集上的实验结果表明,该图例检测方法是一种新的解决单样本学习任务的方法,准确率达到91.3%,检测速度达到61帧/s,相比于其他现有的目标检测方式有一定的优势,几乎能够满足实际工程的工作需求。  相似文献   

On-line learning systems which use incoming batches of training examples to induce rules for a classification task, such as credit card fraud detection, may have to deal with concept drift whereby some of the underlying class definitions change over time. Identifying drift against a background of noise and maintaining accuracy of the learned rules are challenging tasks.We propose a methodology for handling these problems based on the assessment of relevance of a time-stamp attribute (TSAR). In place of the time-windowing of examples that tends to be used in current approaches, we employ a new purging mechanism to remove examples that are no longer valid but retain valid examples regardless of age. This allows the example base to grow thus facilitating good classification.We describe one particular TSAR algorithm, CD3, which utilises ID3 with post pruning. We report on trials that show CD3 can cope very well in a variety of batch-drift scenarios.  相似文献   

