在分析了传统支持向量机(SVM)对不平衡数据的学习缺陷后,提出了一种改进SVM算法,采用自适应合成(ADASYN)采样技术对数据集进行部分重采样,增加少类样本的数量;对不同的样本点分配不同的权重,减弱噪声对训练结果的影响;使用基于代价敏感的SVM算法训练,缓解不平衡数据对超平面造成的偏移.选择UCI数据库中的6组不平衡数据集进行测试,实验结果表明:在各个数据集上改进SVM算法的性能优于其他算法,并在少类准确率和多类准确率上取得了很好的平衡.  相似文献   

如何对生产环境中经代码混淆的结构化数据集的敏感属性(字段)进行自动化识别、分类分级,已成为对结构化数据隐私保护的瓶颈。提出一种面向结构化数据集的敏感属性自动化识别与分级算法,利用信息熵定义了属性敏感度,通过对敏感度聚类和属性间关联规则挖掘,将任意结构化数据集的敏感属性进行识别和敏感度量化;通过对敏感属性簇中属性间的互信息相关性和关联规则分析,对敏感属性进行分组并量化其平均敏感度,实现敏感属性的分类分级。实验表明,该算法可识别、分类、分级任意结构化数据集的敏感属性,效率和精确率更高;对比分析表明,该算法可同时实现敏感属性的识别与分级,无须预知属性特征、敏感特征字典,兼顾了属性间的相关性和关联关系。  相似文献   

基于改进SMOTE的非平衡数据集分类研究   总被引:1,自引:0,他引:1  
针对SMOTE(Synthetic Minority Over-sampling Technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法(SSMOTE)。该算法的关键是将支持度概念和轮盘赌选择技术引入到SMOTE中,并充分利用了异类近邻的分布信息,实现了对少数类样本合成质量和数量的精细控制。将SSMOTE与KNN(K-Nearest Neighbor)算法结合来处理不平衡数据集的分类问题。通过在UCI数据集上与其他重要文献中的相关算法进行的大量对比实验表明,SSMOTE在新样本的整体合成效果上表现出色,有效提高了KNN在非平衡数据集上的分类性能。  相似文献   

目的 场景分类是遥感领域一项重要的研究课题,但大都面向高分辨率遥感影像。高分辨率影像光谱信息少,故场景鉴别能力受限。而高光谱影像包含更丰富的光谱信息,具有强大的地物鉴别能力,但目前仍缺少针对场景级图像分类的高光谱数据集。为了给高光谱场景理解提供数据支撑,本文构建了面向场景分类的高光谱遥感图像数据集(hyperspectral remote sensing dataset for scene classification,HSRS-SC)。方法 HSRS-SC来自黑河生态水文遥感试验航空数据,是目前已知最大的高光谱场景分类数据集,经由定标系数校正、大气校正等处理形成。HSRS-SC分为5个类别,共1 385幅图像,且空间分辨率较高(1 m),波长范围广(380~1 050 nm),同时蕴含地物丰富的空间和光谱信息。结果 为提供基准结果,使用AlexNet、VGGNet-16、GoogLeNet在3种方案下组织实验。方案1仅利用可见光波段提取场景特征。方案2和方案3分别以加和、级联的形式融合可见光与近红外波段信息。结果表明有效利用高光谱影像不同波段信息有利于提高分类性能,最高分类精度达到93.20%。为进一步探索高光谱场景的优势,开展了图像全谱段场景分类实验。在两种训练样本下,高光谱场景相比RGB图像均取得较高的精度优势。结论 HSRS-SC可以反映详实的地物信息,能够为场景语义理解提供良好的数据支持。本文仅利用可见光和近红外部分波段信息,高光谱场景丰富的光谱信息尚未得到充分挖掘。后续可在HSRS-SC开展高光谱场景特征学习及分类研究。  相似文献   

We present a model for simulating casualties in virtual environments for real-time medical training. It allows a user to choose diagnostic and therapeutic actions to carry out on a simulated casualty who will manifest appropriate physiological, behavioral, and physical responses. Currently, the user or a "stealth instructor" can specify one or more injuries that the casualty has sustained. The model responds by continuously determining the state of the casualty, responding appropriately to medical assessment and treatment procedures. So far, we have modeled four medical conditions and over 20 procedures. The model has been designed to handle the addition of other injuries and medical procedures.  相似文献   

The field of dataset shift has received a growing amount of interest in the last few years. The fact that most real-world applications have to cope with some form of shift makes its study highly relevant. The literature on the topic is mostly scattered, and different authors use different names to refer to the same concepts, or use the same name for different concepts. With this work, we attempt to present a unifying framework through the review and comparison of some of the most important works in the literature.  相似文献   

SVM在处理不平衡数据分类问题(class imbalance problem)时,其分类结果常倾向于多数类。为此,综合考虑类间不平衡和类内不平衡,提出一种基于聚类权重的分阶段支持向量机(WSVM)。预处理时,采用K均值算法得到多数类中各样本的权重。分类时,第一阶段根据权重选出多数类内各簇边界区域的与少数类数目相等的样本;第二阶段对选取的样本和少数类样本进行初始分类;第三阶段用多数类中未选取的样本对初始分类器进行优化调整,当满足停止条件时,得到最终分类器。通过对UCI数据集的大量实验表明,WSVM在少数类样本的识别率和分类器的整体性能上都优于传统分类算法。  相似文献   

受级联结构的启示,提出了一种针对不平衡数据集分类的新方法,基于级联结构的Bagging分类方法。该方法通过在每一级剔除一部分多数类样本的方式使数据集逐步趋于平衡,并应用欠取样技术得到训练集,用Bagging算法训练分类器,最后把每一级训练到的分类器集成为一个新的分类器。在10个UCI数据集上的实验结果表明,该方法在查全率和F-value值上优于Bagging和AdaBoost。  相似文献   

In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.  相似文献   

目的随着监控摄像头的增多,基于多摄像头的智能分析显得很重要。基于此,提出一种新的基于特征变换和数据集分块的行人比对方法。方法首先提出一种新的基于变换矩阵来消除特征差异的算法。这种算法能够在高维空间中,通过变换矩阵,让某特征向量逼近另一特征向量,从而消除了同一行人特征间的差异。与此同时,还提出一种新的将行人数据集中特征分块的算法,使每个分块中的行人特征具有相似的性质,从而能够共用某个变换矩阵,从而能更好地消除同一行人在不同镜头下的特征差异。结果基于VIPeR数据集和复杂街道场景数据集设计了行人比对实验。实验结果表明,本文的比对方法具有较高的比对准确率,VIPeR数据集(50%训练,50%检测)的Rank1的比对准确率为22%。同时本文设计了特征变换和数据集分块这2个模块的对照实验。实验结果表明,特征变换和数据集分块模块对结果都有提升的效果。结论本文新的行人比对方法通过恰当的特征变换让同一行人在多镜头下的特征互相接近。实验结果表明该方法能够较好地在多镜头下匹配行人。  相似文献   

Object classification in video is an important factor for improving the reliability of various automatic applications in video surveillance systems, as well as a fundamental feature for advanced applications, such as scene understanding. Despite extensive research, existing methods exhibit relatively moderate classification accuracy when tested on a large variety of real-world scenarios, or do not obey the real-time constraints of video surveillance systems. Moreover, their performance is further degraded in multi-class classification problems. We explore multi-class object classification for real-time video surveillance systems and propose an approach for classifying objects in both low and high resolution images (human height varies from a few to tens of pixels) in varied real-world scenarios. Firstly, we present several features that jointly leverage the distinction between various classes. Secondly, we provide a feature-selection procedure based on entropy gain, which screens out superfluous features. Experiments, using various classification techniques, were performed on a large and varied database consisting of ∼29,000 object instances extracted from 140 different real-world indoor and outdoor, near-field and far-field scenes having various camera viewpoints, which capture a large variety of object appearances under real-world environmental conditions. The insight raised from the experiments is threefold: the efficiency of our feature set in discriminating between classes, the performance improvement when using the feature selection method, and the high classification accuracy obtained on our real-time system on both DSP (TMS320C6415-6E3, 600 MHz) and PC (Quad Core Intel® Xeon® E5310, 2 × 4 MB Cache, 1.60 GHz, 1066 MHz) platforms.  相似文献   

Touchless interaction has received considerable attention in recent years with benefit of removing barriers of physical contact. Several approaches are available to achieve mid-air interactions. However, most of these techniques cause discomfort when the interaction method is not direct manipulation. In this paper, gestures based on unimanual and bimanual interactions with different tools for exploring CT volume dataset are designed to perform the similar tasks in realistic applications. Focus + context approach based on GPU volume ray casting by trapezoid-shaped transfer function is used for visualization and the level-of-detail technique is adopted for accelerating interactive rendering. Comparing the effectiveness and intuitiveness of interaction approach with others by experiments, ours has a better performance and superiority with less completion time. Moreover, the bimanual interaction with more advantages is timesaving when performing continuous exploration task.  相似文献   

Computational Visual Media - In this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for...  相似文献   

针对传统的SVM算法在非平衡数据分类中分类效果不理想的问题,提出一种基于分类超平面和SMOTE过采样方法(HB_SMOTE)。该方法首先对原始训练样本集使用WSVM算法找到分类超平面,然后按一定标准剔除负类中被错分的样本、靠近分类超平面的样本以及远离分类超平面的样本。在UCI数据集上的实验结果表明:与RU_SMOTE等重采样方法相比,HB_SMOTE方法对正类样本和负类样本都具有较高的分类准确率。  相似文献   

A method for recursive training of neural networks for classification is proposed. It searches for the discriminant functions corresponding to several small local minima of the error function. The novelty of the proposed method lies in the transformation of the data into new training data with a deflated minimum of the error function and iteration to obtain the next solution. A simulation study and a character recognition application indicate that the proposed method has the potential to escape from local minima and to direct the local optimizer to new solutions.  相似文献   

Surface defect recognition is important to improve the surface quality of end products. In this area, there were many convolutional neural network (CNN)-based methods because CNN can extract features automatically. The extracted features determine the performance of recognition, so it is important for CNN-based methods to extract effective and sufficient features. However, feature extraction needs a large-scale dataset, which is hard to obtain. To save the cost of collecting samples and extract effective features, ensemble methods were proposed to make full use of the features extracted by CNN in order to guarantee good performance with limited samples. However, the methods are confined to utilize one sample – they extracted multi-level features from one individual sample – but ignore the vast information in a dataset. Due to the limit information in one sample, this paper turns the attention to the training dataset and attempts to mine the multi-level information in the dataset for predicting. The proposed method is named as Prototype vectors fusion-based CNN (ProtoCNN), which utilizes the prototype information in the training dataset. In training process, it trains a VGG11 as the base model, and meanwhile prototype vectors corresponding to each defect class are generated in multiple feature layers of VGG11. Then, in predicting process, the prototype vectors are fused to predict unknown samples. The experiments on three famous datasets, including NEU-CLS, wood dataset, and textile dataset indicate that the proposed ProtoCNN outperforms conventional ensemble models and other models for surface defect recognition. In these datasets, ProtoCNN has achieved the accuracy of 99.86%, 90.01%, and 81.28% respectively, which increase 1.05%, 4.07%, 19.53% compared to its base model respectively. Finally, this paper analyzes the effectiveness and practicality of prototype vectors, showing that the proposed ProtoCNN is practical for real world application.  相似文献   

对支持向量机的大规模训练问题进行了深入研究,提出一种类似SMO的块增量算法.该算法利用increase和decrease两个过程依次对每个输入数据块进行学习,避免了传统支持向量机学习算法在大规模数据集情况下急剧增大的计算开销.理论分析表明新算法能够收敛到近似最优解.基于KDD数据集的实验结果表明,该算法能够获得接近线性的训练速率,且泛化性能和支持向量数目与LIBSVM方法的结果接近.  相似文献   

通过协同求解多个概念漂移问题并充分挖掘相关概念漂移问题中蕴含的有效信息,共享矢量链支持向量机(shared vector chain supported vector machines,SVC-SVM)在面向多任务概念漂移分类时表现出良好性能。然而实际应用中的概念漂移问题通常有较大的数据容量,较高的计算代价限制了SVC-SVM方法的推广能力。针对这个弱点,借鉴核心向量机的近线性时间复杂度的优势,提出了适于多任务概念漂移大规模数据的共享矢量链核心向量机(shared vector chain core vector machines,SVC-CVM)。SVC-CVM具有渐近线性时间复杂度的算法特点,同时又继承了SVC-SVM方法协同求解多个概念漂移问题带来的良好性能,实验验证了该方法在多任务概念漂移大规模数据集上的有效性和快速性。  相似文献   

Multimedia Tools and Applications - The new challenge in image processing is in processing submarine coral reef images. The coral reef disease classification from such submarine coral reef images...  相似文献   

Minimum classification error training for online handwriting recognition   总被引:1,自引:0,他引:1  
This paper describes an application of the minimum classification error (MCE) criterion to the problem of recognizing online unconstrained-style characters and words. We describe an HMM-based, character and word-level MCE training aimed at minimizing the character or word error rate while enabling flexibility in writing style through the use of multiple allographs per character. Experiments on a writer-independent character recognition task covering alpha-numerical characters and keyboard symbols show that the MCE criterion achieves more than 30 percent character error rate reduction compared to the baseline maximum likelihood-based system. Word recognition results, on vocabularies of 5k to 10k, show that MCE training achieves around 17 percent word error rate reduction when compared to the baseline maximum likelihood system.  相似文献   

