首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 796 毫秒
1.
为了提高非平衡数据集的分类精度,提出了一种基于样本空间近邻关系的重采样算法。该方法首先根据数据集中少数类样本的空间近邻关系进行安全级别评估,根据安全级别有指导的采用合成少数类过采样技术(Synthetic minority oversampling technique,SMOTE)进行升采样;然后对多数类样本依据其空间近邻关系计算局部密度,从而对多数类样本密集区域进行降采样处理。通过以上两种手段可以均衡测试数据集,并控制数据规模防止过拟合,实现对两类样本分类的均衡化。采用十折交叉验证的方式产生训练集和测试集,在对训练集重采样之后,以核超限学习机作为分类器进行训练,并在测试集上进行验证。在UCI非平衡数据集和电路故障诊断实测数据上的实验结果表明,所提方法在整体上优于其他重采样算法。   相似文献   

2.
阳建宏  杨德斌  徐科  徐金梧 《钢铁》2005,40(12):37-40
将冷轧带钢表面缺陷图像中的所有像素作为高维空间中的特征向量,利用有监督非线性降维方法对其进行减维后再进行缺陷的分类。该方法解决了冷轧带钢表面缺陷自动分类中的特征提取和特征选择的困难,避免了分类器特征维数过高的问题,并可以用于动态数据的在线识别和聚类。用这种降维方法并结合K近邻分类器与支持向量机对现场采集到的缺陷样本数据集进行试验,结果表明经过降维预处理后,2种分类器的性能都得到了很大的提高。  相似文献   

3.
入侵检测实质上可以被描述为对数据样本进行尽可能正确的分类,关键问题是特征选择和模式识别方法的选择.采用SVM分类器组合的方法对数据样本进行分类,结合协议分析技术,提出了基于协议分析和SVM多分类的入侵检测系统模型,并利用KDD CUP 99数据集对系统模型进行测试.测试结果表明,所提出的方法有效提高了入侵检测的效率,降低了漏报率和误报率.  相似文献   

4.
针对液压系统内泵泄漏诊断的数据集不平衡问题,提出了一种两阶段处理方法,使用变分编码器对少数类样本进行合成,将少数类故障样本补全到和正常样本一致。再使用焦点损失对故障分类模型进行训练,增强分类器对难分类样本的诊断能力。所提出方法经过消融实验验证,能够有效处理不平衡数据集。  相似文献   

5.
基于马氏距离和模糊C均值聚类算法提出了一种数字彩色图像抠图算法.该算法首先对彩色图像像素的红绿蓝三种彩色分量进行正则化处理;然后在正则化图像背景中选取适当的掩膜作为样本集,计算各像素与样本集之间的马氏距离;再利用模糊C均值聚类算法对计算出的马氏距离进行分类;最后利用填洞操作提高抠图质量.对八幅彩色数字图像进行对比实验,结果显示本算法可以自动抠图,且结果优于马氏距离算法、Grow-Cut算法和正则化线性回归算法的相应抠图效果.   相似文献   

6.
岩矿石薄片识别是一项专业性要求极高的任务,人工识别常出现不可避免的主观错误,且效率极低。深度学习图像识别技术是可以高效进行岩矿石薄片识别的方法,但训练深度学习模型需要大量标注数据,因此如何高效利用有限标注数据具有重要意义。通过采用多标签分类方法,在有标签数据集上先训练一个分类器,然后使用该分类器为大量无标注的岩矿石薄片生成伪标签,最后使用有标签的训练数据和所有无标签数据重新训练模型。结果表明,采用多标签分类方法识别岩矿石薄片结构及矿物是可行的,同时使用半监督学习方法训练模型,在不进行大量人工标注的情况下,可提高该模型的泛化能力。  相似文献   

7.
针对高炉炼铁智能控制专家系统中单一支持向量机(SVM)炉温预测模型的改进研究,提出一种基于模糊C均值聚类(FCM)的多支持向量机模型。首先运用模糊C均值聚类对模型训练集进行聚类划分,然后对每一类进行支持向量机的训练,建立相应的子模型,并对测试集中的同一样本点分别进行预测,以测试样本点的输入对应于每一类的隶属度为权值,进行加权求和,最终得到预测值。通过对在线采集的数据分析表明,基于FCM的多支持向量机模型比单一的支持向量机模型在多方面预测性能得到改善,连续预测100炉命中率达86%。  相似文献   

8.
提出了一种相对简单、有效的划分复杂网络社团结构的方法.该算法利用复杂网络的转移矩阵P和K均值聚类算法来划分社团结构,并且用量统计量判定最优的聚类结果,在探测社团结构明显的人工网时具有较高的准确度.  相似文献   

9.
针对经典K–means算法对不均衡数据进行聚类时产生的“均匀效应”问题,提出一种基于近邻的不均衡数据聚类算法(Clustering algorithm for imbalanced data based on nearest neighbor,CABON)。CABON算法首先对数据对象进行初始聚类,通过定义的类别待定集来确定初始聚类结果中类别归属有待进一步核定的数据对象集合;并给出一种类别待定集的动态调整机制,利用近邻思想实现此集合中数据对象所属类别的重新划分,按照从集合边缘到中心的顺序将类别待定集中的数据对象依次归入其最近邻居所在的类别中,得到最终的聚类结果,以避免“均匀效应”对聚类结果的影响。将该算法与K–means、多中心的非平衡K_均值聚类方法(Imbalanced K–means clustering method with multiple centers,MC_IK)和非均匀数据的变异系数聚类算法(Coefficient of variation clustering for non-uniform data,CVCN)在人工数据集和真实数据集上分别进行实验对比,结果表明CABON算法能够有效消减K–means算法对不均衡数据聚类时所产生的“均匀效应”,聚类效果明显优于K–means、MC_IK和CVCN算法。   相似文献   

10.
结构面分组是开展岩体工程稳定性分析的基础,为此,采用谱聚类算法根据岩体结构面产状信息将结构面进行优势组划分。与目前广泛使用的K均值聚类相比,该算法能够收敛到全局最优。选取结构面法向量所夹锐角的正弦值平方作为结构面间的相似度量准则,应用谱聚类算法进行优化求解;同时,引入Silhouette指标对聚类有效性进行评价,以确定最佳分类数目。利用谱聚类方法对人工生成结构面数据进行计算的结果验证了该方法的可靠性。最后,将该算法应用于三山岛金矿岩体结构面的优势组划分,取得了理想的分类效果,为进一步岩体稳定性分析提供了可靠的数据基础。  相似文献   

11.
Machine learning techniques can be used to extract knowledge from data stored in medical databases. In our application, various machine learning algorithms were used to extract diagnostic knowledge which may be used to support the diagnosis of sport injuries. The applied methods include variants of the Assistant algorithm for top-down induction of decision trees, and variants of the Bayesian classifier. The available dataset was insufficient for reliable diagnosis of all sport injuries considered by the system. Consequently, expert-defined diagnostic rules were added and used as pre-classifiers or as generators of additional training instances for diagnoses for which only few training examples were available. Experimental results show that the classification accuracy and the explanation capability of the naive Bayesian classifier with the fuzzy discretization of numerical attributes were superior to other methods and estimated as the most appropriate for practical use.  相似文献   

12.
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

13.
Considering strip steel surface defect samples, a multi-class classification method was proposed based on enhanced least squares twin support vector machines (ELS-TWSVMs) and binary tree. Firstly, pruning region samples center method with adjustable pruning scale was used to prune data samples. This method could reduce classifierr s training time and testing time. Secondly, ELS-TWSVM was proposed to classify the data samples. By introducing error variable contribution parameter and weight parameter, ELS-TWSVM could restrain the impact of noise sam- ples and have better classification accuracy. Finally, multi-class classification algorithms of ELS-TWSVM were pro- posed by combining ELS-TWSVM and complete binary tree. Some experiments were made on two-dimensional data- sets and strip steel surface defect datasets. The experiments showed that the multi-class classification methods of ELS-TWSVM had higher classification speed and accuracy for the datasets with large-scale, unbalanced and noise samples.  相似文献   

14.
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
STUDY DESIGN: Data were collected from 183 subjects who were randomly assigned to the training and test groups. During testing of the classification system, knowledge of the low back pain condition or motion characteristics of the patients in the test group was not made available to the system. OBJECTIVES: To determine specific characteristics of trunk motion associated with different categories of spinal disorders and to determine whether a neural network analysis system can be effective in distinguishing patterns. SUMMARY OF BACKGROUND DATA: Numerous studies have established the difficulty of evaluating lower back pain. Imaging techniques are expensive and ineffective in many cases. A technique for evaluation of lower back pain was developed on the basis of analysis of such dynamic motion features as shape, velocity, and symmetry of movements, using a neural network classification system. METHODS: Dynamic motion data were collected from 183 subjects using a triaxial goniometer. Features of the movement were extracted and provided as input to a two-stage neural network classifier governed by a radial basis function architecture. After training, the output of the classifier was compared with Québec Task Force pain classifications obtained for the patients. Linear and nonlinear classification techniques were compared. RESULTS: The system could determine low back pain classification from motion characteristics. The neural network classifier produced the best results with up to 85% accuracy on novel "validation" data. CONCLUSIONS: A neural network based on kinematic data is an excellent predictive model for classification of lower back pain. Such a system could markedly improve the management of lower back pain in the individual patient.  相似文献   

16.
In many countries, the most widely used method for timing plan selection and implementation is the time-of-day (TOD) method. In TOD mode, a few traffic patterns that exist in the historical volume data are recognized and used to find the signal timing plans needed to achieve optimum performance of the intersections during the day. Traffic engineers usually determine TOD breakpoints by analyzing 1 or 2?days worth of traffic data and relying on their engineering judgment. The current statistical methods, such as hierarchical and K-means clustering methods, determine TOD breakpoints but introduce a large number of transitions. This paper proposes adopting the Z-score of the traffic flow and time variable in the K-means clustering to reduce the number of transitions. The numbers of optimum breakpoints are chosen based on a microscopic simulation model considering a set of performance measures. By using simulation and the K-means algorithm, it was found that five clusters are the optimum for a major arterial in Al-Khobar, Saudi Arabia. As an alternative to the simulation-based approach, a subtractive algorithm-based K-means technique is introduced to determine the optimum number of TODs. Through simulation, it was found that both approaches results in almost the same values of measure of effectiveness (MOE). The proposed two approaches seem promising for similar studies in other regions, and both of them can be extended for different types of roads. The paper also suggests a procedure for considering the cyclic nature of the daily traffic in the clustering effort.  相似文献   

17.
武森  冯小东  杨杰  张晓楠 《工程科学学报》2014,36(10):1411-1419
建立快速有效的针对大规模文本数据的聚类分析方法是当前数据挖掘研究和应用领域中的一个热点问题.为了同时保证聚类效果和提高聚类效率,提出基于"互为最小相似度文本对"搜索的文本聚类算法及分布式并行计算模型.首先利用向量空间模型提出一种文本相似度计算方法;其次,基于"互为最小相似度文本对"搜索选择二分簇中心,提出通过一次划分实现簇质心寻优的二分K-means聚类算法;最后,基于MapReduce框架设计面向云计算应用的大规模文本并行聚类模型.在Hadoop平台上运用真实文本数据的实验表明:提出的聚类算法与原始二分K-means相比,在获得相当聚类效果的同时,具有明显效率优势;并行聚类模型在不同数据规模和计算节点数目上具有良好的扩展性.   相似文献   

18.
A novel method of Bayesian learning with automatic relevance determination prior is presented that provides a powerful approach to problems of classification based on data features, for example, classifying soil liquefaction potential based on soil and seismic shaking parameters, automatically classifying the damage states of a structure after severe loading based on features of its dynamic response, and real-time classification of earthquakes based on seismic signals. After introduction of the theory, the method is illustrated by applying it to an earthquake record dataset from nine earthquakes to build an efficient real-time algorithm for near-source versus far-source classification of incoming seismic ground motion signals. This classification is needed in the development of early warning systems for large earthquakes. It is shown that the proposed methodology is promising since it provides a classifier with higher correct classification rates and better generalization performance than a previous Bayesian learning method with a fixed prior distribution that was applied to the same classification problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号