首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
特征选择是机器学习和数据挖掘领域中一项重要的数据预处理技术,它旨在最大化分类任务的精度和最小化最优子集特征个数。运用粒子群算法在高维数据集中寻找最优子集面临着陷入局部最优和计算代价昂贵的问题,导致分类精度下降。针对此问题,提出了基于多因子粒子群算法的高维数据特征选择算法。引入了进化多任务的算法框架,提出了一种两任务模型生成的策略,通过任务间的知识迁移加强种群交流,提高种群多样性以改善易陷入局部最优的缺陷;设计了基于稀疏表示的初始化策略,在算法初始阶段设计具有稀疏表示的初始解,降低了种群在趋向最优解集时的计算开销。在6个公开医学高维数据集上的实验结果表明,所提算法能够有效实现分类任务且得到较好的精度。  相似文献   

2.
入侵检测算法广泛应用于网络安全领域,然而现有基于机器学习的入侵检测算法仅输出数据的预测结果标签,缺少对预测结果置信值的评价机制,难以确保预测结果的可靠性。提出一种基于共形预测的高可靠入侵检测算法。将共形预测融入到传统机器学习算法中,得到数据分类标签和对应的置信值、可信度,提高网络数据分类的可靠性。通过对网络数据进行数字化、标准化和降维预处理,根据传统机器学习算法的特点,设计在共形预测框架下与各算法相对应的不一致得分计算公式,并引入平滑因子改进p-value的计算公式,使其能够以更平滑的方式计算预测结果p-value,提高算法的稳定性。实验结果表明,与单独采用SVM、DT和DT-SVM算法相比,该算法在KDD CUP99数据集上分类准确率分别提高11.1、4.6和3.7个百分点,在AWID数据集上分类准确率分别提高4.0、2.5和1.3个百分点,可保证入侵检测结果的高可靠性。  相似文献   

3.
喻飞  赵志勇  魏波 《计算机科学》2016,43(9):269-273
因子分解机(Factorization Machine,FM) 算法是一种基于矩阵分解的机器学习算法,可用于求解回归、分类和排序等问题。FM模型中的参数求解使用的是基于梯度的优化方法,然而在样本较少的情况下,该优化方法收敛速度慢,且易陷入局部最优。差分进化算法(Differential Evolution,DE)是一种启发式的全局优化算法,具有收敛速度快等特性。为提高FM模型的训练速度,利用DE计算FM模型参数,提出了DE-FM算法。在数据集Diabetes、HorseColic以及音乐分类数据集Music上的实验结果表明,改进后的基于差分进化的因子分解机算法DE-FM在训练速度和准确性上均有所提高。  相似文献   

4.
正则化极限学习机(Regularized extreme learning machine,RELM)因其极易于实现、训练速度快等优点在诸多领域均取得了成功应用.对此,本文将RELM引入到入侵检测中,设计了天牛群优化算法(Beetle swarm optimization,BSO),并针对RELM由于随机初始化参数带来的潜在缺陷,提出基于天牛群优化与改进正则化极限学习机(BSO-IRELM)的网络入侵检测算法.使用LU分解求解RELM的输出权值矩阵,进一步缩短了RELM的训练时间,同时利用BSO对RELM的权值和阈值进行联合优化.为避免BSO算法陷入局部最优,引入Tent映射反向学习、莱维飞行的群体学习与动态变异策略提升优化性能.实验结果表明,在机器学习UCI数据集上,相比于RELM、IRELM、GA-IRELM、PSO-IRELM等算法,BSO-IRELM的数据分类性能提升明显.最后,将BSO-IRELM应用于网络入侵检测数据集NSL-KDD,并与BP(Back propagation)、LR(Logistics regression)、RBF(Radial basis function)、AB(AdaBoost)、SVM(Support vector machine)、RELM、IRELM等算法进行了对比,结果证明BSO-IRELM算法在准确率、精确率、真正率和假正率等指标上均具有明显优势.  相似文献   

5.
基于人脸图像的曲线奇异性及高维图像数据带来的计算复杂性.提出一种结合Curvelet变换与LPP的人脸识别方法。首先通过Curvelet变换对人脸图像降维,利用LPP将图像投影到最优子空间中,利用支持向量机进行分类识别,实验结果表明该算法的识别效果优于小波变换结合LPP方法、LPP方法。  相似文献   

6.
A support vector machine (SVM) is a mathematical tool which is based on the structural risk minimization principle. It tries to find a hyperplane in high dimensional feature space to solve some linearly inseparable problems. SVM has been applied within the remote sensing community to multispectral and hyperspectral imagery analysis. However, the standard SVM faces some technical disadvantages. For instance, the solution of an SVM learning problem is scale sensitive, and the process is time‐consuming. A novel Potential SVM (P‐SVM) algorithm is proposed to overcome the shortcomings of standard SVM and it has shown some improvements. In this letter, the P‐SVM algorithm is introduced into multispectral and high‐spatial resolution remotely sensed data classification, and it is applied to ASTER imagery and ADS40 imagery respectively. Experimental results indicate that the P‐SVM is competitive with the standard SVM algorithm in terms of accuracy of classification of remotely sensed data, and the time needed is less.  相似文献   

7.
局部保持投影(LPP)是一种新的数据降维技术,但其本身是一种非监督学习算法,对于分类问题效果不是太好。基于自适应最近邻,结合LPP算法,提出了一种有监督的局部保持投影算法(ANNLPP)。该方法通过修改LPP算法中的权值矩阵,在降维的同时,增加了类别信息,是一种有监督学习算法。通过二维数据可视化和UMIST、ORL 人脸识别实验,表明该方法对于分类问题具有较好的降维效果。  相似文献   

8.
In classification problems, many different active learning techniques are often adopted to find the most informative samples for labeling in order to save human labors. Among them, active learning support vector machine (SVM) is one of the most representative approaches, in which model parameter is usually set as a fixed default value during the whole learning process. Note that model parameter is closely related to the training set. Hence dynamic parameter is desirable to make a satisfactory learning performance. To target this issue, we proposed a novel algorithm, called active learning SVM with regularization path, which can fit the entire solution path of SVM for every value of model parameters. In this algorithm, we first traced the entire solution path of the current classifier to find a series of candidate model parameters, and then used unlabeled samples to select the best model parameter. Besides, in the initial phase of training, we constructed a training sample sets by using an improved K-medoids cluster algorithm. Experimental results conducted from real-world data sets showed the effectiveness of the proposed algorithm for image classification problems.  相似文献   

9.
In this paper, we propose a new tensor-based representation algorithm for image classification. The algorithm is realized by learning the parameter tensor for image tensors. One novelty is that the parameter tensor is learned according to the Tucker tensor decomposition as the multiplication of a core tensor with a group of matrices for each order, which endows that the algorithm preserved the spatial information of image. We further extend the proposed tensor algorithm to a semi-supervised framework, in order to utilize both labeled and unlabeled images. The objective function can be solved by using the alternative optimization method, where at each iteration, we solve the typical ridge regression problem to obtain the closed form solution of the parameter along the corresponding order. Experimental results of gray and color image datasets show that our method outperforms several classification approaches. In particular, we find that our method can implement a high-quality classification performance when only few labeled training samples are provided.  相似文献   

10.
为改进基于局部或全局信息相似性度量方法中存在的无法全面提取网络结构信息的问题,以及基于网络表示学习的方法不能对链接的不存在性进行度量的问题,提出一种结合节点向量化方法与机器学习分类算法的Net2Vec-CLP框架。使用具有重启机制的随机游走方法获得节点环境序列,将源网络信息转换成向量表示,在此基础上生成标签数据集,使用带sigmoid核映射方法的SVM模型进行二分类预测。实验结果表明,算法在Facebook数据集上较Node2Vec方法AUC值提高了2.47%,在其它数据集上也有可观测的优势。同时,结合二分类思想的方法,其能明确度量不存在链接关系的数据。  相似文献   

11.
一种利用关联规则挖掘的多标记分类算法   总被引:2,自引:0,他引:2  
刘军煜  贾修一 《软件学报》2017,28(11):2865-2878
多标记学习广泛存在于现实生活中,是当今机器学习领域的研究热点.在多标记学习框架中,每个对象由一个示例构成,但可能同时属于多个类别标记,并且各个标记之间相互关联,所以挖掘多标记之间的关联性对于多标记学习框架具有重要的意义.首先对经典的关联规则算法进行改进,提出了基于矩阵分治的频繁项集挖掘算法,并证明了该算法挖掘频繁项集的正确性;进而将该算法应用于多标记学习框架中,分别提出了基于全局关联规则挖掘和局部关联规则挖掘的多标记分类算法;最后对所提出的算法与现有多标记算法进行实验对比,结果表明,算法在5种不同的评价准则下能够取得更好的效果.  相似文献   

12.
Frequent itemset mining is one of the data mining techniques applied to discover frequent patterns, used in prediction, association rule mining, classification, etc. Apriori algorithm is an iterative algorithm, which is used to find frequent itemsets from transactional dataset. It scans complete dataset in each iteration to generate the large frequent itemsets of different cardinality, which seems better for small data but not feasible for big data. The MapReduce framework provides the distributed environment to run the Apriori on big transactional data. However, MapReduce is not suitable for iterative process and declines the performance. We introduce a novel algorithm named Hybrid Frequent Itemset Mining (HFIM), which utilizes the vertical layout of dataset to solve the problem of scanning the dataset in each iteration. Vertical dataset carries information to find support of each itemsets. Moreover, we also include some enhancements to reduce number of candidate itemsets. The proposed algorithm is implemented over Spark framework, which incorporates the concept of resilient distributed datasets and performs in-memory processing to optimize the execution time of operation. We compare the performance of HFIM with another Spark-based implementation of Apriori algorithm for various datasets. Experimental results show that the HFIM performs better in terms of execution time and space consumption.  相似文献   

13.
Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.  相似文献   

14.
一种基于多进化神经网络的分类方法   总被引:9,自引:0,他引:9  
商琳  王金根  姚望舒  陈世福 《软件学报》2005,16(9):1577-1583
分类问题是目前数据挖掘和机器学习领域的重要内容.提出了一种基于多进化神经网络的分类方法CABEN(classification approach based on evolutionary neural networks).利用改进的进化策略和Levenberg-Marquardt方法对多个三层前馈神经网络同时进行训练.训练好各个分类模型以后,将待识别数据分别输入,最后根据绝对多数投票法决定最终分类结果.实验结果表明,该方法可以较好地进行数据分类,而且与传统的神经网络方法以及贝叶斯方法和决策树方法相比,在  相似文献   

15.
Existing thermal comfort prediction approaches by machine learning models have been achieving great success based on large datasets in sustainable Industry 4.0 environment. However, the industrial Internet of Things (IoT) environment generates small-scale datasets where each dataset may contain lots of worker’s private data. The latter is challenging the current prediction approaches as small datasets running a large number of iterations can result in overfitting. Moreover, worker’s privacy has been a public concern throughout recent years. Therefore, there must be a trade-off between developing accurate thermal comfort prediction models and worker’s privacy-preserving. To tackle this challenge, we present a privacy-preserving machine learning technique, federated learning (FL), where an FL-based neural network algorithm (Fed-NN) is proposed for thermal comfort prediction. Fed-NN departs from current centralized machine learning approaches where a universal learning model is updated through a secured parameter aggregation process in place of sharing raw data among different industrial IoT environments. Besides, we designed a branch selection protocol to solve the problem of communication overhead in federating learning. Experimental studies on a real dataset reveal the robustness, accuracy, and stability of our algorithm in comparison to other machine learning algorithms while taking privacy into consideration.  相似文献   

16.
具有局部结构保留性质的PCA改进算法   总被引:1,自引:0,他引:1  
保局投影(LPP)是一种局部结构保留算法,它使得每个数据点和它的近邻点在投影空间中尽可能地保持相近.结合LPP的几何思想,本文提出一种具有局部结构保留特性的PCA改进算法——保局PCA(LP-PCA).该算法通过构造数据集的邻接图及其补图,对近邻点和非近邻点采取不同的处理方式.在获得数据集全局结构的同时,可有效保留数据集的局部结构.在模拟数据集和现实数据集上进行实验,实验结果验证该算法的有效性.  相似文献   

17.
陆宇  赵凌云  白斌雯  姜震 《计算机应用》2022,42(12):3750-3755
不平衡分类的相关算法是机器学习领域的研究热点之一,其中的过采样通过重复抽取或者人工合成来增加少数类样本,以实现数据集的再平衡。然而当前的过采样方法大部分是基于原有的样本分布进行的,难以揭示更多的数据集分布特征。为了解决以上问题,首先,提出一种改进的半监督聚类算法来挖掘数据的分布特征;其次,基于半监督聚类的结果,在属于少数类的簇中选择置信度高的无标签数据(伪标签样本)加入原始训练集,这样做除了实现数据集的再平衡外,还可以利用半监督聚类获得的分布特征来辅助不平衡分类;最后,融合半监督聚类和分类的结果来预测最终的类别标签,从而进一步提高算法的不平衡分类性能。选择G-mean和曲线下面积(AUC)作为评价指标,将所提算法与TU、CDSMOTE等7个基于过采样或欠采样的不平衡分类算法在10个公开数据集上进行了对比分析。实验结果表明,与TU、CDSMOTE相比,所提算法在AUC指标上分别平均提高了6.7%和3.9%,在G-mean指标上分别平均提高了7.6%和2.1%,且在两个评价指标上相较于所有对比算法都取得了最高的平均结果。可见所提算法能够有效地提高不平衡分类性能。  相似文献   

18.
An automatic gender recognition algorithm based on machine learning methods is proposed. It consists of two stages: adaptive feature extraction and support vector machine classification. Both training technique of the proposed algorithm and experimental results acquired on a large image dataset are presented.  相似文献   

19.
Hybridization of fuzzy GBML approaches for pattern classification problems   总被引:4,自引:0,他引:4  
We propose a hybrid algorithm of two fuzzy genetics-based machine learning approaches (i.e., Michigan and Pittsburgh) for designing fuzzy rule-based classification systems. First, we examine the search ability of each approach to efficiently find fuzzy rule-based systems with high classification accuracy. It is clearly demonstrated that each approach has its own advantages and disadvantages. Next, we combine these two approaches into a single hybrid algorithm. Our hybrid algorithm is based on the Pittsburgh approach where a set of fuzzy rules is handled as an individual. Genetic operations for generating new fuzzy rules in the Michigan approach are utilized as a kind of heuristic mutation for partially modifying each rule set. Then, we compare our hybrid algorithm with the Michigan and Pittsburgh approaches. Experimental results show that our hybrid algorithm has higher search ability. The necessity of a heuristic specification method of antecedent fuzzy sets is also demonstrated by computational experiments on high-dimensional problems. Finally, we examine the generalization ability of fuzzy rule-based classification systems designed by our hybrid algorithm.  相似文献   

20.
近年来,深度学习算法逐渐尝试应用于目标检测领域。本文针对实际交通场景下的车辆目标,应用深度学习目标分类算法中具有代表性的Faster R-CNN框架,结合ImageNet中的车辆数据集,把场景中的目标检测问题转化为目标的二分类问题,进行车辆目标的检测识别。相比传统机器学习目标检测算法,基于深度学习的目标检测算法在检测准确度和执行效率上优势明显。通过本实验结果分析表明,该方法在识别精度以及速度上均取得了显著的提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号