首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 193 毫秒
1.
当前,各种各样的恶意软件常使用域名生成算法(Domain Generation Algorithms,DGAs)来生成大量的随机域名,然后尝试与C&C服务器建立通信,发动相应的攻击。现有的检测方法基于DGA域名的随机性构建人工特征,利用机器学习方法学习分类模式,但该类算法存在人工构建特征费时费力、检测误报率高等问题;或利用LSTM,GRU等深度学习技术学习DGA域名的序列关系,但该类算法对低随机性的DGA域名的检测准确率较低。文中提出了一种域名通用特征的提取方案,建立了包含41种DGA域名家族的数据集,并设计了基于人工特征与深度特征的检测算法,提高了模型的泛化能力,增加了对DGA域名家族的识别种类。实验结果表明,基于人工特征与深度特征的DGA域名检测算法取得了比传统深度学习方法更高的准确率和更好的泛化能力。  相似文献   

2.
为挖掘数据的非独立同分布关系并解决传统KNN算法中存在的分类结果不准确的问题,提出一种非独立同分布下数值型数据的KNN改进算法.利用Pearson相关系数公式得出耦合相似度矩阵,通过该耦合相似度矩阵计算样本的类隶属度,通过Relief F算法思想进行特征权重的计算,根据训练样本的类隶属度和特征权重更新类别决策规则,确定待分类样本的类别.对多个UCI数据集的验证结果表明,该算法能够有效提高分类准确率.  相似文献   

3.
谢娟英  吴肇中 《软件学报》2022,33(4):1338-1353
针对基于信息增益与皮尔森相关系数的特征选择算法FSIP(feature selection based on information gain and Pearson correlation coefficient)存在的特征子集选取需要人工参与的问题,提出基于可辨识矩阵的完全自适应2D特征选择算法DFSIP(disc...  相似文献   

4.
为了准确诊断风电系统故障类别,基于改进加权k近邻的粒子群优化算法(PWKNN)提出一种新的诊断方法。PWKNN通过调整权重来反映特征的重要性,并利用距离判断策略计算出多类标分类的相同概率。采用粒子群优化算法(PSO)优化了PWKNN的权值和参数k,利用特征提取训练分类器,结合特征选择的Pearson相关系数来消除无关特征,从而减少分类器的输出时间。对300W风力发电机的四种分类状态进行测试,与传统分类器的比较表明,PWKNN具有更高的分类精度。特征选择可以将平均特征数量从16个减少到2.8个,输出时间可以减少61%。  相似文献   

5.
基于机器学习的中文微博情感分类实证研究   总被引:3,自引:0,他引:3  
使用三种机器学习算法、三种特征选取算法以及三种特征项权重计算方法对微博进行了情感分类的实证研究。实验结果表明,针对不同的特征权重计算方法,支持向量机(SVM)和贝叶斯分类算法(Nave Bayes)各有优势,信息增益(IG)特征选取方法相比于其他的方法效果明显要好。综合考虑三种因素,采用SVM和IG,以及TF-IDF(Term Frequency-Inverse Document Frequency)作为特征项权重,三者结合对微博的情感分类效果最好。针对电影领域,比较了微博评论和普通评论之间分类模型的通用性,实验结果表明情感分类性能依赖于评论的风格。  相似文献   

6.
当前僵尸网络大量采用DGA算法躲避检测,针对主流的基于人工规则的检测算法无法对最新产生的DGA域名进行识别检测和基于机器学习的检测算法缺乏演化的训练数据的问题,提出了一种基于Ascall编码方式定义域名编、解码器,并结合生成对抗网络构造域名字符生成器来预测生成DGA变体样本的方法。实验结果表明,在采用生成数据进行分类器训练和性能评估中,此方法生成的DGA域名变体样本可充当真实DGA样本,验证了生成数据的有效性并可用于DGA域名检测器的训练评估。  相似文献   

7.
随着互联网金融和电子支付业务的高速增长,由此引发的个人信用问题也呈现与日俱增的态势.个人信用预测本质上是不平衡的序列二分类问题,这类问题的数据样本规模大、维度高、数据分布极不平衡.为了高效区分申请者的信用情况,本文提出一种基于特征优化和集成学习的个人信用预测方法 (PL-SmoteBoost).该方法在Boosting集成框架下构建个人信用预测模型,首先利用Pearson相关系数对数据进行初始化分析,剔除冗余数据;通过Lasso选取部分特征来减少数据维度,降低高维风险;通过SMOTE过采样方法对降维数据的少数类进行线性插值,以解决类不平衡问题;最后为了验证算法有效性,以常用的处理二分类问题的算法作为对比方法,采用从Kaggle和微软开放数据库下载的高纬度不平衡数据集对算法进行测试,以AUC作为算法的评价指标,利用统计检验手段对实验结果进行分析.结果表明,相对于其他算法,本文提出的PL-SmoteBoost算法具有显著优势.  相似文献   

8.
李政仪  冯贵玉  赵龙 《计算机应用》2012,32(9):2588-2591
尺度不变特征变换(SIFT)算法提取的人脸特征具有一定的鲁棒性,但存在数据维数过高和计算过于复杂的问题。为此,提出一种基于直接局部保持投影-尺度不变特征变换(DLPP-SIFT)的人脸识别算法。首先采用SIFT算法进行特征提取,然后结合子空间方法局部保持投影(LPP)进行降维,利用直接对角化方法求取特征矩阵,解决了LPP的奇异值问题。在ORL和FERET人脸库的实验结果表明,DLPP-SIFT算法可显著减少计算复杂度和特征匹配时间,与SIFT、主成分分析(PCA)-SIFT、LPP-SIFT相比,具有更好的鲁棒性。  相似文献   

9.
针对现有大数据分类过程中特征选择算法精度较低,影响后续数据分类算法精度的问题,提出基于惯性权重正交反向学习(OOL)-萤火虫算法(FA)的大数据特征选择算法。借助FA的全局搜索能力,以及OOL分别在收敛速度、收敛精度方面的改进能力,实现数据特征的快速、精确选择,采用结构感知卷积神经网络对大数据特征进行精确分类。在包含6600万个样本和2000个属性的大数据集上进行实验,实验结果表明,所提算法在分类准确率上具有明显的优势。  相似文献   

10.
油中溶解气体分析(DGA)方法是一种典型的充油电力设备故障诊断方法,广泛应用于电力变压器故障检测与状态评估,但由于样本数据的可靠性和诊断模型的有效性影响,导致DGA诊断方法准确率较低。文中提出了一种Box-plot-SA-BP模型,首先,采用Box-plot数据检测法去除异常数据以解决数据质量的问题,然后,利用自注意力机制(Self-attention, SA)准确捕捉多参量样本数据间的联系,提取更加稳定可靠的特征,最后设计BP网络多分类模型实现变压器故障诊断。对比实验证明了Box-plot-SA-BP模型的良好性能,具有较高的应用价值。  相似文献   

11.
The data of dissolved gas in oil analysis (DGA) is uncertain affected by the influence of transformer capacity and fault location, which makes transformer fault diagnosis model based on DGA has low accuracy. Therefore, we propose a hybrid feature selection method based on fuzzy information entropy, whereby optimizing the reasonable DGA feature parameter according to the feature information between the parameter and fault type, to reduce the influence of DGA data uncertainty on the fault diagnosis accuracy. Firstly, the characteristic relevance and redundancy functions are constructed based on fuzzy information entropy theory. Secondly, these functions are taken as the optimization objectives of binary-chaotic multi-objective particle swarm optimization algorithm(B-CMOPSO), to search for the feature subsets in the feature space composed of 46 DGA feature parameters. Then, the optimal feature subset is selected based on the simulation accuracy of ELM, SVM, Adaboost.M1 and BPNN on the feature subsets. Finally, 30 simulation experiments are carried out to compare with several multi-objective optimization algorithms, common Filter methods and common DGA feature combinations, and the rationality of the proposed method is verified by the t-test method. The results show that the 4 classifiers accuracy means is maximatily improved by 18.95%, 20.77%, 19.85% and 21.27% respectively compared with common DGA feature combinations, indicating that the optimal feature subset preserves more feature information and can effectively reduce the influence of DGA data uncertainty on diagnostic accuracy.  相似文献   

12.
林梦雷  刘景华  王晨曦  林耀进 《计算机科学》2017,44(10):289-295, 317
在多标记学习中,特征选择是解决多标记数据高维性的有效手段。每个标记对样本的可分性程度不同,这可能会为多标记学习提供一定的信息。基于这一假设,提出了一种基于标记权重的多标记特征选择算法。该算法首先利用样本在整个特征空间的分类间隔对标记进行加权,然后将特征在整个标记集合下对样本的可区分性作为特征权重,以此衡量特征对标记集合的重要性。最后,根据特征权重对特征进行降序排列,从而得到一组新的特征排序。在6个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记特征选择算法。  相似文献   

13.
The technique of machinery fault diagnosis has been greatly enhanced over recent years with the application of many pattern classification methods. However, these classification methods suffer from the “curse of dimensionality” when applied to high-dimensional fault diagnosis data. In order to solve the problem, this paper proposes a hybrid model which combines multiple feature selection models to select the most significant input features from all potentially relevant features. Among the models, eight filter models are used to pre-rank the candidate features. They include data variance, Pearson correlation coefficient, the Relief algorithm, Fisher score, class separability, chi-squared, information gain and gain ratio. These variable ranking models measure features from various perspectives, and lead to different ranking results. Based on the effect of the ranking results on the Radial Basis Function (RBF) classification, a weighted voting scheme is then introduced to re-rank features. Furthermore, two wrapper models, a Binary Search (BS) model and a Sequential Backward Search (SBS) model are utilized to minimize the number of relevant features. To demonstrate the potential for applying the method to machinery fault diagnosis, two case studies are discussed. The experiment results support the conclusion that this method is useful for revealing fault-related frequency features.  相似文献   

14.
轴承是机械设备主要零部件之一,也是机械设备主要故障零部件之一。轴承故障问题为机械设备的重点,机械设备的使用受到故障轴承的直接影响。针对传统的卷积神经网络算法轴承故障诊断效率低下问题,本文提出了一种基于信号特征提取和卷积神经网络的优化方法。首先对原始数据信号进行时域和频域的信号特征提取,获得有效的故障特征值。之后,使用卷积神经网络对提取的特征值进行故障诊断,完成故障分类。本文使用美国凯斯西储大学的滚动轴承振动加速度信号作为数据集,对提出的方法进行验证,得到的故障诊断平均准确率为74.37%,准确率的方差为0.0001;传统的卷积神经网络算法故障诊断平均准确率为65.6%;准确率的方差为0.0019。实验结果表明,相比传统的卷积神经网络,提出的方法对轴承故障诊断的准确率有显著的提高,并且该方法的稳定性更佳,计算时间更少,综合性能更佳。  相似文献   

15.
特征权重学习是基于特征赋权的K近邻算法需要解决的重要问题之一,传统上提出了许多启发式的学习方法。近年来,随着进化计算技术在模式识别及数据挖掘领域的广泛应用,基于进化计算的权重学习和距离学习方法也得到越来越多的重视。本研究针对基于特征赋权的K近邻算法的权重学习问题,提出了一种基于PSO进行权重学习的算法PSOKNN,通过与传统KNN、GAKNN及ReliefKNN的实验比较分析表明,该方法可有效地搜索出合适的特征权重,获得较好的分类精度并淘汰冗余或无关的特征。  相似文献   

16.
翁楦乔  文成林 《控制工程》2022,29(1):175-181
针对传统方法难以利用大量时序数据和无标签数据对电网进行故障诊断的问题,提出了基于深度特征聚类和循环神经网络(RNN)的电网智能故障诊断方法.该方法首先利用卷积神经网络搭建起特征提取器来提取时序数据的高层特征,然后对提取的特征进行半监督聚类,为无标签样本获得对应的标签,从而可以确定无标签样本所属的故障类别并加以利用;然后...  相似文献   

17.
Recently, there has been interest in developing diagnosis methods that combine model-based and data-driven diagnosis. In both approaches, selecting the relevant measurements or extracting important features from historical data is a key determiner of the success of the algorithm. Recently, deep learning methods have been effective in automating the feature selection process. Autoencoders have been shown to be an effective neural network configuration for extracting features from complex data, however, they may also learn irrelevant features. In addition, end-to-end classification neural networks have also been used for diagnosis, but like autoencoders, this method may also learn unimportant features thus making the diagnostic inference scheme inefficient. To rapidly extract significant fault features, this paper employs end-to-end networks and develops a new feature extraction method based on importance analysis and knowledge distilling. First, a set of cumbersome neural network models are trained to predict faults and some of their internal values are defined as features. Then an occlusion-based importance analysis method is developed to select the most relevant input variables and learned features. Finally, a simple student neural network model is designed based on the previous analysis results and an improved knowledge distilling method is proposed to train the student model. Because of the way the cumbersome networks are trained, only fault features are learned, with the importance analysis further pruning the relevant feature set. These features can be rapidly generated by the student model. We discuss the algorithms, and then apply our method to two typical dynamic systems, a communication system and a 10-tank system employed to demonstrate the proposed approach.  相似文献   

18.
面向范畴数据的序列化信息瓶颈算法(CD-sIB)假设数据各个属性特征对二元化转化的贡献均匀,从而影响转化效果。文中提出二元化加权转化方法来反映非共现数据的特征。该方法通过突出非共现数据的代表性属性,从抑制非代表性(冗余)属性,从而获取最佳共现表示。文中提出随机分布数据的适用性和计算方法的无监督性两个非共现加权原则,并基于加权粒度概念构造二元化加权转化算法。实验结果表明,文中算法的聚类精度优于其它算法。  相似文献   

19.
Newly assembled automobile transmission has its particular failure characteristic, strict quality testing working procedure on the assembly line is important for quality of automobile transmission. In this paper, we introduce a new automatic fault detection method for automobile transmission. A fault diagnosis expert system for newly assembled transmission is presented, related method of knowledge representation, feature extraction and fault classification is given. Order spectrum analysis method is used to analyze vibratory signal of automobile transmission. After initial feature vectors set are obtained, improved genetic search strategy is used to select fault features, so as to reduce the dimension of feature vector set. Selected feature vector sets are inputted into the BP neural network for fault identification and classification of the newly assembled automobile transmission. A large number of data are collected from industrial site and analyzed, proposed algorithm is verified to be effective and exact.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号