首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposed a hybrid genetic based functional link artificial neural network (HFLANN) with simultaneous optimization of input features for the purpose of solving the problem of classification in data mining. The aim of the proposed approach is to choose an optimal subset of input features using genetic algorithm by eliminating features with little or no predictive information and increase the comprehensibility of resulting HFLANN. Using the functionally expanded of selected features, HFLANN overcomes the nonlinearity nature of problems, which is commonly encountered in single-layer neural networks. The features like simplicity of the architecture and low computational complexity of the network encourage us to use it in classification task of data mining. Further, the issue of statistical tests for comparison of algorithms on multiple datasets, which is even more essential to typical machine learning and data mining studies, has been all but ignored. In this work, we recommend a set of simple, yet safe and robust parametric and nonparametric tests for statistical comparisons of HFLANN with FLANN and RBF classifiers over multiple datasets by an extensive simulation studies.  相似文献   

2.
现有的论辩挖掘工作大多针对单个数据集建模,忽视数据集不同时可能存在的特征变化情况,导致模型的泛化性能较差.因此,文中提出基于多任务学习的论辩挖掘方法,将多个数据集的论辩挖掘任务进行联合学习.首先融合多个任务的输入层表示,通过卷积神经网络和高速神经网络获取词级别和字符级共享参数,联合任务相关特征输入栈式双向长短记忆网络,利用多个论辩挖掘任务之间的关联信息并行训练,最终由条件随机场得到序列标注结果.在6个不同领域的数据集上的实验表明,文中方法在Macro-F1值上有所提升,由此验证方法的有效性.  相似文献   

3.
Data classification is an important topic in the field of data mining due to its wide applications. A number of related methods have been proposed based on the well-known learning models such as decision tree or neural network. Although data classification was widely discussed, relatively few studies explored the topic of temporal data classification. Most of the existing researches focused on improving the accuracy of classification by using statistical models, neural network, or distance-based methods. However, they cannot interpret the results of classification to users. In many research cases, such as gene expression of microarray, users prefer the classification information above a classifier only with a high accuracy. In this paper, we propose a novel pattern-based data mining method, namely classify-by-sequence (CBS), for classifying large temporal datasets. The main methodology behind the CBS is integrating sequential pattern mining with probabilistic induction. The CBS has the merit of simplicity in implementation and its pattern-based architecture can supply clear classification information to users. Through experimental evaluation, the CBS was shown to deliver classification results with high accuracy under two real time series datasets. In addition, we designed a simulator to evaluate the performance of CBS under datasets with different characteristics. The experimental results show that CBS can discover the hidden patterns and classify data effectively by utilizing the mined sequential patterns.  相似文献   

4.
陈郑淏  冯翱  何嘉 《计算机应用》2019,39(7):1936-1941
针对情感分类中传统二维卷积模型对特征语义信息的损耗以及时序特征表达能力匮乏的问题,提出了一种基于一维卷积神经网络(CNN)和循环神经网络(RNN)的混合模型。首先,使用一维卷积替换二维卷积以保留更丰富的局部语义特征;再由池化层降维后进入循环神经网络层,整合特征之间的时序关系;最后,经过softmax层实现情感分类。在多个标准英文数据集上的实验结果表明,所提模型在SST和MR数据集上的分类准确率与传统统计方法和端到端深度学习方法相比有1至3个百分点的提升,而对网络各组成部分的分析验证了一维卷积和循环神经网络的引入有助于提升分类准确率。  相似文献   

5.
In this paper, we develop a granular input space for neural networks, especially for multilayer perceptrons (MLPs). Unlike conventional neural networks, a neural network with granular input is an augmented study on a basis of a well learned numeric neural network. We explore an efficient way of forming granular input variables so that the corresponding granular outputs of the neural network achieve the highest values of the criteria of specificity (and support). When we augment neural networks through distributing information granularities across input variables, the output of a network has different levels of sensitivity on different input variables. Capturing the relationship between input variables and output result becomes of a great help for mining knowledge from the data. And in this way, important features of the data can be easily found. As an essential design asset, information granules are considered in this construct. The quantification of information granules is viewed as levels of granularity which is given by the expert. The detailed optimization procedure of allocation of information granularity is realized by an improved partheno genetic algorithm (IPGA). The proposed algorithm is testified effective by some numeric studies completed for synthetic data and data coming from the machine learning and StatLib repositories. Moreover, the experimental studies offer a deep insight into the specificity of input features.  相似文献   

6.
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

7.
王娇  王雄  熊智华 《计算机工程》2006,32(5):183-185
针对丙酮精制过程的特点,提出一种基于神经网络的丙酮产品质最分类挖掘方法。首先,讨论了数据挖掘中自变量筛选的方法,包括相关性分析、Fisher指数分析、主成分回归分析以及偏最小二乘回归分析等,综合各种疗法分析的结果,对丙酮精制过程中众多的工艺影响因素进行了重要性排序并据此筛选出重要的自变量;以选入的变量作为输入变量,构造基于神经网络的产品质量分类器。实验结果表明,训练后的神经网络分类器在丙酮产品质量分类挖掘中取得了良好的效果。  相似文献   

8.
Credit scoring with a data mining approach based on support vector machines   总被引:3,自引:0,他引:3  
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

9.
A hybrid approach of neural network and memory-based learning todata mining   总被引:4,自引:0,他引:4  
We propose a hybrid prediction system of neural network and memory-based learning. Neural network (NN) and memory-based reasoning (MBR) are frequently applied to data mining with various objectives. They have common advantages over other learning strategies. NN and MBR can be directly applied to classification and regression without additional transformation mechanisms. They also have strength in learning the dynamic behavior of the system over a period of time. Unfortunately, they have shortcomings when applied to data mining tasks. Though the neural network is considered as one of the most powerful and universal predictors, the knowledge representation of NN is unreadable to humans, and this "black box" property restricts the application of NN to data mining problems, which require proper explanations for the prediction. On the other hand, MBR suffers from the feature-weighting problem. When MBR measures the distance between cases, some input features should be treated as more important than other features. Feature weighting should be executed prior to prediction in order to provide the information on the feature importance. In our hybrid system of NN and MBR, the feature weight set, which is calculated from the trained neural network, plays the core role in connecting both learning strategies, and the explanation for prediction can be given by obtaining and presenting the most similar examples from the case base. Moreover, the proposed system has advantages in the typical data mining problems such as scalability to large datasets, high dimensions, and adaptability to dynamic situations. Experimental results show that the hybrid system has a high potential in solving data mining problems.  相似文献   

10.
The ensemble of evolving neural networks, which employs neural networks and genetic algorithms, is developed for classification problems in data mining. This network meets data mining requirements such as smart architecture, user interaction, and performance. The evolving neural network has a smart architecture in that it is able to select inputs from the environment and controls its topology. A built-in objective function of the network offers user interaction for customized classification. The bagging technique, which uses a portion of the training set in multiple networks, is applied to the ensemble of evolving neural networks in order to improve classification performance. The ensemble of evolving neural networks is tested by various data sets and produces better performance than both classical neural networks and simple ensemble methods.  相似文献   

11.
Machine vision based inspection systems are in great focus nowadays for quality control applications. The proposed work presents a novel approach for classification of wood knot defects for an automated inspection. The proposed technique utilizes gray level co-occurrence matrix and laws texture energy measures as texture feature extractors and feed-forward back-propagation neural network as classifier. The proposed work involves the comparison of gray level co-occurrence matrix based features with laws texture energy measures based features. Firstly it takes contrast, correlation, energy and homogeneity as input parameters to a feed-forward back propagation neural network to predict wood defects and then it take energy calculated from laws texture energy measures based energy maps as input feature to a feed-forward back propagation neural network. Mean Square Error (MSE) for training data is found to be 0.0718 and 90.5% overall average classification accuracy is achieved when laws texture energy measures based features are used as input to the neural network as compared to gray level co-occurrence matrix based input features where MSE for training data is found to be 0.10728 and 84.3% overall average classification accuracy is achieved. The proposed technique shows promising results to classify wood defects using a feed forward back-propagation neural network.  相似文献   

12.
对具有时间属性的数据进行数据挖掘称为时态数据挖掘,用以发现数据在时间上的知识,当数据变化不规律时,如股票交易数据,就很难发现有价值的规律与规则。而神经网络具有并行、容错、可以硬件实现以及自我学习的优点,可作为股票分类预测应用的一种方法。通过将股票数据与时态型相结合,将股票数据转换成时态型股票数据,提出时态神经网络模型的分类方法,对收集的若干上市公司十年内的股票数据进行分析,构建了时态股票数据神经网络分类器对股票进行分类预测。经过实验验证,相比改进前的神经网络和支持向量机方法,该分类器具有更高的分类准确率。结果证明,这种时态数据神经网络模型对于多只股票的分类预测是非常有效的,可以很好地运用到股票市场的分类预测中。  相似文献   

13.
目的 深度学习已经大量应用于合成孔径宽达(SAR)图像目标识别领域,但大多数工作是基于MSTAR数据集的标准操作条件展开研究。当将深度学习应用于同类含变体目标时,例如T72子类,由于目标间差异小,所以仍存在着较大的挑战。本文从极大限度地保留SAR图像输入特征出发,设计一种适用于SAR变体目标识别的深度卷积神经网络结构。方法 设计网络主要由多尺度空间特征提取模块和DenseNet中的稠密块、转移层构成。多尺度特征提取模块置于网络底层,通过使用尺寸分别为1×1、3×3、5×5、7×7、9×9的卷积核,提取丰富空间特征的同时保留输入图像信息。为使输入图像信息更加有效地向后传递,基于DenseNet中的稠密块和转移层进行后续网络层设计。在对训练样本进行样本扩充基础上,分析了输入图像分辨率及目标存在平移和不同噪声水平等情况对模型识别精度的影响,与用于SAR图像目标识别的深度模型识别精度在标准操作条件下进行了对比分析。结果 实验结果表明,对T72 8类变体目标进行分类,设计的模型能够取得95.48%的识别精度,在存在目标平移和不同噪声水平情况下,平均识别精度分别达到了94.61%和86.36%。对10类目标(包括不含变体和含变体情况)在进行数据增强的情况下进行模型训练与测试,分别达到了99.38%和98.81%的识别精度,略优于其他对比模型结构识别精度。结论 提出的模型可以充分利用输入图像以及各卷积层输出的特征,学习目标图像的细节差异,不仅适用于SAR图像变体目标的识别任务,同时在标准操作条件下的识别任务也取得了较高的识别结果。  相似文献   

14.
在当今信息爆炸、网络快速发展的时代,网络攻击与网络威胁日益增多,恶意流量识别在网络安全中发挥着非常重要的作用。深度学习在图像处理、自然语言处理上已经展现出优越的性能,因此有诸多研究将深度学习应用于流量分类中。将深度学习应用于流量识别时,部分研究对原始流量数据进行截断或者补零操作,截断操作容易造成流量信息的部分丢失,补零操作容易引入对模型训练无用的信息。针对这一问题,本文提出了一种用于恶意流量分类的不定长输入卷积神经网络(IndefiniteLength Convolutional Neural Network,ILCNN),该网络模型基于不定长输入,在输入时使用未截断未补零的原始流量数据,利用池化操作将不定长特征向量转化为定长的特征向量,最终达到对恶意流量分类的目的。基于CICIDS-2017数据集的实验结果表明,ILCNN模型在F1-Score上的分类准确率能够达到0.999208。相较于现有的恶意流量分类工作,本文所提出的不定长输入卷积神经网络ILCNN在F1-Score和准确率上均有所提升。  相似文献   

15.
Artificial neural networks often achieve high classification accuracy rates, but they are considered as black boxes due to their lack of explanation capability. This paper proposes the new rule extraction algorithm RxREN to overcome this drawback. In pedagogical approach the proposed algorithm extracts the rules from trained neural networks for datasets with mixed mode attributes. The algorithm relies on reverse engineering technique to prune the insignificant input neurons and to discover the technological principles of each significant input neuron of neural network in classification. The novelty of this algorithm lies in the simplicity of the extracted rules and conditions in rule are involving both discrete and continuous mode of attributes. Experimentation using six different real datasets namely iris, wbc, hepatitis, pid, ionosphere and creditg show that the proposed algorithm is quite efficient in extracting smallest set of rules with high classification accuracy than those generated by other neural network rule extraction methods.  相似文献   

16.
Artificial neural networks (ANNs) are used for rare vegetation communities’ classification using remotely sensed data. Training of a neural network requires that the user specifies the network structure and sets the learning parameters. Heuristics proposed by a number of researchers to determine the optimum values of network parameters are compared using datasets. Training and test samples were collected for each class type (12 classes). After preliminary statistical tests for training samples, two modification algorithms of the classification scheme were defined: the first one led to creating a scheme which consisted of 7 classes, and the second one led us to creating of 5 class’s scheme. Testing results show that the use of ANNs on the based of 5 class’s scheme can produce higher classification accuracies than either alternative. The visual analysis of the results of the classification is described using Geoinformation Technologies in details. The text was submitted by the authors in English.  相似文献   

17.
以往针对通信网络故障分类的算法没有考虑告警和故障数据中的潜在特征,导致故障分类准确率低,因此提出一种基于数据挖掘的通信网络故障分类算法。首先,根据对数据背景和数据特点的理解,使用特征构造挖掘数据中潜在的特征,将挖掘到的特征加入原数据中。然后,使用LightGBM算法的特征重要性评估函数对新数据集中的所有特征进行重要性评估,根据重要性值删除不重要特征。最后,使用集成学习模型对特征筛选后的数据集进行故障分类研究。实验结果表明,基于数据挖掘的通信网络故障分类算法的准确率有更好的效果。  相似文献   

18.
文本的表示与文本的特征提取是文本分类需要解决的核心问题,基于此,提出了基于改进的连续词袋模型(CBOW)与ABiGRU的文本分类模型。该分类模型把改进的CBOW模型所训练的词向量作为词嵌入层,然后经过卷积神经网络的卷积层和池化层,以及结合了注意力(Attention)机制的双向门限循环单元(BiGRU)神经网络充分提取了文本的特征。将文本特征向量输入到softmax分类器进行分类。在三个语料集中进行的文本分类实验结果表明,相较于其他文本分类算法,提出的方法有更优越的性能。  相似文献   

19.
王刚  王本年 《微机发展》2008,18(2):119-121
模糊神经网络即具有输入信号是模糊量的神经网络,是模糊系统与神经网络相结合的产物,汇聚了二者的优点;遗传算法是一种自适应全局优化概率搜索算法。研究了基于模糊神经网络与遗传算法相融合的一种算法,在应用模糊神经网络进行数据挖掘前,应用遗传算法完成隶属函数的训练,以便更好地进行模糊神经网络学习;经过模糊神经网络学习后,提取相关规则,再次应用遗传算法,进行规则剪枝,提高数据挖掘效率。实验表明,与传统方法相比,该方法能够更快速、更加准确地进行数据挖掘,提取更精确的推理规则。  相似文献   

20.
目的 与传统分类方法相比,基于深度学习的高光谱图像分类方法能够提取出高光谱图像更深层次的特征。针对现有深度学习的分类方法网络结构简单、特征提取不够充分的问题,提出一种堆叠像元空间变换信息的数据扩充方法,用于解决训练样本不足的问题,并提出一种基于不同尺度的双通道3维卷积神经网络的高光谱图像分类模型,来提取高光谱图像的本质空谱特征。方法 通过对高光谱图像的每一像元及其邻域像元进行旋转、行列变换等操作,丰富中心像元的潜在空间信息,达到数据集扩充的作用。将扩充之后的像素块输入到不同尺度的双通道3维卷积神经网络学习训练集的深层特征,实现更高精度的分类。结果 5次重复实验后取平均的结果表明,在随机选取了10%训练样本并通过8倍数据扩充的情况下,Indian Pines数据集实现了98.34%的总体分类精度,Pavia University数据集总体分类精度达到99.63%,同时对比了不同算法的运行时间,在保证分类精度的前提下,本文算法的运行时间短于对比算法,保证了分类模型的稳定性、高效性。结论 本文提出的基于双通道卷积神经网络的高光谱图像分类模型,既解决了训练样本不足的问题,又综合了高光谱图像的光谱特征和空间特征,提高了高光谱图像的分类精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号