首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
基于机器学习的中文微博情感分类实证研究   总被引:3,自引:0,他引:3  
使用三种机器学习算法、三种特征选取算法以及三种特征项权重计算方法对微博进行了情感分类的实证研究。实验结果表明,针对不同的特征权重计算方法,支持向量机(SVM)和贝叶斯分类算法(Nave Bayes)各有优势,信息增益(IG)特征选取方法相比于其他的方法效果明显要好。综合考虑三种因素,采用SVM和IG,以及TF-IDF(Term Frequency-Inverse Document Frequency)作为特征项权重,三者结合对微博的情感分类效果最好。针对电影领域,比较了微博评论和普通评论之间分类模型的通用性,实验结果表明情感分类性能依赖于评论的风格。  相似文献   

2.
Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.  相似文献   

3.
为降低集成特征选择方法的计算复杂性,提出了一种基于粗糙集约简的神经网络集成分类方法。该方法首先通过结合遗传算法求约简和重采样技术的动态约简技术,获得稳定的、泛化能力较强的属性约简集;然后,基于不同约简设计BP网络作为待集成的基分类器,并依据选择性集成思想,通过一定的搜索策略,找到具有最佳泛化性能的集成网络;最后通过多数投票法实现神经网络集成分类。该方法在某地区Landsat 7波段遥感图像的分类实验中得到了验证,由于通过粗糙集约简,过滤掉了大量分类性能欠佳的特征子集,和传统的集成特征选择方法相比,该方法时  相似文献   

4.
为降低集成特征选择方法的计算复杂性,提出了一种基于粗糙集约简的神经网络集成分类方法。该方法首先通过结合遗传算法求约简和重采样技术的动态约简技术,获得稳定的、泛化能力较强的属性约简集;然后,基于不同约简设计BP网络作为待集成的基分类器,并依据选择性集成思想,通过一定的搜索策略,找到具有最佳泛化性能的集成网络;最后通过多数投票法实现神经网络集成分类。该方法在某地区Landsat 7波段遥感图像的分类实验中得到了验证,由于通过粗糙集约简,过滤掉了大量分类性能欠佳的特征子集,和传统的集成特征选择方法相比,该方法时间开销少,计算复杂性低,具有满意的分类性能。  相似文献   

5.
Feature subset selection with the aim of reducing dependency of feature selection techniques and obtaining a high-quality minimal feature subset from a real-world domain is the main task of this research. For this end, firstly, two types of feature representation are presented for feature sets, namely unigram-based and part-of-speech based feature sets. Secondly, five methods of feature ranking are employed for creating feature vectors. Finally, we propose two methods for the integration feature vectors and feature subsets. An ordinal-based integration of different feature vectors (OIFV) is proposed in order to obtain a new feature vector. The new feature vector depends on the order of features in the old vectors. A frequency-based integration of different feature subsets (FIFS) with most effective features, which are obtained from a hybrid filter and wrapper methods in the feature selection task, is then proposed. In addition, four well-known text classification algorithms are employed as classifiers in the wrapper method for the selection of the feature subsets. A wide range of comparative experiments on five widely-used datasets in sentiment analysis were carried out. The experiments demonstrate that proposed methods can effectively improve the performance of sentiment classification. These results also show that proposed part-of-speech patterns are more effective in their classification accuracy compared to unigram-based features.  相似文献   

6.
尝试将word embedding和卷积神经网络(CNN)相结合来解决情感分类问题。首先,利用Skip-Gram模型训练出数据集中每个词的word embedding,然后将每条样本中出现的word embedding组合为二维特征矩阵作为卷积神经网络的输入;此外,每次迭代训练过程中,输入特征也作为参数进行更新。其次,设计了一种具有3种不同大小卷积核的神经网络结构,从而完成多种局部抽象特征的自动提取过程。与传统机器学习方法相比,所提出的基于word embedding和CNN的情感分类模型成功将分类正确率提升了5.04%。  相似文献   

7.
情感分类是目前自然语言处理领域的一个具有挑战性的研究热点,该文主要研究基于半监督的文本情感分类问题。传统基于Co-training的半监督情感分类方法要求文本具备大量有用的属性集,其训练过程是线性时间的计算复杂度并且不适用于非平衡语料。该文提出了一种基于多分类器投票集成的半监督情感分类方法,通过选取不同的训练集、特征参数和分类方法构建了一组有差异的子分类器,每轮通过简单投票挑选出置信度最高的样本使训练集扩大一倍并更新训练模型。该方法使得子分类器可共享有用的属性集,具有对数时间复杂度并且可用于非平衡语料。实验结果表明我们的方法在不同语种、不同领域、不同规模大小,平衡和非平衡语料的情感分类中均具有良好效果。  相似文献   

8.
方丁  王刚 《计算机系统应用》2012,21(7):177-181,248
随着Web2.0的迅速发展,越来越多的用户乐于在互联网上分享自己的观点或体验。这类评论信息迅速膨胀,仅靠人工的方法难以应对网上海量信息的收集和处理,因此基于计算机的文本情感分类技术应运而生,并且研究的重点之一就是提高分类的精度。由于集成学习理论是提高分类精度的一种有效途径,并且已在许多领域显示出其优于单个分类器的良好性能,为此,提出基于集成学习理论的文本情感分类方法。实验结果显示三种常用的集成学习方法 Bagging、Boosting和Random Subspace对基础分类器的分类精度都有提高,并且在不同的基础分类器条件下,Random Subspace方法较Bagging和Boosting方法在统计意义上更优,以上结果进一步验证了集成学习理论在文本情感分类中应用的有效性。  相似文献   

9.
Choosing appropriate classification algorithms for a given data set is very important and useful in practice but also is full of challenges. In this paper, a method of recommending classification algorithms is proposed. Firstly the feature vectors of data sets are extracted using a novel method and the performance of classification algorithms on the data sets is evaluated. Then the feature vector of a new data set is extracted, and its k nearest data sets are identified. Afterwards, the classification algorithms of the nearest data sets are recommended to the new data set. The proposed data set feature extraction method uses structural and statistical information to characterize data sets, which is quite different from the existing methods. To evaluate the performance of the proposed classification algorithm recommendation method and the data set feature extraction method, extensive experiments with the 17 different types of classification algorithms, the three different types of data set characterization methods and all possible numbers of the nearest data sets are conducted upon the 84 publicly available UCI data sets. The results indicate that the proposed method is effective and can be used in practice.  相似文献   

10.
Nowadays, multi-label classification methods are of increasing interest in the areas such as text categorization, image annotation and protein function classification. Due to the correlation among the labels, traditional single-label classification methods are not directly applicable to the multi-label classification problem. This paper presents two novel multi-label classification algorithms based on the variable precision neighborhood rough sets, called multi-label classification using rough sets (MLRS) and MLRS using local correlation (MLRS-LC). The proposed algorithms consider two important factors that affect the accuracy of prediction, namely the correlation among the labels and the uncertainty that exists within the mapping between the feature space and the label space. MLRS provides a global view at the label correlation while MLRS-LC deals with the label correlation at the local level. Given a new instance, MLRS determines its location and then computes the probabilities of labels according to its location. The MLRS-LC first finds out its topic and then the probabilities of new instance belonging to each class is calculated in related topic. A series of experiments reported for seven multi-label datasets show that MLRS and MLRS-LC achieve promising performance when compared with some well-known multi-label learning algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号