首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义增强的改进混合特征选择的文本分类
引用本文:高洁云,赵逢禹,刘亚.基于语义增强的改进混合特征选择的文本分类[J].计算机技术与发展,2021(1).
作者姓名:高洁云  赵逢禹  刘亚
作者单位:上海理工大学光电信息与计算机工程学院
基金项目:国家自然科学基金(61803264)
摘    要:如何从文本中抽取出能够体现文本特点的关键特征,抓取特征到类别之间的映射是文本分类核心问题之一。传统的词袋模型的优点是将每个词视为一个特征,而缺点是计算成本会随特征数量和文本与特征之间的关系的增加而增加,并且没有考虑文本特征自身的语义关系,语义关系的优势是获取文本和特征之间的相关性。针对这个问题,提出一种增强混合特征选择方法,该方法使用混合特征选择进行降维,然后再使用词向量对低频词进行语义增强。为了验证增强的混合特征选择对文本分类的作用,构建了两个实验,使用LSTM算法进行分类模型训练与测试。对爬取的71825个新闻文本数据进行实验表明,基于语义的增强混合特征选择方法在文本分类时既提高了分类效率又能保证分类精度。

关 键 词:混合特征选择  语义分析  词向量  文本分类  LSTM

Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement
GAO Jie-yun,ZHAO Feng-yu,LIU Ya.Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement[J].Computer Technology and Development,2021(1).
Authors:GAO Jie-yun  ZHAO Feng-yu  LIU Ya
Affiliation:(School of Optoelectronic Information and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
Abstract:One of the core problems of text classification is how to extract the key features that can reflect the characteristics of the text from the text and capture the mapping between features and categories.The advantage of the traditional bag-of-words model is to treat each word as a feature,while the disadvantage is that the calculation cost increases with the increase in the number of features and the relationship between text and features,and the semantic relationship of the text features themselves is not considered.The advantage of semantic relationships is to get the correlation between text and features.Aiming at this problem,we propose an enhanced hybrid feature selection method which uses hybrid feature selection to reduce the dimension,and then uses word embedding to semantically enhance low-frequency words.In order to verify the effect of enhanced hybrid feature selection on text classification,two experiments are constructed,using the LSTM algorithm to train and test the classification model.Experiments on 71825 news text data crawled show that the semantic-based enhanced hybrid feature selection method not only improves the classification efficiency but also ensures the classification accuracy in text classification.
Keywords:hybrid feature selection  semantic analysis  word-embedding  text classification  LSTM
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号