首页 | 本学科首页   官方微博 | 高级检索  
     

利用协变量调整控制混杂因子的鲁棒文本分类
引用本文:董园园. 利用协变量调整控制混杂因子的鲁棒文本分类[J]. 计算机系统应用, 2020, 29(3): 155-160
作者姓名:董园园
作者单位:齐鲁师范学院,济南250013
基金项目:山东省社会科学规划研究项目(17CTYJ03)
摘    要:针对目前很多文本分类方法很少控制混杂变量,且分类准确度对数据分布的鲁棒性较低的问题,提出一种基于协变量调整的文本分类方法.首先,假设文本分类中的混杂因子(变量)可在训练阶段观察到,但无法在测试阶段观察到;然后,以训练阶段的混杂因子为条件,在预测阶段计算出混杂因子的总和;最后,基于Pearl的协变量调整,通过控制混杂因子来观察文本特征和分类变量对分类器的精度影响.通过微博数据集和IMDB数据集验证所提方法的性能,实验结果表明,与其他方法相比,所提方法处理混杂关系时,可以得到更高的分类准确度,且对混杂变量具备鲁棒性.

关 键 词:协变量调整  混杂变量  文本分类  文本特征  鲁棒性
收稿时间:2019-04-26
修稿时间:2019-05-21

Robust Text Categorization Using Covariates to Control Confounding Factors
DONG Yuan-Yuan. Robust Text Categorization Using Covariates to Control Confounding Factors[J]. Computer Systems& Applications, 2020, 29(3): 155-160
Authors:DONG Yuan-Yuan
Affiliation:Qilu Normal University, Jinan 250013, China
Abstract:Aiming at the problem that many documents categorization methods seldom control hybrid variables and have low robustness to data distribution, a documents (text) categorization method based on covariate adjustment is proposed. Firstly, it is assumed that the confounding factors (variables) in text categorization can be observed in the training stage, but not in the testing stage. Then, the sum of confounding factors is calculated in the prediction stage under the condition of the confounding factors in the training stage. Finally, based on Pearl''s covariate adjustment, the accuracy of text features and classification variables to the classifier is observed by controlling the confounding factors. The performance of the proposed method is verified by microblog data set and IMDB data set. The experimental results show that the proposed method can achieve higher classification accuracy and robustness against mixed variables than other methods.
Keywords:covariate adjustment  confounding variables  text classification  text features  robustness
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号