首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于互自扩展模式的中文产品属性提取算法
引用本文:于明朕,那日萨.一种基于互自扩展模式的中文产品属性提取算法[J].计算机应用研究,2017,34(4).
作者姓名:于明朕  那日萨
作者单位:大连理工大学 管理与经济学部,大连理工大学 管理与经济学部
基金项目:国家自然科学基金面上项目;教育部人文社科研究规划基金项目
摘    要:针对中文在线评论中产品属性词的提取,提出了一种基于互自扩展模式的半监督学习方法。利用较少的人工参与,通过FP-Growth算法挖掘频繁项集获得种子属性词,通过增量迭代发现新的属性词,在每一轮迭代中,通过计算提取词与提取模式的置信度,确保了算法的准确性,同时避免了主题偏移。最后通过相似提取模式获得复合提取词,大大减少了因分词及词性标注错误所导致的属性词挖掘错误,以牺牲较少准确率的代价换取了较高的召回率。实验结果表明:本文算法对产品属性提取的F值可以达到78.97%,结果优于文献中其它类似提取算法。

关 键 词:在线评论  产品属性提取  互自扩展  FP-Growth算法  置信度
收稿时间:2016/3/21 0:00:00
修稿时间:2016/5/6 0:00:00

A feature extraction method based on mutual self-expanding mode
Affiliation:Faculty of Management and Economics,Dalian University of Technology,Faculty of Management and Economics,Dalian University of Technology
Abstract:A feature extraction method based on mutual self-expanding in Chinese product comment was proposed. With little manual work, this method can find seed features by FP-Growth, then find the other new features by an incremental iterative procedure. During the iteration, the confidence coefficient of the extracted-word and the extracted-mode can insure a high precision, avoid deviating theme at the same time. At last, this method find Combination Extracted-Word by Similarity Extracted-Mode. It can reduce many feature extraction mistakes caused by word segmentation technology and part-of-speech tagging technology, and get a high precision with reducing little recall rate. The experimental results indicate that the F-score of the proposed method for product feature extraction can be 78.97%, better than the other method of the literatures of this paper.
Keywords:online comment  product features extraction  mutual sel-expanding  FP-Growth method  confidence coefficient
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号