首页 | 本学科首页   官方微博 | 高级检索  
     

基于多维相似度和情感词扩充的相同产品特征识别
引用本文:胡龙茂,胡学钢.基于多维相似度和情感词扩充的相同产品特征识别[J].山东大学学报(工学版),2020,50(2):50-59.
作者姓名:胡龙茂  胡学钢
作者单位:合肥工业大学计算机与信息学院,安徽 合肥230601;安徽财贸职业学院,安徽 合肥230601;合肥工业大学计算机与信息学院,安徽 合肥230601
基金项目:国家自然科学基金项目(61673152);安徽省高校自然科学研究重点项目(KJ2017A858)
摘    要:针对现有相同产品特征识别方法受限于词典覆盖率或语料规模的不足,提出一种基于多维相似度和情感词扩充的识别方法。通过双向长短时记忆条件随机场(bi-directional long short-term memory and conditional random field, Bi-LSTM-CRF)模型抽取产品特征的扩充情感词,综合特征词的语素相似度、同义词林相似度和TF-IDF(term frequency-inverse document frequency)余弦相似度,采用K-medoids聚类算法,识别相同的产品特征。试验结果表明,在手机和笔记本数据集上,该方法的最大调整兰德指数分别达到0.579和0.595 9,而最小熵值分别达到0.782 6和0.745 7,均优于结合语素的调整Jaccard相似度、Word2Vec相似度和基于二分K-means的Word2Vec相似度三种基线试验方法。

关 键 词:产品特征  情感词扩充  Bi-LSTM-CRF  多维度  相似度计算
收稿时间:2019-07-17

Identification of the same product feature based on multi-dimension similarity and sentiment word expansion
Longmao HU,Xuegang HU.Identification of the same product feature based on multi-dimension similarity and sentiment word expansion[J].Journal of Shandong University of Technology,2020,50(2):50-59.
Authors:Longmao HU  Xuegang HU
Affiliation:1. School of Computer and Information, Hefei University of Technology, Hefei 230601, Anhui, China2. Anhui Finance and Trade Vocational College, Hefei 230601, Anhui, China
Abstract:Because the existing methods for identifying the same product features were limited by the lack of dictionary coverage or corpus size, an identification method was proposed based on multidimensional similarity and sentiment word expansion. Extracting emotional words of product features through bi-directional long short-term memory and conditional random field (Bi-LSTM-CRF), combining the morpheme similarity, Cilin similarity and term frequency-inverse document frequency (TF-IDF) cosine similarity of product feature words, the same product features were identified by K-medoids clustering algorithm. The experimental results showed that, on mobile and notebook datasets, the maximum adjusted rand index (ARI) reached 0.579 and 0.595 9 respectively, while the minimum entropy reached 0.782 6 and 0.745 7. The proposed method was superior to the adjusted Jaccard similarity combined morpheme, Word2Vec similarity and Word2Vec similarity based on bisecting K-means.
Keywords:product feature  sentiment word expansion  Bi-LSTM-CRF  multi-dimension  similarity calculation  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号