为了识别商品垃圾评论,基于垃评论员发表的多为垃圾评论这一基本思想,提出一种基于评论员评论行为来判定其是否为垃圾评论员的方法。分析定义了垃圾评论员常见的三类评论行为,分别是针对同类商品发表垃圾评论,针对同品牌商品发表垃圾评论和针对同一卖家商品发表垃圾评论;在对这三类评论行为建模的同时提出一种依据重复性过高或过低打分的评论数量来计算评论员垃圾指数(spam score)的方法。实验数据为在当当网摄影摄像商品区发表过评论的评论员的所有评论信息。实验结果通过人工评判和计算NDCG(normalize discounted cumulative gain)值的方法来检验,实验结果准确有效。  相似文献   

消费者购买商品后发表的商品评论中包含了对商品的正向或负向评价。一些贪心商家通过发表虚假评论来美化自己的商品或诋毁竞争对手的商品从而获得非正当利益,因此需要识别虚假商品评论。商品评论的文本内容是最容易获取的分析数据,因此对基于文本内容的虚假商品评论检测领域相关研究进行分析,主要分为传统机器学习方法和深度学习方法。  相似文献   

产品垃圾评论检测研究综述   总被引:4,自引:2,他引:2  
互联网上的产品垃圾评论混淆视听,误导了潜在消费者。产品垃圾评论检测的目的就是将垃圾评论从评论文本中找到并去除,保留真实的产品评论供用户参考。首先将产品垃圾评论和互联网上其它常见的垃圾信息进行了对比,并把产品垃圾评论的检测和产品评论的质量判断、产品评论的情感分析等相关的工作进行了比较分析。然后从产品垃圾评论检测的数据集、检测方法两个角度对相关工作做了概述和分析。最后,在上述工作的基础上提出了一些产品垃圾评论检测研究中值得进一步关注的问题。  相似文献   

单词的统计特征在自然语言处理中具有广泛的应用。针对统计特征对关键词抽取和文本分类精确度的影响,分析了八种常见的统计特征,通过情感词抽取和商品评论分类,研究统计特征在情感分析领域中的作用。情感词提取实验的结果表明,通过结合统计特征与词性,情感词提取的准确率能够达到76.4%,显著高于基于统计特征或单词词性的情感词提取算法。商品评论分类的测试结果表明,与传统的基于单词的文本情感分类相比,基于统计特征的商品评论分类的准确率提高了10.8%。利用八种统计特征构造文本向量空间模型,替代基于单词构造文本向量空间模型的方法,能够降低文本向量的维度,具有隐形语义空间(LSA/SVD)的压缩效果,在保证分类结果准确率的前提下有效降低了算法的复杂度,能够替代传统的向量空间模型。  相似文献   

基于LSTM的商品评论情感分析   总被引:1,自引:0,他引:1  
随着电子商务的发展,产生了大量的商品评论文本.针对商品评论的短文本特征,基于情感词典的情感分类方法需要大量依赖于情感数据库资源,而机器学习的方法又需要进行复杂的人工设计特征和提取特征过程.本文提出采用长短期记忆网络(Long Short-Term Memory)文本分类算法进行情感倾向分析,首先利用Word2vec和分词技术将评论短文本文本处理为计算机可理解的词向量传入LSTM网络并加入Dropout算法以防止过拟合得出最终的分类模型.实验表明:在基于深度学习的商品评论情感倾向分析中,利用LSTM网络的短时记忆独特特征对商品评论的情感分类取得了很好的效果,准确率达到99%以上.  相似文献   

随着移动互联网的发展,以商品评论等带有主观性的短文本信息急剧增加.海量的文本信息使得人工管理越来越困难.本文以商品评论为研究对象进行情感分析.针对商品评论为短文本的特点,本文在词向量的基础上提出了词向量叠加方法和加权词向量方法进行文本特征的提取,从而更深层次的提取短文本特征.在进行评论情感分析模型性能的比较中,说明了本文所提方法的有效性.基于情感分析技术可以解决人工难以胜任的海量商品评论的分类,方便用户快速获取有效信息.  相似文献   

随着电子商务系统评价体系的完善,网购评论的内容对消费者的购物起到十分重要的指导作用。但是消费者不能从大量评论中找到自己直接关心的商品属性(如:手机产品的属性“电池”)以及属性相关评价(如:“电池容量很大”)。相对于构建知识库和传统机器学习的方法,需要人工总结复杂的特征和规则来提取商品属性和属性相关评价。本文应用基于词嵌入融合双向长短时记忆网络(Bi-LSTM)和条件随机场(CRF)的方法并根据在评论中属性多为名词、属性评价多为形容词的特点在Bi-LSTM+CRF模型中融入词性特征,实现对评论中的商品属性以及属性评价的自动化提取,在避免总结规则的同时更具领域普适性。通过测试相机、男装、儿童安全座椅3个商品领域,得到了宏精确度为86.74%,宏召回率为85.89%。  相似文献   

评论是一种反映事物价值的重要主观信息。该文从用户角度出发,提出一种基于全局用户意图的商品评论自动估价方法。该研究首先定义了一种简易的评论价值划分标准(“实用”和“垃圾”评论),借以实现前瞻性的方法尝试。在此基础上,该文采用SVM分类器作为划分评论价值类别(二元分类问题)的基本平台,并基于这一平台重点考察三种影响评论价值的特征 1)属性热度;2)内容可信度;3)用户情感和观点。该文在文本结构特征的基础上,加入上述三类反映用户意图的特征进行评论价值判定,并在大规模商品评论语料集中进行测试。实验表明通过引入用户意图特征,评论自动估价的性能有较大幅度提高。  相似文献   

用户评论的质量检测与控制研究综述   总被引:1,自引:0,他引:1  
随着网络技术的发展,越来越多用户生成的内容(user-generated content)出现在网络应用中,其中,用户评论富含用户的观点,它们在网络环境中充当越来越重要的角色.据美国Cone公司2011年的调查报告,64%的用户在购买行为之前会参考已有的用户评论.因此,为用户提供准确、简洁和真实的评论是一个迫切且重要的任务.主要围绕评论质量评估、评论总结和垃圾评论检测这3个方面综述了国际上评论质量检测与控制的研究内容、技术和方法的研究进展.在此基础上,展望该领域的发展给出了可能的研究方向.  相似文献   

金相宏  李琳  钟珞 《计算机科学》2017,44(10):254-258
随着电子商务的飞速发展,网络购物越来越被消费者认同,而随之产生的产品评论给消费者的购买决策带来了影响。产品评论是指用户在购物站点上对商品的评价信息,而 经过分析和研究发现这些评论中充斥着大量的垃圾评论,因此垃圾评论的识别成了电子商务在提高服务质量的过程中需解决的重要问题之一。根据垃圾评论的主要特点提出LDA-SP(LDA-Sentiment Polarity)垃圾评论识别方法。首先利用LDA主题模型过滤出内容型垃圾评论,然后结合情感分析识别出欺骗型垃圾评论。对网络商城的大量评论数据进行准确度分析实验的结果表明,LDA-SP方法的识别准确度高于传统的LDA主题模型和单一的情感极性分析方法,能够有效地检测垃圾评论,从而使产品评论信息更加客观准确,为电子商务用户提供了有效的参考信息。  相似文献   

Online reviews significantly influence decision-making in many aspects of society. The integrity of internet evaluations is crucial for both consumers and vendors. This concern necessitates the development of effective fake review detection techniques. The goal of this study is to identify fraudulent text reviews. A comparison is made on shill reviews vs. genuine reviews over sentiment and readability features using semi-supervised language processing methods with a labeled and balanced Deceptive Opinion dataset. We analyze textual features accessible in internet reviews by merging sentiment mining approaches with readability. Overall, the research improves fake review screening by using various transformer models such as Bidirectional Encoder Representation from Transformers (BERT), Robustly Optimized BERT (Roberta), XLNET (Transformer-XL) and XLM-Roberta (Cross-lingual Language model–Roberta). This proposed research extracts and classifies features from product reviews to increase the effectiveness of review filtering. As evidenced by the investigation, the application of transformer models improves the performance of spam review filtering when related to existing machine learning and deep learning models.  相似文献   

E-commerce websites are now favourite for shopping comfortably at home without any burden of going to market. Their success depends upon the reviews written by the consumers who used particular products and subsequently shared their experiences with that product. The reviews also affects the buying decision of customer. Because of this reason the activity of fake reviews posting is increasing. The brand competitors of the product or the company itself may involve in posting fraud reviews to gain more profit. Such fraudulent reviews are spam review that badly affects the decision choice of the prospective consumer of the products. Many customers are misguided due to fake reviews. The person, who writes the fake reviews, is called the spammer. Identification of spammers is indirectly helpful in identifying whether the reviews are spam or not. The detection of review spammers is serious concern for the E-commerce business. To help researchers in this vibrant area, we present the state of art approaches for review spammer detection. This paper presents a comprehensive survey of the existing spammer detection approaches describing the features used for individual and group spammer detection, dataset summary with details of reviews, products and reviewers. The main aim of this paper is to provide a basic, comprehensive and comparative study of current research on detecting review spammer using machine learning techniques and give future directions. This paper also provides a concise summary of published research to help potential researchers in this area to innovate new techniques.  相似文献   

垃圾图像判别中的特征提取与选择研究*   总被引:1,自引:1,他引:0  
对垃圾图像判别问题中的特征提取和特征选择研究现状进行了总结。从特征的可区分性、鲁棒性和提取效率三个方面比较了垃圾图像判别中的主要特征,分析了特征的优缺点。结合分类学习算法、仿真实验结果,对已有的主要特征选择和分析方法进行比对,为进一步研究特征提取、特征选择方法,提高垃圾图像分类器的性能和效率提供有价值的参考。  相似文献   

Availability of millions of products and services on e-commerce sites makes it difficult to search the best suitable product according to the requirements because of existence of many alternatives. To get rid of this the most popular and useful approach is to follow reviews of others in opinionated social medias, who have already tried them. Almost all e-commerce sites provide facility to the users for giving views and experience of the product and services they experienced. The customers reviews are increasingly used by individuals, manufacturers and retailers for purchase and business decisions. As there is no scrutiny over the reviews received, anybody can write anything unanimously which conclusively leads to review spam. Moreover, driven by the desire of profit and/or publicity, spammers produce synthesized reviews to promote some products/brand and demote competitors products/brand. Deceptive review spam has seen a considerable growth overtime. In this work, we have applied supervised as well as unsupervised techniques to identify review spam. Most effective feature sets have been assembled for model building. Sentiment analysis has also been incorporated in the detection process. In order to get best performance some well-known classifiers were applied on labeled dataset. Further, for the unlabeled data, clustering is used after desired attributes were computed for spam detection. Additionally, there is a high chance that spam reviewers may also be held responsible for content pollution in multimedia social networks, because nowadays many users are giving the reviews using their social network logins. Finally, the work can be extended to find suspicious accounts responsible for posting fake multimedia contents into respective social networks.  相似文献   

从图片垃圾邮件的现状着手,通过对图片垃圾邮件的分析,将图片垃圾邮件与文本垃圾邮件之间的不同点进行了对比,并对图片垃圾邮件的特征进行了总结.与此同时,对图片垃圾邮件过滤中常用的一些过滤方法,例如OCR(最优字符识别)以及指纹技术进行了介绍,分析了其优缺点,并结合它们自身的缺点提出了一些建设性看法.最后对最新的反垃圾邮件研究成果作了简略描述,并对垃圾邮件的发展作出了展望.  相似文献   

Fusing and mining opinions from reviews posted in webs or social networks is becoming a popular research topic in recent years in order to analyze public opinions on a specific topic or product. Existing research has been focused on extraction, classification and summarization of opinions from reviews in news websites, forums and blogs. An important issue that has not been well studied is the degree of relevance between a review and its corresponding article. Prior work simply divides reviews into two classes: spam and non-spam, neglecting that the non-spam reviews could have different degrees of relevance to the article. In this paper, we propose a notion of “Review Pertinence” to study the degree of this relevance. Unlike usual methods, we measure the pertinence of review by considering not only the similarity between a review and its corresponding article, but also the correlation among reviews. Experiment results based on real data sets collected from a number of popular portal sites show the obvious effectiveness of our method in ranking reviews based on their pertinence, compared with three baseline methods. Thus, our method can be applied to efficiently retrieve reviews for opinion fusion and mining and filter review spam in practice.  相似文献   

Today's e-commerce is highly depended on increasingly growing online customers’ reviews posted in opinion sharing websites. This fact, unfortunately, has tempted spammers to target opinion sharing websites in order to promote and demote products. To date, different types of opinion spam detection methods have been proposed in order to provide reliable resources for customers, manufacturers and researchers. However, supervised approaches suffer from imbalance data due to scarcity of spam reviews in datasets, rating deviation based filtering systems are easily cheated by smart spammers, and content based methods are very expensive and majority of them have not been tested on real data hitherto.The aim of this paper is to propose a robust review spam detection system wherein the rating deviation, content based factors and activeness of reviewers are employed efficiently. To overcome the aforementioned drawbacks, all these factors are synthetically investigated in suspicious time intervals captured from time series of reviews by a pattern recognition technique. The proposed method could be a great asset in online spam filtering systems and could be used in data mining and knowledge discovery tasks as a standalone system to purify product review datasets. These systems can reap benefit from our method in terms of time efficiency and high accuracy. Empirical analyses on real dataset show that the proposed approach is able to successfully detect spam reviews. Comparison with two of the current common methods, indicates that our method is able to achieve higher detection accuracy (F-Score: 0.86) while removing the need for having specific fields of Meta data and reducing heavy computation required for investigation purposes.  相似文献   

随着互联网的发展,用户倾向于在购物、旅游、用餐之前参考线上评论.之后,他们也会发表评论来表达自身意见.线上评论越来越具有价值.评论对用户决策的重要导向作用催生了虚假评论.虚假评论,指用户由于利益、个人偏见等因素发布的不符合产品真实特性的评论.这些虚假评论语言上模仿真实评论,消费者很难识别出来.国内外学者综合运用自然语言处理技术来研究虚假评论检测问题.从特征工程的角度分析,虚假评论检测方法可以分为三类:基于语言特征和行为特征的方法、基于图结构的方法、基于表示学习的方法.主要描述了检测的一般流程,归纳了三类研究方法常用的特征,比较了方法的优缺点,并且介绍了研究常用的数据集.最后探讨了未来研究方向.  相似文献   

图像型垃圾邮件过滤技术综述*   总被引:4,自引:3,他引:1  
从基于图像特征的图像型垃圾邮件的检测难点入手,总结了目前用于识别垃圾邮件的图像特征,将其归类为文件属性、图像属性等八类特征。对已经用于图像型垃圾邮件分类的五种分类算法,包括支持向量机、决策树法、最大熵模型、DS证据理论、贝叶斯算法进行了理论分析与效果比较。最后对图像型垃圾邮件过滤技术的研究方向进行了展望。  相似文献   

