首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 796 毫秒
1.
准确检索出博客空间中人们对重要话题、热点事件的观点看法对市场调研、网络舆情发现与预警等应用有重要意义。博客观点检索的目标是检索出不仅与特定查询主题相关而且包含针对该主题的评论的博文单元并依据观点强度进行排序。首先给出博客观点检索系统的框架,然后给出主题博文检索模型以及博客观点权重的计算方法。实验结果表明:所设计的博客观点检索系统能够有效地检索出对查询主题含有主观观点的博客,具有较好的应用价值。  相似文献   

2.
基于概率推理模型的博客倾向性检索研究   总被引:2,自引:0,他引:2  
近年来博客作为一种新兴的大众化新闻发布媒介越来越受到人们和业界的关注.博客之间通过互相引用、互相推荐形成一个巨大的博客空间.在博客空间中,人们既可以自由发表对现实生活各种问题的观点,表达自己的情感,也可以对市场上出现的新产品进行评论.准确检索出博客空间中人们对重要话题、热点事件的观点看法对市场调研、网络舆情发现与预警等应用有重要意义.博客倾向性检索的目标是检索出与给定查询既要主题相关又要有与该查询相关评论的博文.为实现该目标,把概率推理模型应用于博客倾向性检索中,提出一个基于概率推理模型的博客倾向性检索算法.该算法把主题相关性评分和倾向性评分合并到一个统一的概率推理理论模型,能够有效计算博文中出现的主题描述与查询的主题相关性,合理度量倾向性词描述查询主题的倾向性强弱,并融合二者分数形成最后整体评分.实验表明,该算法能够有效地识别博客空间中与给定查询相关的观点,获得较好的结果.  相似文献   

3.
针对PageRank算法在博文排序中的主题漂移和轻视新博文、重视旧博文的不足,以及存在与用户查询相关的博文并不靠前的问题,提出一种多特征融合的博文排序算法。该方法在分析博客自身结构特征的基础上,通过两链接博文的内容相似度和结构相似度以及博文的时间新鲜度和博主的受欢迎程度,得到博文的分数。实验结果证明,该算法性能优于传统的博文排序算法。  相似文献   

4.
基于语义理解的中文博文倾向性分析   总被引:3,自引:0,他引:3  
何凤英 《计算机应用》2011,31(8):2130-2133
博客作为一种大众化的信息及文化载体被越来越多的人所接受,博客文本的情感倾向性分析也逐渐成为信息挖掘领域的热点。目前,文本倾向性分析的研究大都围绕普通文本、新闻评论进行,针对博客文本的特点,提出一种基于语义理解的博客文本倾向性分类方法。首先以HowNet情感词语集为基准,构建中文基础情感词典,并用中文词语相似度方法计算词语的情感权值,同时分析语义层副词的出现规律及其对文本倾向性判断的影响,最后利用博主的语言风格因素对倾向性结果进行修正实现博文的情感分类。实验表明,该方法能有效地判定博客文本情感倾向性。  相似文献   

5.
针对已有的基于链接分析的热点发现方法存在准确度较低、易受作弊链接影响、易产生主题漂移现象等问题,利用复杂网络簇结构具有高度主题相关的特点,提出一种融合应用链接分析和萤火虫算法聚类博文的热点话题发现算法。以博文页面为节点,与博文内容相同或相关的链接作为边,根据博文及博主的相关属性,综合评定页面权重,建立博客话题模型;运用萤火虫算法对博文进行聚类获得聚类中心,按页面权重将聚类中心从大到小排序,形成热点话题热度排行。实验结果表明,该方法能够发现精度更高、数量更多的博客热点话题。  相似文献   

6.
龙珑  邓伟 《计算机应用研究》2013,30(4):1095-1098
由于目前博客基本是文本格式,提出基于语义理解分析博文倾向性的方法。算法以HowNet情感词语词库为基础,绿色网络云系统可以创建并不断完善绿色网络系统的情感字典云数据库,使用词语相似度方法计算词语的情感权值,同时利用词语的情感权值的计算对博文倾向性作初始判定,从而得到博文的情感倾向性判定结果。最后通过实验对该方法进行验证,结果表明该算法可以有效地判定博客文本情感倾向性,为绿色网络系统是否过滤该博客提供准确依据。  相似文献   

7.
博客,英文名为Blogger与WebLog的混合词.博客是由博主进行管理并不定时发布新文章的平台.本文主要阐述简单博客系统实现博主在线博文发表及回复评论,读者浏览博文及评论博文,为读者提供了友好的信息共享和便捷的交流平台. 该网站基于B/S模式,在Windows开发环境下采用Xmapp建站集成包,Sublime编译器以及PHP,HTML,JQuery,CSS等技术,后台数据库使用MySql,可满足数据的存储需求,实现了博主与读者之间的简单信息交流.  相似文献   

8.
博客空间意见领袖鉴别是网络舆情分析的重要研究方向。针对传统方法采用博客间的链接分析忽略了博文内容的问题,因此提出一种基于链接分析和内容分析相结合的算法。该算法从博文获得的内链接数、外链接数、评论数和文章长度四个方面计算博主的影响力得分,排名后选取Top-K个博主作为意见领袖。实验结果表明,该算法与基于链接分析的算法相比,具有更好的全路径覆盖率,选出的意见领袖话题更具多样性,可以应用于网络舆情中意见领袖的分析。  相似文献   

9.
该文在研究了信息检索理论与文本倾向性分析技术等的基础上,结合国内外关于观点检索的相关研究,提出了基于关联度的文本观点检索算法。它综合考虑了主题检索过程中的查询扩展、文本检索相关度、文本倾向性强度和检索主题与文本情感的关联度等对观点检索最后结果的影响。该算法从理论上考虑了观点检索不同因素之间的相互影响问题。通过对COAE2008观点检索子任务的实验数据进行实验,结果表明 该文提出的基于关联度的观点检索算法可以取得较好的效果。  相似文献   

10.
博客作为一种用户发表其观点和看法的载体已成为Web上一个重要的情感抒发与交流平台,博文搜索为这种交流提供了方便快捷的途径.很多时候,用户进行博文搜索时更关注作者对事件所持的观点或情感,但目前的博文搜索返回结果大多基于主题而非情感倾向.基于此提出一种基于句法依存分析技术的算法SOAD(sentimentorientationanalysisbasedonsyntacticdependency)对博文搜索结果进行情感倾向性分析.基于SOAD算法,构建了一个中文博文搜索原型系统,对博文搜索结果进行再处理.实验证明,一方面,SOAD算法在分析博文情感上具有更大的优势;另一方面,建立的原型系统实现了依据情感倾向返回搜索结果的目标.  相似文献   

11.
The massive acceptance and usage of the blog communities by a significant portion of the Web users has rendered knowledge extraction from blogs a particularly important research field. One of the most interesting related problems is the issue of the opinionated retrieval, that is, the retrieval of blog entries which contain opinions about a topic. There has been a remarkable amount of work towards the improvement of the effectiveness of the opinion retrieval systems. The primary objective of these systems is to retrieve blog posts which are both relevant to a given query and contain opinions, and generate a ranked list of the retrieved documents according to the relevance and opinion scores. Although a wide variety of effective opinion retrieval methods have been proposed, to the best of our knowledge, none of them takes into consideration the issue of the importance of the retrieved opinions. In this work we introduce a ranking model which combines the existing retrieval strategies with query-independent information to enhance the ranking of the opinionated documents. More specifically, our model accounts for the influence of the blogger who authored an opinion, the reputation of the blog site which published a specific blog post, and the impact of the post itself. Furthermore, we expand the current proximity-based opinion scoring strategies by considering the physical locations of the query and opinion terms within a document. We conduct extensive experiments with the TREC Blogs08 dataset which demonstrate that the application of our methods enhances retrieval precision by a significant margin.  相似文献   

12.
随着移动产业发展和移动技术提高,基于用户位置的业务迅速发展,如:紧急援助、信息查询等,基于位置业务创新已经成为移动产业发展的巨大推动力。文中在ISG平台上设计和实现基于位置的手机博客系统。与传统的手机博客系统相比较,文中引入用户位置信息。用户写博客时,系统自动记录用户的位置信息,并把用户位置与其所写博客动态绑定存储;用户可以根据自己的位置动态搜索博客。  相似文献   

13.
Blog retrieval is a complex task because of the informal language usage.Blogs deviate from the language which is used in traditional corpora largely due to various reasons.Spelling errors,grammatical irregularity,over use of abbreviations and symbolic characters like emotions are a few reasons of irregular corpus blogs.To make the retrieval of blogs easier,the novel idea of personalized semantic based blog retrieval(PSBBR) system is discussed in this paper.The blogs are tagged with a relationship to one another with reference to ontology.The meanings of the blog content and key term are tagged as XML tags.The query term accesses the XML tags to retrieve entire blog content.The system is evaluated with a huge number of blogs extracted from various blog sources.Relevance score is calculated for every blog associated with  相似文献   

14.
Although topic detection and tracking techniques have made great progress, most of the researchers seldom pay more attention to the following two aspects. First, the construction of a topic model does not take the characteristics of different topics into consideration. Second, the factors that determine the formation and development of hot topics are not further analyzed. In order to correctly extract news blog hot topics, the paper views the above problems in a new perspective based on the W2T (Wisdom Web of Things) methodology, in which the characteristics of blog users, context of topic propagation and information granularity are investigated in a unified way. The motivations and features of blog users are first analyzed to understand the characteristics of news blog topics. Then the context of topic propagation is decomposed into the blog community, topic network and opinion network, respectively. Some important factors such as the user behavior pattern, opinion leader and network opinion are identified to track the development trends of news blog topics. Moreover, a blog hot topic detection algorithm is proposed, in which news blog hot topics are identified by measuring the duration, topic novelty, attention degree of users and topic growth. Experimental results show that the proposed method is feasible and effective. These results are also useful for further studying the formation mechanism of opinion leaders in blogspace.  相似文献   

15.
Blogs are increasingly accepted as a useful means to proliferate a variety of information on the web. As the popularity of blogs grows rapidly, a number of blog search engines have appeared recently to help users access and discover blog posts efficiently. Nevertheless, existing approaches tend to focus on ranking the blog posts according to their recency or popularity only, leaving the problem of retrieving more topic relevant posts to a user’s query largely unexplored. In this paper, we present a novel blog ranking framework, called PTRank, that improves search quality by taking account of relevance feedback from users as well as various information available from RSS feeds. A neural network method is employed to learn ranking functions that provide a relevance score between a keyword and a blog post. Extensive experiments on real blog data have been conducted to validate the proposed ranking framework for blog post search, and the results indicate that PTRank performs significantly better than the existing popular approach.  相似文献   

16.
范纯龙  夏佳  肖昕  吕红伟  徐蕾 《计算机应用》2011,31(9):2417-2420
博客作为一类重要的网络信息资源,其评论信息抽取是舆情分析等研究工作的基础。总结了当前主流的博客评论抽取算法,介绍了页面结构在信息抽取中的应用,并结合人理解网页时充分利用“首页”等指示性短语的特点,提出利用具有明确语义和功能指示作用的功能语义单元来抽取评论信息的技术;详细介绍了抽取过程中涉及的页面结构线性化、功能语义单元识别、正文识别和评论抽取算法等内容。最后,通过实验证明,该技术在博客的正文和评论信息抽取上能取得良好效果。  相似文献   

17.
针对全文本关键字检索的时间成本高,以及采用标签/类别会产生语句歧义和同义词等问题,提出在博客链接平台上选取联合关键字进行博客聚类。假设一个博客文章被查询的候选关键字(或者联合关键字)可以用于表示这个博客文章的主题。为验证该假设,首先将跟踪代码嵌入到博客链接(BC)组件中,以收集读者查询的关键字。然后,选取适当的候选关键字作为联合关键字。最后,使用重叠投影、交互信息投影、分布式分布信息和肯德尔 系数这四种相似性度量以验证BC组件提取的联合关键字。实验结果表明,提出的方法可以为查询者提供一条找到对应博客的快速通道。此外,生成的联合关键字可以减少全文本关键字检索过程的复杂度和冗余度,很好地满足了博客用户的需求。  相似文献   

18.
In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature.  相似文献   

19.
Blog clustering is an important approach for online public opinion analysis. The traditional clustering methods, usually group blogs by keywords, stories and timeline, which usually ignore opinions and emotions expressed in the blog articles. In this paper, an integrated graph-based model for clustering Chinese blogs by embedded sentiments is proposed. A novel graph-based representation and the corresponding clustering algorithm are applied on the Chinese blog search results. The proposed model SoB-graph considers not only sentiment words but also structural information in blogs. Experimental results show that comparing with the traditional graph-based document representation model and vector space document representation model, the proposed SoB-graph model has achieved better performance in clustering sentiments in Chinese blog documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号