首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 145 毫秒
1.
陈千  桂志国  郭鑫  向阳 《计算机应用》2015,35(2):456-460
针对网络大数据时代文本流的主题演化研究大多基于经典概率主题模型,以词袋假设为前提导致主题的语义缺失问题和批处理问题,提出一种在线增量的基于特征本体的主题演化算法。首先,基于词共现和通用本体库WordNet构建特征本体,用特征本体对文本流主题进行建模;其次,提出一种文本流主题矩阵构建算法,实现在线增量主题演化分析;最后,依据该矩阵提出文本流主题本体演化图构建算法,利用特征本体的子图相似度计算主题相似度,从而获得文本流中主题随时间的演化模式。在科技文献上的实验上,满意度同传统在线潜在狄利克雷分配模型(LDA)不相上下,但时间复杂度降低到O(nK+N)。所提出的方法引入了本体,加入了语义关系标注,可图形化展现主题的语义特征,并在此基础上在线增量地实现了主题演化图的构建,在语义解释性和主题可视化方面更具有优势。  相似文献   

2.
Tag recommender schemes suggest related tags for an untagged resource and better tag suggestions to tagged resources. Tagging is very important if the user identifies the tag that is more precise to use in searching interesting blogs. There is no clear information regarding the meaning of each tag in a tagging process. An user can use various tags for the same content, and he can also use new tags for an item in a blog. When the user selects tags, the resultant metadata may comprise homonyms and synonyms. This may cause an improper relationship among items and ineffective searches for topic information. The collaborative tag recommendation allows a set of freely selected text keywords as tags assigned by users. These tags are imprecise, irrelevant, and misleading because there is no control over the tag assignment. It does not follow any formal guidelines to assist tag generation, and tags are assigned to resources based on the knowledge of the users. This causes misspelled tags, multiple tags with the same meaning, bad word encoding, and personalized words without common meaning. This problem leads to miscategorization of items, irrelevant search results, wrong prediction, and their recommendations. Tag relevancy can be judged only by a specific user. These aspects could provide new challenges and opportunities to its tag recommendation problem. This paper reviews the challenges to meet the tag recommendation problem. A brief comparison between existing works is presented, which we can identify and point out the novel research directions. The overall performance of our ontology‐based recommender systems is favorably compared to other systems in the literature.  相似文献   

3.
针对人物标签推荐中多样性及推荐标签质量问题,该文提出了一种融合个性化与多样性的人物标签推荐方法。该方法使用主题模型对用户关注对象建模,通过聚类分析把具有相似言论的对象划分到同一类簇;然后对每个类簇的标签进行冗余处理,并选取代表性标签;最后对不同类簇中的标签融合排序,以获取Top-K个标签推荐给用户。实验结果表明,与已有推荐方法相比,该方法在反映用户兴趣爱好的同时,能显著提高标签推荐质量和推荐结果的多样性。  相似文献   

4.
传统基于项目的协同过滤算法在计算项目相似度时仅依靠评分数据,未考虑项目的自身特征。社会化标注的出现使得标签能在一定程度上反映项目特征,但标签具有语义模糊的特点,因此直接将标签纳入协同过滤算法存在一定问题。为解决上述问题,提出一种改进的基于项目的协同过滤推荐算法。该算法对标签进行聚类并生成主题标签簇,根据项目标注情况计算项目与主题间的相关度并生成项目-主题相关度矩阵,同时将其与项目-评分矩阵相结合来计算项目间的相似度,采用协同过滤完成对目标项目的评分预测,以实现个性化推荐。在Movielens数据集上的实验结果表明,该算法能够解决标签的语义模糊问题并提升推荐质量。  相似文献   

5.
Topic-based ranking in Folksonomy via probabilistic model   总被引:1,自引:0,他引:1  
Social tagging is an increasingly popular way to describe and classify documents on the web. However, the quality of the tags varies considerably since the tags are authored freely. How to rate the tags becomes an important issue. Most social tagging systems order tags just according to the input sequence with little information about the importance and relevance. This limits the applications of tags such as information search, tag recommendation, and so on. In this paper, we pay attention to finding the authority score of tags in the whole tag space conditional on topics and put forward a topic-sensitive tag ranking (TSTR) approach to rank tags automatically according to their topic relevance. We first extract topics from folksonomy using a probabilistic model, and then construct a transition probability graph. Finally, we perform random walk over the topic level on the graph to get topic rank scores of tags. Experimental results show that the proposed tag ranking method is both effective and efficient. We also apply tag ranking into tag recommendation, which demonstrates that the proposed tag ranking approach really boosts the performances of social-tagging related applications.  相似文献   

6.
对基于向量空间模型的检索方法进行改进,提出基于本体语义的信息检索模型。将WordNet词典作为参照本体来计算概念之间的语义相似度,依据查询中标引项之间的相似度,对查询向量中的标引项进行权值调整,并参照Word-Net本体对标引项进行同义和上下位扩展,在此基础上定义查询与文档间的相似度。与传统的基于词形的信息检索方法相比,该方法可以提高语义层面上的检索精度。  相似文献   

7.
在深入分析当前流行的文本主题提取技术和方法的基础上,提出一种将本体技术应用于文本主题提取的方法。使用本体技术用语义向量表示文本句,对文本进行预处理,然后进行语义相似度计算和语义聚类,最后从每类中抽取代表句生成文本主题。实验结果表明,该方法在提取文本主题方面是一个有效的方法。  相似文献   

8.
针对目前基于语义网的本体映射算法中背景本体搜索面少、本体收集不精确的问题,利用基于虚拟文档的映射技术提取在Word-Net中与概念同义的同义词集,将对单个概念进行搜索转换成对同义概念集进行搜索,从而扩大本体搜索面,获取更多背景本体.提出基于语义环境的动态本体映射算法来排除错误背景本体,使本体收集更加精确.实验结果表明,该算法可有效提高映射的查全率和查准率.  相似文献   

9.
Social tagging systems leverage social interoperability by facilitating the searching, sharing, and exchanging of tagging resources. A major drawback of existing social tagging systems is that social tags are used as keywords in keyword-based search. They focus on keywords and human interpretability rather than on computer interpretable semantic knowledge. Therefore, social tags are useful for information sharing and organizing, but they lack the computer-interpretability needed to facilitate a personalized social tag recommendation. An interesting issue is how to automatically generate a personalized social tag recommendation list to users when a resource is accessed by users. The novel solution proposed in this study is a hybrid approach based on semantic tag-based resource profile and user preference to provide personalized social tag recommendation. Experiments show that the Precision and Recall of the proposed hybrid approach effectively improves the accuracy of social tag recommendation.  相似文献   

10.
一种基于语义网的本体映射改进算法   总被引:1,自引:1,他引:0       下载免费PDF全文
针对目前基于语义网的本体映射算法中背景本体搜索面少、本体收集不精确的问题,利用基于虚拟文档的映射技术提取在Word—Net中与概念同义的同义词集,将对单个概念进行搜索转换成对同义概念集进行搜索,从而扩大本体搜索面,获取更多背景本体。提出基于语义环境的动态本体映射算法来排除错误背景本体,使本体收集更加精确。实验结果表明,该算法可有效提高映射的查全率和查准率。  相似文献   

11.
邢双双  刘名威  彭鑫 《软件学报》2022,33(11):4027-4045
开源及企业软件项目和各类软件开发网站上的代码片段是重要的软件开发资源.然而,很多开发者代码搜索需求反映的代码的高层意图和主题难以通过基于代码文本的信息检索技术来实现精准的代码搜索.因此,反映代码整体意图和主题的语义标签对于改进代码搜索、辅助代码理解都具有十分重要的作用.现有的标签生成技术主要面向文本内容或依赖于历史数据,无法满足大范围代码语义标注和辅助搜索、理解的需要.针对这一问题,提出了一种基于知识图谱的代码语义标签自动生成方法KGCodeTagger.该方法通过基于API文档和软件开发问答文本的概念和关系抽取构造软件知识图谱,作为代码语义标签生成的基础.针对给定的代码,该方法识别并抽取出通用API调用或概念提及,并链接到软件知识图谱中的相关概念上.在此基础上,该方法进一步识别与所链接的概念相关的其他概念作为候选,然后按照多样性和代表性排序,产生最终的代码语义标签.通过实验对KGCodeTagger软件知识图谱构建的各个步骤进行了评估,并通过与几个已有的基准方法的比较,对所生成的代码语义标签质量进行了评估.实验结果表明,KGCodeTagger的软件知识图谱构建步骤是合理有效的,该方法所生成的代码语义标签是高质量、有意义的,能够帮助开发人员快速理解代码的意图.  相似文献   

12.
向微博用户推荐对其有价值和感兴趣的内容,是改善用户体验的重要途径。通过分析微博的特点以及现有微博推荐算法的缺陷,利用标签信息表征用户兴趣,提出一种基于标签概率相关性的微博推荐方法 LPCMR。首先,该方法利用标签之间的概率相关性,构造标签相似性矩阵。然后通过相关性标签权重加权方案,加强标签权重,构建用户-标签矩阵。针对用户标签矩阵稀疏的问题,采用标签相似性矩阵对用户-标签矩阵进行更新,使该矩阵既包含用户兴趣信息,又包含标签与标签之间的关系。以新浪微博公开API抓取的微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文提出的推荐算法具有较好的效果。  相似文献   

13.
User modeling is aimed at capturing the users’ interests in a working domain, which forms the basis of providing personalized information services. In this paper, we present an ontology based user model, called user ontology, for providing personalized information service in the Semantic Web. Different from the existing approaches that only use concepts and taxonomic relations for user modeling, the proposed user ontology model utilizes concepts, taxonomic relations, and non-taxonomic relations in a given domain ontology to capture the users’ interests. As a customized view of the domain ontology, a user ontology provides a richer and more precise representation of the user’s interests in the target domain. Specifically, we present a set of statistical methods to learn a user ontology from a given domain ontology and a spreading activation procedure for inferencing in the user ontology. The proposed user ontology model with the spreading activation based inferencing procedure has been incorporated into a semantic search engine, called OntoSearch, to provide personalized document retrieval services. The experimental results, based on the ACM digital library and the Google Directory, support the efficacy of the user ontology approach to providing personalized information services.  相似文献   

14.
马慧芳  张迪  赵卫中  史忠植 《软件学报》2019,30(11):3397-3412
向微博用户推荐对其有价值和感兴趣的内容,是改善用户体验的重要途径.通过分析微博特点以及现有微博推荐算法的缺陷,利用标签信息表征用户兴趣,提出一种结合标签扩充与标签概率相关性的微博推荐方法.首先,考虑到大部分微博用户未给自己添加任何标签或添加标签过少,视用户发布微博为超边,微博中的词视为超点来构建超图,并以一定的加权策略对超边和超点进行加权,通过在超图上随机游走,得到一定数量的关键词,对微博用户标签进行扩充;然后,采用相关性标签权重加权方案构建用户-标签矩阵,利用标签之间的概率相关性,构造标签相似性矩阵,对用户-标签矩阵进行更新,使该矩阵既包含用户兴趣信息,又包含标签与标签之间的关系.以新浪微博公开API抓取的微博信息作为实验数据进行了一系列的实验和分析,结果表明,该推荐算法具有较好的效果.  相似文献   

15.
主题检测是文本挖掘的一个重要研究方向,传统的主题检测方法以统计理论为基础,忽略了数据本身蕴含的语义,带来了偏差严重、与样本数据高度相关等缺点。针对以上缺点,面向文本流数据,提出一种基于特征本体的主题检测方法。首先构建文本特征本体;其次,将较为复杂的文本特征本体看作是由若干主题组成的连通图,然后将主题连通图分解成单边图集合;再次,将主题相似度计算问题转化为单边图贡献度和图相似度的计算问题。最后,对每一批新文本集检测是否有新主题,从而使得主题的个数随着时间的推移而增加。在科技文献和新闻语料上进行实证研究,结果发现阈值δ参数决定文本流中新主题出现的频率,且实验结果同经典主题模型基本保持一致。除此之外,同传统的方法相比,本文提出的方法能更好地支持主题的语义表示,且适用于流数据,能增量实现主题检测,在应用上具有更大的优势。  相似文献   

16.
Social annotation systems (SAS) allow users to annotate different online resources with keywords (tags). These systems help users in finding, organizing, and retrieving online resources to significantly provide collaborative semantic data to be potentially applied by recommender systems. Previous studies on SAS had been worked on tag recommendation. Recently, SAS‐based resource recommendation has received more attention by scholars. In the most of such systems, with respect to annotated tags, searched resources are recommended to user, and their recent behavior and click‐through is not taken into account. In the current study, to be able to design and implement a more precise recommender system, because of previous users' tagging data and users' current click‐through, it was attempted to work on the both resource (such as web pages, research papers, etc.) and tag recommendation problem. Moreover, by applying heat diffusion algorithm during the recommendation process, more diverse options would present to the user. After extracting data, such as users, tags, resources, and relations between them, the recommender system so called “Swallow” creates a graph‐based pattern from system log files. Eventually, following the active user path and observing heat conduction on the created pattern, user further goals are anticipated and recommended to him. Test results on SAS data set demonstrate that the proposed algorithm has improved the accuracy of former recommendation algorithms.  相似文献   

17.
As a valuable tool for text understanding, semantic similarity measurement enables discriminative semantic-based applications in the fields of natural language processing, information retrieval, computational linguistics and artificial intelligence. Most of the existing studies have used structured taxonomies such as WordNet to explore the lexical semantic relationship, however, the improvement of computation accuracy is still a challenge for them. To address this problem, in this paper, we propose a hybrid WordNet-based approach CSSM-ICSP to measuring concept semantic similarity, which leverage the information content(IC) of concepts to weight the shortest path distance between concepts. To improve the performance of IC computation, we also develop a novel model of the intrinsic IC of concepts, where a variety of semantic properties involved in the structure of WordNet are taken into consideration. In addition, we summarize and classify the technical characteristics of previous WordNet-based approaches, as well as evaluate our approach against these approaches on various benchmarks. The experimental results of the proposed approaches are more correlated with human judgment of similarity in term of the correlation coefficient, which indicates that our IC model and similarity detection approach are comparable or even better for semantic similarity measurement as compared to others.  相似文献   

18.
模糊集与本体结合的数据挖掘方法得到了广泛的关注。为了丰富数据挖掘效果以及数据挖掘得出的规则的完整性,本文在模糊本体的挖掘算法基础上,提出了模糊本体中叶子结点的相似度定义以及不同语义层次所含项目集的数目定义多重最小支持度,提出了基于模糊本体的广义关联规则算法。对比实验证明,基于模糊本体的广义关联规则算法的挖掘具有更强的可读性,获得的语义关联规则更加丰富,促进了在广义关联规则挖掘过程中使概念泛化更加合理,提高了算法效率。  相似文献   

19.
Nowadays, due to the rapid growth of digital technologies, huge volumes of image data are created and shared on social media sites. User-provided tags attached to each social image are widely recognized as a bridge to fill the semantic gap between low-level image features and high-level concepts. Hence, a combination of images along with their corresponding tags is useful for intelligent retrieval systems, those are designed to gain high-level understanding from images and facilitate semantic search. However, user-provided tags in practice are usually incomplete and noisy, which may degrade the retrieval performance. To tackle this problem, we present a novel retrieval framework that automatically associates the visual content with textual tags and enables effective image search. To this end, we first propose a probabilistic topic model learned on social images to discover latent topics from the co-occurrence of tags and image features. Moreover, our topic model is built by exploiting the expert knowledge about the correlation between tags with visual contents and the relationship among image features that is formulated in terms of spatial location and color distribution. The discovered topics then help to predict missing tags of an unseen image as well as the ones partially labeled in the database. These predicted tags can greatly facilitate the reliable measure of semantic similarity between the query and database images. Therefore, we further present a scoring scheme to estimate the similarity by fusing textual tags and visual representation. Extensive experiments conducted on three benchmark datasets show that our topic model provides the accurate annotation against the noise and incompleteness of tags. Using our generalized scoring scheme, which is particularly advantageous to many types of queries, the proposed approach also outperforms state-of-the-art approaches in terms of retrieval accuracy.  相似文献   

20.
提出了一种词汇和本体概念间的语义相似度计算方法。该方法利用编辑距离和维基百科从语法和语义两方面综合考虑词汇和概念间的语义相似度。在领域本体的指导下,将方法应用于语义标注过程,建立词汇与本体概念之间的映射。在标注过程中建立知识库,提高算法性能,实验结果说明该方法是行之有效的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号