首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Tags are user-generated keywords for entities. Recently tags have been used as a popular way to allow users to contribute metadata to large corpora on the web. However, tagging style websites lack the function of guaranteeing the quality of tags for other usages, like collaboration/community, clustering, and search, etc. Thus, as a remedy function, automatic tag recommendation which recommends a set of candidate tags for user to choice while tagging a certain document has recently drawn many attentions. In this paper, we introduce the statistical language model theory into tag recommendation problem named as language model for tag recommendation (LMTR), by converting the tag recommendation problem into a ranking problem and then modeling the correlation between tag and document with the language model framework. Furthermore, we leverage two different methods based on both keywords extraction and keywords expansion to collect candidate tag before ranking with LMTR to improve the performance of LMTR. Experiments on large-scale tagging datasets of both scientific and web documents indicate that our proposals are capable of making tag recommendation efficiently and effectively.  相似文献   

3.
Bursty event detection from collaborative tags   总被引:1,自引:0,他引:1  
  相似文献   

4.
Automatic image tagging automatically assigns image with semantic keywords called tags, which significantly facilitates image search and organization. Most of present image tagging approaches are constrained by the training model learned from the training dataset, and moreover they have no exploitation on other type of web resource (e.g., web text documents). In this paper, we proposed a search based image tagging algorithm (CTSTag), in which the result tags are derived from web search result. Specifically, it assigns the query image with a more comprehensive tag set derived from both web images and web text documents. First, a content-based image search technology is used to retrieve a set of visually similar images which are ranked by the semantic consistency values. Then, a set of relevant tags are derived from these top ranked images as the initial tag set. Second, a text-based search is used to retrieve other relevant web resources by using the initial tag set as the query. After the denoising process, the initial tag set is expanded with other tags mined from the text-based search result. Then, an probability flow measure method is proposed to estimate the probabilities of the expanded tags. Finally, all the tags are refined using the Random Walk with Restart (RWR) method and the top ones are assigned to the query images. Experiments on NUS-WIDE dataset show not only the performance of the proposed algorithm but also the advantage of image retrieval and organization based on the result tags.  相似文献   

5.
一种面向协作标签系统的图片检索聚类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
为了更有效地进行图片检索,提出了一种面向Web2.0协作标签系统的图片检索聚类方法。该算法首先针对标签空间由于标签表达多样性带来的不一致问题,并通过挖掘标签间的词汇关系实现语义级查询扩展来得到语义可能相关的扩展图片结果集;然后根据标签间的相关度度量选出图片结果集中与查询标签高相关的标签集,接着采用一种自顶向下启发式的图划分算法来自动对次相关标签集进行分类。最后图片结果集即根据标签分类结果被聚类。为验证该方法的效果,从标签图片共享网站Flickr上随机下载了大量真实图片集以及所含带的标签元数据,在已实现的图片检索原型系统PivotBrowser上进行了大量实验,结果证明,该聚类算法能有效解决标签空间存在的标签表达不一致问题和标签查询歧义性问题,能提供更满意的用户检索。  相似文献   

6.
In social tagging systems such as Delicious and Flickr,users collaboratively manage tags to annotate resources.Naturally,a social tagging system can be modeled as a (user,tag,resource) hypernetwork,where there are three different types of nodes,namely users,resources and tags,and each hyperedge has three end nodes,connecting a user,a resource and a tag that the user employs to annotate the resource.Then how can we automatically cluster related users,resources and tags,respectively? This is a problem of community detection in a 3-partite,3-uniform hypernetwork.More generally,given a K-partite K-uniform (hyper)network,where each (hyper)edge is a K-tuple composed of nodes of K different types,how can we automatically detect communities for nodes of different types? In this paper,by turning this problem into a problem of finding an efficient compression of the (hyper)network’s structure,we propose a quality function for measuring the goodness of partitions of a K-partite K-uniform (hyper)network into communities,and develop a fast community detection method based on optimization.Our method overcomes the limitations of state of the art techniques and has several desired properties such as comprehensive,parameter-free,and scalable.We compare our method with existing methods in both synthetic and real-world datasets.  相似文献   

7.
标签是Web 2.0时代信息分类与索引的重要方式.为解决标签系统所面临的不一致性、冗余性以及完备性等问题,标签推荐通过提供备选标签的方法来提高标签的质量.为了进一步提升标签推荐的质量,提出了一种基于标签系统中对象间关系与资源内容融合分析的标签推荐方法,给出了基于LDA(latent Dirichlet allocation)的融合表示对象间关系与资源内容的标签系统生成模型TSM/Forc,提出了一种基于概率的标签推荐方法,并给出了基于吉布斯(Gibbs)抽样的参数估计方法.实验结果表明,该方法可以提供比当前主流与最新方法更加准确的推荐结果.  相似文献   

8.
由于用户标签的不准确和语义模糊使得协作式标注图像检索正确率低,而现有垃圾标签过滤方法往往关注标签本身,忽略了协作式标签与图像的关联性。本文在分析协作式标注图像视觉内容与标签的关联性的基础上,提出一种基于协作式标注图像视觉内容的垃圾标签检测方法。该方法分析同一标签下图像视觉内容,设计不同的核函数用于颜色和SIFT(Scale invariant feature transform)特征子集,同时将2种低维特征映射到高维多模特征空间形成混合核函数,对同一标签下的图像进行基于混合核的最大最小距离聚类,少数群体的标签说明与图像内容关联性小则为用户标注错误的标签,从而检测垃圾标签。实验结果表明,该方法能够提高协作式图像垃圾标签检测的正确性。  相似文献   

9.
In this paper, we study the problem of tag completion. Given an image and a set of tags, only a few of the tags are known to be associated with this image or not, and the problem is to predict whether the other tags are associated with the image. To solve this problem, we propose to learn a tag scoring vector for each image and use it to predict the associated tags of the image. To learn the tag scoring vector, we use the method of local linear learning. A local linear function is used in the neighborhood of each image to predict the tag scoring vectors of its neighboring images. We construct a unified objective function for the learning of both tag scoring vectors and local linear function parameters. In this objective, we impose the learned tag scoring vectors to be consistent with the known associations to the tags of each image and also minimize the prediction error of each local linear function, while reducing the complexity of each local function. The objective function is optimized by an alternate optimization strategy and gradient descent methods in an iterative algorithm. We compare the proposed algorithm against different state-of-the-art tag completion methods, and the results show its advantages.  相似文献   

10.
A folksonomy consists of three basic entities, namely users, tags and resources. This kind of social tagging system is a good way to index information, facilitate searches and navigate resources. The main objective of this paper is to present a novel method to improve the quality of tag recommendation. According to the statistical analysis, we find that the total number of tags used by a user changes over time in a social tagging system. Thus, this paper introduces the concept of user tagging status, namely the growing status, the mature status and the dormant status. Then, the determining user tagging status algorithm is presented considering a user’s current tagging status to be one of the three tagging status at one point. Finally, three corresponding strategies are developed to compute the tag probability distribution based on the statistical language model in order to recommend tags most likely to be used by users. Experimental results show that the proposed method is better than the compared methods at the accuracy of tag recommendation.  相似文献   

11.
The advent of internet has led to a significant growth in the amount of information available, resulting in information overload, i.e. individuals have too much information to make a decision. To resolve this problem, collaborative tagging systems form a categorization called folksonomy in order to organize web resources. A folksonomy aggregates the results of personal free tagging of information and objects to form a categorization structure that applies utilizes the collective intelligence of crowds. Folksonomy is more appropriate for organizing huge amounts of information on the Web than traditional taxonomies established by expert cataloguers. However, the attributes of collaborative tagging systems and their folksonomy make them impractical for organizing resources in personal environments.This work designs a desktop collaborative tagging (DCT) system that enables collaborative workers to tag their documents. This work proposes an application in patent analysis based on the DCT system. Folksonomy in DCT is built by aggregating personal tagging results, and is represented by a concept space. Concept spaces provide synonym control, tag recommendation and relevant search. Additionally, to protect privacy of authors and to decrease the transmission cost, relations between tagged and untagged documents are constructed by extracting document’s features rather than adopting the full text.Experimental results reveal that the adoption rate of recommended tags for new documents increases by 10% after users have tagged five or six documents. Furthermore, DCT can recommend tags with higher adoption rates when given new documents with similar topics to previously tagged ones. The relevant search in DCT is observed to be superior to keyword search when adopting frequently used tags as queries. The average precision, recall, and F-measure of DCT are 12.12%, 23.08%, and 26.92% higher than those of keyword searching.DCT allows a multi-faceted categorization of resources for collaborative workers and recommends tags for categorizing resources to simplify categorization easier. Additionally, DCT system provides relevance searching, which is more effective than traditional keyword searching for searching personal resources.  相似文献   

12.
基于概率主题模型的标签预测   总被引:2,自引:1,他引:1  
袁柳  张龙波 《计算机科学》2011,38(7):175-180
充分利用用户自定义标签信息,是理解Web资源语义,提高Web应用智能程度的重要途径。针对资源标签分派中大量存在的信息不完整、不一致的现象,建立基于用户标记行为特征的概率主题模型,利用概率主题模型实现对标记信息不完整资源的标签预测。根据每个资源所对应的标签的统计特征,可产生不同形式的标签文档,通过分析标签文档所生成主题的性能,确定适合于特定数据集的标签文档形式;利用同一主题内词汇间的高度相关性,设计合理的预测标签排序方法,从而实现对标记信息不完整资源的标签预测以及标签语义不一致现象的检测。在数据集DeliciousT 140和Wikilo+上的测试表明,所提方法能有效实现标签预测,并可提高信息检索的性能。  相似文献   

13.
14.
Learning Social Tag Relevance by Neighbor Voting   总被引:2,自引:0,他引:2  
Social image analysis and retrieval is important for helping people organize and access the increasing amount of user tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-contributed tag with respect to the visual content the tag is describing. Intuitively, if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose in this paper a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors. Under a set of well-defined and realistic assumptions, we prove that our algorithm is a good tag relevance measurement for both image ranking and tag ranking. Three experiments on 3.5 million Flickr photos demonstrate the general applicability of our algorithm in both social image retrieval and image tag suggestion. Our tag relevance learning algorithm substantially improves upon baselines for all the experiments. The results suggest that the proposed algorithm is promising for real-world applications.  相似文献   

15.
In recent years, social Web users have been overwhelmed by the huge numbers of social media available. Consequentially, users have trouble finding social media suited to their needs. To help such users retrieve useful social media content, we propose a new model of tag-based personalized searches to enhance not only retrieval accuracy but also retrieval coverage. By leveraging social tagging as a preference indicator, we build two models: (i) a latent tag preference model that reflects how a certain user has assigned tags similar to a given tag and (ii) a latent tag annotation model that captures how users have tagged a certain tag to resources similar to a given resource. We then seamlessly map the tags onto items, depending on an individual user's query, to find the most desirable content relevant to the user's needs. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the art algorithms and show our method's feasibility for personalized searches in social media services.  相似文献   

16.
Tag recommender schemes suggest related tags for an untagged resource and better tag suggestions to tagged resources. Tagging is very important if the user identifies the tag that is more precise to use in searching interesting blogs. There is no clear information regarding the meaning of each tag in a tagging process. An user can use various tags for the same content, and he can also use new tags for an item in a blog. When the user selects tags, the resultant metadata may comprise homonyms and synonyms. This may cause an improper relationship among items and ineffective searches for topic information. The collaborative tag recommendation allows a set of freely selected text keywords as tags assigned by users. These tags are imprecise, irrelevant, and misleading because there is no control over the tag assignment. It does not follow any formal guidelines to assist tag generation, and tags are assigned to resources based on the knowledge of the users. This causes misspelled tags, multiple tags with the same meaning, bad word encoding, and personalized words without common meaning. This problem leads to miscategorization of items, irrelevant search results, wrong prediction, and their recommendations. Tag relevancy can be judged only by a specific user. These aspects could provide new challenges and opportunities to its tag recommendation problem. This paper reviews the challenges to meet the tag recommendation problem. A brief comparison between existing works is presented, which we can identify and point out the novel research directions. The overall performance of our ontology‐based recommender systems is favorably compared to other systems in the literature.  相似文献   

17.
吴晓慧  柴佩琪 《计算机工程》2003,29(2):151-152,160
汉语自动词性标注和韵律短语切分都是汉语文语转换(Text-to-Speech)系统的重要组成部分,在用从人工标注的语料库中得到韵律短语切分点的边界模式以及概率信息,对文本中的韵律短语切分点进行自动预测时,语素g这种词性就过于模糊,导致韵律短语切分点预测得不合理,该文提出了一种修改词类标注集,去掉语素g这种词性的方法,该方法在进行词性标注时,对实语素恰当地柰注出在句中的词性,以便提高韵律短语的正确切分,应用此方法对10万词的训练集和5万词的测试集分别进行封闭和开放测试表明,词性标注正确率分别可达96.67%和92.60%,并采用修改过的词类标注集,对1000句的文本进行了韵律短语切分点的预测,召回率在66.21%左右,正确率达到75.79%。  相似文献   

18.
In recent years, as the amount of data grows, personal information management has become essential as well as challenging for everyday lives. Tagging, an alternative or complement to classifying into tree-structured directories, allows users to classify a single information item in multiple categories. Due to its flexibility, tagging system has become popular and a number of studies have been conducted. Most of the previous research investigated the quality of tags with various tools such as questionnaires. However, the actual usage behavior of tag-based browsing and retrieval of stored information has rarely been studied. In this study, we examined the effects of tag attributes on the user behavior in browsing self-tagged documents under personal information management settings.

Three attributes, tag commonness, tag frequency and tag position, were identified. A controlled experiment with tasks of tagging and retrieval to trace users’ behavior revealed that the tags with higher tag commonness, higher tag frequency, and lower tag position were more likely to be used. The tags with lower tag commonness and lower tag frequency helped users recognize a desired document among a list of candidates. Among the three attributes, tag position was found the most influential. The findings of this study are expected to enhance the understanding of the quality tags and help information designers in building an effective tagging environment.  相似文献   


19.
This paper describes a novel approach to morphological tagging for Korean, an agglutinative language with a very productive inflectional system. The tagger takes raw text as input and returns a lemmatized and morphologically disambiguated output for each word: the lemma is labeled with a part-of-speech (POS) tag and the inflections are labeled with inflectional tags. Unlike the standard approach to tagging for morphologically complex languages, in our proposed approach the tagging phase precedes the analysis phase. It comprises a trigram-based tagging component followed by a morphological rule application component, obtaining 95% precision and recall on unseen test data.  相似文献   

20.
针对传统近重复文本图像检索方法需人工事先确定近重复文本图像之间存在的变换类型,易受到人主观性影响这一问题,提出一个面向近重复文本图像检索的三分支孪生网络,能自动学习图像之间存在的各种变换。该网络输入为三元组,包括查询图像、查询图像的近重复图像以及其非近重复图像,训练时采用三元损失使得查询图像和近重复图像之间的距离小于查询图像与非近重复图像之间的距离。提出的方法在两个数据集上的mAP (mean average precision)分别达到98.76%和96.50%,优于目前已有方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号