首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
由于空间数据库通常蕴含海量数据,因此一个普通的空间查询很可能会导致多查询结果问题。为了解决上述问题,提出了一种空间查询结果自动分类方法。在离线阶段,根据空间对象之间的位置相近度和语义相关度来评估空间对象之间的耦合关系,在此基础上利用概率密度评估方法对空间对象进行聚类,每个聚类代表一种类型的用户需求;在在线查询处理阶段,对于一个给定的空间查询,在查询结果集上利用改进的C4.5决策树算法动态生成一棵查询结果分类树,用户可通过检查分类树分支的标签来逐步定位到其感兴趣的空间对象。实验结果表明,提出的空间对象聚类方法能够有效地体现空间对象在语义和位置上的相近性,查询结果分类方法具有较好的分类效果和较低的搜索代价。  相似文献   

2.
目前大多搜索引擎结果聚类算法针对用户查询生成的网页摘要进行聚类,由于网页摘要较短且质量良莠不齐,聚类效果难以保证。提出了一种基于频繁词义序列的检索结果聚类算法,利用WordNet结合句法和语义特征对搜索结果构建聚类及标签。不像传统的基于向量空间模型的聚类算法,考虑了词语在文档中的序列模式。算法首先对文本进行预处理,生成压缩文档以降低文本数据维度,构建广义后缀树,挖掘出最大频繁项集,然后获取频繁词义序列。从文档中获取的有序频繁项集可以更好地反映文档的主题,把相同主题的搜索结果聚类在一起,与用户查询相关度高的优先排序。实验表明,该算法可以获得与查询相关的高质量聚类及基于语义的聚类标签,具有更高的聚类准确度和更高的运行效率,并且可扩展性良好。  相似文献   

3.
建立中文维基百科的文档链接和目录结构关系,基于维基关联数据实现Web搜索的主题分类。为查询词扩展语义特征获取和表达查询主题,在维基目录空间上根据目录间的语义关系计算相关度,通过加权累加各目录标签与分类间的相关度评分识别分类标签,最后得出实验结果与分析。结果表明对于提高Web信息检索质量具有重要意义和应用价值。  相似文献   

4.
查询扩展技术是在原有用户查询的基础上加入语义相关的新词,组成语义更准确的查询条件。文中对查询扩展算法中扩展词加权方法进行改进,提出一种基于初始用户查询意欲和词与词间语义关联性给扩展词加权的方法。根据此算法得到的扩展词权值不仅反映了该扩展词和原关键词间的关联性,还反映出该扩展词和查询关键词集合中所有元素的关联性。因此,可将基于语义树的查询扩展问题转换为扩展词权值wijs,o,p的计算,如何计算出权值wijs,o,p是文中的核心。实验证明,该算法提高了检索的查准率。  相似文献   

5.
查询扩展技术是在原有用户查询的基础上加入语义相关的新词,组成语义更准确的查询条件.文中对查询扩展算法中扩展词加权方法进行改进,提出一种基于初始用户查询意欲和词与词间语义关联性给扩展词加权的方法.根据此算法得到的扩展词权值不仅反映了该扩展词和原关键词间的关联性,还反映出该扩展词和查询关键词集合中所有元素的关联性.因此,可将基于语义树的查询扩展问题转换为扩展词权值wiis,o,p的计算,如何计算出权值wijs,o,p是文中的核心.实验证明,该算法提高了检索的查准率.  相似文献   

6.
为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.  相似文献   

7.
一种基于潜在语义分析的查询扩展算法   总被引:5,自引:0,他引:5  
该文提出一种新的查询扩展算法。通过对文本进行潜在语义分析,引入计算词语间语义相似度的方法,将文本聚类应用到检索的交互过程中,以提高信息检索的质量。实验结果表明该算法对于提高检索的准确率是十分有效的。  相似文献   

8.
为了解决搜索引擎检索结果中的主题混杂现象,帮助用户快速准确地定位到有价值的信息,提出基于主题短语的搜索引擎结果聚类方法。首先从检索结果中提取查询词并与相邻词语组成主题短语,建立包含高频独立词语及主题短语的混合向量空间模型,同时引入同义词词林对特征项进行语义扩充,最后采用改进的k-means聚类算法对搜索结果进行聚类,并为各个类别提取类别标签。实验结果表明,该算法能有效提高聚类结果的准确率。  相似文献   

9.
深度学习中神经网络的性能依赖于高质量的样本,然而噪声标签会降低网络的分类准确率。为降低噪声标签对网络性能的影响,噪声标签学习算法被提出。该算法首先将训练样本集划分成干净样本集和噪声样本集,然后使用半监督学习算法对噪声样本集赋予伪标签。然而,错误的伪标签以及训练样本数量不足的问题仍然限制着噪声标签学习算法性能的提升。为解决上述问题,提出基于K-means聚类和特征空间增强的噪声标签深度学习算法。首先,该算法利用K-means聚类算法对干净样本集进行标签聚类,并根据噪声样本集与聚类中心的距离大小筛选出难以分类的噪声样本,以提高训练样本的质量;其次,使用mixup算法扩充干净样本集和噪声样本集,以增加训练样本的数量;最后,采用特征空间增强算法抑制mixup算法新生成的噪声样本,从而提高网络的分类准确率。并在CIFAR10、CIFAR100、MNIST和ANIMAL-10共4个数据集上试验验证了该算法的有效性。  相似文献   

10.
基于关联规则的Web文档聚类算法   总被引:32,自引:1,他引:32  
宋擒豹  沈钧毅 《软件学报》2002,13(3):417-423
Web文档聚类可以有效地压缩搜索空间,加快检索速度,提高查询精度.提出了一种Web文档的聚类算法.该算法首先采用向量空间模型VSM(vector space model)表示主题,根据主题表示文档;再以文档为事务,以主题为事务项,将文档和主题间的关系看作事务的形式,采用关联规则挖掘算法发现主题频集,相应的文档集即为初步文档类;然后依据类间距离和类内连接强度阈值合并、拆分类,最终实现文档聚类.实验结果表明,该算法是有效的,能处理文档类间固有的重叠情况,具有一定的实用价值.  相似文献   

11.
Automatic image tagging automatically assigns image with semantic keywords called tags, which significantly facilitates image search and organization. Most of present image tagging approaches are constrained by the training model learned from the training dataset, and moreover they have no exploitation on other type of web resource (e.g., web text documents). In this paper, we proposed a search based image tagging algorithm (CTSTag), in which the result tags are derived from web search result. Specifically, it assigns the query image with a more comprehensive tag set derived from both web images and web text documents. First, a content-based image search technology is used to retrieve a set of visually similar images which are ranked by the semantic consistency values. Then, a set of relevant tags are derived from these top ranked images as the initial tag set. Second, a text-based search is used to retrieve other relevant web resources by using the initial tag set as the query. After the denoising process, the initial tag set is expanded with other tags mined from the text-based search result. Then, an probability flow measure method is proposed to estimate the probabilities of the expanded tags. Finally, all the tags are refined using the Random Walk with Restart (RWR) method and the top ones are assigned to the query images. Experiments on NUS-WIDE dataset show not only the performance of the proposed algorithm but also the advantage of image retrieval and organization based on the result tags.  相似文献   

12.
由于用户标签的不准确和语义模糊使得协作式标注图像检索正确率低,而现有垃圾标签过滤方法往往关注标签本身,忽略了协作式标签与图像的关联性。本文在分析协作式标注图像视觉内容与标签的关联性的基础上,提出一种基于协作式标注图像视觉内容的垃圾标签检测方法。该方法分析同一标签下图像视觉内容,设计不同的核函数用于颜色和SIFT(Scale invariant feature transform)特征子集,同时将2种低维特征映射到高维多模特征空间形成混合核函数,对同一标签下的图像进行基于混合核的最大最小距离聚类,少数群体的标签说明与图像内容关联性小则为用户标注错误的标签,从而检测垃圾标签。实验结果表明,该方法能够提高协作式图像垃圾标签检测的正确性。  相似文献   

13.
Learning Social Tag Relevance by Neighbor Voting   总被引:2,自引:0,他引:2  
Social image analysis and retrieval is important for helping people organize and access the increasing amount of user tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-contributed tag with respect to the visual content the tag is describing. Intuitively, if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose in this paper a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors. Under a set of well-defined and realistic assumptions, we prove that our algorithm is a good tag relevance measurement for both image ranking and tag ranking. Three experiments on 3.5 million Flickr photos demonstrate the general applicability of our algorithm in both social image retrieval and image tag suggestion. Our tag relevance learning algorithm substantially improves upon baselines for all the experiments. The results suggest that the proposed algorithm is promising for real-world applications.  相似文献   

14.
Folksonomy, considered a core component for Web 2.0 user-participation architecture, is a classification system made by user’s tags on the web resources. Recently, various approaches for image retrieval exploiting folksonomy have been proposed to improve the result of image search. However, the characteristics of the tags such as semantic ambiguity and non-controlledness limit the effectiveness of tags on image retrieval. Especially, tags associated with images in a random order do not provide any information about the relevance between a tag and an image. In this paper, we propose a novel image tag ranking system called i-TagRanker which exploits the semantic relationships between tags for re-ordering the tags according to the relevance with an image. The proposed system consists of two phases: 1) tag propagation phase, 2) tag ranking phase. In tag propagation phase, we first collect the most relevant tags from similar images, and then propagate them to an untagged image. In tag ranking phase, tags are ranked according to their semantic relevance to the image. From the experimental results on a Flickr photo collection about over 30,000 images, we show the effectiveness of the proposed system.  相似文献   

15.
Mining multi-tag association for image tagging   总被引:1,自引:0,他引:1  
Automatic media tagging plays a critical role in modern tag-based media retrieval systems. Existing tagging schemes mostly perform tag assignment based on community contributed media resources, where the tags are provided by users interactively. However, such social resources usually contain dirty and incomplete tags, which severely limit the performance of these tagging methods. In this paper, we propose a novel automatic image tagging method aiming to automatically discover more complete tags associated with information importance for test images. Given an image dataset, all the near-duplicate clusters are discovered. For each near-duplicate cluster, all the tags occurring in the cluster form the cluster’s “document”. Given a test image, we firstly initialize the candidate tag set from its near-duplicate cluster’s document. The candidate tag set is then expanded by considering the implicit multi-tag associations mined from all the clusters’ documents, where each cluster’s document is regarded as a transaction. To further reduce noisy tags, a visual relevance score is also computed for each candidate tag to the test image based on a new tag model. Tags with very low scores can be removed from the final tag set. Extensive experiments conducted on a real-world web image dataset—NUS-WIDE, demonstrate the promising effectiveness of our approach.  相似文献   

16.
目的 随着Web2.0技术的进步,以用户生成内容为中心的社交网站蓬勃发展,也使得基于图像标签的图像检索技术越来越重要。但是,由于用户标注时的随意性和个性化,导致用户提交的图像标签不够完备,降低了图像检索的准确性。方法 针对这一问题,提出一种正则化的非负矩阵分解方法来丰富图像欠完备的标签,提高图像标签的完备性。利用非负矩阵分解的方法将原始的标签-图像矩阵投影到潜在的低秩空间里消除噪声,同时利用图像的类内视觉离散度作为正则化项提高消除噪声、丰富标签的效果。结果 利用从社交网站Flickr上下载的大量社交图像进行对比实验,验证了本文方法对丰富图像标签的有效性。通过对比目前流行的优化算法,本文算法获得较高的性能提升,算法平均准确度提高了12.3%。结论 将图像类内视觉离散度作为正则化项的非负矩阵分解算法,能较好地丰富社交图像的标签,解决网络图像标签的欠完备问题。  相似文献   

17.
The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images.The prevalent way is to provide a keyword interface for users to submit queries.However,the amount of images without any tags or annotations are beyond the reach of manual efforts.To overcome this,automatic image annotation techniques emerge,which are generally a process of selecting a suitable set of tags for a given image without user intervention.However,there are three main challenges with respect to Web-scale image annotation:scalability,noiseresistance and diversity.Scalability has a twofold meaning:first an automatic image annotation system should be scalable with respect to billions of images on the Web;second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster.Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags.Diversity represents that image content may include both scenes and objects,which are further described by multiple different image features constituting different facets in annotation.In this paper,we propose a unified framework to tackle the above three challenges for automatic Web image annotation.It mainly involves two components:tag candidate retrieval and multi-facet annotation.In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues.In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets.Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map.Millions of images from Flickr are used in our evaluation.Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics:precision,recall and F1 score.  相似文献   

18.
19.
大量上传的网络图像因用户语义标注的随意性,造成了图像标签的不完备,大大降低了图像检索的效率.低秩稀疏是一种有效降低数据噪声的方法.为提高图像语义标签完备的准确度,提出一种基于低秩稀疏分解优化(LRSDO)的图像标签完备方法.首先结合待完备图像的视觉特征和语义搜索其近邻图像集;然后通过低秩稀疏分解模型获得其视觉特征与语义之间的映射关系,并以此预测该图像的候选标签;最后使用面向个体的标签共现频率方法对候选标签进行去噪优化,进而实现对其更加准确的自动图像标签完备.在基准数据集Corel5K和真实数据集Flickr30Concepts上进行了实验,结果表明,该方法在图像标签完备的平均准确率,平均召回率和覆盖率上均表现出更优的性能.  相似文献   

20.
王振海 《计算机工程与应用》2012,48(36):190-193,220
利用商标图像的形状特征,提出了一种融合图像全局特征和局部特征的商标检索算法。其中全局特征反映了图像的整体信息,这些信息可用来较快地建立候选图像库,而局部特征则可以更准确地与候选图像进行匹配。提取图像的傅里叶描述子进行初步检索,按相似度排序,在此结果集的基础上对候选图像通过提取SIFT特征进行精确匹配。实验结果表明,该方法既保持了SIFT特征的良好描述能力,又减少了精确匹配需要的计算次数,降低了复杂度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号