首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Image retrieval from an image database by the image objects and their spatial relationships has emerged as an important research subject in these decades. To retrieve images similar to a given query image, retrieval methods must assess the similarity degree between a database image and the query image by the extracted features with acceptable efficiency and effectiveness. This paper proposes a graph-based model SRG (spatial relation graph) to represent the semantic information of the contained objects and their spatial relationships in an image with no file annotation. In an SRG graph, the image objects are symbolized by the predefined class names as vertices and the spatial relations between object pairs are represented as arcs. The proposed model assesses the similarity degree between two images by calculating the maximum common subgraph of two corresponding SRG’s through intersection, which has quadratic time complexity owing to the characteristics of SRG. Its efficiency remains quadratic regardless of the duplication rate of the object symbols. The extended model SRGT is also proposed, with the same time complexity, for the applications that need to consider the topological relations among objects. A synthetic symbolic image database and an existing image dataset are used in the conducted experiments to verify the performance of the proposed models. The experimental results show that the proposed models have compatible retrieval quality with remarkable efficiency improvements compared with three well-known methods LCS_Clique, SIMR, and 2D Be-string, where LCS_Clique utilizes the number of objects in the maximum common subimage as its similarity function, SIMR uses accumulation-based similarity function of similar object pairs, and 2D Be-string calculates the similarity of 2D patterns by the linear combination of two 1D similarities.  相似文献   

2.

Similar item recommendations—a common feature of many Web sites—point users to other interesting objects given a currently inspected item. A common way of computing such recommendations is to use a similarity function, which expresses how much alike two given objects are. Such similarity functions are usually designed based on the specifics of the given application domain. In this work, we explore how such functions can be learned from human judgments of similarities between objects, using two domains of “quality and taste”—cooking recipe and movie recommendation—as guiding scenarios. In our approach, we first collect a few thousand pairwise similarity assessments with the help of crowdworkers. Using these data, we then train different machine learning models that can be used as similarity functions to compare objects. Offline analyses reveal for both application domains that models that combine different types of item characteristics are the best predictors for human-perceived similarity. To further validate the usefulness of the learned models, we conducted additional user studies. In these studies, we exposed participants to similar item recommendations using a set of models that were trained with different feature subsets. The results showed that the combined models that exhibited the best offline prediction performance led to the highest user-perceived similarity, but also to recommendations that were considered useful by the participants, thus confirming the feasibility of our approach.

  相似文献   

3.
为解决本体异构、实现不同本体应用程序间互操作以及数据集成,提出一种基于RDF图的改进相似度传播匹配算法。首先通过WordNet发现初始相似对种子,经过预处理把本体表示成RDF三元组形式,针对RDF图的特点,将相似度传播的条件扩展到三元组中,发现可能相似对;然后采用综合元素特征的方法计算相似度。相似度传播、发现可能相似对种子、相似度计算是一个循环迭代的过程,直到满足收敛条件。实验表明了该算法的有效性,并在时间性能上也有所提高。  相似文献   

4.
图(Graph)在众多的科学领域和工程领域(如模式识别和计算机视觉)中具有广泛的应用 ,其具备 强大的信息表达能力。当图被用来表示物体结构时,衡量物体的相似程度将会被转化成计算两个图的相似度,这就是图匹配(Graph Matching)。近几十年来,对图匹配相关技术和算法的研究已经成为了研究领域内的一个重要课题,尤其是随着大数据时代的来临,图作为数据之间关系的一种表示形式,将会受到越来越多的关注。文中对图匹配技术的发展现状进行了综述,详细介绍了该技术的理论基础,梳理了解决图匹配问题的几种主流思路。最后,结合图匹配技术的一种具体应用对几种算法的性能进行了对比分析。  相似文献   

5.
针对医学图像检索中相似性表达的自身困难,以及噪声影响的问题,提出一种通过张量积图进行扩散,利用其他数据点的上下信息改进基于纹理元的成对相似性度量的方法。首先,采用纹理元的统计方法进行医学图像特征描述和提取,并通过对纹理元相似性加权,得到图像的成对相似性;然后,利用张量积图沿着数据点的内在流形进行相似性的传播,实现全局的相似性度量。在ImageCLEFmed 2009上的实验结果表明,该算法与基于Gabor的检索算法相比,其类平均精度提高了32%,与基于尺度不变特征转换(SIFT)的检索算法相比,其类平均精度提高了19%,能良好地应用于医学图像检索。  相似文献   

6.
基于相似模式聚类的电子商务网站个性化推荐系统研究   总被引:5,自引:0,他引:5  
保证个性化推荐系统产生高质量的推荐结果的重要因素是:系统必须要确定访问者在访问行为的相似程度,从而能预测访问者的访问和购买兴趣。实现此功能的关键技术是计算访问者对象在整个或者部分属性空间的相似距离,从而得到访问行为的相似程度。该文首先分析了目前在推荐系统中常用的用于计算访问行为相似程度的距离函数,发现它们是测定访问者对象在所有测试属性空间上的平均测定,而在属性集的子维空间上的相似模式并没有有效地挖掘出来。然后提出一种新的基于相似模式聚类算法的电子商务个性化推荐系统,综合考虑可供挖掘的数据源(如:网站内容,网站的超链接结构,顾客访问网站的行为,以及商业的实际购买情况,顾客的身份数据等)获取用户访问电子商务网站的访问页面序列,构建较高购买者的顾客行为的矩阵模型,高效地得到访问者对象在整个或者部分属性空间的相似访问行为,然后通过挖掘潜在购买者与较高购买者的相似模式特征,帮助顾客发现他所希望购买的产品信息,用于提高实际购买量,实验数据表明,该系统高效并可广泛使用。  相似文献   

7.
彭昂  王如龙  陈泉泉  张锦 《计算机应用》2010,30(7):1930-1932
针对电信客户的有效细分问题,利用属性相似度度量思想,提出了一种面向复杂属性的聚类算法。该算法用复杂属性分布相似度函数衡量对象的相似性,然后根据相似性建立图模型,最后对图进行分割进行聚类。相比于传统基于选维和降维的聚类分析算法,提出的算法能有效处理高维数据和复杂属性。同时,算法在参数调节时,不需遍历原始数据,也减少了人工干预。利用真实电信客户数据进行的模拟实验也表明,提出的算法具有良好性能,可以有效解决电信客户细分问题。  相似文献   

8.
MatchSim: a novel similarity measure based on maximum neighborhood matching   总被引:1,自引:1,他引:0  
Measuring object similarity in a graph is a fundamental data- mining problem in various application domains, including Web linkage mining, social network analysis, information retrieval, and recommender systems. In this paper, we focus on the neighbor-based approach that is based on the intuition that ??similar objects have similar neighbors?? and propose a novel similarity measure called MatchSim. Our method recursively defines the similarity between two objects by the average similarity of the maximum-matched similar neighbor pairs between them. We show that MatchSim conforms to the basic intuition of similarity; therefore, it can overcome the counterintuitive contradiction in SimRank. Moreover, MatchSim can be viewed as an extension of the traditional neighbor-counting scheme by taking the similarities between neighbors into account, leading to higher flexibility. We present the MatchSim score computation process and prove its convergence. We also analyze its time and space complexity and suggest two accelerating techniques: (1) proposing a simple pruning strategy and (2) adopting an approximation algorithm for maximum matching computation. Experimental results on real-world datasets show that although our method is less efficient computationally, it outperforms classic methods in terms of accuracy.  相似文献   

9.
Similarity-based clustering is a simple but powerful technique which usually results in a clustering graph for a partitioning of threshold values in the unit interval. The guiding principle of similarity-based clustering is "similar objects are grouped in the same cluster." To judge whether two objects are similar, a similarity measure must be given in advance. The similarity measure presented in the paper is determined in terms of the weighted distance between the features of the objects. Thus, the clustering graph and its performance (which is described by several evaluation indices defined in the paper) will depend on the feature weights. The paper shows that, by using gradient descent technique to learn the feature weights, the clustering performance can be significantly improved. It is also shown that our method helps to reduce the uncertainty (fuzziness and nonspecificity) of the similarity matrix. This enhances the quality of the similarity-based decision making  相似文献   

10.
基于多重分形的聚类层次优化算法   总被引:2,自引:0,他引:2  
闫光辉  李战怀  党建武 《软件学报》2008,19(6):1283-1300
大量初始聚类结果之间存在强弱不同的相似性,会给用户理解与描述聚类结果带来不利影响,进而阻碍数据挖掘后续工作的顺利展开.传统聚类算法由于注重聚类形状及空间邻接性,或者考虑全局数据分布密度的均匀性,实际中均难以解决这一类问题.为此,提出了基于分形的聚类层次优化算法FCHO(fractal-based cluster hierarchy optimization),FCHO算法基于多重分形理论,利用聚类对应多重分形维数及聚类合并之后多重分形维数的变化程度来度量初始聚类之间的相似程度,最终生成反映数据自然聚集状态的聚类家族树.此外,初步分析了算法的时空复杂性,基于合成数据集和标准数据集的有关实验工作证实了算法的有效性.  相似文献   

11.
张应龙  李翠平  陈红 《软件学报》2014,25(11):2602-2615
信息网络无处不在.通过把网络中的对象抽象为点,把对象之间的关系刻画为边,相应的信息网络就可以用图来表示.图中结点相似度计算是图数据管理中的基本问题,在很多领域都有运用,比如社会网络分析、信息检索和推荐系统等.其中,著名的相似度度量是以Personalized PageRank和SimRank为代表.这两种度量本质都是以图中的路径来定义,然而它们侧重的路径截然不同.为此,提出了一个度量 SuperSimRank.它不仅涵盖了这些路径,而且考虑了Personalized PageRank和SimRank两者都没有考虑的路径,从而能够更加体现出这种链接关系的本质.在此基础上对SuperSimRank进行了理论分析,从而提出了相应的优化算法,使得计算性能从最坏情况O(kn4)提高到O(knl).这里,k 是迭代次数,n 是结点数,l 是边数.最后,通过实验验证了 SuperSimRank 优于 SimRank 和 Personalized PageRank,同时验证了优化算法在各种情况下都是有效的.  相似文献   

12.
In several applications, data objects move on pre-defined spatial networks such as road segments, railways, and invisible air routes. Many of these objects exhibit similarity with respect to their traversed paths, and therefore two objects can be correlated based on their motion similarity. Useful information can be retrieved from these correlations and this knowledge can be used to define similarity classes. In this paper, we study similarity search for moving object trajectories in spatial networks. The problem poses some important challenges, since it is quite different from the case where objects are allowed to move freely in any direction without motion restrictions. New similarity measures should be employed to express similarity between two trajectories that do not necessarily share any common sub-path. We define new similarity measures based on spatial and temporal characteristics of trajectories, such that the notion of similarity in space and time is well expressed, and moreover they satisfy the metric properties. In addition, we demonstrate that similarity range queries in trajectories are efficiently supported by utilizing metric-based access methods, such as M-trees.  相似文献   

13.
Web检索结果快速聚类方法的研究与实现   总被引:2,自引:0,他引:2  
为了帮助Web用户从搜索引擎所返回的大量文档片断中筛选出自己所需要的文档,在对聚类过程研究分析的基础上给出了一种Web检索结果快速聚类方法。它通过分析聚类过程,从建立索引模型、相似性的计算到聚类结果的形成等环节,都做了分析和简化,并利用检索结果的标题、Url以及文档片断3部分所含信息计算返回结果之间的相似度,将首先返回的部分检索结果利用无向图映射法进行部分聚类后,将其余返回结果分配到与之最相近的集簇中最终形成聚类结果。该方法实现简单。实验证明该方法响应速度快,聚类相关性较高,空间占用少。  相似文献   

14.
该文提出了一种字形相似度计算方法,旨在解决汉字中相似字形(称作形似字)的识别和查找问题。首先,提出了汉字拆分方法,并构建了偏旁部首知识图谱;然后,基于图谱和汉字的结构特点,提出2CTransE模型,学习汉字实体语义信息的表示;最后,将输出的实体向量用于汉字字形的相似度计算,得到目标汉字的形似字候选集。实验结果表明,该文所提出的方法对于不同结构汉字的字形相似度计算有一定效果,所形成的汉字部件组成库,为之后字形计算的相关研究提供了行之有效的数据集。同时,也拓宽了日语等类汉语语言文字字体相似度计算的研究思路。  相似文献   

15.
Although there have been a large body of works on computing the similarity of static shapes, similarity judgments on deforming meshes are not studied well. In this study, we investigate a similarity measurement method for comparing two deforming meshes. Based on the degree of deformation, we first binarily label each triangle within each frame as either ‘deformed’ or ‘rigid’, then merge the ‘deformed’ triangles in both spatial and temporal domains for the segmentation. The segmentation results are encoded in a form of evolving graph, with an aim of obtaining a compact representation of the motion of the mesh. Finally, we formulate the similarity measurement as a sequence matching problem: after clustering similar graphs and assigning each of the graphs with the cluster labels, each deforming mesh is represented with a sequence of labels. Then, we apply a sequence alignment algorithm to compute the locally optimal alignment between the two label sequences, and to compute the similarity by normalizing the alignment score. The experimental results over several datasets show that the similarities of animation data can be captured correctly using our approach. This may be significant, as it solves a problem that cannot be handled by current approaches.  相似文献   

16.
钱忠胜  宋涛 《软件学报》2021,32(9):2691-2712
软件测试是软件开发中重要的一环,能有效地提高软件的可靠性和质量.而测试用例的重用可减少软件测试的工作量,提升测试的效率.提出一种面向关键字流图的相似程序间测试用例的重用方法,该方法将程序已经生成的测试数据重用到与之相似的程序中.可见,探究测试用例重用的前期工作是判定程序的相似性.对于程序相似性的判定,给出根据关键字流图相似性比较的方法:首先,将程序代码中的关键字存储在流图所对应的节点中,构建关键字流图;接下来,利用动态规划算法查找待测程序关键字流图的最大公共子图;最后,根据最大公共子图距离算法计算程序的相似度.较高相似程度的程序可用到测试用例重用的方法中.在利用遗传算法生成测试用例时,引用相似程序中适应度较高的测试用例,使种群在进行进化操作过程中不断与这些用例进行交叉,加快用例的生成效率.实验表明:将测试用例重用在相似程序的测试生成中,与传统方法相比,在覆盖率和平均进化代数等方面均有明显优势.  相似文献   

17.
Although multimedia objects such as images, audios and texts are of different modalities, there are a great amount of semantic correlations among them. In this paper, we propose a method of transductive learning to mine the semantic correlations among media objects of different modalities so that to achieve the cross-media retrieval. Cross-media retrieval is a new kind of searching technology by which the query examples and the returned results can be of different modalities, e.g., to query images by an example of audio. First, according to the media objects features and their co-existence information, we construct a uniform cross-media correlation graph, in which media objects of different modalities are represented uniformly. To perform the cross-media retrieval, a positive score is assigned to the query example; the score spreads along the graph and media objects of target modality or MMDs with the highest scores are returned. To boost the retrieval performance, we also propose different approaches of long-term and short-term relevance feedback to mine the information contained in the positive and negative examples.  相似文献   

18.
Given an undirected/directed large weighted data graph and a similar smaller weighted pattern graph, the problem of weighted subgraph matching is to find a mapping of the nodes in the pattern graph to a subset of nodes in the data graph such that the sum of edge weight differences is minimum. Biological interaction networks such as protein-protein interaction networks and molecular pathways are often modeled as weighted graphs in order to account for the high false positive rate occurring intrinsically during the detection process of the interactions. Nonetheless, complex biological problems such as disease gene prioritization and conserved phylogenetic tree construction largely depend on the similarity calculation among the networks. Although several existing methods provide efficient methods for graph and subgraph similarity measurement, they produce nonintuitive results due to the underlying unweighted graph model assumption. Moreover, very few algorithms exist for weighted graph matching that are applicable with the restriction that the data and pattern graph sizes are equal. In this paper, we introduce a novel algorithm for weighted subgraph matching which can effectively be applied to directed/undirected weighted subgraph matching. Experimental results demonstrate the superiority and relative scalability of the algorithm over available state of the art methods.  相似文献   

19.
We propose a graph model for mutual information based clustering problem. This problem was originally formulated as a constrained optimization problem with respect to the conditional probability distribution of clusters. Based on the stationary distribution induced from the problem setting, we propose a function which measures the relevance among data objects under the problem setting. This function is utilized to capture the relation among data objects, and the entire objects are represented as an edge-weighted graph where pairs of objects are connected with edges with their relevance. We show that, in hard assignment, the clustering problem can be approximated as a combinatorial problem over the proposed graph model when data is uniformly distributed. By representing the data objects as a graph based on our graph model, various graph based algorithms can be utilized to solve the clustering problem over the graph. The proposed approach is evaluated on the text clustering problem over 20 Newsgroup and TREC datasets. The results are encouraging and indicate the effectiveness of our approach.  相似文献   

20.
Automatic annotation is an essential technique for effectively handling and organizing Web objects (e.g., Web pages), which have experienced an unprecedented growth over the last few years. Automatic annotation is usually formulated as a multi-label classification problem. Unfortunately, labeled data are often time-consuming and expensive to obtain. Web data also accommodate much richer feature space. This calls for new semi-supervised approaches that are less demanding on labeled data to be effective in classification. In this paper, we propose a graph-based semi-supervised learning approach that leverages random walks and ? 1 sparse reconstruction on a mixed object-label graph with both attribute and structure information for effective multi-label classification. The mixed graph contains an object-affinity subgraph, a label-correlation subgraph, and object-label edges with adaptive weight assignments indicating the assignment relationships. The object-affinity subgraph is constructed using ? 1 sparse graph reconstruction with extracted structural meta-text, while the label-correlation subgraph captures pairwise correlations among labels via linear combination of their co-occurrence similarity and kernel-based similarity. A random walk with adaptive weight assignment is then performed on the constructed mixed graph to infer probabilistic assignment relationships between labels and objects. Extensive experiments on real Yahoo! Web datasets demonstrate the effectiveness of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号