共查询到20条相似文献,搜索用时 15 毫秒
1.
Nowadays, it is common for organizations to maintain collections of hundreds or even thousands of business processes. Techniques
exist to search through such a collection, for business process models that are similar to a given query model. However, those
techniques compare the query model to each model in the collection in terms of graph structure, which is inefficient and computationally
complex. This paper presents an efficient algorithm for similarity search. The algorithm works by efficiently estimating model
similarity, based on small characteristic model fragments, called features. The contribution of this paper is threefold. First,
it presents three techniques to improve the efficiency of the currently fastest similarity search algorithm. Second, it presents
a software architecture and prototype for a similarity search engine. Third, it presents an advanced evaluation of the algorithm.
Experiments show that the algorithm in this paper helps to perform similarity search about 10 times faster than the original
algorithm. 相似文献
2.
Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets. 相似文献
3.
Videos play an ever increasing role in our everyday lives with applications ranging from news, entertainment, scientific research, security and surveillance. Coupled with the fact that cameras and storage media are becoming less expensive, it has resulted in people producing more video content than ever before. This necessitates the development of efficient indexing and retrieval algorithms for video data. Most state-of-the-art techniques index videos according to the global content in the scene such as color, texture, brightness, etc. In this paper, we discuss the problem of activity-based indexing of videos. To address the problem, first we describe activities as a cascade of dynamical systems which significantly enhances the expressive power of the model while retaining many of the computational advantages of using dynamical models. Second, we also derive methods to incorporate view and rate-invariance into these models so that similar actions are clustered together irrespective of the viewpoint or the rate of execution of the activity. We also derive algorithms to learn the model parameters from a video stream and demonstrate how a single video sequence may be clustered into different clusters where each cluster represents an activity. Experimental results for five different databases show that the clusters found by the algorithm correspond to semantically meaningful activities. 相似文献
4.
在网页聚类中,HAC(Hierarchical Agglomerative Clustering)算法和K-means算法都是经常用到的。但它们都有各自的不足。提出一种两阶段聚类方法。第一阶段利用HAC聚类算法对网络检索结果的标题进行聚类,第二阶段以第一阶段结果作为初始中心用K-means算法聚类标题和摘要取得比较合理的聚类结果。由于标题一般都比较短,可以大大减少HAC算法的运行时间。这样既满足网络检索对时间的要求又可以得到较好的聚类结果。 相似文献
5.
Similarity search in graph databases has been widely investigated. It is worthwhile to develop a fast algorithm to support similarity search in large-scale graph databases. In this paper, we investigate a k-NN ( k-Nearest Neighbor) similarity search problem by locality sensitive hashing (LSH). We propose an innovative fast graph search algorithm named LSH-GSS, which first transforms complex graphs into vectorial representations based on prototypes in the database and later accelerates a query in Euclidean space by employing LSH. Because images can be represented as attributed graphs, we propose an approach to transform attributed graphs into n-dimensional vectors and apply LSH-GSS to execute further image retrieval. Experiments on three real graph datasets and two image datasets show that our methods are highly accurate and efficient. 相似文献
6.
Computational Visual Media - In this paper, we reconsider the clustering problem for image over-segmentation from a new perspective. We propose a novel search algorithm called “active... 相似文献
7.
针对虚拟新闻系统中视频使用时出现的接近于复杂网络理论中的无尺度现象,从而导致整个虚拟新闻效果下降的问题,设计了一种全新的视频语义相似度网络。详细给出了视频语义的描述模型、网络构建的规则、相似度计算的方法以及建立在相似度网络基础上的视频检索算法。对视频语义相似度网络进行了实验,结果表明,视频语义相似度网络能够非常有效地解决视频使用时出现的问题。 相似文献
8.
Nearest-neighbor search of high-dimensionality spaces is critical for many applications, such as content-based retrieval from multimedia databases, similarity search of patterns in data mining, and nearest-neighbor classification. Unfortunately, even with the aid of the commonly used indexing schemes, the performance of nearest-neighbor (NN) queries deteriorates rapidly with the number of dimensions. We propose a method, called Clustering with Singular Value Decomposition (CSVD), which supports efficient approximate processing of NN queries, while maintaining good precision-recall characteristics. CSVD groups homogeneous points into clusters and separately reduces the dimensionality of each cluster using SVD. Cluster selection for NN queries relies on a branch-and-bound algorithm and within-cluster searches can be performed with traditional or in-memory indexing methods. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than plain SVD for the same normalized mean squared error (NMSE), which translates into a higher efficiency in processing approximate NN queries. 相似文献
9.
Nowadays, many companies standardize their operations through Business Process (BP), which are stored in repositories and reused when new functionalities are required. However, finding specific processes may become a cumbersome task due to the large size of these repositories. This paper presents MulTimodalGroup, a model for grouping and searching business processes. The grouping mechanism is built upon a clustering algorithm that uses a similarity function based on fuzzy logic; this grouping is performed using the results of each user request. By its part, the search is based on a multimodal representation that integrates textual and structural information of BP. The assessment of the proposed model was carried out in two phases: 1) internal quality assessment of groups and 2) external assessment of the created groups compared with an ideal set of groups. The assessment was performed using a closed BP collection designed collaboratively by 59 experts. The experimental results in each phase are promising and evidence the validity of the proposed model. 相似文献
10.
由于进行关联规则挖掘过程中会产生大量规则,给关联规则的后期分析与利用带来了巨大障碍.针对关联规则的特点,提出了一种新的规则相似性度量方法,通过相似性度量方法推出新的规则距离度量方法,运用系统聚类中的类平均法进行聚类.实验结果表明,该距离度量方法考虑了关联规则的整体信息,依据聚类谱系图和规则散点图,确定了类和类的个数,有利于规则的分类处理. 相似文献
11.
研究了现有的基于向量空间模型的文本聚类算法,发现这些算法都存在数据维度过高和忽略了单词之间语义关系的缺点.针对这些问题,提出一种基于单词相似度的文本聚类算法,该算法首先利用单词相似度对单词进行分类获得单词间的语义关系,然后利用产生的单词类作为向量空间的项表示文本降低了向量空间的维度,最后采用基于划分聚类方法对文本聚类.实验结果表明,相对于传统基于向量空间模型的聚类算法,该算法具有较好的聚类效果. 相似文献
12.
实践证明聚类技术是改进搜索结果显示方式的一种有效手段。然而,目前的聚类方法没有考虑到用户兴趣,对于相同的查询,返回给所有用户同样的聚类结果。由此提出一种个性化聚类检索方法。该方法改进了k-means算法,利用该算法对传统搜索引擎返回的结果结合用户兴趣进行聚类,返回针对特定用户的网页簇。实验证明该方法能够提供个性化服务,改善了聚类的效果,提高了用户的检索效率。 相似文献
13.
The clustering assumption is to maximize the within-cluster similarity and simultaneously to minimize the between-cluster similarity for a given unlabeled dataset. This paper deals with a new spectral clustering algorithm based on a similarity and dissimilarity criterion by incorporating a dissimilarity criterion into the normalized cut criterion. The within-cluster similarity and the between-cluster dissimilarity can be enhanced to result in good clustering performance. Experimental results on toy and real-world datasets show that the new spectral clustering algorithm has a promising performance. 相似文献
14.
为了改善文本聚类的质量,得到满意的聚类结果,针对文本聚类忽略概念的内涵及缺少概念间的联系,设计和改进了基于本体和相似度的文本聚类方法TCBOS(text clustering based on ontology and similarity)。研究了文本预处理及分词的方法,设计了用有限状态自动机来自动提取概念和关系的方法,对概念语义扩展和相似度计算方法进行了改进和完善,通过应用本体的语义相似度来度量文档间相近程度,完善了根据相似度进行文本聚类的K中心点算法。实验证明,该方法从聚类的准确性和聚类的关联度方 相似文献
15.
This paper proposes a novel approach for recognizing faces in videos with high recognition rate. Initially, the feature vector based on Normalized Local Binary Patterns is obtained for the face region. A set of training and testing videos are used in this face recognition procedure. Each frame in the query video is matched with the signature of the faces in the database using Euclidean distance and a rank list is formed. Each ranked list is clustered and its reliability is analyzed for re-ranking. Multiple re-ranked lists of the query video is fused together to form a video signature. This video signature embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations. For matching two videos, their composite ranked lists are compared using a Kendall Tau distance measure. The developed methods are deployed on the YouTube and ChokePoint videos, and they exhibit significant performance improvement owing to their novel approach when compared with the existing techniques. 相似文献
16.
为克服刻面分类表示法的人为主观因素,采用了刻面分类与全文检索相结合的方法对构件进行了表示.同时,从语义角度出发,结合优化技术,提出了一种基于语义相似度与优化的构件聚类算法.该算法有效地减少了刻面分类的主观性因素,进一步提高了构件查询的效率和准确性,并与基于向量空间模型的构件聚类效果进行比较.实验结果表明,基于语义相似度与优化的构件聚类算法的有效性,它在一定程度上改善了构件聚类的效果,提高了聚类质量. 相似文献
17.
World Wide Web - Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i)... 相似文献
19.
随着物联网和工业互联网的快速发展,网络空间安全的研究日益受到工业界和学术界的重视。由于源代码无法获取,二进制代码相似性搜索成为漏洞挖掘和恶意代码分析的关键核心技术。首先,从二进制代码相似性搜索基本概念出发,给出二进制代码相似性搜索系统框架;然后,围绕相似性技术系统介绍二进制代码语法相似性搜索、语义相似性搜索和语用相似性搜索的发展现状;其次,从二进制哈希、指令序列、图结构、基本块语义、特征学习、调试信息恢复和函数高级语义识别等角度总结比较现有解决方案;最后,展望二进制代码相似性搜索未来发展方向与前景。 相似文献
20.
目前,搜索结果聚类方法大多数采用基于文档的方法,不能生成有意义的聚类标签。为了解决这个问题,提出一种基于关键名词短语聚类的中文搜索结果聚类方法,该方法将名词短语、相关搜索词作为候选聚类标签,利用C-Value算法、IDF值筛选标签,然后使用Chameleon算法将标签聚类,最后将搜索结果划分到最相关的聚类簇。实验证明,该方法把关键名词短语和相关搜索词作为聚类标签,有效地提高了标签的描述性,降低了聚类算法的时间复杂度。 相似文献
|