首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The need of suitable divergence measures arise as they play an important role in discrimination of two probability distributions. The present communication is devoted to the introduction of one such divergence measure using Jensen inequality and Shannon entropy and its validation. Also, a new dissimilarity measure based on the proposed divergence measure is introduced. Besides establishing validation, some of its major properties are also studied. Further, a new multiple attribute decision making method based on a proposed dissimilarity measure is introduced and is thoroughly explained with the help of an illustrated example. The paper is summed up with an application of the proposed dissimilarity measure in pattern recognition.  相似文献   

2.
Evaluation of automatic text summarization is a challenging task due to the difficulty of calculating similarity of two texts. In this paper, we define a new dissimilarity measure – compression dissimilarity to compute the dissimilarity between documents. Then we propose a new automatic evaluating method based on compression dissimilarity. The proposed method is a completely “black box” and does not need preprocessing steps. Experiments show that compression dissimilarity could clearly distinct automatic summaries from human summaries. Compression dissimilarity evaluating measure could evaluate an automatic summary by comparing with high-quality human summaries, or comparing with its original document. The evaluating results are highly correlated with human assessments, and the correlation between compression dissimilarity of summaries and compression dissimilarity of documents can serve as a meaningful measure to evaluate the consistency of an automatic text summarization system.  相似文献   

3.
胡明晓 《计算机工程》2010,36(18):197-199
为实现局部文档集抄袭的识别,将基于回退数与前跳数的广义编辑距离的近似值定义为文档抄袭距离,分析该文档抄袭距离满足三角不等式成立和弱三角不等式成立时的充分条件,提出一种快速全文识别算法,能识别出文档集内涉嫌抄袭的所有文档有序对。实验结果表明,相比其他算法,该算法在兼顾识别召回率的同时效率提高了3倍~5倍。  相似文献   

4.
In Dempster-Shafer evidence theory, the pignistic probability function is used to transform the basic probability assignment (BPA) into pignistic probabilities. Since the transformation is from the power set of the frame of discernment to the set itself, it may cause some information loss. The distance between betting commitments is constructed on the basis of the pignistic probability function and is used to measure the dissimilarity between two BPAs. However, it is a pseudo-metric and it may bring unreasonable results in some cases. To solve such problem, we propose a power-set-distribution (PSD) pignistic probability function based on the new explanation of the non-singleton focal elements in the BPA. The new function is directly operated on the power set, so it takes more information contained in the BPA than the pignistic probability function does. Based on the new function, the distance between PSD betting commitments which can better measure the dissimilarity between two BPAs is also proposed, and the proof that it is a metric is provided. In order to demonstrate the performance of the new distance, numerical examples are given to compare it with three existing dissimilarity measures. Moreover, its applications in combining the conflicting BPAs are also presented through two examples.  相似文献   

5.
Robust fuzzy clustering of relational data   总被引:1,自引:0,他引:1  
Popular relational-data clustering algorithms, relational dual of fuzzy c-means (RFCM), non-Euclidean RFCM (NERFCM) (both by Hathaway et al), and FANNY (by Kaufman and Rousseeuw) are examined. A new algorithm, which is a generalization of FANNY, called the fuzzy relational data clustering (FRC) algorithm, is introduced, having an identical objective functional as RFCM. However, the FRC does not have the restriction of RFCM, which is that the relational data is derived from Euclidean distance as the measure of dissimilarity between the objects, and it also does not have limitations of FANNY, including the use of a fixed membership exponent, or a fuzzifier exponent, m. The FRC algorithm is further improved by incorporating the concept of Dave's object data noise clustering (NC) algorithm, done by proposing a concept of noise-dissimilarity. Next, based on the constrained minimization, which includes an inequality constraint for the memberships and corresponding Kuhn-Tucker conditions, a noise resistant, FRC algorithm is derived which works well for all types of non-Euclidean dissimilarity data. Thus it is shown that the extra computations for data expansion (/spl beta/-spread transformation) required by the NERFCM algorithm are not necessary. This new algorithm is called robust non-Euclidean fuzzy relational data clustering (robust-NE-FRC), and its robustness is demonstrated through several numerical examples. Advantages of this new algorithm are: faster convergence, robustness against outliers, and ability to handle all kinds of relational data, including non-Euclidean. The paper also presents a new and better interpretation of the noise-class.  相似文献   

6.
Tree is a data structure used to express various objects such as semistructured data and genes. When objects are represented as trees, computing tree similarity is essential for pattern recognition and retrieval. This paper considers the noisy subsequence tree recognition problem whose purpose is to recognize the original tree, given its noisy subsequence tree. Previous research on this problem relied on constrained tree edit distance to measure the dissimilarity. However, the number of relabelings must be predetermined to compute it. This paper proposes a new dissimilarity measure for this problem. Our dissimilarity measure is obtained by counting the node edit operations included in the unit‐cost tree edit distance that contribute to the matching of node labels. The number of relabelings need not be specified to compute our dissimilarity measure. Moreover, our measure achieves more accurate recognition performance and faster execution speed than the constrained tree edit distance. Our measure is also useful to solve the tree inclusion problem which is the problem of deciding whether a tree includes another tree and shows the extent of approximate tree inclusion when a tree incompletely includes another tree. © 2011 Wiley Periodicals, Inc.  相似文献   

7.
《Pattern recognition letters》2002,23(1-3):151-160
An approach for clustering on the basis of incomplete dissimilarity data is given. The data is first completed using simple triangle inequality-based approximation schemes and then clustered using the non-Euclidean relational fuzzy c-means algorithm. Results of numerical tests are included.  相似文献   

8.
基于新的相异度量的模糊K-Modes聚类算法   总被引:3,自引:2,他引:1  
白亮  曹付元  梁吉业 《计算机工程》2009,35(16):192-194
传统的模糊K-Modes聚类算法采用简单匹配方法度量对象与Mode之间的相异程度,没有充分考虑Mode对类的代表程度,容易造成信息的丢失,弱化了类内的相似性。针对上述问题,通过对象对类的隶属度反映Mode对类的代表程度,提出一种新的相异度量,并将它应用于传统的模糊K—Modes聚类算法。与传统的K—Modes和模糊K-Modes聚类算法相比,该相异度量是有效的。  相似文献   

9.
Skeletal trees are commonly used in order to express geometric properties of the shape. Accordingly, tree-edit distance is used to compute a dissimilarity between two given shapes. We present a new tree-edit based shape matching method which uses a recent coarse skeleton representation. The coarse skeleton representation allows us to represent both shapes and shape categories in the form of depth-1 trees. Consequently, we can easily integrate the influence of the categories into shape dissimilarity measurements. The new dissimilarity measure gives a better within group versus between group separation, and it mimics the asymmetric nature of human similarity judgements.  相似文献   

10.
In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.  相似文献   

11.
The need of suitable measures to find the distance between two probability distributions arises as they play an eminent role in problems based on discrimination and inferences. In this communication, we have introduced one such divergence measure based on well-known Shannon entropy and established its existence. In addition to this, a new dissimilarity measure for intuitionistic fuzzy sets corresponding to proposed divergence measure is also introduced and validated. Some major properties of the proposed dissimilarity measure are also discussed. Further, a new multiple attribute decision-making (MADM) method based on the proposed dissimilarity measure is introduced by using the concept of TOPSIS and is thoroughly explained with the help of an illustrated example on supplier selection problem. Finally, the application of proposed dissimilarity measure is given in pattern recognition and the performance is compared with some existing divergence measures in the literature.  相似文献   

12.
多边形表示的相似度量   总被引:3,自引:0,他引:3  
本文提出了多边形的三角形划分、三角形弱划分、保角划分的概念和基于这些划分的多边形相似度量的新方法,该方法具有旋转、变换、放大、缩小不变性。给出了利用此方法实现图象数据相似检索的方法和实例。  相似文献   

13.
This paper presents level forms of the triangle inequalities in fuzzy metric spaces (XdLR). To aid discussion, a fuzzy pre-metric condition is introduced. It is first pointed out that under the fuzzy pre-metric condition the first triangle inequality is always equivalent to its level form. The second triangle inequality is equivalent to one level form when R is right continuous, and to another level form also when further conditions are imposed on R. In a fuzzy metric space, the level form of the first triangle inequality and one of the level forms of the second triangle inequality are always valid. The other level form of the second triangle inequality holds for all but at most countable α ∈ [0, 1). Finally, a fixed point theorem for fuzzy metric spaces is derived as an application of the preceding results.  相似文献   

14.
The sources of evidence may have different reliability and importance in real applications for decision making. The estimation of the discounting (weighting) factors when the prior knowledge is unknown have been regularly studied until recently. In the past, the determination of the weighting factors focused only on reliability discounting rule and it was mainly dependent on the dissimilarity measure between basic belief assignments (bba's) represented by an evidential distance. Nevertheless, it is very difficult to characterize efficiently the dissimilarity only through an evidential distance. Thus, both a distance and a conflict coefficient based on probabilistic transformations BetP are proposed to characterize the dissimilarity. The distance represents the difference between bba's, whereas the conflict coefficient reveals the divergence degree of the hypotheses that two belief functions strongly support. These two aspects of dissimilarity are complementary in a certain sense, and their fusion is used as the dissimilarity measure. Then, a new estimation method of weighting factors is presented by using the proposed dissimilarity measure. In the evaluation of weight of a source, both its dissimilarity with other sources and their weighting factors are considered. The weighting factors can be applied in the both importance and reliability discounting rules, but the selection of the adapted discounting rule should depend on the actual application. Simple numerical examples are given to illustrate the interest of the proposed approach.  相似文献   

15.
Similarity and dissimilarity measures are widely used in many research areas and applications. When a dissimilarity measure is used, it is normally required to be a distance metric. However, when a similarity measure is used, there is no formal requirement. In this article, we have three contributions. First, we give a formal definition of similarity metric. Second, we show the relationship between similarity metric and distance metric. Third, we present general solutions to normalize a given similarity metric or distance metric.  相似文献   

16.
Semisupervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depends on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y than to z. The proposed clustering algorithm simultaneously learns the underlying dissimilarity measure while finding compact clusters in the given data set using relative comparisons. Through our experimental studies on high-dimensional textual data sets, we demonstrate that the proposed algorithm achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.  相似文献   

17.
针对静态数据管理问题,设计了link cost在不满足三角不等式的情况下几何网络中此问题的近似算法。通过引入两个受限的数据安置作对比,经过类似于均态分析的算法分析,在给定相关的参数的情况下,所给的近似算法具有常数的近似度。不过,网络中link cost的最大值与最小值之比是已知的。  相似文献   

18.
We introduce novel dissimilarity into a probabilistic clustering task to properly measure dissimilarity among multiple clusters when each cluster is characterized by a subpopulation in the mixture model. This measure of dissimilarity is called redundancy-based dissimilarity among probability distributions. From aspects of both source coding and a statistical hypothesis test, we shed light on several of the theoretical reasons for the redundancy-based dissimilarity among probability distributions being a reasonable measure of dissimilarity among clusters. We also elucidate a principle in common for the measures of redundancy-based dissimilarity and Ward's method in terms of hierarchical clustering criteria. Moreover, we show several related theorems that are significant for clustering tasks. In the experiments, properties of the measure of redundancy-based dissimilarity are examined in comparison with several other measures.  相似文献   

19.
直觉模糊相似关系的构造方法   总被引:2,自引:0,他引:2  
传统的模糊相似关系构造方法已不能用于直觉模糊相似关系的构造。基于直觉模糊集的相异度和相似度,研究了直觉模糊相似关系的构造问题。对几种现有直觉模糊集相似度与相异度度量方法进行了分析,在此基础上定义了直觉模糊集的相异度,并给出一种有效的直觉模糊集相异度和相似度度量方法,提出一种实用的直觉模糊相似关系构造方法,以具体算例验证和表明了方法的正确性和有效性。  相似文献   

20.
Searching in metric spaces by spatial approximation   总被引:5,自引:0,他引:5  
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts. Edited by R. Sacks-Davis Received: 17 April 2001 / Accepted: 24 January 2002 / Published online: 14 May 2002  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号