共查询到20条相似文献,搜索用时 15 毫秒
1.
Victor Cheng Author Vitae Chun-Hung Li Author Vitae James T. Kwok Author Vitae Author Vitae 《Pattern recognition》2004,37(7):1471-1477
Defining a good distance (dissimilarity) measure between patterns is of crucial importance in many classification and clustering algorithms. While a lot of work has been performed on continuous attributes, nominal attributes are more difficult to handle. A popular approach is to use the value difference metric (VDM) to define a real-valued distance measure on nominal values. However, VDM treats the attributes separately and ignores any possible interactions among attributes. In this paper, we propose the use of adaptive dissimilarity matrices for measuring the dissimilarities between nominal values. These matrices are learned via optimizing an error function on the training samples. Experimental results show that this approach leads to better classification performance. Moreover, it also allows easier interpretation of (dis)similarity between different nominal values. 相似文献
2.
Nuno Mendes Pedro Neto J. Norberto Pires Altino Loureiro 《Expert systems with applications》2013,40(4):1143-1151
This paper presents methodologies to discretize nominal robot paths extracted from 3-D CAD drawings. Behind robot path discretization is the ability to have a robot adjusting the traversed paths so that the contact between robot tool and work-piece is properly maintained. In addition, a hybrid force/motion control system based on Fuzzy-PI control is proposed to adjust robot paths with external sensory feedback. All these capabilities allow to facilitate the robot programming process and to increase the robot’s autonomy. 相似文献
3.
The need for measuring the dispersion of nominal categorical attributes appears in several applications, like clustering or data anonymization. For a nominal attribute whose categories can be hierarchically classified, a measure of the variance of a sample drawn from that attribute is proposed which takes the attribute’s hierarchy into account. The new measure is the reciprocal of “consanguinity”: the less related the nominal categories in the sample, the higher the measured variance. For non-hierarchical nominal attributes, the proposed measure yields results consistent with previous diversity indicators. Applications of the new nominal variance measure to economic diversity measurement and data anonymization are also discussed. 相似文献
4.
We propose a new distance called Hierarchical Semantic-Based Distance (HSBD), devoted to the comparison of nominal histograms equipped with a dissimilarity matrix providing the semantic correlations between the bins. The computation of this distance is based on a hierarchical strategy, progressively merging the considered instances (and their bins) according to their semantic proximity. For each level of this hierarchy, a standard bin-to-bin distance is computed between the corresponding pair of histograms. In order to obtain the proposed distance, these bin-to-bin distances are then fused by taking into account the semantic coherency of their associated level. From this modus operandi, the proposed distance can handle histograms which are generally compared thanks to cross-bin distances. It preserves the advantages of such cross-bin distances (namely robustness to histogram translation and histogram bin size issues), while inheriting the low computational cost of bin-to-bin distances. Validations in the context of geographical data classification emphasize the relevance and usefulness of the proposed distance. 相似文献
5.
为克服传统的全监督机器学习模型的训练依赖于大量的标注样本的弱点,给出一种半监督学习和主动学习相结合的算法。根据主动学习选择策略选择最有价值的句子来标注,结合半监督来充分利用未标注的句子。结合汉语语料的特点,改进主动学习选择策略。实验结果表明,与采用随机选择标注样本相比,在使用相同数目的训练样本的情况下,该算法可以使学习器的F-score调高10.2%,在分类器到达相同性能的情况下,人工标注量可以减少32%,学习器对标注样本的需求得到了有效降低。 相似文献
6.
《Intelligent Data Analysis》1998,2(1-4):265-286
The main problem considered in this paper consists of binarizing categorical (nominal) attributes having a very large number of values (204 in our application). A small number of relevant binary attributes are gathered from each initial attribute. Let us suppose that we want to binarize a categorical attribute v with L values, where L is large or very large. The total number of binary attributes that can be extracted from v is 2L−1− 1, which in the case of a large L is prohibitive. Our idea is to select only those binary attributes that are predictive; and these shall constitute a small fraction of all possible binary attributes. In order to do this, the significant idea consists in grouping the L values of a categorical attribute by means of an hierarchical clustering method. To do so, we need to define a similarity between values, which is associated with their predictive power. By clustering the L values into a small number of clusters (J), we define a new categorical attribute with only J values. The hierarchical clustering method used by us, AVL, allows to choose a significant value for J. Now, we could consider using all the 2L−1− 1 binary attributes associated with this new categorical attribute. Nevertheless, the J values are tree-structured, because we have used a hierarchical clustering method. We profit from this, and consider only about 2 × J binary attributes. If L is extremely large, for complexity and statistical reasons, we might not be able to apply a clustering algorithm directly. In this case, we start by “factorizing” v into a pair (v2, v2), each one with about √L(v) values. For a simple example, consider an attribute v with only four values m1,m2, m3,m4. Obviously, in this example, there is no need to factorize the set of values of v, because it has a very small number of values. Nevertheless, for illustration purposes, v could be decomposed (factorized) into 2 attributes with only two values each; the correspondence between the values of v and (v2, v2) would be
υ | (υ1, | υ2) |
---|---|---|
m1 | 1 | 1 |
m2 | 1 | 2 |
m3 | 2 | 1 |
m4 | 2 | 2 |