首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we present a systematization of techniques that use quality metrics to help in the visual exploration of meaningful patterns in high-dimensional data. In a number of recent papers, different quality metrics are proposed to automate the demanding search through large spaces of alternative visualizations (e.g., alternative projections or ordering), allowing the user to concentrate on the most promising visualizations suggested by the quality metrics. Over the last decade, this approach has witnessed a remarkable development but few reflections exist on how these methods are related to each other and how the approach can be developed further. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose a systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research.  相似文献   

2.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

3.
4.
In this paper, we study the visual mining of time series, and we contribute to the study and evaluation of 3D tubular visualizations. We describe the state of the art in the visual mining of time-dependent data, and we concentrate on visualizations that use a tubular shape to represent data. After analyzing the motivations for studying such a representation, we present an extended tubular visualization. We propose new visual encodings of the time and data, new interactions for knowledge discovery, and the use of rearrangement clustering. We show how this visualization can be used in several real-world domains and that it can address large datasets. We present a comparative user study. We conclude with the advantages and the drawbacks of our method (especially the tubular shape).  相似文献   

5.
In traditional illustration the choice of appropriate styles and rendering techniques is guided by the intention of the artist. For illustrative volume visualizations it is difficult to specify the mapping between the 3D data and the visual representation that preserves the intention of the user. The semantic layers concept establishes this mapping with a linguistic formulation of rules that directly map data features to rendering styles. With semantic layers fuzzy logic is used to evaluate the user defined illustration rules in a preprocessing step. In this paper we introduce interaction‐dependent rules that are evaluated for each frame and are therefore computationally more expensive. Enabling interaction‐dependent rules, however, allows the use of a new class of semantics, resulting in more expressive interactive illustrations. We show that the evaluation of the fuzzy logic can be done on the graphics hardware enabling the efficient use of interaction‐dependent semantics. Further we introduce the flat rendering mode and discuss how different rendering parameters are influenced by the rule base. Our approach provides high quality illustrative volume renderings at interactive frame rates, guided by the specification of illustration rules.  相似文献   

6.
Illustrative interactive stipple rendering   总被引:1,自引:0,他引:1  
Simulating hand-drawn illustration can succinctly express information in a manner that is communicative and informative. We present a framework for an interactive direct stipple rendering of volume and surface-based objects. By combining the principles of artistic and scientific illustration, we explore several feature enhancement techniques to create effective, interactive visualizations of scientific and medical data sets. We also introduce a rendering mechanism that generates appropriate point lists at all resolutions during an automatic preprocess and modifies rendering styles through different combinations of these feature enhancements. The new system is an effective way to interactively preview large, complex volume and surface data sets in a concise, meaningful, and illustrative manner. Stippling is effective for many applications and provides a quick and efficient method to investigate both volume and surface models.  相似文献   

7.
In distance metric learning, recent work has shown that value difference metric (VDM) with a strong attribute independence assumption outperforms other existing distance metrics. However, an open question is whether VDM with a less restrictive assumption can perform even better. Many approaches have been proposed to improve VDM by weakening the assumption. In this paper, we make a comprehensive survey on the existing improved approaches and then propose a new approach to improve VDM by attribute weighting. We name the proposed new distance function as attribute-weighted value difference metric (AWVDM). Moreover, we propose a modified attribute-weighted value difference metric (MAWVDM) by incorporating the learned attribute weights into the conditional probability estimates of AWVDM. AWVDM and MAWVDM significantly outperform VDM and inherit the computational simplicity of VDM simultaneously. Experimental results on a large number of UCI data sets validate the performance of AWVDM and MAWVDM.  相似文献   

8.
Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiveness of clustering. In this paper, we propose an adaptive Semi-supervised Clustering Kernel Method based on Metric learning (SCKMM) to mitigate the above problems. Specifically, we first construct an objective function from pairwise constraints to automatically estimate the parameter of the Gaussian kernel. Then, we use pairwise constraint-based K-means approach to solve the violation issue of constraints and to cluster the data. Furthermore, we introduce metric learning into nonlinear semi-supervised clustering to improve separability of the data for clustering. Finally, we perform clustering and metric learning simultaneously. Experimental results on a number of real-world data sets validate the effectiveness of the proposed method.  相似文献   

9.
软件缺陷预测能够提高软件开发和测试的效率,保障软件质量。无监督缺陷预测方法具有不需要标签数据的特点,从而能够快速应用于工程实践中。提出了基于概率的无监督缺陷预测方法—PCLA,将度量元值与阈值的差值映射为概率,使用概率评估类存在缺陷的可能性,然后再通过聚类和标记来完成缺陷预测,以解决现有无监督方法直接根据阈值判断时对阈值比较敏感而引起的信息丢失问题。将PCLA方法应用在NetGen和Relink两组数据集,共7个软件项目上,实验结果表明PCLA方法在查全率、查准率、F-measure上相对现有无监督方法分别平均提升4.1%、2.52%、3.14%。  相似文献   

10.
The performance of many supervised and unsupervised learning algorithms is very sensitive to the choice of an appropriate distance metric. Previous work in metric learning and adaptation has mostly been focused on classification tasks by making use of class label information. In standard clustering tasks, however, class label information is not available. In order to adapt the metric to improve the clustering results, some background knowledge or side information is needed. One useful type of side information is in the form of pairwise similarity or dissimilarity information. Recently, some novel methods (e.g., the parametric method proposed by Xing et al.) for learning global metrics based on pairwise side information have been shown to demonstrate promising results. In this paper, we propose a nonparametric method, called relaxational metric adaptation (RMA), for the same metric adaptation problem. While RMA is local in the sense that it allows locally adaptive metrics, it is also global because even patterns not in the vicinity can have long-range effects on the metric adaptation process. Experimental results for semi-supervised clustering based on both simulated and real-world data sets show that RMA outperforms Xing et al.'s method under most situations. Besides applying RMA to semi-supervised learning, we have also used it to improve the performance of content-based image retrieval systems through metric adaptation. Experimental results based on two real-world image databases show that RMA significantly outperforms other methods in improving the image retrieval performance.  相似文献   

11.
Constrained clustering methods (that usually use must-link and/or cannot-link constraints) have been received much attention in the last decade. Recently, kernel adaptation or kernel learning has been considered as a powerful approach for constrained clustering. However, these methods usually either allow only special forms of kernels or learn non-parametric kernel matrices and scale very poorly. Therefore, they either learn a metric that has low flexibility or are applicable only on small data sets due to their high computational complexity. In this paper, we propose a more efficient non-linear metric learning method that learns a low-rank kernel matrix from must-link and cannot-link constraints and the topological structure of data. We formulate the proposed method as a trace ratio optimization problem and learn appropriate distance metrics through finding optimal low-rank kernel matrices. We solve the proposed optimization problem much more efficiently than SDP solvers. Additionally, we show that the spectral clustering methods can be considered as a special form of low-rank kernel learning methods. Extensive experiments have demonstrated the superiority of the proposed method compared to recently introduced kernel learning methods.  相似文献   

12.
The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks.  相似文献   

13.
在7个数据集上对3种不同聚类算法与3种不同相似性度最标准的多种组合进行实验,以评估这些因素对聚类性能的影响.为便于确定聚类参数,提出一种针对蛋白质结构预测的聚类中心选择算法.实验结果表明,在3种相似性度量标准中,RMSD对于聚类的效果最好,而在3种聚类算法中,SPICKER性能最优,其次是AP聚类算法.  相似文献   

14.
15.
Diffusion Tensor Imaging (DTI) has made feasible the visualization of the fibrous structure of the brain white matter. In the last decades, several fiber‐tracking methods have been developed to reconstruct the fiber tracts from DTI data. Usually these fiber tracts are shown individually based on some selection criteria like region of interest. However, if the white matter as a whole is being visualized clutter is generated by directly rendering the individual fiber tracts. Often users are actually interested in fiber bundles, anatomically meaningful entities that abstract from the fibers they contain. Several clustering techniques have been developed that try to group the fiber tracts in fiber bundles. However, even if clustering succeeds, the complex nature of white matter still makes it difficult to investigate. In this paper, we propose the use of illustration techniques to ease the exploration of white matter clusters. We create a technique to visualize an individual cluster as a whole. The amount of fibers visualized for the cluster is reduced to just a few hint lines, and silhouette and contours are used to improve the definition of the cluster borders. Multiple clusters can be easily visualized by a combination of the single cluster visualizations. Focus+context concepts are used to extend the multiple‐cluster renderings. Exploded views ease the exploration of the focus cluster while keeping the context clusters in an abstract form. Real‐time results are achieved by the GPU implementation of the presented techniques.  相似文献   

16.
We address the problem of metric learning for multi-view data. Many metric learning algorithms have been proposed, most of them focus just on single view circumstances, and only a few deal with multi-view data. In this paper, motivated by the co-training framework, we propose an algorithm-independent framework, named co-metric, to learn Mahalanobis metrics in multi-view settings. In its implementation, an off-the-shelf single-view metric learning algorithm is used to learn metrics in individual views of a few labeled examples. Then the most confidently-labeled examples chosen from the unlabeled set are used to guide the metric learning in the next loop. This procedure is repeated until some stop criteria are met. The framework can accommodate most existing metric learning algorithms whether types-of-side-information or example-labels are used. In addition it can naturally deal with semi-supervised circumstances under more than two views. Our comparative experiments demonstrate its competiveness and effectiveness.  相似文献   

17.
Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algorithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance.  相似文献   

18.
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering   总被引:2,自引:0,他引:2  
In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depends on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two-phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.  相似文献   

19.
Development of geometry data compression techniques in the past years has been limited by the lack of a metric with proven correlation with human perception of mesh distortion. Many algorithms have been proposed, but usually the aim has been to minimise mean squared error, or some of its derivatives. In the field of dynamic mesh compression, the situation has changed with the recent proposal of the STED metric, which has been shown to capture the human perception of mesh distortion much better than previous metrics. In this paper we show how existing algorithms can be steered to provide optimal results with respect to this metric, and we propose a novel dynamic mesh compression algorithm, based on trajectory space PCA and Laplacian coordinates, specifically designed to minimise the newly proposed STED error. Our experiments show that using the proposed algorithm, we were able to reduce the required data rate by up to 50% while preserving the introduced STED error.  相似文献   

20.
大多数现存的谱聚类方法均使用传统距离度量计算样本之间的相似性, 这样仅仅考虑了两两样本之间的相似性而忽略了周围的近邻信息, 更没有顾及数据的全局性分布结构. 因此, 本文提出一种新的融合欧氏距离和 Kendall Tau距离的谱聚类方法. 该方法通过融合两两样本之间的直接距离以及其周围的近邻信息, 充分利用了不同的相似性度量可以从不同角度抓取数据之间结构信息的优势, 更加全面地反映数据的底层结构信息. 通过与传统聚类算法在UCI标准数据集上的实验结果作比较, 验证了本文的方法可以显著提高聚类效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号