首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Unsupervised feature selection is an important problem, especially for high‐dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity‐based feature clustering, Feature Selection‐based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K‐means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real‐world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi‐Cluster‐based Feature Selection (MCFS) in terms of runtime, feature compression ratio, and the clustering results of K‐means. The results show that FSFC can not only reduce the feature space in less time, but also significantly improve the clustering performance of K‐means.  相似文献   

2.
Clustering is a popular technique for analyzing microarray data sets, with n genes and m experimental conditions. As explored by biologists, there is a real need to identify coregulated gene clusters, which include both positive and negative regulated gene clusters. The existing pattern-based and tendency-based clustering approaches cannot directly be applied to find such coregulated gene clusters, because they are designed for finding positive regulated gene clusters. In this paper, in order to cluster coregulated genes, we propose a coding scheme that allows us to cluster two genes into the same cluster if they have the same code, where two genes that have the same code can be either positive or negative regulated. Based on the coding scheme, we propose a new algorithm for finding maximal subspace coregulated gene clusters with new pruning techniques. A maximal subspace coregulated gene cluster clusters a set of genes on a condition sequence such that the cluster is not included in any other subspace coregulated gene clusters. We conduct extensive experimental studies. Our approach can effectively and efficiently find maximal subspace coregulated gene clusters. In addition, our approach outperforms the existing approaches for finding positive regulated gene clusters.  相似文献   

3.
Several attempts have been made to grasp three‐dimensional (3D) ground shape from a 3D point cloud generated by aerial vehicles, which help fast situation recognition. However, identifying such objects on the ground from a 3D point cloud, which consists of 3D coordinates and color information, is not straightforward due to the gap between the low‐level point information (coordinates and colors) and high‐level context information (objects). In this paper, we propose a ground object recognition and segmentation method from a geo‐referenced point cloud. Basically, we rely on some existing tools to generate such a point cloud from aerial images, and our method tries to give semantics to each set of clustered points. In our method, firstly, such points that correspond to the ground surface are removed using the elevation data from the Geographical Survey Institute. Next, we apply an interpoint distance‐based clustering and color‐based clustering. Then, such clusters that share some regions are merged to correctly identify a cluster that corresponds to a single object. We have evaluated our method in several experiments in real fields. We have confirmed that our method can remove the ground surface within 20 cm error and can recognize most of the objects.  相似文献   

4.
Advances in technology coupled with the availability of low‐cost sensors have resulted in the continuous generation of large time series from several sources. In order to visually explore and compare these time series at different scales, analysts need to execute online analytical processing (OLAP) queries that include constraints and group‐by's at multiple temporal hierarchies. Effective visual analysis requires these queries to be interactive. However, while existing OLAP cube‐based structures can support interactive query rates, the exponential memory requirement to materialize the data cube is often unsuitable for large data sets. Moreover, none of the recent space‐efficient cube data structures allow for updates. Thus, the cube must be re‐computed whenever there is new data, making them impractical in a streaming scenario. We propose Time Lattice, a memory‐efficient data structure that makes use of the implicit temporal hierarchy to enable interactive OLAP queries over large time series. Time Lattice is a subset of a fully materialized cube and is designed to handle fast updates and streaming data. We perform an experimental evaluation which shows that the space efficiency of the data structure does not hamper its performance when compared to the state of the art. In collaboration with signal processing and acoustics research scientists, we use the Time Lattice data structure to design the Noise Profiler, a web‐based visualization framework that supports the analysis of noise from cities. We demonstrate the utility of Noise Profiler through a set of case studies.  相似文献   

5.
The hierarchical edge bundle (HEB) method generates useful visualizations of dense graphs, such as social networks, but requires a predefined clustering hierarchy, and does not easily benefit from existing straight‐line visualization improvements. This paper proposes a new clustering approach that extracts the community structure of a network and organizes it into a hierarchy that is flatter than existing community‐based clustering approaches and maps better to HEB visualization. Our method not only discovers communities and generates clusters with better modularization qualities, but also creates a balanced hierarchy that allows HEB visualization of unstructured social networks without predefined hierarchies. Results on several data sets demonstrate that this approach clarifies real‐world communication, collaboration and competition network structure and reveals information missed in previous visualizations. We further implemented our techniques into a social network visualization application on facebook.com and let users explore the visualization and community clustering of their own social networks.  相似文献   

6.
On the basis of cluster size and cluster cohesion, we propose a generalized cluster‐reliability (CR) measure, which indicates the overall reliability of arguments in a cluster. Taking the reliability of clusters as order‐inducing variables, we introduce a generalized cluster‐reliability‐induced ordered weighted averaging (CRI‐OWA) operator from the viewpoint of combining representative arguments of clusters. Furthermore, we propose a grid‐based cohesion measure for grid‐based clusters. On the basis of this cohesion measure, we obtain the special CR measure and CRI‐OWA operator for the grid‐based clusters. Then we introduced two other special CR measures for graph‐based and prototype‐based clusters, respectively. Taking the CR, computed by these two measures, as order‐inducing variables, we can obtain two other kinds of CRI‐OWA operators for graph‐based and prototype‐based clusters, respectively. © 2012 Wiley Periodicals, Inc.  相似文献   

7.
Multidimensional projection‐based visualization methods typically rely on clustering and attribute selection mechanisms to enable visual analysis of multidimensional data. Clustering is often employed to group similar instances according to their distance in the visual space. However, considering only distances in the visual space may be misleading due to projection errors as well as the lack of guarantees to ensure that distinct clusters contain instances with different content. Identifying clusters made up of a few elements is also an issue for most clustering methods. In this work we propose a novel multidimensional projection‐based visualization technique that relies on representative instances to define clusters in the visual space. Representative instances are selected by a deterministic sampling scheme derived from matrix decomposition, which is sensitive to the variability of data while still been able to handle classes with a small number of instances. Moreover, the sampling mechanism can easily be adapted to select relevant attributes from each cluster. Therefore, our methodology unifies sampling, clustering, and feature selection in a simple framework. A comprehensive set of experiments validate our methodology, showing it outperforms most existing sampling and feature selection techniques. A case study shows the effectiveness of the proposed methodology as a visual data analysis tool.  相似文献   

8.
A mandatory component for many point set algorithms is the availability of consistently oriented vertex‐normals (e.g. for surface reconstruction, feature detection, visualization). Previous orientation methods on meshes or raw point clouds do not consider a global context, are often based on unrealistic assumptions, or have extremely long computation times, making them unusable on real‐world data. We present a novel massively parallelized method to compute globally consistent oriented point normals for raw and unsorted point clouds. Built on the idea of graph‐based energy optimization, we create a complete kNN‐graph over the entire point cloud. A new weighted similarity criterion encodes the graph‐energy. To orient normals in a globally consistent way we perform a highly parallel greedy edge collapse, which merges similar parts of the graph and orients them consistently. We compare our method to current state‐of‐the‐art approaches and achieve speedups of up to two orders of magnitude. The achieved quality of normal orientation is on par or better than existing solutions, especially for real‐world noisy 3D scanned data.  相似文献   

9.
10.
Real‐world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi‐supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets. Generally, traditional SSLs tentatively label the unlabeled data points on the basis of the smoothness assumption that neighboring points should have the same label. When this assumption is violated, unlabeled points are mislabeled injecting noise into the final classifier. An alternative SSL approach is cluster‐then‐label (CTL), which partitions all the data points (labeled and unlabeled) into clusters and creates a classifier by using those clusters. CTL is based on the less restrictive cluster assumption that data points in the same cluster should have the same label. As shown, this allows CTLs to achieve higher classification accuracy on many datasets where the cluster assumption holds for the CTLs, but smoothness does not hold for the traditional SSLs. However, cluster configuration problems (e.g., irrelevant features, insufficient clusters, and incorrectly shaped clusters) could violate the cluster assumption. We propose a new framework for CTLs by using a genetic algorithm (GA) to evolve classifiers without the cluster configuration problems (e.g., the GA removes irrelevant attributes, updates number of clusters, and changes the shape of the clusters). We demonstrate that a CTL based on this framework achieves comparable or higher accuracy with both traditional SSLs and CTLs on 12 University of California, Irvine machine learning datasets.  相似文献   

11.
The distribution of visual attention can be evaluated using eye tracking, providing valuable insights into usability issues and interaction patterns. However, when used in real, augmented, and collaborative environments, new challenges arise that go beyond desktop scenarios and purely virtual environments. Toward addressing these challenges, we present a visualization technique that provides complementary views on the movement and eye tracking data recorded from multiple people in real-world environments. Our method is based on a space-time cube visualization and a linked 3D replay of recorded data. We showcase our approach with an experiment that examines how people investigate an artwork collection. The visualization provides insights into how people moved and inspected individual pictures in their spatial context over time. In contrast to existing methods, this analysis is possible for multiple participants without extensive annotation of areas of interest. Our technique was evaluated with a think-aloud experiment to investigate analysis strategies and an interview with domain experts to examine the applicability in other research fields.  相似文献   

12.
Visual analytics of multidimensional multivariate data is a challenging task because of the difficulty in understanding metrics in attribute spaces with more than three dimensions. Frequently, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records with similar attribute values. A large number of (typically hierarchical) clustering algorithms have been developed to group individual records to clusters of statistical significance. However, only few visualization techniques exist for further exploring and understanding the clustering results. We propose visualization and interaction methods for analyzing individual clusters as well as cluster distribution within and across levels in the cluster hierarchy. We also provide a clustering method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional multivariate space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. To visually represent the cluster hierarchy, we present a 2D radial layout that supports an intuitive understanding of the distribution structure of the multidimensional multivariate data set. Individual clusters can be explored interactively using parallel coordinates when being selected in the cluster tree. Furthermore, we integrate circular parallel coordinates into the radial hierarchical cluster tree layout, which allows for the analysis of the overall cluster distribution. This visual representation supports the comprehension of the relations between clusters and the original attributes. The combination of the 2D radial layout and the circular parallel coordinates is used to overcome the overplotting problem of parallel coordinates when looking into data sets with many records. We apply an automatic coloring scheme based on the 2D radial layout of the hierarchical cluster tree encoding hue, saturation, and value of the HSV color space. The colors support linking the 2D radial layout to other views such as the standard parallel coordinates or, in case data is obtained from multidimensional spatial data, the distribution in object space.  相似文献   

13.
Molecular visualization is often challenged with rendering of large molecular structures in real time. We introduce a novel approach that enables us to show even large protein complexes. Our method is based on the level‐of‐detail concept, where we exploit three different abstractions combined in one visualization. Firstly, molecular surface abstraction exploits three different surfaces, solvent‐excluded surface (SES), Gaussian kernels and van der Waals spheres, combined as one surface by linear interpolation. Secondly, we introduce three shading abstraction levels and a method for creating seamless transitions between these representations. The SES representation with full shading and added contours stands in focus while on the other side a sphere representation of a cluster of atoms with constant shading and without contours provide the context. Thirdly, we propose a hierarchical abstraction based on a set of clusters formed on molecular atoms. All three abstraction models are driven by one importance function classifying the scene into the near‐, mid‐ and far‐field. Moreover, we introduce a methodology to render the entire molecule directly using the A‐buffer technique, which further improves the performance. The rendering performance is evaluated on series of molecules of varying atom counts.  相似文献   

14.
Most of existing traffic simulation efforts focus on urban regions with a coarse two‐dimensional representation; relatively few studies have been conducted to simulate realistic three‐dimensional traffic flows on a large, complex road web in rural scenes. In this paper, we present a novel agent‐based approach called accident‐avoidance full velocity difference model (abbreviated as AA‐FVDM) to simulate realistic street‐level rural traffics, on top of the existing FVDM. The main distinction between FVDM and AA‐FVDM is that FVDM cannot handle a critical real‐world traffic problem while AA‐FVDM settles this problem and retains the essence of FVDM. We also design a novel scheme to animate the lane‐changing maneuvering process (in particular, the execution course). Through numerous simulations, we demonstrate that besides addressing a previously unaddressed real‐world traffic problem, our AA‐FVDM method efficiently (in real time) simulates large‐scale traffic flows (tens of thousands of vehicles) with realistic, smooth effects. Furthermore, we validate our method using real‐world traffic data, and the validation results show that our method measurably outperforms state‐of‐the‐art traffic simulation methods.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
Quotient Cube和QC-tree试图在浓缩一个数据立方尺寸的同时,保持该数据立方蕴涵的语义,但是,前者没有语义关系的存储,后者存储的语义关系是晦涩模糊的.为此提出了下钻立方结构,首次从语义角度考虑数据立方存储,存储的不是类的内容,而是类之间的直接下钻关系.下钻立方不仅能够极大地减小数据立方的存储尺寸,而且可以清晰地表达原数据立方蕴涵的下钻语义.此外,下钻立方具有较高的查询响应性能,这一点在范围查询中表现得尤其显著.实验和分析表明,下钻立方在存储尺寸和查询响应方面明显优于QC-tree,适于用来组织和存储数据立方.  相似文献   

16.
Rajkumar Buyya 《Software》2000,30(7):723-739
Workstation/PC clusters have become a cost‐effective solution for high performance computing. C‐DAC's PARAM 10000 (or OpenFrame, internal code name) is a large cluster of high‐performance workstations interconnected through low‐latency and high bandwidth networks. The management and control of such a huge system is a tedious and challenging task since workstations/PCs are typically designed to work as a standalone system rather than part of a cluster. We have designed and developed a tool called PARMON that allows effective monitoring and control of large clusters. It supports the monitoring of critical system resource activities and their utilization at three different levels: entire system, node and component level. It also allows the monitoring of multiple instances of the same component; for instance, multiple processors in SMP type cluster nodes. PARMON is a portable, flexible, interactive, scalable, location‐transparent, and comprehensive environment based on client–server technology. The major components of PARMON are parmon‐server—system resource activities and utilization information provider and parmon‐client—a GUI based client responsible for interacting with parmon‐server and users for data gathering in real‐time and presenting information graphically for visualization. The client is developed as a Java application and the server is developed as a multithreaded server using C and POSIX/Solaris threads since Java does not support interfaces to access system internals. PARMON is regularly used to monitor PARAM 10000 supercomputer, a cluster of 48+ Ultra‐4 workstations powered by the Solaris operating system. The recent popularity of Beowulf‐class clusters (dedicated Linux clusters) in terms of price–performance ratio has motivated us to port PARMON to Linux (accomplished by porting system dependent portions of parmon‐server). This enables management/monitoring of both Solaris and Linux‐based clusters (federated clusters) through a single user interface. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

17.
18.
SSW: A Small-World-Based Overlay for Peer-to-Peer Search   总被引:2,自引:0,他引:2  
Peer-to-peer (P2P) systems have become a popular platform for sharing and exchanging voluminous information among thousands or even millions of users. The massive amount of information shared in such systems mandates efficient semantic-based search instead of key-based search. The majority of existing proposals can only support simple key-based search rather than semantic-based search. This paper presents the design of an overlay network, namely, semantic small world (SSW), that facilitates efficient semantic-based search in P2P systems. SSW achieves the efficiency based on four ideas: 1) semantic clustering, where peers with similar semantics organize into peer clusters, 2) dimension reduction, where to address the high maintenance overhead associated with capturing high-dimensional data semantics in the overlay, peer clusters are adaptively mapped to a one-dimensional naming space, 3) small world network, where peer clusters form into a one-dimensional small world network, which is search efficient with low maintenance overhead, and 4) efficient search algorithms, where peers perform efficient semantic-based search, including approximate point query and range query in the proposed overlay. Extensive experiments using both synthetic data and real data demonstrate that SSW is superior to the state of the art on various aspects, including scalability, maintenance overhead, adaptivity to distribution of data and locality of interest, resilience to peer failures, load balancing, and efficiency in support of various types of queries on data objects with high dimensions.  相似文献   

19.
前缀立方的索引   总被引:1,自引:0,他引:1  
前缀立方是最近提出的一种新的数据立方结构.它利用前缀共享和基本单元组有效地缩小了数据立方的尺寸,相应减少了数据立方的计算时间.为提高前缀立方的查询性能,本文提出了它的一种索引机制Prefix-CuboidTree.文中用真实数据集和模拟数据集进行大量实验,证明了该索引机制的查询性能.  相似文献   

20.
In this report, we organize and reflect on recent advances and challenges in the field of sports data visualization. The exponentially‐growing body of visualization research based on sports data is a prime indication of the importance and timeliness of this report. Sports data visualization research encompasses the breadth of visualization tasks and goals: exploring the design of new visualization techniques; adapting existing visualizations to a novel domain; and conducting design studies and evaluations in close collaboration with experts, including practitioners, enthusiasts, and journalists. Frequently this research has impact beyond sports in both academia and in industry because it is i) grounded in realistic, highly heterogeneous data, ii) applied to real‐world problems, and iii) designed in close collaboration with domain experts. In this report, we analyze current research contributions through the lens of three categories of sports data: box score data (data containing statistical summaries of a sport event such as a game), tracking data (data about in‐game actions and trajectories), and meta‐data (data about the sport and its participants but not necessarily a given game). We conclude this report with a high‐level discussion of sports visualization research informed by our analysis—identifying critical research gaps and valuable opportunities for the visualization community. More information is available at the STAR's website: https://sportsdataviz.github.io/ .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号