首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Today, as data sets used in computations grow in size and complexity, the technologies developed over the years to deal with scientific data sets have become less efficient and effective. Many frequently used operations, such as eigenvector computation, could quickly exhaust our desktop workstations once the data size reaches certain limits. On the other hand, the high-dimensional data sets we collect every day don't relieve the problem. Many conventional metric designs that build on quantitative or categorical data sets cannot be applied directly to heterogeneous data sets with multiple data types. While building new machines with more resources might conquer the data size problems, the complexity of today's computations requires a new breed of projection techniques to support analysis of the data and verification of the results. We introduce the concept of a data signature, which captures the essence of a scientific data set in a compact format, and use it to conduct analysis as if using the original. A time-dependent climate simulation data set demonstrates our approach and presents the results  相似文献   

2.
High-dimensional data, such as documents, digital images, and audio clips, can be considered as spatial objects, which induce a metric space where the metric can be used to measure dissimilarities between objects. We investigate a method for retrieving objects within some distance from a given object by utilizing a spatial indexing/access method R-tree, which usually assumes Euclidean metric. First, we prove that objects in discreteL 1 (or Manhattan distance) metric space can be embedded into vertices of a unit hyper-cube in Euclidean space when the square root ofL 1 distance is used as the distance. To take fully advantage of R-tree spatial indexing, we have to project objects into space of relatively lower dimension. We adopt FastMap by Faloutsos and Lin to reduce the dimension of object space. The range corresponding to a query (Q, h) for retrieving objects within distanceh from a objectQ is naturally considered as a hyper-sphere even after FastMap projection, which is an orthogonal projection in Euclidean space. However, it is turned out that the query range is contracted into a smaller hyper-box than the hyper-sphere by applying FastMap to objects embedded in the above mentioned way. Finally, we give a brief summary of experiments in applying our method to Japanese chess boards. Takeshi Shinohara, Dr.Sci.: He is a Professor in the Department of Artificial Intelligence at Kyushu Institute of Technology. He obtained his bachelors degree in Mathematics from Kyoto University in 1980, and his Dr. Sci. from Kyushu University in 1986. His research interests are in Computational/Algorithmic Learning Theory, Information Retrieval, and Approximate Retrieval of Multimedia Data. Hiroki Ishizaka, Dr.Sci.: He is an Associate Professor in the Department of Artificial Intelligence at Kyushu Institute of Technology. He obtained his bachelors degree in Mathematics from Kyushu University in 1984, and his Dr.Sci. from Kyushu University in 1993. His research interests are in Computational/Algorithmic Learning Theory.  相似文献   

3.
We have implemented the USGS National Climate Change Viewer (NCCV), which is an easy-to-use web application that displays future projections from global climate models over the United States at the state, county and watershed scales. We incorporate the NASA NEX-DCP30 statistically downscaled temperature and precipitation for 30 global climate models being used in the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC), and hydrologic variables we simulated using a simple water-balance model. Our application summarizes very large, complex data sets at scales relevant to resource managers and citizens and makes climate-change projection information accessible to users of varying skill levels. Tens of terabytes of high-resolution climate and water-balance data are distilled to compact binary format summary files that are used in the application. To alleviate slow response times under high loads, we developed a map caching technique that reduces the time it takes to generate maps by several orders of magnitude. The reduced access time scales to >500 concurrent users. We provide code examples that demonstrate key aspects of data processing, data exporting/importing and the caching technique used in the NCCV.  相似文献   

4.
Research issues in model-based visualization of complex data sets   总被引:1,自引:0,他引:1  
At the most abstract level, data visualization maps discrete values computed over an n-dimensional domain onto pixel colors. It is largely a dimension-reducing process justified by its leverage on human perceptual capacities for extracting information from visual stimuli. The difficulty is to implement a mapping that reveals the data characteristics relevant to the application at hand. Effective visualization solutions let the user control the process parameters interactively and enhance the automatically extracted features. We argue for an intelligent, model-based approach to visualization, which extracts the intrinsic data characteristics and constructs multiresolution graphics models suitable for interactive rendering on commercially available hardware adapters. The model-based approach has four parts, which we summarize  相似文献   

5.
After two decades of research, the techniques for efficient similarity search in metric spaces have combined virtually all the available tricks resulting in many structural index designs. As the representative state-of-the-art metric access methods (also called metric indexes) that vary in the usage of filtering rules and in structural designs, we could mention the M-tree, the M-Index and the List of Clusters, to name a few. In this paper, we present the concept of cut-regions that could heavily improve the performance of metric indexes that were originally designed to employ simple ball-regions. We show that the shape of cut-regions is far more compact than that of ball-regions, yet preserving simple and concise representation. We present three re-designed metric indexes originating from the above-mentioned ones but utilizing cut-regions instead of ball-regions. We show that cut-regions can be fully utilized in the index structure, positively affecting not only query processing but also the index construction. In the experiments we show that the re-designed metric indexes significantly outperform their original versions.  相似文献   

6.
We describe an architecture for distributed collaborative visualization that integrates video conferencing, distributed data management and grid technologies as well as tangible interaction devices for visualization. High-speed, low-latency optical networks support high-quality collaborative interaction and remote visualization of large data.  相似文献   

7.
This paper presents a new approach to carry out erosion, dilation and connected component labeling. We use the extreme vertices model, an orthogonal polyhedra representation, to describe binary images and volume data sets in a very efficient way.

Our proposal does not use a voxel-based approach but deals with the inner sections of the object. It allows to treat images and volumes indistinctly using the same algorithm and data structure with no overhead of memory and can be applied to manifold as well as non-manifold data. The connected component labeling algorithm actually detects non-manifold zones and permits to break or not the objects at these zones by an user-specified parameter.  相似文献   


8.
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests.  相似文献   

9.
Some approximate indexing schemes have been recently proposed in metric spaces which sort the objects in the database according to pseudo-scores. It is known that (1) some of them provide a very good trade-off between response time and accuracy, and (2) probability-based pseudo-scores can provide an optimal trade-off in range queries if the probabilities are correctly estimated. Based on these facts, we propose a probabilistic enhancement scheme which can be applied to any pseudo-score based scheme. Our scheme computes probability-based pseudo-scores using pseudo-scores obtained from a pseudo-score based scheme. In order to estimate the probability-based pseudo-scores, we use the object-specific parameters in logistic regression and learn the parameters using MAP (Maximum a Posteriori) estimation and the empirical Bayes method. We also propose a technique which speeds up learning the parameters using pseudo-scores. We applied our scheme to the two state-of-the-art schemes: the standard pivot-based scheme and the permutation-based scheme, and evaluated them using various kinds of datasets from the Metric Space Library. The results showed that our scheme outperformed the conventional schemes, with regard to both the number of distance computations and the CPU time, in all the datasets.  相似文献   

10.
均值漂移谱聚类(MSSC)算法为模式识别聚类任务提供了一种较新的方案.然而由于其内嵌均值漂移过程的时问复杂度与样本容量呈平方关系,其在大数据集环境的实用性受到大大削弱.利用快速压缩集密度估计器(FRSDE)替代Parren窗密度估计式(PW)并融合基于图的松弛聚类(GRC)方法,提出了快速均值漂移谱聚类(FMSSC)算法.相比原MSSC,该算法的总体渐进时间复杂度与样本容量呈线性关系,并具有自适应性和便捷性.  相似文献   

11.
Dr. K. Dürre 《Computing》1976,16(3):271-279
Given a non-branched tree withn vertices. Then, by ann-g-coloration we understand a partition of the set of vertices into no more thang classes, such that adjacent vertices belong to different classes. Supposed the set \(\mathfrak{S}\) of alln-g-colorations (for givenn andg) is lexicographically ordered, here are given two algorithms: the first directly determines (without using the set proper) the ordinal number of an arbitrary element of \(\mathfrak{S}\) ; the other directly generates an element of \(\mathfrak{S}\) from its given ordinal number.  相似文献   

12.
This work focus on fast nearest neighbor (NN) search algorithms that can work in any metric space (not just the Euclidean distance) and where the distance computation is very time consuming. One of the most well known methods in this field is the AESA algorithm, used as baseline for performance measurement for over twenty years. The AESA works in two steps that repeats: first it searches a promising candidate to NN and computes its distance (approximation step), next it eliminates all the unsuitable NN candidates in view of the new information acquired in the previous calculation (elimination step).This work introduces the PiAESA algorithm. This algorithm improves the performance of the AESA algorithm by splitting the approximation criterion: on the first iterations, when there is not enough information to find good NN candidates, it uses a list of pivots (objects in the database) to obtain a cheap approximation of the distance function. Once a good approximation is obtained it switches to the AESA usual behavior. As the pivot list is built in preprocessing time, the run time of PiAESA is almost the same than the AESA one.In this work, we report experiments comparing with some competing methods. Our empirical results show that this new approach obtains a significant reduction of distance computations with no execution time penalty.  相似文献   

13.
Population models are widely applied in biomedical data analysis since they characterize both the average and individual responses of a population of subjects. In the absence of a reliable mechanistic model, one can resort to the Bayesian nonparametric approach that models the individual curves as Gaussian processes. This paper develops an efficient computational scheme for estimating the average and individual curves from large data sets collected in standardized experiments, i.e. with a fixed sampling schedule. It is shown that the overall scheme exhibits a “client-server” architecture. The server is in charge of handling and processing the collective data base of past experiments. The clients ask the server for the information needed to reconstruct the individual curve in a single new experiment. This architecture allows the clients to take advantage of the overall data set without violating possible privacy and confidentiality constraints and with negligible computational effort.  相似文献   

14.
Data visualization of high-dimensional data is possible through the use of dimensionality reduction techniques. However, in deciding which dimensionality reduction techniques to use in practice, quantitative metrics are necessary for evaluating the results of the transformation and visualization of the lower dimensional embedding. In this paper, we propose a manifold visualization metric based on the pairwise correlation of the geodesic distance in a data manifold. This metric is compared with other metrics based on the Euclidean distance, Mahalanobis distance, City Block metric, Minkowski metric, cosine distance, Chebychev distance, and Spearman distance. The results of applying different dimensionality reduction techniques on various types of nonlinear manifolds are compared and discussed. Our experiments show that our proposed metric is suitable for quantitatively evaluating the results of the dimensionality reduction techniques if the data lies on an open planar nonlinear manifold. This has practical significance in the implementation of knowledge-based visualization systems and the application of knowledge-based dimensionality reduction methods.  相似文献   

15.
音频具有数据量大、维数高等特点,直接进行音频检索会造成“特征维数灾难”,因此有必要从音频提取最能表现音频特征的音频帧。提出一种基于模糊粗糙集模型(Fuzzy Rough Set Model,FRSM)的音频数据约简算法,根据隶属度对音频数据进行模糊离散,基于知识表达能力约简属性,以等价划分计算具有等同分类能力的知识核。实验结果表明,该算法能够得到最小约简,并且最大程度地保持音频特征,提高检索效率。  相似文献   

16.
Riemannian metric tensors are used to control the adaptation of meshes for finite element and finite volume computations. To study the numerous metric construction and manipulation techniques, a new method has been developed to visualize two-dimensional metrics without interference from an adaptation algorithm. This method traces a network of orthogonal tensor lines, tangent to the eigenvectors of the metric field, to form a pseudo-mesh visually close to a perfectly adapted mesh but without many of its constraints. Anisotropic metrics can be visualized directly using such pseudo-meshes but, for isotropic metrics, the eigensystem is degenerate and an anisotropic perturbation has to be used. This perturbation merely preserves directional information usually present during metric construction and is small enough, about 1% of the prescribed target element size, to be visually imperceptible. Both analytical and solution-based examples show the effectiveness and usefulness of the present method. As an example, pseudo-meshes are used to visualize the effect on metrics of Laplacian-like smoothing and gradation control techniques. Application to adaptive quadrilateral mesh generation is also discussed.  相似文献   

17.
We consider the problem of similarity search in databases with costly metric distance measures. Given limited main memory, our goal is to develop a reference-based index that reduces the number of comparisons in order to answer a query. The idea in reference-based indexing is to select a small set of reference objects that serve as a surrogate for the other objects in the database. We consider novel strategies for selection of references and assigning references to database objects. For dynamic databases with frequent updates, we propose two incremental versions of the selection algorithm. Our experimental results show that our selection and assignment methods far outperform competing methods. This work is partially supported by the National Science Foundation under Grant No. 0347408.  相似文献   

18.
提出了一种基于内容的视频索引方法。将视频内容分为人脸(face)、风景(landscape)和运动目标(object motion)三种模式,实现手机平台上的快速视频索引。 首先在不同模式下用滑动窗口的方法计算出每个时间片的感兴趣程度。在face模式下,利用水平投影直方图分析人脸所在的大致位置,综合统计人脸区域的变化;在Landscape模式下,进行HSV空间的直方图统计;在object motion模式下,用基于光流的跟踪算法得出目标的运动矢量。在用户选择对应的模式后,对每个时间片内的视频求得各前后帧的  相似文献   

19.
Data in its raw form can potentially contain valuable information, but much of that value is lost if it cannot be presented to a user in a way that is useful and meaningful. Data visualization techniques offer a solution to this issue. Such methods are especially useful in spatial data domains such as medical scan data and geophysical data. However, to properly see trends in data or to relate data from multiple sources, multiple-data set visualization techniques must be used. In research with the time-line paradigm, we have integrated multiple streaming data sources into a single visual interface. Data visualization takes place on several levels, from the visualization of query results in a time-line fashion to using multiple visualization techniques to view, analyze, and compare the data from the results. A significant contribution of this research effort is the extension and combination of existing research efforts into the visualization of multiple-data sets to create new and more flexible techniques. We specifically address visualization issues regarding clarity, speed, and interactivity. The developed visualization tools have also led recently to the visualization querying paradigm and challenge highlighted herein.  相似文献   

20.
Spatial data sets are analysed in many scientific disciplines. Kriging, i.e. minimum mean squared error linear prediction, is probably the most widely used method of spatial prediction. Computation time and memory requirement can be an obstacle for kriging for data sets with many observations. Calculations are accelerated and memory requirements decreased by using a Gaussian Markov random field on a lattice as an approximation of a Gaussian field. The algorithms are well suited also for nonlattice data when exploiting a bilinear interpolation at nonlattice locations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号