首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Current information visualization techniques assume unrestricted access to data. However, privacy protection is a key issue for a lot of real-world data analyses. Corporate data, medical records, etc. are rich in analytical value but cannot be shared without first going through a transformation step where explicit identifiers are removed and the data is sanitized. Researchers in the field of data mining have proposed different techniques over the years for privacy-preserving data publishing and subsequent mining techniques on such sanitized data. A well-known drawback in these methods is that for even a small guarantee of privacy, the utility of the datasets is greatly reduced. In this paper, we propose an adaptive technique for privacy preservation in parallel coordinates. Based on knowledge about the sensitivity of the data, we compute a clustered representation on the fly, which allows the user to explore the data without breaching privacy. Through the use of screen-space privacy metrics, the technique adapts to the user's screen parameters and interaction. We demonstrate our method in a case study and discuss potential attack scenarios.  相似文献   

2.
Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time‐series visualizations, parallel coordinates, link‐node diagrams, and phase‐space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2×2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi‐resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line‐based visualizations. We demonstrate this for parallel coordinates, a time‐series visualization, and a phase‐space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image‐based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method.  相似文献   

3.
The inconceivable ability and common practice to collect personal data as well as the power of data‐driven approaches to businesses, services and security nowadays also introduce significant privacy issues. There have been extensive studies on addressing privacy preserving problems in the data mining community but relatively few have provided supervised control over the anonymization process. Preserving both the value and privacy of the data is largely a non‐trivial task. We present the design and evaluation of a visual interface that assists users in employing commonly used data anonymization techniques for making privacy preserving visualizations. Specifically, we focus on event sequence data due to its vulnerability to privacy concerns. Our interface is designed for data owners to examine potential privacy issues, obfuscate information as suggested by the algorithm and fine‐tune the results per their discretion. Multiple use case scenarios demonstrate the utility of our design. A user study similarly investigates the effectiveness of the privacy preserving strategies. Our results show that using a visual‐based interface is effective for identifying potential privacy issues, for revealing underlying anonymization processes, and for allowing users to balance between data utility and privacy.  相似文献   

4.
The publication of microdata is pivotal for medical research purposes, data analysis and data mining. These published data contain a substantial amount of sensitive information, for example, a hospital may publish many sensitive attributes such as diseases, treatments and symptoms. The release of multiple sensitive attributes is not desirable because it puts the privacy of individuals at risk. The main vulnerability of such approach while releasing data is that if an adversary is successful in identifying a single sensitive attribute, then other sensitive attributes can be identified by co-relation. A whole variety of techniques such as SLOMS, SLAMSA and others already exist for the anonymization of multiple sensitive attributes; however, these techniques have their drawbacks when it comes to preserving privacy and ensuring data utility. The extant framework lacks in terms of preserving privacy for multiple sensitive attributes and ensuring data utility. We propose an efficient approach (p, k)-Angelization for the anonymization of multiple sensitive attributes. Our proposed approach protects the privacy of the individuals and yields promising results compared with currently used techniques in terms of utility. The (p, k)-Angelization approach not only preserves the privacy by eliminating the threat of background join and non-membership attacks but also reduces the information loss thus improving the utility of the released information.  相似文献   

5.
Clickstreams are visitors' paths through a Web site. Analysis of clickstreams shows how a Web site is navigated and used by its visitors. Clickstream data of online stores contains information useful for understanding the effectiveness of marketing and merchandising efforts, such as how customers find the store, what products they see, and what products they purchase. In this paper, we present an interactive visualization system that provides users with greater abilities to interpret and explore clickstream data of online stores. This system visualizes the effectiveness of Web merchandising from two different points of view by using two different visualization techniques: visualization of sessions by using parallel coordinates and visualization of product performance by using starfield graphs. Furthermore, this system provides facilities for zooming, filtering, color-coding, dynamic querying and data sampling. It also provides summary information along with visualizations, and by maintaining a connection between visualizations and the source database, it dynamically updates the summary information. To demonstrate how the presented visualization system provides capabilities for examining online store clickstreams, we present a series of parallel coordinates and starfield visualizations that display clickstream data from an operating online retail store. A framework for understanding Web merchandising is briefly explained. A set of metrics referred to as micro-conversion rates, which are defined for Web merchandising analysis in our previous work (Lee et al., Electronic Markets, 2000), is also explained and used for the visualizations of online store effectiveness.  相似文献   

6.
This paper presents new techniques for seamlessly transitioning between parallel coordinate plots, star plots, and scatter plots. The star plot serves as a mediator visualization between parallel coordinate plots and scatter plots since it uses lines to represent data items as parallel coordinates do and can arrange axes orthogonally as used for scatter plots. The design of the transitions also motivated a new variant of the star plot, the polycurve star plot, that uses curved lines instead of straight ones and has advantages both in terms of space utilization and the detection of clusters. Furthermore, we developed a geometrically motivated method to embed scatter points from a scatter plot into star plots and parallel coordinate plots to track the transition of structural information such as clusters and correlations between the different plot types. The integration of our techniques into an interactive analysis tool for exploring multivariate data demonstrates the advantages and utility of our approach over a multi-view approach for scatter plots and parallel coordinate plots, which we confirmed in a user study and concrete usage scenarios.  相似文献   

7.
In this paper, we present a systematization of techniques that use quality metrics to help in the visual exploration of meaningful patterns in high-dimensional data. In a number of recent papers, different quality metrics are proposed to automate the demanding search through large spaces of alternative visualizations (e.g., alternative projections or ordering), allowing the user to concentrate on the most promising visualizations suggested by the quality metrics. Over the last decade, this approach has witnessed a remarkable development but few reflections exist on how these methods are related to each other and how the approach can be developed further. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose a systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research.  相似文献   

8.
Random-data perturbation techniques and privacy-preserving data mining   总被引:2,自引:4,他引:2  
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.  相似文献   

9.
Accelerometer-based activity recognition (AAR) attracted a lot of attentions due to the wide spread of smartphones with energy-efficiency. However, since accelerometer data contains individual characteristics; AAR might raise privacy concerns. Although numerous privacy preservation approaches, such as ”privacy filtering, differential privacy, and inferential privacy”, have been proposed to conceal sensitive information, unfortunately they cannot address the privacy problem associated with AAR. In this paper, we report our efforts to control the use of the AAR while preserving the privacy. To achieve this task, our method leverages a connection to agglomerative information bottleneck, through which the amount of disclosed data can be compressed so that irrelevant private information can be reduced, and a connection to general privacy statistical inference framework, where both of the privacy leakage and utility accuracy are considered as mutual information. Our experimental results have shown that the proposed solution can greatly reduce privacy leakage while maintaining a relative good utility.  相似文献   

10.
Medical illustrations have been used for a long time for teaching and communicating information for diagnosis or surgery planning. Illustrative visualization systems create methods and tools that adapt traditional illustration techniques to enhance the result of renderings. Clipping the volume is a popular operation in volume rendering for inspecting the inner parts, though it may remove some information of the context that is worth preserving. In this paper we present a new editing technique based on the use of clipping planes, direct structure extrusion, and illustrative methods, which preserves the context by adapting the extruded region to the structures of interest of the volumetric model. We will show that users may interactively modify the clipping plane and edit the structures to highlight, in order to easily create the desired result. Our approach works with segmented volume models and non‐segmented ones. In the last case, a local segmentation is performed on‐the‐fly. We will demonstrate the efficiency and utility of our method.  相似文献   

11.
Privacy preserving clustering on horizontally partitioned data   总被引:3,自引:0,他引:3  
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.  相似文献   

12.
Visual analytics of multidimensional multivariate data is a challenging task because of the difficulty in understanding metrics in attribute spaces with more than three dimensions. Frequently, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records with similar attribute values. A large number of (typically hierarchical) clustering algorithms have been developed to group individual records to clusters of statistical significance. However, only few visualization techniques exist for further exploring and understanding the clustering results. We propose visualization and interaction methods for analyzing individual clusters as well as cluster distribution within and across levels in the cluster hierarchy. We also provide a clustering method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional multivariate space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. To visually represent the cluster hierarchy, we present a 2D radial layout that supports an intuitive understanding of the distribution structure of the multidimensional multivariate data set. Individual clusters can be explored interactively using parallel coordinates when being selected in the cluster tree. Furthermore, we integrate circular parallel coordinates into the radial hierarchical cluster tree layout, which allows for the analysis of the overall cluster distribution. This visual representation supports the comprehension of the relations between clusters and the original attributes. The combination of the 2D radial layout and the circular parallel coordinates is used to overcome the overplotting problem of parallel coordinates when looking into data sets with many records. We apply an automatic coloring scheme based on the 2D radial layout of the hierarchical cluster tree encoding hue, saturation, and value of the HSV color space. The colors support linking the 2D radial layout to other views such as the standard parallel coordinates or, in case data is obtained from multidimensional spatial data, the distribution in object space.  相似文献   

13.
Multi‐dimensional data originate from many different sources and are relevant for many applications. One specific sub‐type of such data is continuous trajectory data in multi‐dimensional state spaces of complex systems. We adapt the concept of spatially continuous scatterplots and spatially continuous parallel coordinate plots to such trajectory data, leading to continuous‐time scatterplots and continuous‐time parallel coordinates. Together with a temporal heat map representation, we design coordinated views for visual analysis and interactive exploration. We demonstrate the usefulness of our visualization approach for three case studies that cover examples of complex dynamic systems: cyber‐physical systems consisting of heterogeneous sensors and actuators networks (the collection of time‐dependent sensor network data of an exemplary smart home environment), the dynamics of robot arm movement and motion characteristics of humanoids.  相似文献   

14.
Studying transformation in a chemical system by considering its energy as a function of coordinates of the system's components provides insight and changes our understanding of this process. Currently, a lack of effective visualization techniques for high‐dimensional energy functions limits chemists to plot energy with respect to one or two coordinates at a time. In some complex systems, developing a comprehensive understanding requires new visualization techniques that show relationships between all coordinates at the same time. We propose a new visualization technique that combines concepts from topological analysis, multi‐dimensional scaling, and graph layout to enable the analysis of energy functions for a wide range of molecular structures. We demonstrate our technique by studying the energy function of a dimer of formic and acetic acids and a LTA zeolite structure, in which we consider diffusion of methane.  相似文献   

15.
Color, as one of the most effective visual variables, is used in many techniques to encode and group data points according to different features. Relations between features and groups appear as visual patterns in the visualization. However, optical illusions may bias the perception at the first level of the analysis process. For instance, in pixel‐based visualizations contrast effects make pixels appear brighter if surrounded by a darker area, which distorts the encoded metric quantity of the data points. Even if we are aware of these perceptual issues, our visual cognition system is not able to compensate these effects accurately. To overcome this limitation, we present a color optimization algorithm based on perceptual metrics and color perception models to reduce physiological contrast or color effects. We evaluate our technique with a user study and find that the technique doubles the accuracy of users comparing and estimating color encoded data values. Since the presented technique can be used in any application without adaption to the visualization itself, we are able to demonstrate its effectiveness on data visualizations in different domains.  相似文献   

16.
The execution performance of an information gathering plan can suffer significantly due to remote I/O latencies. A streaming dataflow model of execution addresses the problem to some extent, exploiting all natural opportunities for parallel execution, as allowed by the data dependencies in a plan. Unfortunately, plans that integrate information from multiple sources often use the results of one operation as the basis for forming queries to a subsequent operation. Such cases require sequential execution, an inefficiency that can erase prior gains made through techniques like streaming dataflow. To address this problem, we present a technique called speculative plan execution, an out-of-order method that capitalizes on knowledge gained from prior executions as a means for overcoming remaining data dependencies between plan operators. Our approach inserts additional plan operators that generate and confirm speculative results, while preserving the safety and fairness of overall execution. To increase the utility of speculative execution, we propose a method of value prediction that combines caching with the more effective and space-efficient techniques of classification and transduction. We present experimental results that demonstrate how the performance of information gathering plans can benefit from speculative execution and how its overall utility can be increased through our hybrid method of value prediction.  相似文献   

17.
The smart meter is a basic device of the smart grid, which improves the efficiency of the power grid and brings a lot of convenience for the industry and people’s daily life. However, real-time power consumption data contain some sensitive information, and could disclose the privacy of users. As an immunological technique, the negative survey is proposed to preserve the privacy of static data. In this paper, firstly, we demonstrate that traditional negative survey might disclose the privacy of users when it is used to collect time-series data. Secondly, we propose an improved negative survey method for collecting the time-series data. Thirdly, for the first time, we apply the negative survey to preserving the privacy of the power consumption data aggregated from smart meters. Theoretical analysis and experimental results demonstrate that the method proposed in this paper could aggregate the power consumption data while preserving the privacy of users. Compared with existing techniques, our method is simple and efficient, and does not need a trusted third party. Moreover, it could tolerate the failure of some users and resist differential attack.  相似文献   

18.
Illustrative parallel coordinates (IPC) is a suite of artistic rendering techniques for augmenting and improving parallel coordinate (PC) visualizations. IPC techniques can be used to convey a large amount of information about a multidimensional dataset in a small area of the screen through the following approaches: (a) edge‐bundling through splines; (b) visualization of “branched ” clusters to reveal the distribution of the data; (c) opacity‐based hints to show cluster density; (d) opacity and shading effects to illustrate local line density on the parallel axes; and (e) silhouettes, shadows and halos to help the eye distinguish between overlapping clusters. Thus, the primary goal of this work is to convey as much information as possible in a manner that is aesthetically pleasing and easy to understand for non‐experts.  相似文献   

19.
K-anonymisation is an approach to protecting individuals from being identified from data.Good k-anonymisations should retain data utility and preserve privacy,but few methods have considered these two conflicting requirements together. In this paper,we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness.We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters.Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.  相似文献   

20.
This survey gives an overview of the current state of the art in GPU techniques for interactive large‐scale volume visualization. Modern techniques in this field have brought about a sea change in how interactive visualization and analysis of giga‐, tera‐ and petabytes of volume data can be enabled on GPUs. In addition to combining the parallel processing power of GPUs with out‐of‐core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e. ‘output‐sensitive’ algorithms and system designs. This leads to recent output‐sensitive approaches that are ‘ray‐guided’, ‘visualization‐driven’ or ‘display‐aware’. In this survey, we focus on these characteristics and propose a new categorization of GPU‐based large‐scale volume visualization techniques based on the notions of actual output‐resolution visibility and the current working set of volume bricks—the current subset of data that is minimally required to produce an output image of the desired display resolution. Furthermore, we discuss the differences and similarities of different rendering and data traversal strategies in volume rendering by putting them into a common context—the notion of address translation. For our purposes here, we view parallel (distributed) visualization using clusters as an orthogonal set of techniques that we do not discuss in detail but that can be used in conjunction with what we present in this survey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号