首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
HD-Eye: visual mining of high-dimensional data   总被引:3,自引:0,他引:3  
Clustering in high-dimensional databases poses an important problem. However, we can apply a number of different clustering algorithms to high-dimensional data. The authors consider how an advanced clustering algorithm combined with new visualization methods interactively clusters data more effectively. Experiments show these techniques improve the data mining process  相似文献   

2.
‘Mobile payments are growing at a great pace in the UAE. The latest technological advancements have made it easy to provide consumers with speedy, cheap, and convenient mobile payment services. However, the success of these services does not rely solely on these technical factors. Mobile payment services will only continue growing so rapidly, if the users of these services are legally well protected. This article seeks to test the adequacy of the current legal framework for mobile payments in the UAE with regard to consumer protection. It focuses in particular on issues of oversight and governance of the mobile payment providers, protection of consumer’s data, protection against unauthorized charges, and availability of viable dispute resolution for the consumers. Legislative lacunae are highlighted and solutions for improvement are proposed.’  相似文献   

3.
The cocktail party problem, i.e., tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously, is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition (ASR) systems. In this overview paper, we review the techniques proposed in the last two decades in attacking this problem. We focus our discussions on the speech separation problem given its central role in the cocktail party environment, and describe the conventional single-channel techniques such as computational auditory scene analysis (CASA), non-negative matrix factorization (NMF) and generative models, the conventional multi-channel techniques such as beamforming and multi-channel blind source separation, and the newly developed deep learning-based techniques, such as deep clustering (DPCL), the deep attractor network (DANet), and permutation invariant training (PIT). We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment. We argue effectively exploiting information in the microphone array, the acoustic training set, and the language itself using a more powerful model. Better optimization objective and techniques will be the approach to solving the cocktail party problem.  相似文献   

4.
数据聚类的可视分析方法利用可视化与交互技术帮助用户对聚类过程与结果进行 多角度分析,从而发现数据内部隐藏的结构和关系。但由于高维数据自身的“维度诅咒”问题 使得聚类分析面临着许多挑战,例如模型参数设定、数据特征捕捉、结果解释以及可视化展现 等。本文从高维数据聚类过程中遇到的问题出发,首先总结了高维数据聚类过程中常用的数据 处理方法并对其性能进行了比较,这些方法能够较好地解决“维度诅咒”问题,帮助用户挖掘 数据中存在的聚类模式。在分析和理解不同聚类结果中包含的数据内部结构和规律时,由于前 期采取的数据处理方法不同,因此需要采取不同的探索分析策略,所以本文将近10 年来高维数 据聚类的可视分析方法分为2 大类进行总结,即基于降维的聚类可视分析方法和基于子空间聚 类的可视分析方法。最后对该领域目前存在的机遇与挑战进行了讨论。  相似文献   

5.
Qian  Yan-min  Weng  Chao  Chang  Xuan-kai  Wang  Shuai  Yu  Dong 《浙江大学学报:C卷英文版》2019,20(3):438-438
Frontiers of Information Technology & Electronic Engineering - In the original version of this article, there is a mistake about the result of DPCL++ (Isik et al., 2016) in Section 5.6 (Fig....  相似文献   

6.
7.
During the last decade, the deluge of multimedia data has impacted a wide range of research areas, including multimedia retrieval, 3D tracking, database management, data mining, machine learning, social media analysis, medical imaging, and so on. Machine learning is largely involved in multimedia applications of building models for classification and regression tasks, etc., and the learning principle consists in designing the models based on the information contained in the multimedia dataset. While many paradigms exist and are widely used in the context of machine learning, most of them suffer from the ‘curse of dimensionality’, which means that some strange phenomena appears when data are represented in a high-dimensional space. Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis. To deal with the impact of high dimensionality, an intuitive way is to reduce the dimensionality. On the other hand, some researchers devoted themselves to designing some effective learning schemes for high-dimensional data. In this survey, we cover feature transformation, feature selection and feature encoding, three approaches fighting the consequences of the curse of dimensionality. Next, we briefly introduce some recent progress of effective learning algorithms. Finally, promising future trends on multimedia learning are envisaged.  相似文献   

8.
In many application fields, data analysts have to deal with datasets that contain many expressions per item. The effective analysis of such multivariate datasets is dependent on the user's ability to understand both the intrinsic dimensionality of the dataset as well as the distribution of the dependent values with respect to the dimensions. In this paper, we propose a visualization model that enables the joint interactive visual analysis of multivariate datasets with respect to their dimensions as well as with respect to the actual data values. We describe a dual setting of visualization and interaction in items space and in dimensions space. The visualization of items is linked to the visualization of dimensions with brushing and focus+context visualization. With this approach, the user is able to jointly study the structure of the dimensions space as well as the distribution of data items with respect to the dimensions. Even though the proposed visualization model is general, we demonstrate its application in the context of a DNA microarray data analysis.  相似文献   

9.
In Toponomics, the function protein pattern in cells or tissue (the toponome) is imaged and analyzed for applications in toxicology, new drug development and patient-drug-interaction. The most advanced imaging technique is robot-driven multi-parameter fluorescence microscopy. This technique is capable of co-mapping hundreds of proteins and their distribution and assembly in protein clusters across a cell or tissue sample by running cycles of fluorescence tagging with monoclonal antibodies or other affinity reagents, imaging, and bleaching in situ. The imaging results in complex multi-parameter data composed of one slice or a 3D volume per affinity reagent. Biologists are particularly interested in the localization of co-occurring proteins, the frequency of co-occurrence and the distribution of co-occurring proteins across the cell. We present an interactive visual analysis approach for the evaluation of multi-parameter fluorescence microscopy data in toponomics. Multiple, linked views facilitate the definition of features by brushing multiple dimensions. The feature specification result is linked to all views establishing a focus+context visualization in 3D. In a new attribute view, we integrate techniques from graph visualization. Each node in the graph represents an affinity reagent while each edge represents two co-occurring affinity reagent bindings. The graph visualization is enhanced by glyphs which encode specific properties of the binding. The graph view is equipped with brushing facilities. By brushing in the spatial and attribute domain, the biologist achieves a better understanding of the function protein patterns of a cell. Furthermore, an interactive table view is integrated which summarizes unique fluorescence patterns. We discuss our approach with respect to a cell probe containing lymphocytes and a prostate tissue section.  相似文献   

10.
Clustering is the task of classifying patterns or observations into clusters or groups. Generally, clustering in high-dimensional feature spaces has a lot of complications such as: the unidentified or unknown data shape which is typically non-Gaussian and follows different distributions; the unknown number of clusters in the case of unsupervised learning; and the existence of noisy, redundant, or uninformative features which normally compromise modeling capabilities and speed. Therefore, high-dimensional data clustering has been a subject of extensive research in data mining, pattern recognition, image processing, computer vision, and other areas for several decades. However, most of existing researches tackle one or two problems at a time which is unrealistic because all problems are connected and should be tackled simultaneously. Thus, in this paper, we propose two novel inference frameworks for unsupervised non-Gaussian feature selection, in the context of finite asymmetric generalized Gaussian (AGG) mixture-based clustering. The choice of the AGG distribution is mainly due to its ability not only to approximate a large class of statistical distributions (e.g. impulsive, Laplacian, Gaussian and uniform distributions) but also to include the asymmetry. In addition, the two frameworks simultaneously perform model parameters estimation as well as model complexity (i.e., both model and feature selection) determination in the same step. This was done by incorporating a minimum message length (MML) penalty in the model learning step and by fading out the redundant densities in the mixture using the rival penalized EM (RPEM) algorithm, for first and second frameworks, respectively. Furthermore, for both algorithms, we tackle the problem of noisy and uninformative features by determining a set of relevant features for each data cluster. The efficiencies of the proposed algorithms are validated by applying them to real challenging problems namely action and facial expression recognition.  相似文献   

11.
The so far developed and widely utilized connectionist systems (artificial neural networks) are mainly based on a single brain-like connectionist principle of information processing, where learning and information exchange occur in the connections. This paper extends this paradigm of connectionist systems to a new trend—integrative connectionist learning systems (ICOS) that integrate in their structure and learning algorithms principles from different hierarchical levels of information processing in the brain, including neuronal-, genetic-, quantum. Spiking neural networks (SNN) are used as a basic connectionist learning model which is further extended with other information learning principles to create different ICOS. For example, evolving SNN for multitask learning are presented and illustrated on a case study of person authentification based on multimodal auditory and visual information. Integrative gene-SNN are presented, where gene interactions are included in the functioning of a spiking neuron. They are applied on a case study of computational neurogenetic modeling. Integrative quantum-SNN are introduced with a quantum Hebbian learning, where input features as well as information spikes are represented by quantum bits that result in exponentially faster feature selection and model learning. ICOS can be used to solve more efficiently challenging biological and engineering problems when fast adaptive learning systems are needed to incrementally learn in a large dimensional space. They can also help to better understand complex information processes in the brain especially how information processes at different information levels interact. Open questions, challenges and directions for further research are presented.  相似文献   

12.
Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.  相似文献   

13.
The shortest path between two concepts in a taxonomic ontology is commonly used to represent the semantic distance between concepts in edge-based semantic similarity measures. In the past, edge counting, which is simple and intuitive and has low computational complexity, was considered the default method for path computation. However, a large lexical taxonomy, such as WordNet, has irregular link densities between concepts due to its broad domain, but edge counting-based path computation is powerless for this non-uniformity problem. In this paper, we advocate that the path computation can be separated from edge-based similarity measures and can form various general computing models. Therefore, to solve the problem of the non-uniformity of concept density in a large taxonomic ontology, we propose a new path computing model based on the compensation of local area density of concepts, which is equal to the number of direct hyponyms of the subsumers for concepts in the shortest path. This path model considers the local area density of concepts as an extension of the edge counting-based path according to the information theory. This model is a general path computing model and can be applied in various edge-based similarity approaches. The experimental results show that the proposed path model improves the average optimal correlation between edge-based measures and human judgments on the Miller and Charles benchmark for WordNet from less than 0.79 to more than 0.86, on the Pedersenet al. benchmark (average of both Physician and Coder) for SNOMED-CT from less than 0.75 to more than 0.82, and it has a large advantage in efficiency compared with information content computation in a dynamic ontology, thereby successfully improving the edge-based similarity measure as an excellent method with high performance and high efficiency.  相似文献   

14.
This paper looks beyond the existing United States Government Open Systems Interconnection Profile (GOSIP) toward several important challenges to be met in the years ahead. The first challenge is creating effective, economical, and technically credible test policies and procedures for GOSIP. The second challenge is stimulating the strategic and tactical planning within Federal Agencies necessary to implement the provisions of GOSIP. The third challenge is adding functions to later versions of GOSIP to provide directory services, dynamic routing, security, transaction processing, and electronic data interchange. The fourth challenge is fostering and successfully pursuing international collaboration in functional standards, procurement profiles, and testing. Beyond these four challenges lies the next horizon - integrated, interoperable network management.  相似文献   

15.
Zhang  Jiao  Huang  Tao  Wang  Shuo  Liu  Yun-jie 《浙江大学学报:C卷英文版》2019,20(9):1185-1194
Frontiers of Information Technology & Electronic Engineering - Traditional networks face many challenges due to the diversity of applications, such as cloud computing, Internet of Things, and...  相似文献   

16.
17.
Grid benchmarking is an important and challenging topic of Grid computing research. In this paper, we present an overview of the key challenges that need to be addressed for the integration of benchmarking practices, techniques, and tools in emerging Grid computing infrastructures. We discuss the problems of performance representation, measurement, and interpretation in the context of Grid benchmarking, and propose the use of ontologies for organizing and describing benchmarking metrics. Finally, we present a survey of ongoing research efforts that develop benchmarks and benchmarking tools for the Grid. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

18.
Improvements to the reliability and safety of medical devices are vital for healthcare systems. It is necessary to consider the user experience (UX) of patients and healthcare professionals at all the development stages of medical devices. Ergonomic design principles can also reduce the cost of medical device production. This study is a multilateral analytical literature review of UX and usability issues in healthcare and medical device design. The total number of literature sources presented is n = 88. The literature sources are classified according to the difference between UX and usability for various target products and services, including healthcare. The literature is reviewed with a focus on human-oriented aspects. This includes medical technology and device design, which are dependent on the user type, medical device scope, and area of use. A review of key medical device standards and regulation documents is presented. The main analysis methodologies for the UX and their advantages and disadvantages are discussed. The future challenges in healthcare and medical ergonomics are briefly highlighted. Above all, this study tried to examine the difference between usability and UX of general products and those of medical devices through the review of existing literature. Even standards do not reflect this well and need to be considered based on further research in academia and industry.Relevance to industryThe obtained results will help medical-device designers and healthcare professionals understand the main medical-research trends and improve the design process. Additionally, they will be helpful for increasing the satisfaction level among medical-device users and reducing user risks.  相似文献   

19.
20.
High-dimensional data visualization is a more complex process than the ordinary dimensionality reduction to two or three dimensions. Therefore, we propose and evaluate a novel four-step visualization approach that is built upon the combination of three components: metric learning, intrinsic dimensionality estimation, and feature extraction. Although many successful applications of dimensionality reduction techniques for visualization are known, we believe that the sophisticated nature of high-dimensional data often needs a combination of several machine learning methods to solve the task. Here, this is provided by a novel framework and experiments with real-world data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号