首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Distributed data mining: a survey   总被引:1,自引:1,他引:0  
Most data mining approaches assume that the data can be provided from a single source. If data was produced from many physically distributed locations like Wal-Mart, these methods require a data center which gathers data from distributed locations. Sometimes, transmitting large amounts of data to a data center is expensive and even impractical. Therefore, distributed and parallel data mining algorithms were developed to solve this problem. In this paper, we survey the-state-of-the-art algorithms and applications in distributed data mining and discuss the future research opportunities.  相似文献   

2.
Multimedia Tools and Applications - Image pattern recognition in the field of big data has gained increasing importance and attention from researchers and practitioners in many domains of science...  相似文献   

3.
With the recent development of deep learning technology comes the wide use of artificial intelligence (AI) models in various domains. AI shows good performance for definite-purpose tasks, such as image recognition and text classification. The recognition performance for every single task has become more accurate than feature engineering, enabling more work that could not be done before. In addition, with the development of generation technology (e.g., GPT-3), AI models are showing stable performances in each recognition and generation task. However, not many studies have focused on how to integrate these models efficiently to achieve comprehensive human interaction. Each model grows in size with improved performance, thereby consequently requiring more computing power and more complicated designs to train than before. This requirement increases the complexity of each model and requires more paired data, making model integration difficult. This study provides a survey on visual language integration with a hierarchical approach for reviewing the recent trends that have already been performed on AI models among research communities as the interaction component. We also compare herein the strengths of existing AI models and integration approaches and the limitations they face. Furthermore, we discuss the current related issues and which research is needed for visual language integration. More specifically, we identify four aspects of visual language integration models: multimodal learning, multi-task learning, end-to-end learning, and embodiment for embodied visual language interaction. Finally, we discuss some current open issues and challenges and conclude our survey by giving possible future directions.  相似文献   

4.
From visual data exploration to visual data mining: a survey   总被引:8,自引:0,他引:8  
We survey work on the different uses of graphical mapping and interaction techniques for visual data mining of large data sets represented as table data. Basic terminology related to data mining, data sets, and visualization is introduced. Previous work on information visualization is reviewed in light of different categorizations of techniques and systems. The role of interaction techniques is discussed, in addition to work addressing the question of selecting and evaluating visualization techniques. We review some representative work on the use of information visualization techniques in the context of mining data. This includes both visual data exploration and visually expressing the outcome of specific mining algorithms. We also review recent innovative approaches that attempt to integrate visualization into the DM/KDD process, using it to enhance user interaction and comprehension.  相似文献   

5.
In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.  相似文献   

6.
Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.  相似文献   

7.
基于Web的数据挖掘研究综述   总被引:4,自引:0,他引:4  
基于Web数据挖掘是一个结合了数据挖掘和WWW的热门研究主题。文章介绍了Web数据挖掘最流行的分类;Web内容挖掘,Web结构挖掘和Web使用记录挖掘,根据Web数据挖掘的最近研究状况,总结了几个研究热点,并介绍了一个Web使用记录挖掘的框架WebSIFT.  相似文献   

8.
Multimedia Tools and Applications - Rating a video based on its content is one of the most important solutions to classify videos for audience age groups. In this regard, Film content rating and TV...  相似文献   

9.
Support Vector Machine, an optimization technique, is well known in the data mining community. In fact, many other optimization techniques have been effectively used in dealing with data separation and analysis. For the last 10 years, the author and his colleagues have proposed and extended a series of optimization-based classification models via Multiple Criteria Linear Programming (MCLP) and Multiple Criteria Quadratic Programming (MCQP). These methods are different from statistics, decision tree induction, and neural networks. The purpose of this paper is to review the basic concepts and frameworks of these methods and promote the research interests in the data mining community. According to the evolution of multiple criteria programming, the paper starts with the bases of MCLP. Then, it further discusses penalized MCLP, MCQP, Multiple Criteria Fuzzy Linear Programming (MCFLP), Multi-Class Multiple Criteria Programming (MCMCP), and the kernel-based Multiple Criteria Linear Program, as well as MCLP-based regression. This paper also outlines several applications of Multiple Criteria optimization-based data mining methods, such as Credit Card Risk Analysis, Classification of HIV-1 Mediated Neuronal Dendritic and Synaptic Damage, Network Intrusion Detection, Firm Bankruptcy Prediction, and VIP E-Mail Behavior Analysis.  相似文献   

10.
Spatiotemporal aggregate computation: a survey   总被引:3,自引:0,他引:3  
Spatiotemporal databases are becoming increasingly more common. Typically, applications modeling spatiotemporal objects need to process vast amounts of data. In such cases, generating aggregate information from the data set is more useful than individually analyzing every entry. In this paper, we study the most relevant techniques for the evaluation of aggregate queries on spatial, temporal, and spatiotemporal data. We also present a model that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied. This model gives us a framework that allows us to analyze and compare the different existing techniques for the evaluation of aggregate queries. At the same time, it allows us to identify opportunities for research on types of aggregate queries that have not been studied.  相似文献   

11.
12.
Editorial survey: swarm intelligence for data mining   总被引:1,自引:0,他引:1  
This paper surveys the intersection of two fascinating and increasingly popular domains: swarm intelligence and data mining. Whereas data mining has been a popular academic topic for decades, swarm intelligence is a relatively new subfield of artificial intelligence which studies the emergent collective intelligence of groups of simple agents. It is based on social behavior that can be observed in nature, such as ant colonies, flocks of birds, fish schools and bee hives, where a number of individuals with limited capabilities are able to come to intelligent solutions for complex problems. In recent years the swarm intelligence paradigm has received widespread attention in research, mainly as Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO). These are also the most popular swarm intelligence metaheuristics for data mining. In addition to an overview of these nature inspired computing methodologies, we discuss popular data mining techniques based on these principles and schematically list the main differences in our literature tables. Further, we provide a unifying framework that categorizes the swarm intelligence based data mining algorithms into two approaches: effective search and data organizing. Finally, we list interesting issues for future research, hereby identifying methodological gaps in current research as well as mapping opportunities provided by swarm intelligence to current challenges within data mining research.  相似文献   

13.

针对流数据中概念漂移发生后,在线学习模型不能对分布变化后的数据做出及时响应且难以提取数据分布的最新信息,导致学习模型收敛较慢的问题,提出一种基于在线集成的概念漂移自适应分类方法(adaptive classification method for concept drift based on online ensemble,AC_OE). 一方面,该方法利用在线集成策略构建在线集成学习器,对数据块中的训练样本进行局部预测以动态调整学习器权重,有助于深入提取漂移位点附近流数据的演化信息,对数据分布变化进行精准响应,提升在线学习模型对概念漂移发生后新数据分布的适应能力,提高学习模型的实时泛化性能;另一方面,利用增量学习策略构建增量学习器,并随新样本的进入进行增量式的训练更新,提取流数据的全局分布信息,使模型在平稳的流数据状态下保持较好的鲁棒性. 实验结果表明,该方法能够对概念漂移做出及时响应并加速在线学习模型的收敛速度,同时有效提高学习器的整体泛化性能.

  相似文献   

14.
基于数据挖掘的人口数据预测模型综述   总被引:3,自引:1,他引:2       下载免费PDF全文
论文调查了国内外基于数据挖掘技术的人口数据预测模型。根据预测目的不同对这些模型进行了分类比较,在此基础上综合各模型的优缺点,对今后的研究工作做了进一步展望。  相似文献   

15.
Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplar works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research.  相似文献   

16.
This paper introduces a software tool named KEEL which is a software tool to assess evolutionary algorithms for Data Mining problems of various kinds including as regression, classification, unsupervised learning, etc. It includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL, as well as the integration of evolutionary learning techniques with different pre-processing techniques, allowing it to perform a complete analysis of any learning model in comparison to existing software tools. Moreover, KEEL has been designed with a double goal: research and educational. Supported by the Spanish Ministry of Science and Technology under Projects TIN-2005-08386-C05-(01, 02, 03, 04 and 05). The work of Dr. Bacardit is also supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant GR/T07534/01.  相似文献   

17.
In this paper we present some initial results of a project which uses data-mining techniques to search for evidence of massive compact halo objects (MACHOs) from very large time series database. MACHOs are the proposed materials that probably make the “dark matter” surrounding our own and other galaxies. It was suggested that MACHOs may be detected through the gravitational microlensing effect which can be identified from the light curves of background stars. The objective of this project is two-fold, namely, (i) identification of new classes of variable stars and (ii) detection of microlensing events. In this paper, we present the major characteristics of the time series astronomical data, data preprocessing techniques to process these time series, and some domain-specific techniques to separate candidate variable stars from the nonvariant ones. We discuss the use of the Fourier model to represent the time series and the k-means based clustering method to classify variable stars.  相似文献   

18.
本文着重分析了航天探测信息系统建设的现状与成就,指出了当前航空物探数据管理的一些问题,提出了加大航天航空探测数据库建设。实现数据入库、检查和查询三大功能。本文针对其查询较为繁琐的问题引入了数据挖掘这一思想,使数据查询和使用更加的高效和便捷,进一步完善了我航空航天数据库系统的建设。  相似文献   

19.
网络信息技术的高速发展产生了新的数据模型,即数据流模型,并且越来越多的领域出现了对数据流实时处理的需求,庞大且高速的数据以及应用场景的实时性需求均推进了数据流挖掘技术的发展。首先介绍了常见的数据流模型;然后根据数据流模型的特点总结数据流挖掘的支撑技术;最后,分析了分布式数据流挖掘的重要性和有效性,给出了算法并行化的数学模型,并介绍了几种具有代表性的分布式数据流处理系统。  相似文献   

20.
Applied Intelligence - The goal of this survey is to review the state-of-the art Reversible Data Hiding (RDH) methods, classify these methods into different classes, and list out new trends in this...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号