共查询到20条相似文献,搜索用时 78 毫秒
1.
以实现多种形态高维数据流的高效、精确并行计算为出发点,提出基于粒度理论的高维数据流并行计算方法.使用基于动态粒度的数据流挖掘模型,高效挖掘高维数据流;利用基于局部保持投影原理和主成分分析原理压制高维数据流噪声,减少高维数据流噪声隐患;依据降噪后不同高维数据流特点,采用高维数据流相关性分析并行计算方法,得到高维数据的皮尔逊积差相关系数,实现数据流关联,并基于数据流十字转门模型,定义适合高维数据流分析的滑动数据流窗口模式,实现高维数据流的并行计算.实验结果验证,上述方法挖掘高维数据流的内存消耗低,高维数据流数据去噪能力强,具备较高的高维数据流并行计算精度,且并行计算效率高. 相似文献
2.
3.
4.
数据流图可视化编辑工具SECAI是适用于传统的面向数据流需求分析的一种工具软件. SECAI是为了满足需求分析时绘制数据流图和编写数据字典的需求,因此它的主要功能包括:绘制分层数据流图、保证数据流的一致性以及记录外部实体的信息、记录加工的信息、记录数据流的信息、记录数据存储(文件)的信息并根据以上信息自动编写数据字典.文中使用UML描述了数据流图可视化编辑工具的设计方法,设计了数据流一致性保持算法并给出了运行结果 相似文献
5.
数据流作为一种新的数据形态,不同于传统的静态数据,具有连续快速、短暂易逝和不可预测的特点,对其进行有效地分析和挖掘遇到了极大的挑战。介绍了数据流的基本概念、数据流模型、数据流处理模型和目前一些数据流管理系统,并对数据流技术及其挖掘算法进行归纳和分类论述。 相似文献
6.
《数字社区&智能家居》2008,(Z2)
介绍了数据流的定义和特点及数据流频繁模式的基本概念。针对数据流的特性,讨论分析了目前国内外数据流频繁模式挖掘算法、算法特性及应用情况,最后展望了数据流频繁模式挖掘的进一步研究工作。 相似文献
7.
服务流程需要处理服务之间大量的异构数据的交互,不同的数据流处理方式直接影响了服务流程的执行效率。阐述了服务流程模型中的数据流表示模型、数据映射机制与数据流验证机制,论述了服务流程运行中的数据流调度、数据存储以及传输等数据管理问题,分析了数据流处理在服务流程中的应用情况。最后,结合现有的数据流研究进展,提出了数据流研究的展望。 相似文献
8.
数据流管理和挖掘技术探析 总被引:2,自引:1,他引:1
数据流管理和挖掘技术是数据库领域的新研究方向之一。概述了数据库技术的发展趋势以及数据流的概念、特点、体系结构、应用领域,分析了数据流概要数据结构的构造问题和数据流的连续近似查询技术,最后介绍了数据流挖掘技术。旨在描述数据流管理和挖掘技术的发展概况,为进一步的研究提供有益的借鉴。 相似文献
9.
本文基于数据流框架理论,提出了如何将数据流分析方法应用于JAVA字节码中,通过建立数据流与半格、数据流和函数调用图的关系,从而对类型信息进行分析.实验表明该数据流分析方法能够对文件中的类型信息进行较精确的分析. 相似文献
10.
11.
12.
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space 总被引:2,自引:1,他引:1
In recent times, data are generated as a form of continuous data streams in many applications. Since handling data streams
is necessary and discovering knowledge behind data streams can often yield substantial benefits, mining over data streams
has become one of the most important issues. Many approaches for mining frequent itemsets over data streams have been proposed.
These approaches often consist of two procedures including continuously maintaining synopses for data streams and finding
frequent itemsets from the synopses. However, most of the approaches assume that the synopses of data streams can be saved
in memory and ignore the fact that the information of the non-frequent itemsets kept in the synopses may cause memory utilization
to be significantly degraded. In this paper, we consider compressing the information of all the itemsets into a structure
with a fixed size using a hash-based technique. This hash-based approach skillfully summarizes the information of the whole
data stream by using a hash table, provides a novel technique to estimate the support counts of the non-frequent itemsets,
and keeps only the frequent itemsets for speeding up the mining process. Therefore, the goal of optimizing memory space utilization
can be achieved. The correctness guarantee, error analysis, and parameter setting of this approach are presented and a series
of experiments is performed to show the effectiveness and the efficiency of this approach. 相似文献
13.
14.
This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework employing stratified bagging for training base classifiers to integrate data preprocessing and dynamic ensemble selection methods for imbalanced data stream classification. The proposed approach has been evaluated based on computer experiments carried out on 135 artificially generated data streams with various imbalance ratios, label noise levels, and types of concept drift as well as on two selected real streams. Four preprocessing techniques and two dynamic selection methods, used on both bagging classifiers and base estimators levels, were considered. Experimentation results showed that, for highly imbalanced data streams, dynamic ensemble selection coupled with data preprocessing could outperform online and chunk-based state-of-art methods. 相似文献
15.
These days, endless streams of data are generated by various sources such as sensors, applications, users, etc. Due to possible issues in sources, such as malfunctions in sensors, platforms, or communication, the generated data might be of low quality, and this can lead to wrong outcomes for the tasks that rely on these data streams. Therefore, controlling the quality of data streams has become increasingly significant. Many approaches have been proposed for controlling the quality of data streams, and hence, various research areas have emerged in this field. To the best of our knowledge, there is no systematic literature review of research papers within this field that comprehensively reviews approaches, classifies them, and highlights the challenges.In this paper, we present the state of the art in the area of quality control of data streams, and characterize it along four dimensions. The first dimension represents the goal of the quality analysis, which can be either quality assessment, or quality improvement. The second dimension focuses on the quality control method, which can be online, offline, or hybrid. The third dimension focuses on the quality control technique, and finally, the fourth dimension represents whether the quality control approach uses any contextual information (inherent, system, organizational, or spatiotemporal context) or not. We compare and critically review the related approaches proposed in the last two decades along these dimensions. We also discuss the open challenges and future research directions. 相似文献
16.
A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams. 相似文献
17.
18.
Querying live media streams is a challenging problem that is becoming an essential requirement in a growing number of applications.
Research in multimedia information systems has addressed and made good progress in dealing with archived data. Meanwhile,
research in stream databases has received significant attention for querying alphanumeric symbolic streams. The lack of a
data model capable of representing different multimedia data in a declarative way, hiding the media heterogeneity and providing
reasonable abstractions for querying live multimedia streams poses the challenge of how to make the best use of data in video,
audio and other media sources for various applications. In this paper we propose a system that enables directly capturing
media streams from sensors and automatically generating more meaningful feature streams that can be queried by a data stream
processor. The system provides an effective combination between extendible digital processing techniques and general data
stream management research. Together with other query techniques developed in related data stream management streams, our
system can be used in those application areas where multifarious live media senors are deployed for surveillance, disaster
response, live conferencing, telepresence, etc.
相似文献
Bin LiuEmail: |