共查询到20条相似文献,搜索用时 15 毫秒
1.
Yong Joon Lee Author Vitae 《Journal of Systems and Software》2009,82(1):155-167
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques. 相似文献
2.
数据流挖掘算法研究综述 总被引:15,自引:3,他引:15
流数据挖掘是数据挖掘的一个新的研究方向,已逐渐成为许多领域的有用工具。在介绍数据流的基本特点以及数据流挖掘的意义的基础上,对现有数据流挖掘算法的主要思想方法进行了总结,并指出了这些方法的局限性。最后对数据流挖掘的发展方向进行了展望。 相似文献
3.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions. 相似文献
4.
孙娜 《计算机工程与设计》2013,34(9)
为了克服数据流概念漂移现象对分类模型的影响,提高数据流分类准确率,提出了一种基于概念漂移检测算法的数据流分类模型.针对不同概念漂移类型使用不同的方法进行检测,该模型通过对概念漂移进行监控,从而有效控制分类模型的更新频率,做到有的放矢地更新分类器模型,提高分类模型的分类性能.通过使用两种不同的数据集进行实验,并与传统分类模型进行比较,验证了该模型的有效性和正确性. 相似文献
5.
The proliferation of sensor technology, especially in the context of embedded systems, has brought forward novel types of applications that make use of streams of continuously generated sensor data. Many applications like telemonitoring in healthcare or roadside traffic monitoring and control particularly require data stream management (DSM) to be provided in a distributed, yet reliable way. This is even more important when DSM applications are deployed in a failure-prone distributed setting including resource-limited mobile devices, for instance in applications which aim at remotely monitoring mobile patients. In this paper, we introduce a model for distributed and reliable DSM. The contribution of this paper is threefold. First, in analogy to the SQL isolation levels, we define levels of reliability and describe necessary consistency constraints for distributed DSM that specify the tolerated loss, delay, or re-ordering of data stream elements, respectively. Second, we use this model to design and analyze an algorithm for reliable distributed DSM, namely efficient coordinated operator checkpointing (ECOC). We show that ECOC provides lossless and delay-limited reliable data stream management and thus can be used in critical application domains such as healthcare, where the loss of data stream elements cannot be tolerated. Third, we present detailed performance evaluations of the ECOC algorithm running on mobile, resource-limited devices. In particular, we can show that ECOC provides a high level of reliability while, at the same time, featuring good performance characteristics with moderate resource consumption. 相似文献
6.
Improving industrial product reliability, maintainability and thus availability is a challenging task for many industrial companies. In industry, there is a growing need to process data in real time, since the generated data volume exceeds the available storage capacity. This paper consists of a review of data stream mining and data stream management systems aimed at improving product availability. Further, a newly developed and validated grid-based classifier method is presented and compared to one-class support vector machine (OCSVM) and a polygon-based classifier. 相似文献
7.
Flexible decision tree for data stream classification in the presence of concept change, noise and missing values 总被引:1,自引:0,他引:1
In recent years, classification learning for data streams has become an important and active research topic. A major challenge
posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised
accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If
accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this
methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however,
this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction
in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams.
Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because
of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values.
When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates
itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes
a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification.
The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise
from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals
with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing
data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results
suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change,
noise and missing values coexist. 相似文献
8.
为了有效解决传统的数据分类算法不能很好的适应数据流的数据无限性和概念漂移性带来的问题,提出了一种实时的数据流的挖掘算法.贝叶斯数据流分类算法充分考虑了离散属性和连续属性的不同处理,对时间窗口内的数据进行压缩,然后根据各个时间窗口的权重,重组了压缩后的数据并在重组后的压缩数据上学习和生成了单个贝叶斯分类器.实验结果表明,该算法在分类性能、分类准确率、分类速度上优于同类算法. 相似文献
9.
数据流中基于计数的频繁模式挖掘 总被引:1,自引:0,他引:1
频繁项集是挖掘流数据挖掘的基本任务。许多近似算法能够有效进行频繁项挖掘,但不能有效控制内存资源消耗。文章提出并实现了0—δ算法,能够有效控制内存消耗问题。在充分的理论分析基础上,还用翔实的实验证明了新方法的有效性。 相似文献
10.
11.
基于复杂网络数据流密度的增量子空间数据挖掘算法 总被引:1,自引:0,他引:1
为了提升在复杂网络中对大规模网络数据流进行挖掘时的准确性,提出一种基于复杂网络数据流密度的增量子空间数据挖掘算法,在算法中先对复杂网络的数据流密度进行分析,并根据不同网络的数据流密度来划分社区,进行无向环路遍历来确定数据流的所属社区.再通过增量子空间数据挖掘算法来计算社区网络与数据流的相关度以及数据流所经过的节点与时间的相关系数,从而准确确定目标数据流所处的节点.通过仿真实验结果和数据分析表明,增量子空间数据挖掘算法的数据挖掘精度在节点、社区数较多的情况下仍达到了较高的挖掘精度. 相似文献
12.
Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain.
For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about
25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples
can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the
techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the
Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join
and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and
dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm.
We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results
show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition,
it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted
attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.
Bijan Raahemi is an assistant professor at the Telfer School of Management, University of Ottawa, Canada, with cross-appointment with the School of Information Technology and Engineering. He received his Ph.D. in Electrical and Computer Engineering from the University of Waterloo, Canada, in 1997. Prior to joining the University of Ottawa, Dr. Raahemi held several research positions in Telecommunications industry, including Nortel Networks and Alcatel-Lucent, focusing on Computer Networks Architectures and Services, Dynamics of Internet Traffic, Systems Modeling, and Performance Analysis of Data Networks. His current research interests include Knowledge Discovery and Data Mining, Information Systems, and Data Communications Networks. Dr. Raahemi’s work has appeared in several peer-reviewed journals and conference proceedings. He also holds 10 patents in Data Communications. He is a senior Member of the Institute of Electrical and Electronics Engineering (IEEE), and a member of the Association for Computing Machinery (ACM). Weicai Zhong is a post-doctoral fellow at the Telfer School of Management, University of Ottawa, Canada. He received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000 and a Ph.D. in pattern recognition and intelligent systems from Xidian University in 2004. Prior to joining the University of Ottawa, Dr. Zhong was a senior statistician in SPSS Inc. from Jan. 2005 to Dec. 2007. His current research interests include Internet Traffic Identification, Data Mining, and Evolutionary Computation. He is a member of the Institute of Electrical and Electronics Engineering (IEEE). Jing Liu is an Associate Professor with Xidian University, China. She received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000, and a Ph.D. in circuits and systems from Xidian University in 2004. Her research interests include Data Mining, Evolutionary Computation, and Multiagent Systems. She is a member of the Institute of Electrical and Electronics Engineering (IEEE). 相似文献
Jing LiuEmail: |
Bijan Raahemi is an assistant professor at the Telfer School of Management, University of Ottawa, Canada, with cross-appointment with the School of Information Technology and Engineering. He received his Ph.D. in Electrical and Computer Engineering from the University of Waterloo, Canada, in 1997. Prior to joining the University of Ottawa, Dr. Raahemi held several research positions in Telecommunications industry, including Nortel Networks and Alcatel-Lucent, focusing on Computer Networks Architectures and Services, Dynamics of Internet Traffic, Systems Modeling, and Performance Analysis of Data Networks. His current research interests include Knowledge Discovery and Data Mining, Information Systems, and Data Communications Networks. Dr. Raahemi’s work has appeared in several peer-reviewed journals and conference proceedings. He also holds 10 patents in Data Communications. He is a senior Member of the Institute of Electrical and Electronics Engineering (IEEE), and a member of the Association for Computing Machinery (ACM). Weicai Zhong is a post-doctoral fellow at the Telfer School of Management, University of Ottawa, Canada. He received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000 and a Ph.D. in pattern recognition and intelligent systems from Xidian University in 2004. Prior to joining the University of Ottawa, Dr. Zhong was a senior statistician in SPSS Inc. from Jan. 2005 to Dec. 2007. His current research interests include Internet Traffic Identification, Data Mining, and Evolutionary Computation. He is a member of the Institute of Electrical and Electronics Engineering (IEEE). Jing Liu is an Associate Professor with Xidian University, China. She received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000, and a Ph.D. in circuits and systems from Xidian University in 2004. Her research interests include Data Mining, Evolutionary Computation, and Multiagent Systems. She is a member of the Institute of Electrical and Electronics Engineering (IEEE). 相似文献
13.
一种能够适应概念漂移变化的数据流分类方法 总被引:1,自引:0,他引:1
目前多数的数据流分类方法都是基于数据稳定分布这一假设,忽略了真实数据在一段时间内会发生潜在概念性的变化,这可能会降低分类模型的预测精度.针对数据流的特性,提出一种能够识别并适应概念漂移发生的在线分类算法,实验表明它能根据目前概念漂移的状况,自动地调整训练窗口和模型重建期间新样本的个数. 相似文献
14.
This paper presents a novel approach to handle large amounts of geometric data. A data stream clustering is used to reduce the amount of data and build a hierarchy of clusters. The data stream concept allows for the processing of very large data sets. The cluster hierarchy is then used in a dynamic triangulation to create a multiresolution model. It allows for the interactive selection of a different level of detail in various parts of the data.A method for removal multiple points from Delaunay triangulation is proposed. It is significantly faster than the traditional approach. The clustering and the triangulation are supplemented by an elliptical metric to handle data with anisotropic properties.Compared to the closest competitive method by Isenburg et al., the presented algorithm requires only a single pass over the data and offers a high flexibility. These advantages culminate in a long running time. The method was tested on several large digital elevation maps. The clustering phase can take up to a few hours. Once the cluster hierarchy is built, the terrains can be efficiently manipulated in real time. 相似文献
15.
16.
This special issue provides a leading forum for timely, in-depth presentation of recent advances in algorithms, theories and applications in temporal data mining. The selected papers underwent a rigorous refereeing and revision process. 相似文献
17.
挖掘频繁项集是挖掘数据流的基本任务.许多近似算法能够对数据流进行频繁项集的挖掘,但不能有效控制内存资源消耗和挖掘运行时间.为了提高数据流挖掘的效率,通过挖掘数据流中的频繁闭项集来减少挖掘结果项集的数量,并借鉴Relim算法和Manku算法,引入事务链表组作为概要数据结构,提出了一种新的数据流频繁闭项集的挖掘算法.最后通过实验,证明了该算法的有效性. 相似文献
18.
In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method. 相似文献
19.
In this paper we present a new credal classification rule (CCR) based on belief functions to deal with the uncertain data. CCR allows the objects to belong (with different masses of belief) not only to the specific classes, but also to the sets of classes called meta-classes which correspond to the disjunction of several specific classes. Each specific class is characterized by a class center (i.e. prototype), and consists of all the objects that are sufficiently close to the center. The belief of the assignment of a given object to classify with a specific class is determined from the Mahalanobis distance between the object and the center of the corresponding class. The meta-classes are used to capture the imprecision in the classification of the objects when they are difficult to correctly classify because of the poor quality of available attributes. The selection of meta-classes depends on the application and the context, and a measure of the degree of indistinguishability between classes is introduced. In this new CCR approach, the objects assigned to a meta-class should be close to the center of this meta-class having similar distances to all the involved specific classes? centers, and the objects too far from the others will be considered as outliers (noise). CCR provides robust credal classification results with a relatively low computational burden. Several experiments using both artificial and real data sets are presented at the end of this paper to evaluate and compare the performances of this CCR method with respect to other classification methods. 相似文献
20.
《Expert systems with applications》2014,41(11):5431-5450
We present a method for the classification of multi-labeled text documents explicitly designed for data stream applications that require to process a virtually infinite sequence of data using constant memory and constant processing time.Our method is composed of an online procedure used to efficiently map text into a low-dimensional feature space and a partition of this space into a set of regions for which the system extracts and keeps statistics used to predict multi-label text annotations. Documents are fed into the system as a sequence of words, mapped to a region of the partition, and annotated using the statistics computed from the labeled instances colliding in the same region. This approach is referred to as clashing.We illustrate the method in real-world text data, comparing the results with those obtained using other text classifiers. In addition, we provide an analysis about the effect of the representation space dimensionality on the predictive performance of the system. Our results show that the online embedding indeed approximates the geometry of the full corpus-wise TF and TF-IDF space. The model obtains competitive F measures with respect to the most accurate methods, using significantly fewer computational resources. In addition, the method achieves a higher macro-averaged F measure than methods with similar running time. Furthermore, the system is able to learn faster than the other methods from partially labeled streams. 相似文献