首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining user behavior patterns in mobile environments is an emerging topic in data mining fields with wide applications. By integrating moving paths with purchasing transactions, one can find the sequential purchasing patterns with the moving paths, which are called mobile sequential patterns of the mobile users. Mobile sequential patterns can be applied not only for planning mobile commerce environments but also for analyzing and managing online shopping websites. However, unit profits and purchased numbers of the items are not considered in traditional framework of mobile sequential pattern mining. Thus, the patterns with high utility (i.e., profit here) cannot be found. In view of this, we aim at integrating mobile data mining with utility mining for finding high-utility mobile sequential patterns in this study. Two types of algorithms, namely level-wise and tree-based methods, are proposed for mining high-utility mobile sequential patterns. A series of analyses and comparisons on the performance of the two different types of algorithms are conducted through experimental evaluations. The results show that the proposed algorithms outperform the state-of-the-art mobile sequential pattern algorithms and that the tree-based algorithms deliver better performance than the level-wise ones under various conditions.  相似文献   

2.
Understanding latency in network-based applications has received considerable attention to provide consistent and acceptable levels of services. This paper presents an empirical approach, a pattern-based prediction method, to predict end-to-end network latency. The key idea of the approach is to utilize past history of latency and their variation patterns in latency predictions. After some preliminary study on simple numerical prediction models we examine the effectiveness of the proposed method with real latency data and various definitions of network stability. Our results show that the pattern-based method outperforms any single numerical model obtaining an overall prediction accuracy of 86.2%.  相似文献   

3.
杜超  王志海  江晶晶  孙艳歌 《软件学报》2017,28(11):2891-2904
基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.  相似文献   

4.
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.  相似文献   

5.
对具有时间属性的数据进行数据挖掘称为时态数据挖掘,用以发现数据在时间上的知识,当数据变化不规律时,如股票交易数据,就很难发现有价值的规律与规则。而神经网络具有并行、容错、可以硬件实现以及自我学习的优点,可作为股票分类预测应用的一种方法。通过将股票数据与时态型相结合,将股票数据转换成时态型股票数据,提出时态神经网络模型的分类方法,对收集的若干上市公司十年内的股票数据进行分析,构建了时态股票数据神经网络分类器对股票进行分类预测。经过实验验证,相比改进前的神经网络和支持向量机方法,该分类器具有更高的分类准确率。结果证明,这种时态数据神经网络模型对于多只股票的分类预测是非常有效的,可以很好地运用到股票市场的分类预测中。  相似文献   

6.
Sequential pattern mining is one of the most important data mining techniques. Previous research on mining sequential patterns discovered patterns from point-based event data, interval-based event data, and hybrid event data. In many real life applications, however, an event may involve many statuses; it might not occur only at one certain point in time or over a period of time. In this work, we propose a generalized representation of temporal events. We treat events as multi-label events with many statuses, and introduce an algorithm called MLTPM to discover multi-label temporal patterns from temporal databases. The experimental results show that the efficiency and scalability of the MLTPM algorithm are satisfactory. We also discuss interesting multi-label temporal patterns discovered when MLTPM was applied to historical Nasdaq data.  相似文献   

7.
Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree – a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness.  相似文献   

8.
Mining interesting user behavior patterns in mobile commerce environments   总被引:6,自引:6,他引:0  
Discovering user behavior patterns from mobile commerce environments is an essential topic with wide applications, such as planning physical shopping sites, maintaining e-commerce on mobile devices and managing online shopping websites. Mobile sequential pattern mining is an emerging issue in this topic, which considers users’ moving paths and purchased items in mobile commerce environments to find the complete set of mobile sequential patterns. However, an important factor, namely users’ interests, has not been considered yet in past studies. In practical applications, users may only be interested in the patterns with some user-specified constraints. The traditional methods without considering the constraints pose two crucial problems: (1) Users may need to filter out uninteresting patterns within huge amount of patterns, (2) Finding the complete set of patterns containing the uninteresting ones needs high computational cost and runtime. In this paper, we address the problem of mining mobile sequential patterns with two kinds of constraints, namely importance constraints and pattern constraints. Here, we consider the importance of an item as its utility (i.e., profit) in the mobile commerce environment. An efficient algorithm, IM-Span (I nteresting M obile S equential Pa tter n mining), is proposed for dealing with the two kinds of constraints. Several effective strategies are employed to reduce the search space and computational cost in different aspects. Experimental results show that the proposed algorithms outperform state-of-the-art algorithms significantly under various conditions.  相似文献   

9.
This research aims to evaluate ensemble learning (bagging, boosting, and modified bagging) potential in predicting microbially induced concrete corrosion in sewer systems from the data mining (DM) perspective. Particular focus is laid on ensemble techniques for network-based DM methods, including multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) as well as tree-based DM methods, such as chi-square automatic interaction detector (CHAID), classification and regression tree (CART), and random forests (RF). Hence, an interdisciplinary approach is presented by combining findings from material sciences and hydrochemistry as well as data mining analyses to predict concrete corrosion. The effective factors on concrete corrosion such as time, gas temperature, gas-phase H2S concentration, relative humidity, pH, and exposure phase are considered as the models’ inputs. All 433 datasets are randomly selected to construct an individual model and twenty component models of boosting, bagging, and modified bagging based on training, validating, and testing for each DM base learners. Considering some model performance indices, (e.g., Root mean square error, RMSE; mean absolute percentage error, MAPE; correlation coefficient, r) the best ensemble predictive models are selected. The results obtained indicate that the prediction ability of the random forests DM model is superior to the other ensemble learners, followed by the ensemble Bag-CHAID method. On average, the ensemble tree-based models acted better than the ensemble network-based models; nevertheless, it was also found that taking the advantages of ensemble learning would enhance the general performance of individual DM models by more than 10%.  相似文献   

10.
Mining minimal distinguishing subsequence patterns with gap constraints   总被引:1,自引:4,他引:1  
Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.  相似文献   

11.

In this paper, with respect to reviewing and comparing existing social networks’ datasets, we introduce SNEFL dataset: the first social network dataset that includes the level of users’ likes (fuzzy like) data in addition to the likes between users. With users’ privacy in mind, the data has been collected from a social network. It includes several additional features including age, gender, marital status, height, weight, educational level and religiosity of the users. We have described its structure, analysed its features and evaluated its advantages in comparison with other social network datasets. On top of that, using unique feature of SNEFL dataset (fuzzy like) for the first time a rule-based algorithm has been developed to detect involuntary celibates (Incels) in social networks. Despite Incels activities in online social networks, until now no study on computer science has been performed to identify them. This study is the first step to address this challenge that society is facing today. Experimental results show that the accuracy of the proposed algorithm in identifying Incels among all social network users is 23.21% and among users who have fuzzy like data is 68.75%. In addition to the Incel detection, SNEFL dataset can be used by researchers in different fields to produce more accurate results. Some study areas that SNEFL dataset can be used in are network analysis, frequent pattern mining, classification and clustering.

  相似文献   

12.
陈郑淏  冯翱  何嘉 《计算机应用》2019,39(7):1936-1941
针对情感分类中传统二维卷积模型对特征语义信息的损耗以及时序特征表达能力匮乏的问题,提出了一种基于一维卷积神经网络(CNN)和循环神经网络(RNN)的混合模型。首先,使用一维卷积替换二维卷积以保留更丰富的局部语义特征;再由池化层降维后进入循环神经网络层,整合特征之间的时序关系;最后,经过softmax层实现情感分类。在多个标准英文数据集上的实验结果表明,所提模型在SST和MR数据集上的分类准确率与传统统计方法和端到端深度学习方法相比有1至3个百分点的提升,而对网络各组成部分的分析验证了一维卷积和循环神经网络的引入有助于提升分类准确率。  相似文献   

13.
OSAF-tree--可迭代的移动序列模式挖掘及增量更新方法   总被引:1,自引:0,他引:1  
移动通信技术和无限定位技术的发展积累了海量的、动态增长的时空数据.利用数据挖掘技术从移动用户的时空行为轨迹当中挖掘用户移动序列模式,在移动通信、交通管理、基于位置服务等领域有着广泛的应用前景.由于移动环境网络资源珍贵、数据量大的特点,传统的序列模式挖掘方法在效率上很难满足需求.OSAF-tree算法基于投影的概念,只需要对数据库进行一遍扫描,就可以很好地处理移动序列模式的挖掘及其增量更新和迭代挖掘问题,这是一个非常高效的算法.与已有的方法相比,OSAF-tree算法在性能和I/O代价等方面都具有明显的优势.  相似文献   

14.
胡耀炜  段磊  李岭  韩超 《计算机应用》2018,38(2):427-432
针对现有的基于模式的序列分类算法对于生物序列存在分类精度不理想、模型训练时间长的问题,提出密度感知模式,并设计了基于密度感知模式的生物序列分类算法——BSC。首先,在生物序列中挖掘具有"密度感知"的频繁序列模式;然后,对挖掘出的频繁序列模式进行筛选、排序制定成分类规则;最后,通过分类规则对没有分类的序列进行分类预测。在4组真实生物序列中进行实验,分析了BSC算法参数对结果的影响并提供了推荐参数设置;同时分类结果表明,相比其他四种基于模式的分类算法,BSC算法在实验数据集上的准确率至少提高了2.03个百分点。结果表明,BSC算法有较高的生物序列分类精度和执行效率。  相似文献   

15.
Data stream mining is an emerging research topic in the data mining field. Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis. However, two main problems existed in relevant studies: (1) The utilities (e.g., importance or profits) of items are not considered. Actual utilities of patterns cannot be reflected in frequent itemsets. (2) Existing utility mining methods produce too many patterns and this makes it difficult for the users to filter useful patterns among the huge set of patterns. In view of this, in this paper we propose a novel framework, named GUIDE (Generation of maximal high Utility Itemsets from Data strEams), to find maximal high utility itemsets from data streams with different models, i.e., landmark, sliding window and time fading models. The proposed structure, named MUI-Tree (Maximal high Utility Itemset Tree), maintains essential information for the mining processes and the proposed strategies further facilitates the performance of GUIDE. Main contributions of this paper are as follows: (1) To the best of our knowledge, this is the first work on mining the compact form of high utility patterns from data streams; (2) GUIDE is an effective one-pass framework which meets the requirements of data stream mining; (3) GUIDE generates novel patterns which are not only high utility but also maximal, which provide compact and insightful hidden information in the data streams. Experimental results show that our approach outperforms the state-of-the-art algorithms under various conditions in data stream environments on different models.  相似文献   

16.
成淑慧  武优西 《控制与决策》2024,39(3):1012-1020
虽然协同过滤可以实现用户的个性化推荐,但是大多数协同过滤及其改进模型未考虑用户和项目等特征,因而不能发掘样本间的非线性关系.与协同过滤相比,深度学习能挖掘丰富的用户兴趣模式,但网络拓扑结构是基于二支决策的方式,忽略了推荐样本的难易程度.为了增强模型的非线性表达,同时区分推荐样本的难易,受序贯三支决策的启发,提出序贯三支决策神经网络个性化推荐模型(personalized recommendation model based on sequential three-way decision with single feedforward neural network, STWD-SFNN-PR).首先,为了将高维稀疏特征向量映射为低维稠密的特征向量, STWD-SFNN-PR采用嵌入进行特征处理.其次,在增量式的网络结构中学习推荐样本,使用Adam优化网络参数,并返回难以推荐的样本.再次,利用序贯三支决策增加延迟决策的策略,并在不同的粒度层采用序贯的阈值,从而动态地实现难以推荐样本的划分.最后,为了验证模型的可行性和有效性,选择多种电影推荐数据集进行研究,并选择经典的神经网络推荐、经典的...  相似文献   

17.
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms.  相似文献   

18.
Frequent sequential pattern mining has become one of the most important tasks in data mining. It has many applications, such as sequential analysis, classification, and prediction. How to generate candidates and how to control the combinatorically explosive number of intermediate subsequences are the most difficult problems. Intelligent systems such as recommender systems, expert systems, and business intelligence systems use only a few patterns, namely those that satisfy a number of defined conditions. Challenges include the mining of top-k patterns, top-rank-k patterns, closed patterns, and maximal patterns. In many cases, end users need to find itemsets that occur with a sequential pattern. Therefore, this paper proposes approaches for mining top-k co-occurrence items usually found with a sequential pattern. The Naive Approach Mining (NAM) algorithm discovers top-k co-occurrence items by directly scanning the sequence database to determine the frequency of items. The Vertical Approach Mining (VAM) algorithm is based on vertical database scanning. The Vertical with Index Approach Mining (VIAM) algorithm is based on a vertical database with index scanning. VAM and VIAM use pruning strategies to reduce the search space, thus improving performance. VAM and VIAM are especially effective in mining the co-occurrence items of a long input pattern. The three algorithms were evaluated using real-world databases. The experimental results show that these algorithms perform well, especially VAM and VIAM.  相似文献   

19.
Jin  Canghong  Chen  Dongkai  Lin  Zhiwei  Liu  Zemin  Wu  Minghui 《GeoInformatica》2021,25(4):799-820

Identification of individuals based on transit modes is of great importance in user tracking systems. However, identifying users in real-life studies is not trivial owing to the following challenges: 1) activity data containing both temporal and spatial context are high-order and sparse; 2) traditional two-step classifiers depend on trajectory patterns as input features, which limits accuracy especially in the case of scattered and diverse data; 3) in some cases, there are few positive instances and they are difficult to detect. Therefore, approaches involving statistics-based or trajectory-based features do not work effectively. Deep learning methods also suffer from the problem of how to represent trajectory vectors for user classification. Here, we propose a novel end-to-end scenario-based deep learning method to address these challenges, based on the observation that individuals may visit the same place for different reasons. We first define a scenario using critical places and related trajectories. Next, we embed scenarios via path-based or graph-based approaches using extended embedding techniques. Finally, a two-level convolution neural network is constructed for the classification. Our model is applied to the problem of detection of addicts using transit records directly without feature engineering, based on real-life data collected from mobile devices. Based on constructed scenario with dense trajectories, our model outperforms classical classification approaches, anomaly detection methods, state-of-the-art sequential deep learning models, and graph neural networks. Moreover, we provide statistical analyses and intuitiveexplanations to interpret the characteristics of resident and addict mobility. Our method could be generalized to other trajectory-related tasks involving scattered and diverse data.

  相似文献   

20.
一种挖掘压缩序列模式的有效算法   总被引:1,自引:0,他引:1  
从序列数据库中挖掘频繁序列模式是数据挖掘领域的一个中心研究主题,而且该领域已经提出和研究了各种有效的序列模式挖掘算法.由于在挖掘过程中会产生大量的频繁序列模式,最近许多研究者已经不再聚焦于序列模式挖掘算法的效率,而更关注于如何让用户更容易地理解序列模式的结果集.受压缩频繁项集思想的启发,提出了一种CFSP(compressing frequent sequential patterns)算法,其可挖掘出少量有代表性的序列模式来表达全部频繁序列模式的信息,并且清除了大量的冗余序列模式.CFSP是一种two-steps的算法:在第1步,其获得了全部闭序列模式作为有代表性序列模式的候选集,与此同时还得到大多数的有代表性模式;在第2步,该算法只花费了少量的时间去发现剩余的有代表性序列模式.一个采用真实数据集与模拟数据集的实验研究也证明了CFSP算法具有高效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号