共查询到20条相似文献,搜索用时 31 毫秒
1.
Grouping individual tourists who have the same or similar tourist routes over the same time period makes it more convenient for the tourists at a low cost by providing transportation means such as regular or occasional tour buses, driver, and tourism guides. In this paper, we propose a mathematical formulation for the tour routes clustering problem and two phases for a sequential pattern algorithm for clustering similar or identical routes according to the tourist routes of individual tourists, with illustrative examples. The first phase is to construct a site by site frequency matrix and prune infrequent tour route patterns from the matrix. The second phase is to perform clustering of the tour routes to determine the tour route using a sequential pattern mining algorithm. We compare and evaluate the performance of our algorithms, i.e., in terms of execution time and memory used. The proposed algorithm is efficient in both runtime and memory usage for the increasing number of transactions. 相似文献
2.
Instagram is a popular photo-sharing social application. It is widely used by tourists to record their journey information such as location, time and interest. Consequently, a huge volume of geo-tagged photos with spatio-temporal information are generated along tourist’s travel trajectories. Such Instagram photo trajectories consist of travel paths, travel density distributions, and traveller behaviors, preferences, and mobility patterns. Mining Instagram photo trajectories is thus very useful for many mobile and location-based social applications, including tour guide and recommender systems. However, we have not found any work that extracts interesting group-like travel trajectories from Instagram photos asynchronously taken by different tourists. Motivated by this, we propose a novel concept: coterie, which reveals representative travel trajectory patterns hidden in Instagram photos taken by users at shared locations and paths. Our work includes the discovery of (1) coteries, (2) closed coteries, and (3) the recommendation of popular travel routes based on closed coteries. For this, we first build a statistically reliable trajectory database from Instagram geo-tagged photos. These trajectories are then clustered by the DBSCAN method to find tourist density. Next, we transform each raw spatio-temporal trajectory into a sequence of clusters. All discriminative closed coteries are further identified by a Cluster-Growth algorithm. Finally, distance-aware and conformityaware recommendation strategies are applied on closed coteries to recommend popular tour routes. Visualized demos and extensive experimental results demonstrate the effectiveness and efficiency of our methods. 相似文献
4.
Bankruptcy trajectory reflects the dynamic changes of financial situation of companies, and hence make possible to keep track of the evolution of companies and recognize the important trajectory patterns. This study aims at a compact visualization of the complex temporal behaviors in financial statements. We use self-organizing map (SOM) to analyze and visualize the financial situation of companies over several years through a two-step clustering process. Initially, the bankruptcy risk is characterized by a feature self-organizing map (FSOM), and therefore the temporal sequence is converted to the trajectory vector projected on the map. Afterwards, the trajectory self-organizing map (TSOM) clusters the trajectory vectors to a number of trajectory patterns. The proposed approach is applied to a large database of French companies spanning over four years. The experimental results demonstrate the promising functionality of SOM for bankruptcy trajectory clustering and visualization. From the viewpoint of decision support, the method might give experts insight into the patterns of bankrupt and healthy company development. 相似文献
5.
This paper proposes a method for grouping trajectories as two-dimensional time-series data. Our method employed a two-stage approach. Firstly, it compared two trajectories based on their structural similarity, and determines the best correspondence of partial trajectories. Then, it calculated the value-based dissimilarity for the all pairs of matched segments, and outputs their total sum as the dissimilarity of two trajectories. We evaluated this method on two data sets. Experimental results on the Australia sign language dataset and chronic hepatitis dataset demonstrate that our method could capture the structural similarity between trajectories even in the presence of noise and local differences, and could provide better proximity for discriminating objects. 相似文献
7.
在分析研究具有代表性的关联知识挖掘算法的基础上,提出了挖掘频繁模式的一个新的数据库存储结构AFP-树,并在此结构上设计了一个频繁模式挖掘算法。理论研究已经阐明了AFP-树的有效性和相关算法的高效性。 相似文献
8.
针对序列模式挖掘,提出频繁2序列图(F2SG)来表示数据库中的序列信息,通过扫描一次数据库,将与挖掘任务相关的信息映射到F2SG中,并在此基础上提出一种新的序列模式发现算法——GBSP。GBSP算法充分利用F2SG中表示的项目之间的次序关系进行频繁序列挖掘,提高了其生成效率。理论分析与实验表明,该算法较传统的序列模式发现算法在时间和空间性能上具有优越性。 相似文献
9.
提出一种基于最大频繁模式、模式相似与属性描述相结合的多维序列模式挖掘算法MSP,该算法包括3个步骤:挖掘数据集中的最大频繁模式,每个频繁模式成为一个模式类;比较数据中各序列项序列与各模式类的包含与相似关系;按照一定的规则抽取与各模式类相关的属性,给出以属性为前件、模式类为后件的多维序列规则为形式的多维序列模式挖掘结果.... 相似文献
10.
如何有效的从轨迹数据中挖掘轨迹模式和规律具有重要意义,本文基于交通路网研究移动对象轨迹预测,将序列分析方法和马尔科夫统计模型结合,提出了一种基于后缀自动机的变阶马尔科夫模型挖掘方法。该方法根据移动对象的历史轨迹数据进行学习训练,计算轨迹序列上下文的概率特征,建立序列的后缀自动机模型,结合当前实际轨迹数据,动态自适应预测将来的位置信息。实验结果表明:相比固定阶马尔科夫模型,随着阶数的增加(L>=2),固定阶马尔科夫模型预测的精度逐步降低,而该方法能动态自适应,精度保持在81.3%左右,取得较好的预测效果;同时,该方法只需线性的时间和空间开销,大大降低了存储空间和时间,能实现大规模数据的在线学习。 相似文献
11.
Location is a key context ingredient and many existing pervasive applications rely on the current locations of their users. However, with the ability to predict the future location and movement behavior of a user, the usability of these applications can be greatly improved. In this paper, we propose an approach to predict both the intended destination and the future route of a person. Rather than predicting the destination and future route separately, we have focused on making prediction in an integrated way by exploiting personal movement data (i.e. trajectories) collected by GPS. Since trajectories contain daily whereabouts information of a person, the proposed approach first detects the significant places where the person may depart from or go to using a clustering-based algorithm called FBM (Forward–Backward Matching), then abstracts the trajectories based on a space partitioning method, and finally extracts movement patterns from the abstracted trajectories using an extended CRPM (Continuous Route Pattern Mining) algorithm. Extracted movement patterns are organized in terms of origin–destination couples. The prediction is made based on a pattern tree built from these movement patterns. With the real personal movement data of 14 participants, we conducted a number of experiments to evaluate the performance of our system. The results show that our approach can achieve approximately 80% and 60% accuracy in destination prediction and 1-step prediction, respectively, and result in an average deviation of approximately 60 m in continuous future route prediction. Finally, based on the proposed approach, we implemented a prototype running on mobile phones, which can extract patterns from a user’s historical movement data and predict the destination and future route. 相似文献
12.
在全球定位、移动通信技术迅速发展的背景下涌现出了海量的时空轨迹数据,这些数据是对移动对象在时空环境下的移动模式和行为特征的真实写照,蕴含了丰富的信息,这些信息对于城市规划、交通管理、服务推荐、位置预测等领域具有重要的应用价值,而时空轨迹数据在这些领域的应用通常需要通过对时空轨迹数据进行序列模式挖掘才能得以实现.时空轨迹... 相似文献
13.
临床行为数据经清理后仍然存在时间关系噪音,直接用于序列挖掘算法难以发现高质量的模式.提出了一种时间规范化模型,该模型定义了时序行为的顺序和并列关系,针对所给出的关系进行相交系数的计算,根据计算结果确定行为时间关系中的噪音,遵循规范后的所有行为相互之间既无噪音又保持原正确关系不变的准则,进行噪音清除.针对模型进行了算法实现,对样本数据的测试结果表明,经处理后的数据满足了后续的模式挖掘的要求. 相似文献
14.
Mining association rules is most commonly seen among the techniques for knowledge discovery from databases (KDD). It is used to discover relationships among items or itemsets. Furthermore, temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities. In this paper, a new concept of up-to-date patterns is proposed, which is a hybrid of the association rules and temporal mining. An itemset may not be frequent (large) for an entire database but may be large up-to-date since the items seldom occurring early may often occur lately. An up-to-date pattern is thus composed of an itemset and its up-to-date lifetime, in which the user-defined minimum-support threshold must be satisfied. The proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database. Experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up-to-date temporal patterns especially when the minimum-support threshold is high. 相似文献
15.
Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases.
It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications
of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial
approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically
generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally
examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates
actual minimum-supports from the commonly-used requirements.
This work is partially supported by Australian ARC grants for discovery projects (DP0449535, DP0559536 and DP0667060), a China
NSF Major Research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the
Chinese Academy of Sciences (06S3011S01), and an Overseas-Returning High-level Talent Research Program of China Human-Resource
Ministry.
A preliminary and shortened version of this paper has been published in the Proceedings of the 8th Pacific Rim International
Conference on Artificial Intelligence (PRICAI ’04). 相似文献
16.
High-utility pattern mining (HUPM) is an emerging topic in recent years instead of association-rule mining to discover more interesting and useful information for decision making. Many algorithms have been developed to find high-utility patterns (HUPs) from quantitative databases without considering timestamp of patterns, especially in recent intervals. A pattern may not be a HUP in an entire database but may be a HUP in recent intervals. In this paper, a new concept namely up-to-date high-utility pattern (UDHUP) is designed. It considers not only utility measure but also timestamp factor to discover the recent HUPs. The UDHUP-apriori is first proposed to mine UDHUPs in a level-wise way. Since UDHUP-apriori uses Apriori-like approach to recursively derive UDHUPs, a second UDHUP-list algorithm is then presented to efficiently discover UDHUPs based on the developed UDU-list structures and a pruning strategy without candidate generation, thus speeding up the mining process. A flexible minimum-length strategy with two specific lifetimes is also designed to find more efficient UDHUPs based on a users’ specification. Experiments are conducted to evaluate the performance of the proposed two algorithms in terms of execution time, memory consumption, and number of generated UDHUPs in several real-world and synthetic datasets. 相似文献
17.
FP-growth算法是挖掘频繁项集的经典算法,它利用FP-树这种紧凑的数据结构存储事务数据库与频繁项集挖掘相关的全部信息,但对于挖掘加权频繁项集并不合适。分析了现有加权频繁项集挖掘算法中存在的问题,并对FP-树进行改进,构造新的加权FP-树,提出了有效挖掘加权频繁项集的算法。最后举例说明了算法的挖掘过程,并通过实验验证了算法的有效性。 相似文献
18.
The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. The solution procedure consists of two steps. First, they derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, one can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, they derive algorithms to determine the frequent traversal patterns-i.e., large reference sequences-from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted 相似文献
20.
序列模式挖掘是数据挖掘领域中十分重要的研究课题.目前已有许多算法用于序列模式的挖掘,但在序列模式增量式更新方面的研究还比较少,针对这种情况提出了序列模式增量式更新的挖掘算法SPIU.SPIU算法充分利用了原有的挖掘结果,并对产生的候选频繁序列进行剪枝,有效地减小了候选频繁序列的大小,从而很好地改善了挖掘效率.测试结果表明SPIU算法是正确和高效的,另外算法还具有很好的扩放性. 相似文献
|