共查询到20条相似文献,搜索用时 15 毫秒
1.
From sequential pattern mining to structured pattern mining: A pattern-growth approach 总被引:10,自引:0,他引:10 下载免费PDF全文
Jia-WeiHan JianPei Xi-FengYan 《计算机科学技术学报》2004,19(3):0-0
Sequential pattern mining is an important data mining problem with broad applications. However,it is also a challenging problem since the mining may have to generate or examine a combinatorially explosivenumber of intermediate subsequences. Recent studies have developed two major classes of sequential patternmining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-basedsequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) a pattern-growthmethod, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns. In this study, we perform a systematic introduction and presentation of the pattern-growth methodologyand study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpanand PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patternsusing the same methodology. Their relative performance in l 相似文献
2.
为了克服传统问卷调查方法研究产品功能使用度时受限于样本大小和目标针对性不强等缺陷,提出了基于Web语义挖掘的产品功能使用度分析方法.运用基于人工修正的知网方法构建了关联词表,然后开发了产品使用信息系统,构建了产品功能定量化研究模型,对产品功能使用度进行分析.通过某款手机具体对该系统性的方法进行了验证,为产品功能决策提供... 相似文献
3.
4.
传统的活动语义识别研究侧重从时空轨迹的空间信息中提取人类的活动语义,对时空轨迹数据的时间特性挖掘不足。本文兼顾时间和空间特征,提出了一种基于周期模式挖掘的活动语义识别方法。首先将分离出的活动轨迹数据通过空间距离进行密度聚类分成不同轨迹簇;然后,根据轨迹簇的时序特征挖掘个体对特定位置的访问周期,基于该访问周期,并结合在该位置的停留时间,及其附近兴趣点分布等特征构建分类模型,识别人类个体的活动语义。基于签到数据和仿真数据的实验结果表明,结合周期特征的活动语义识别方法相比没有加入周期特征的实验结果有效提升识别精度20%以上,在2个相同的签到数据集下,对比其他的识别方法提升精度10%以上。 相似文献
5.
Rong Zhao 《Pattern recognition》2002,35(3):593-600
In this paper, we present the results of a project that seeks to transform low-level features to a higher level of meaning. This project concerns a technique, latent semantic indexing (LSI), in conjunction with normalization and term weighting, which have been used for full-text retrieval for many years. In this environment, LSI determines clusters of co-occurring keywords, sometimes, called concepts, so that a query which uses a particular keyword can then retrieve documents perhaps not containing this keyword, but containing other keywords from the same cluster. In this paper, we examine the use of this technique for content-based image retrieval, using two different approaches to image feature representation. We also study the integration of visual features and textual keywords and the results show that it can help improve the retrieval performance significantly. 相似文献
6.
Vineet Chaoji Mohammad Al Hasan Saeed Salem Mohammed J. Zaki 《Data mining and knowledge discovery》2008,17(3):457-495
Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences,
trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe
the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of
the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this
hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage
manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for
various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added.
Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end
in the library. DMTL been publicly released as open-source software (), and has been downloaded by numerous researchers from all over the world. 相似文献
7.
Improving the quality of image data through noise filtering has gained more attention for a long time. To date, many studies have been devoted to filter the noise inside the image, while few of them focus on filtering the instance-level noise among normal images. In this paper, aiming at providing a noise filter for bag-of-features images, (1) we first propose to utilize the cosine interesting pattern to construct the noise filter; (2) then we prove that to filter noise only requires to mine the shortest cosine interesting patterns, which dramatically simplifies the mining process; (3) we present an in-breadth pruning technique to further speed up the mining process. Experimental results on two real-life image datasets demonstrate effectiveness and efficiency of our noise filtering method. 相似文献
8.
A multi-step recognition process is developed for extracting compound forest cover information from manually produced scanned historical topographic maps of the 19th century. This information is a unique data source for GIS-based land cover change modeling. Based on salient features in the image the steps to be carried out are character recognition, line detection and structural analysis of forest symbols. Semantic expansion implying the meanings of objects is applied for final forest cover extraction. The procedure resulted in high accuracies of 94% indicating a potential for automatic and robust extraction of forest cover from larger areas. 相似文献
9.
10.
基于分层神经网络模型的数据挖掘算法 总被引:1,自引:0,他引:1
介绍了建立带钢板形缺陷模式识别的数据挖掘过程。针对普通神经网络识别精度较低的缺陷,提出一种基于分层神经网络进行数据挖掘的新方法。该方法采用二叉树型结构,通过分层来细化预测范围并选用多个神经网络进行递推。实验结果证明了分层神经网络模型比普通神经网络模型的预测精度有较大提高,完全可以满足实际生产需要。 相似文献
11.
When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases,
it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly
the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching
for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets.
We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing
constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before.
We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster
in less than an hour and a half.
Recommended by: Ahmed Elmagarmid 相似文献
12.
As a core area in data mining, frequent pattern (or itemset) mining has been studied for a long time. Weighted frequent pattern mining prunes unimportant patterns and maximal frequent pattern mining discovers compact frequent patterns. These approaches contribute to improving mining performance by reducing the search space. However, we need to consider both the downward closure property and patterns' subset checking process when integrating these different methods in order to prevent unintended pattern losses. Moreover, it is also essential to extract valid patterns with faster runtime and less memory consumption. For this reason, in this paper, we propose more efficient maximal weighted frequent pattern (MWFP) mining approaches based on tree and array structures. We describe how to handle these problems more efficiently, maintaining the correctness of our method. We develop two types of maximal weighted frequent mining algorithms based on weight ascending order and support descending order and compare these two algorithms to conclude which is more suitable for MWFP mining. In addition, comprehensive tests in this paper show that our algorithms are more efficient and scalable than state‐of‐the‐art algorithms, and they also have the correctness of the MWFP mining in terms of their pattern generation results. 相似文献
13.
XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many
Web-based knowledge management repositories store data and documents in XML. If semantics about the data are formally represented
in an ontology, then it is possible to extract knowledge: This is done as ontology definitions and axioms are applied to XML
data to automatically infer knowledge that is not explicitly represented in the repository. Ontologies also play a central
role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will
be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract knowledge
from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto
languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted
that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value
such as knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational
data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint
for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.
相似文献
Henry M. KimEmail: |
14.
15.
在当今大数据时代,MapReduce等大数据处理框架处理数据能力有限,其在处理有关图的数据时常常显得缓慢低效,典型如3-clique计数问题,故需要探究一种高效的算法处理这类clique计数问题。由于在前人文献中对3-clique计数问题已有深入探讨,故针对该问题的扩展版本—4-clique计数问题进行探究。在一个启发式的想法下提出了基于邻边采样的概率采样算法,利用切诺夫不等式证明该算法在近似条件下只需要一定数量的采样器作为相对误差的性能保证。通过实验评估对比发现,相对于传统精确算法,概率采样算法虽然在结果上损失了少量的精度,但在算法运行时间和空间占用上具有巨大的优势。最后得出其在实际应用中具有巨大实践价值的结论。 相似文献
16.
Chang-Hwan Lee 《Applied Intelligence》2007,26(3):231-242
Sequential pattern mining is an important data mining problem with broad applications. While the current methods are inducing
sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating
the additional attributes, the sequential patterns found are richer and more informative to the user. This paper proposes
a new method for inducing multi-dimensional sequential patterns with the use of Hellinger entropy measure. A number of theorems
are proposed to reduce the computational complexity of the sequential pattern systems. The proposed method is tested on some
synthesized transaction databases.
Dr. Chang-Hwan Lee is a full professor at the Department of Information and Communications at DongGuk University, Seoul, Korea since 1996. He
has received his B.Sc. and M.Sc in Computer Science and Statistics from Seoul National University in 1982 and 1988, respectively.
He received his Ph.D. in Computer Science and Engineering from University of Connecticut in 1994. Prior to joining DongGuk
University in Korea, he had worked for AT&T Bell Laboratories, Middletown, USA. (1994-1995). He also had been a visiting professor
at the University of Illinois at Urbana-Champaign (2000-2001). He is author or co-author of more than 50 refereed articles
on topics such as machine learning, data mining, artificial intelligence, pattern recognition, and bioinformatics. 相似文献
17.
The present paper reviews the techniques for automated extraction of information from signals. The techniques may be classified broadly into two categories—the conventional pattern recognition approach and the artificial intelligence (AI) based approach. The conventional approach comprises two methodologies—statistical and structural. The paper reviews salient issues in the application of conventional techniques for extraction of information. The systems that use the artificial intelligence approach are characterized with respect to three key properties. The basic differences between the approaches and the computational aspects are reviewed. Current trends in the use of the AI approach are indicated. Some key ideas in current literature are reviewed. 相似文献
18.
Stemming is the basic operation in Natural language processing (NLP) to remove derivational and inflectional affixes without performing a morphological analysis. This practice is essential to extract the root or stem. In NLP domains, the stemmer is used to improve the process of information retrieval (IR), text classifications (TC), text mining (TM) and related applications. In particular, Urdu stemmers utilize only uni-gram words from the input text by ignoring bigrams, trigrams, and n-gram words. To improve the process and efficiency of stemming, bigrams and trigram words must be included. Despite this fact, there are a few developed methods for Urdu stemmers in the past studies. Therefore, in this paper, we proposed an improved Urdu stemmer, using hybrid approach divided into multi-step operation, to deal with unigram, bigram, and trigram features as well. To evaluate the proposed Urdu stemming method, we have used two corpora; word corpus and text corpus. Moreover, two different evaluation metrics have been applied to measure the performance of the proposed algorithm. The proposed algorithm achieved an accuracy of 92.97% and compression rate of 55%. These experimental results indicate that the proposed system can be used to increase the effectiveness and efficiency of the Urdu stemmer for better information retrieval and text mining applications. 相似文献
19.
In this paper, the problem of mining complex temporal patterns in the context of multivariate time series is considered. A new method called the Fast Temporal Pattern Mining with Extended Vertical Lists is introduced. The method is based on an extension of the level‐wise property, which requires a more complex pattern to start at positions within a record where all of the subpatterns of the pattern start. The approach is built around a novel data structure called the Extended Vertical List that tracks positions of the first state of the pattern inside records and links them to appropriate positions of a specific subpattern of the pattern called the prefix. Extensive computational results indicate that the new method performs significantly faster than the previous version of the algorithm for Temporal Pattern Mining; however, the increase in speed comes at the expense of increased memory usage. 相似文献
20.
Successive stages can be distinguished in the development of the human visual system's ability to use and recognize signs. The stages involve perception of parts of objects, of whole objects, of several objects, and of their interrelations. The system of signs described in this paper was developed through experimental investigations of visual perception in adults, children, and mentally ill or brain-damaged persons. 相似文献