共查询到20条相似文献,搜索用时 0 毫秒
1.
为了克服传统问卷调查方法研究产品功能使用度时受限于样本大小和目标针对性不强等缺陷,提出了基于Web语义挖掘的产品功能使用度分析方法.运用基于人工修正的知网方法构建了关联词表,然后开发了产品使用信息系统,构建了产品功能定量化研究模型,对产品功能使用度进行分析.通过某款手机具体对该系统性的方法进行了验证,为产品功能决策提供... 相似文献
2.
3.
Rong Zhao 《Pattern recognition》2002,35(3):593-600
In this paper, we present the results of a project that seeks to transform low-level features to a higher level of meaning. This project concerns a technique, latent semantic indexing (LSI), in conjunction with normalization and term weighting, which have been used for full-text retrieval for many years. In this environment, LSI determines clusters of co-occurring keywords, sometimes, called concepts, so that a query which uses a particular keyword can then retrieve documents perhaps not containing this keyword, but containing other keywords from the same cluster. In this paper, we examine the use of this technique for content-based image retrieval, using two different approaches to image feature representation. We also study the integration of visual features and textual keywords and the results show that it can help improve the retrieval performance significantly. 相似文献
4.
Vineet Chaoji Mohammad Al Hasan Saeed Salem Mohammed J. Zaki 《Data mining and knowledge discovery》2008,17(3):457-495
Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences,
trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe
the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of
the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this
hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage
manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for
various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added.
Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end
in the library. DMTL been publicly released as open-source software (), and has been downloaded by numerous researchers from all over the world. 相似文献
5.
Improving the quality of image data through noise filtering has gained more attention for a long time. To date, many studies have been devoted to filter the noise inside the image, while few of them focus on filtering the instance-level noise among normal images. In this paper, aiming at providing a noise filter for bag-of-features images, (1) we first propose to utilize the cosine interesting pattern to construct the noise filter; (2) then we prove that to filter noise only requires to mine the shortest cosine interesting patterns, which dramatically simplifies the mining process; (3) we present an in-breadth pruning technique to further speed up the mining process. Experimental results on two real-life image datasets demonstrate effectiveness and efficiency of our noise filtering method. 相似文献
6.
7.
A multi-step recognition process is developed for extracting compound forest cover information from manually produced scanned historical topographic maps of the 19th century. This information is a unique data source for GIS-based land cover change modeling. Based on salient features in the image the steps to be carried out are character recognition, line detection and structural analysis of forest symbols. Semantic expansion implying the meanings of objects is applied for final forest cover extraction. The procedure resulted in high accuracies of 94% indicating a potential for automatic and robust extraction of forest cover from larger areas. 相似文献
8.
基于分层神经网络模型的数据挖掘算法 总被引:1,自引:0,他引:1
介绍了建立带钢板形缺陷模式识别的数据挖掘过程。针对普通神经网络识别精度较低的缺陷,提出一种基于分层神经网络进行数据挖掘的新方法。该方法采用二叉树型结构,通过分层来细化预测范围并选用多个神经网络进行递推。实验结果证明了分层神经网络模型比普通神经网络模型的预测精度有较大提高,完全可以满足实际生产需要。 相似文献
9.
When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases,
it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly
the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching
for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets.
We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing
constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before.
We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster
in less than an hour and a half.
Recommended by: Ahmed Elmagarmid 相似文献
10.
11.
Chang-Hwan Lee 《Applied Intelligence》2007,26(3):231-242
Sequential pattern mining is an important data mining problem with broad applications. While the current methods are inducing
sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating
the additional attributes, the sequential patterns found are richer and more informative to the user. This paper proposes
a new method for inducing multi-dimensional sequential patterns with the use of Hellinger entropy measure. A number of theorems
are proposed to reduce the computational complexity of the sequential pattern systems. The proposed method is tested on some
synthesized transaction databases.
Dr. Chang-Hwan Lee is a full professor at the Department of Information and Communications at DongGuk University, Seoul, Korea since 1996. He
has received his B.Sc. and M.Sc in Computer Science and Statistics from Seoul National University in 1982 and 1988, respectively.
He received his Ph.D. in Computer Science and Engineering from University of Connecticut in 1994. Prior to joining DongGuk
University in Korea, he had worked for AT&T Bell Laboratories, Middletown, USA. (1994-1995). He also had been a visiting professor
at the University of Illinois at Urbana-Champaign (2000-2001). He is author or co-author of more than 50 refereed articles
on topics such as machine learning, data mining, artificial intelligence, pattern recognition, and bioinformatics. 相似文献
12.
The present paper reviews the techniques for automated extraction of information from signals. The techniques may be classified broadly into two categories—the conventional pattern recognition approach and the artificial intelligence (AI) based approach. The conventional approach comprises two methodologies—statistical and structural. The paper reviews salient issues in the application of conventional techniques for extraction of information. The systems that use the artificial intelligence approach are characterized with respect to three key properties. The basic differences between the approaches and the computational aspects are reviewed. Current trends in the use of the AI approach are indicated. Some key ideas in current literature are reviewed. 相似文献
13.
Stemming is the basic operation in Natural language processing (NLP) to remove derivational and inflectional affixes without performing a morphological analysis. This practice is essential to extract the root or stem. In NLP domains, the stemmer is used to improve the process of information retrieval (IR), text classifications (TC), text mining (TM) and related applications. In particular, Urdu stemmers utilize only uni-gram words from the input text by ignoring bigrams, trigrams, and n-gram words. To improve the process and efficiency of stemming, bigrams and trigram words must be included. Despite this fact, there are a few developed methods for Urdu stemmers in the past studies. Therefore, in this paper, we proposed an improved Urdu stemmer, using hybrid approach divided into multi-step operation, to deal with unigram, bigram, and trigram features as well. To evaluate the proposed Urdu stemming method, we have used two corpora; word corpus and text corpus. Moreover, two different evaluation metrics have been applied to measure the performance of the proposed algorithm. The proposed algorithm achieved an accuracy of 92.97% and compression rate of 55%. These experimental results indicate that the proposed system can be used to increase the effectiveness and efficiency of the Urdu stemmer for better information retrieval and text mining applications. 相似文献
14.
Successive stages can be distinguished in the development of the human visual system's ability to use and recognize signs. The stages involve perception of parts of objects, of whole objects, of several objects, and of their interrelations. The system of signs described in this paper was developed through experimental investigations of visual perception in adults, children, and mentally ill or brain-damaged persons. 相似文献
15.
基于特定领域的中文微博热点话题挖掘系统BTopicMiner 总被引:1,自引:0,他引:1
随着微博应用的迅猛发展,自动地从海量微博信息中提取出用户感兴趣的热点话题成为一个具有挑战性的研究课题。为此研究并提出了基于扩展的话题模型的中文微博热点话题抽取算法。为了解决微博信息固有的数据稀疏性问题,算法首先利用文本聚类方法将内容相关的微博消息合成为微博文档;基于微博之间的跟帖关系蕴含着话题的关联性的假设,算法对传统潜在狄利克雷分配(LDA)话题模型进行扩展以建模微博之间的跟帖关系;最后利用互信息(MI)计算被抽取出的话题的话题词汇用于热点话题推荐。为了验证扩展的话题抽取模型的有效性,实现了一个基于特定领域的中文微博热点话题挖掘的原型系统——BTopicMiner。实验结果表明:基于微博跟帖关系的扩展话题模型可以更准确地自动提取微博中的热点话题,同时利用MI度量自动计算得到的话题词汇和人工挑选的热点词汇之间的语义相似度达到75%以上。 相似文献
16.
Carson Kai-Sang Leung Quamrul I. Khan Zhan Li Tariqul Hoque 《Knowledge and Information Systems》2007,11(3):287-311
Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating.
Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern
mining. In this paper, we propose a novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order. By exploiting
its nice properties, the CanTree can be easily maintained when database transactions are inserted, deleted, and/or modified.
For example, the CanTree does not require adjustment, merging, and/or splitting of tree nodes during maintenance. No rescan
of the entire updated database or reconstruction of a new tree is needed for incremental updating. Experimental results show
the effectiveness of our CanTree in the incremental mining of frequent patterns. Moreover, the applicability of CanTrees is
not confined to incremental mining; CanTrees can also be applicable to other frequent-pattern mining tasks including constrained
mining and interactive mining.
Carson K.-S. Leung received his B.Sc.(Honours), M.Sc., and Ph.D. degrees, all in computer science, from the University of British Columbia,
Canada. Currently, he is an Assistant Professor at the University of Manitoba, Canada. His research interests include the
areas of databases, data mining, and data warehousing. His work has been published in refereed journals and conferences such
as ACM Transactions on Database Systems (TODS), IEEE International Conference on Data Engineering (ICDE), and IEEE International Conference on Data Mining (ICDM)
Quamrul I. Khan received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. He then worked as a Test
Engineer and a Software Engineer for a few years before he started his current M.Sc. degree program in computer science at
the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.
Zhan Li received her B.Eng. degree in computer engineering from Harbin Engineering University, China, in 2002. Currently, she is
pursuing her M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S.
Leung.
Tariqul Hoque received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. Currently, he is pursuing
his M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung. 相似文献
17.
S. A. Starks R. J. P. de Figueiredo D. L. Van Rooy 《International journal of parallel programming》1977,6(1):41-54
A computational algorithm is presented for the extraction of an optimal single linear feature from several Gaussian pattern classes. The algorithm minimizes the increase in the probability of misclassification in the transformed (feature) space. The general approach used in this procedure was developed in a recent paper by R. J. P. de Figueiredo.(1) Numerical results on the application of this procedure to the remotely sensed data from the Purdue C1 flight line as well asLandsat data are presented. It was found that classification using the optimal single linear feature yielded a value for the probability of misclassification on the order of 30% less than that obtained by using the best single untransformed feature. The optimal single linear feature gave performance results comparable to those obtained by using the two features which maximized the average divergence. Also discussed are improvements in classification results using this method when the size of the training set is small.This work was supported by the Air Force Office of Scientific Research under Grant 75-2777 and by the National Aeronautics and Space Administration under contract NAS 9-12776. 相似文献
18.
Motivated by a growing need for intelligent housing to accommodate ageing populations, we propose a novel application of intertransaction association rule (IAR) mining to detect anomalous behaviour in smart home occupants. An efficient mining algorithm that avoids the candidate generation bottleneck limiting the application of current IAR mining algorithms on smart home data sets is detailed. An original visual interface for the exploration of new and changing behaviours distilled from discovered patterns using a new process for finding emergent rules is presented. Finally, we discuss our observations on the emergent behaviours detected in the homes of two real world subjects. 相似文献
19.
Mining changing regions from access-constrained snapshots: a cluster-embedded decision tree approach
Change detection on spatial data is important in many applications, such as environmental monitoring. Given a set of snapshots
of spatial objects at various temporal instants, a user may want to derive the changing regions between any two snapshots.
Most of the existing methods have to use at least one of the original data sets to detect changing regions. However, in some
important applications, due to data access constraints such as privacy concerns and limited data online availability, original
data may not be available for change analysis. In this paper, we tackle the problem by proposing a simple yet effective model-based
approach. In the model construction phase, data snapshots are summarized using the novel cluster-embedded decision trees as concise models. Once the models are built, the original data snapshots will not be accessed anymore. In the change detection
phase, to mine changing regions between any two instants, we compare the two corresponding cluster-embedded decision trees.
Our systematic experimental results on both real and synthetic data sets show that our approach can detect changes accurately
and effectively.
Irene Pekerskaya’s and Jian Pei’s research is supported partly by National Sciences and Engineering Research Council of Canada
and National Science Foundation of the US, and a President’s Research Grant and an Endowed Research Fellowship Award at Simon
Fraser University. Ke Wang’s research is supported partly by Natural Sciences and Engineering Research Council of Canada.
All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect
the views of the funding agencies. 相似文献