首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Future trends in data mining   总被引:3,自引:1,他引:3  
Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article, we sketch our vision of the future of data mining. Starting from the classic definition of “data mining”, we elaborate on topics that — in our opinion — will set trends in data mining.  相似文献   

2.
基于Web的数据挖掘技术研究及其在电子商务中的应用   总被引:1,自引:0,他引:1  
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题.本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术.在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用.  相似文献   

3.
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题。本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术。在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用。  相似文献   

4.
This paper reports on conceptual development in applications of neural networks to data mining and knowledge discovery. Hypothesis generation is one of the significant differences of data mining from statistical analyses. Nonlinear pattern hypothesis generation is a major task of data mining and knowledge discovery. Yet, few methods of nonlinear pattern hypothesis generation are available.

This paper proposes a model of data mining to support nonlinear pattern hypothesis generation. This model is an integration of linear regression analysis model, Kohonen's self-organizing maps, the algorithm for convex polytopes, and back-propagation neural networks.  相似文献   


5.
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes.  相似文献   

6.
Current trends clearly indicate that online learning has become an important learning mode. However, no effective assessment mechanism for learning performance yet exists for e-learning systems. Learning performance assessment aims to evaluate what learners learned during the learning process. Traditional summative evaluation only considers final learning outcomes, without concerning the learning processes of learners. With the evolution of learning technology, the use of learning portfolios in a web-based learning environment can be beneficially adopted to record the procedure of the learning, which evaluates the learning performances of learners and produces feedback information to learners in ways that enhance their learning. Accordingly, this study presents a mobile formative assessment tool using data mining, which involves six computational intelligence theories, i.e. statistic correlation analysis, fuzzy clustering analysis, grey relational analysis, K-means clustering, fuzzy association rule mining and fuzzy inference, in order to identify the key formative assessment rules according to the web-based learning portfolios of an individual learner for the performance promotion of web-based learning. Restated, the proposed method can help teachers to precisely assess the learning performance of individual learner utilizing only the learning portfolios in a web-based learning environment. Hence, teachers can devote themselves to teaching and designing courseware, since they save a lot of time in measuring learning performance. More importantly, teachers can understand the main factors influencing learning performance in a web-based learning environment based on the interpretable learning performance assessment rules obtained. Experimental results indicate that the evaluation results of the proposed scheme are very close to those of summative assessment results and the factor analysis provides simple and clear learning performance assessment rules. Furthermore, the proposed learning feedback with formative assessment can clearly promote the learning performances and interests of learners.  相似文献   

7.
Very little research in knowledge discovery has studied how to incorporate statistical methods to automate linear correlation discovery (LCD). We present an automatic LCD methodology that adopts statistical measurement functions to discover correlations from databases’ attributes. Our methodology automatically pairs attribute groups having potential linear correlations, measures the linear correlation of each pair of attribute groups, and confirms the discovered correlation. The methodology is evaluated in two sets of experiments. The results demonstrate the methodology’s ability to facilitate linear correlation discovery for databases with a large amount of data.  相似文献   

8.
Applying data mining to manufacturing: the nature and implications   总被引:2,自引:1,他引:1  
Recent advances in computers and manufacturing techniques have made it easy to collect and store all kinds of data in manufacturing enterprises. The problem of how to enable engineers and managers to understand large amount of data remains. Traditional data analysis methods are no longer the best alternative to be used. Data Mining (DM) approaches have created new intelligent tools for extracting useful information and knowledge automatically. All these will have a profound impact on current practices in manufacturing. In this paper the nature and implications of DM techniques in manufacturing and their implementations on product design and manufacturing are discussed.  相似文献   

9.
CRISP-DM is the standard to develop Data Mining projects. CRISP-DM proposes processes and tasks that you have to carry out to develop a Data Mining project. A task proposed by CRISP-DM is the cost estimation of the Data Mining project.  相似文献   

10.
The exponential growth of databanks creates opportunities to expand Operational Research. An example is the development of scientific approaches to "mine" intelligently the huge databanks that complex systems rely on for their management. The contribution presents an approach to exploit a health insurance databank to evaluate the performance of cardiovascular surgery nation wide.  相似文献   

11.
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess” the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than using straightforward column averages. Received: March 15, 1999 / Accepted: November 1, 1999  相似文献   

12.
潜语义标与汉语信息检索研究   总被引:4,自引:0,他引:4  
1 引言典型的传统信息检索系统,如布尔逻辑模型、向量空间模型,根据用户提供的查询条件,依据关键词的匹配或向量空间的相似系数,返回相关查询结果。对于相同的概念,使用不同的词汇表示,如同义词或近义词,或同一词汇在不同的语言环境中拥有不同的语义,即一词多义,因此基于语词匹配的查询方法,其准确性和完整性都不够理想。尽管同义词词典的使用,在一定程度上,提高了信息检索的查全率(recall),但却降低了查询的精度,且在实际应用中,需要不断更新同义词库,才能满足系统不断变化的要求。  相似文献   

13.
This paper presents the insights gained from applying knowledge discovery in databases (KDD) processes for the purpose of developing intelligent models, used to classify a country's investing risk based on a variety of factors. Inferential data mining techniques, like C5.0, as well as intelligent learning techniques, like neural networks, were applied to a dataset of 52 countries. The dataset included 27 variables (economic, stock market performance/risk and regulatory efficiencies) on 52 countries, whose investing risk category was assessed in a Wall Street Journal survey of international experts. The results of applying KDD techniques to the dataset are promising, and successfully classified most countries as compared to the experts' classifications. Implementation details, results, and future plans are also presented.  相似文献   

14.
 The combination of objective measurements and human perceptions using hidden Markov models with particular reference to sequential data mining and knowledge discovery is presented in this paper. Both human preferences and statistical analysis are utilized for verification and identification of hypotheses as well as detection of hidden patterns. As another theoretical view, this work attempts to formalize the complementarity of the computational theories of hidden Markov models and perceptions for providing solutions associated with the manipulation of the internet.  相似文献   

15.
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

16.
A key problem in traffic engineering is the optimization of the flow of vehicles through urban intersections by improving the timing policy of traffic signals. Current methods of signal control policy are based on the junction topography and prespecified static traffic volumes. However, the actual daily traffic volumes can be affected by many time‐dependent factors making a static policy hardly optimal. In this paper, we induce nonstationary predictive models of traffic flow by applying novel methods of time‐series data mining to the traffic sensors data collected from a signalized intersection in Jerusalem over a period of 3 years. Our methodology for modeling dynamic traffic volumes combines clustering and segmentation algorithms. The results of a case study based on real‐world traffic data demonstrate that a dynamic signal policy using the data mining approach can produce a decrease of about 33.7% in the total waiting time of drivers during 1 year, in comparison to the existing static traffic policy. The resulting savings for this junction only would be about 13,800 driving hours, which are worth of about $52,000 per annum in terms of Israeli economy. © 2011 Wiley Periodicals, Inc.  相似文献   

17.
XML data mining     
With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general‐purpose query language in support of mining XML data. In our framework, raw data, mining models and domain knowledge are represented by way of XML documents and stored inside native XML databases. Data mining (DM) tasks are expressed in an extension of XQuery. Special attention is given to the frequent pattern discovery problem, and a way of exploiting domain‐dependent optimizations and efficient data structures as deeper as possible in the extraction process is presented. We report the results of a first bunch of experiments, showing that a good trade‐off between expressiveness and efficiency in XML DM is not a chimera. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

18.
Learning often occurs through comparing. In classification learning, in order to compare data groups, most existing methods compare either raw instances or learned classification rules against each other. This paper takes a different approach, namely conceptual equivalence, that is, groups are equivalent if their underlying concepts are equivalent while their instance spaces do not necessarily overlap and their rule sets do not necessarily present the same appearance. A new methodology of comparing is proposed that learns a representation of each group’s underlying concept and respectively cross-exams one group’s instances by the other group’s concept representation. The innovation is fivefold. First, it is able to quantify the degree of conceptual equivalence between two groups. Second, it is able to retrace the source of discrepancy at two levels: an abstract level of underlying concepts and a specific level of instances. Third, it applies to numeric data as well as categorical data. Fourth, it circumvents direct comparisons between (possibly a large number of) rules that demand substantial effort. Fifth, it reduces dependency on the accuracy of employed classification algorithms. Empirical evidence suggests that this new methodology is effective and yet simple to use in scenarios such as noise cleansing and concept-change learning.  相似文献   

19.
Increasing availability of music data via Internet evokes demand for efficient search through music files. Users' interests include melody tracking, harmonic structure analysis, timbre identification, and so on. We visualize, in an illustrative example, why content based search is needed for music data and what difficulties must be overcame to build an intelligent music information retrieval system.  相似文献   

20.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号