共查询到20条相似文献,搜索用时 15 毫秒
1.
Future trends in data mining 总被引:3,自引:1,他引:3
Hans-Peter Kriegel Karsten M. Borgwardt Peer Kröger Alexey Pryakhin Matthias Schubert Arthur Zimek 《Data mining and knowledge discovery》2007,15(1):87-97
Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing
industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article,
we sketch our vision of the future of data mining. Starting from the classic definition of “data mining”, we elaborate on
topics that — in our opinion — will set trends in data mining. 相似文献
2.
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题。本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术。在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用。 相似文献
3.
基于Web的数据挖掘技术研究及其在电子商务中的应用 总被引:1,自引:0,他引:1
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题.本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术.在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用. 相似文献
4.
Shouhong 《Data & Knowledge Engineering》2002,40(3):273-283
This paper reports on conceptual development in applications of neural networks to data mining and knowledge discovery. Hypothesis generation is one of the significant differences of data mining from statistical analyses. Nonlinear pattern hypothesis generation is a major task of data mining and knowledge discovery. Yet, few methods of nonlinear pattern hypothesis generation are available.
This paper proposes a model of data mining to support nonlinear pattern hypothesis generation. This model is an integration of linear regression analysis model, Kohonen's self-organizing maps, the algorithm for convex polytopes, and back-propagation neural networks. 相似文献
5.
Fernando Alonso Juan P. Caraa-Valente Angel L. Gonzlez Csar Montes 《Expert systems with applications》2002,23(4)
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes. 相似文献
6.
Current trends clearly indicate that online learning has become an important learning mode. However, no effective assessment mechanism for learning performance yet exists for e-learning systems. Learning performance assessment aims to evaluate what learners learned during the learning process. Traditional summative evaluation only considers final learning outcomes, without concerning the learning processes of learners. With the evolution of learning technology, the use of learning portfolios in a web-based learning environment can be beneficially adopted to record the procedure of the learning, which evaluates the learning performances of learners and produces feedback information to learners in ways that enhance their learning. Accordingly, this study presents a mobile formative assessment tool using data mining, which involves six computational intelligence theories, i.e. statistic correlation analysis, fuzzy clustering analysis, grey relational analysis, K-means clustering, fuzzy association rule mining and fuzzy inference, in order to identify the key formative assessment rules according to the web-based learning portfolios of an individual learner for the performance promotion of web-based learning. Restated, the proposed method can help teachers to precisely assess the learning performance of individual learner utilizing only the learning portfolios in a web-based learning environment. Hence, teachers can devote themselves to teaching and designing courseware, since they save a lot of time in measuring learning performance. More importantly, teachers can understand the main factors influencing learning performance in a web-based learning environment based on the interpretable learning performance assessment rules obtained. Experimental results indicate that the evaluation results of the proposed scheme are very close to those of summative assessment results and the factor analysis provides simple and clear learning performance assessment rules. Furthermore, the proposed learning feedback with formative assessment can clearly promote the learning performances and interests of learners. 相似文献
7.
Very little research in knowledge discovery has studied how to incorporate statistical methods to automate linear correlation discovery (LCD). We present an automatic LCD methodology that adopts statistical measurement functions to discover correlations from databases’ attributes. Our methodology automatically pairs attribute groups having potential linear correlations, measures the linear correlation of each pair of attribute groups, and confirms the discovered correlation. The methodology is evaluated in two sets of experiments. The results demonstrate the methodology’s ability to facilitate linear correlation discovery for databases with a large amount of data. 相似文献
8.
Kesheng Wang 《Journal of Intelligent Manufacturing》2007,18(4):487-495
Recent advances in computers and manufacturing techniques have made it easy to collect and store all kinds of data in manufacturing
enterprises. The problem of how to enable engineers and managers to understand large amount of data remains. Traditional data
analysis methods are no longer the best alternative to be used. Data Mining (DM) approaches have created new intelligent tools
for extracting useful information and knowledge automatically. All these will have a profound impact on current practices
in manufacturing. In this paper the nature and implications of DM techniques in manufacturing and their implementations on
product design and manufacturing are discussed. 相似文献
9.
CRISP-DM is the standard to develop Data Mining projects. CRISP-DM proposes processes and tasks that you have to carry out to develop a Data Mining project. A task proposed by CRISP-DM is the cost estimation of the Data Mining project. 相似文献
10.
The exponential growth of databanks creates opportunities to expand Operational Research. An example is the development of scientific approaches to "mine" intelligently the huge databanks that complex systems rely on for their management. The contribution presents an approach to exploit a health insurance databank to evaluate the performance of cardiovascular surgery nation wide. 相似文献
11.
Flip Korn Alexandros Labrinidis Yannis Kotidis Christos Faloutsos 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):254-266
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing
error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the
given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values
from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess”
the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting,
answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules
in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods
which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics,
biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in
binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than
using straightforward column averages.
Received: March 15, 1999 / Accepted: November 1, 1999 相似文献
12.
T. D. Pham 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):400-405
The combination of objective measurements and human perceptions using hidden Markov models with particular reference to sequential
data mining and knowledge discovery is presented in this paper. Both human preferences and statistical analysis are utilized
for verification and identification of hypotheses as well as detection of hidden patterns. As another theoretical view, this
work attempts to formalize the complementarity of the computational theories of hidden Markov models and perceptions for providing
solutions associated with the manipulation of the internet. 相似文献
13.
Learning often occurs through comparing. In classification learning, in order to compare data groups, most existing methods compare either raw instances or learned classification rules against each other. This paper takes a different approach, namely conceptual equivalence, that is, groups are equivalent if their underlying concepts are equivalent while their instance spaces do not necessarily overlap and their rule sets do not necessarily present the same appearance. A new methodology of comparing is proposed that learns a representation of each group’s underlying concept and respectively cross-exams one group’s instances by the other group’s concept representation. The innovation is fivefold. First, it is able to quantify the degree of conceptual equivalence between two groups. Second, it is able to retrace the source of discrepancy at two levels: an abstract level of underlying concepts and a specific level of instances. Third, it applies to numeric data as well as categorical data. Fourth, it circumvents direct comparisons between (possibly a large number of) rules that demand substantial effort. Fifth, it reduces dependency on the accuracy of employed classification algorithms. Empirical evidence suggests that this new methodology is effective and yet simple to use in scenarios such as noise cleansing and concept-change learning. 相似文献
14.
Increasing availability of music data via Internet evokes demand for efficient search through music files. Users' interests include melody tracking, harmonic structure analysis, timbre identification, and so on. We visualize, in an illustrative example, why content based search is needed for music data and what difficulties must be overcame to build an intelligent music information retrieval system. 相似文献
15.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data. 相似文献
16.
The visual senses for humans have a unique status, offering a very broadband channel for information flow. Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. Visual Data Mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this work, we try to investigate and expand the area of visual data mining by proposing new visual data mining techniques for the visualization of mining outcomes. 相似文献
17.
首先介绍了统计语言模型(SLM)的发展及常用的N元(n-gram)模型,对信息检索过程中的主要模型作了公式化描述并比较了不同模型,指出了它们之间及与传统概率检索方法的异同,分析了统计语言模型的弱点,最后介绍了对其可能的改进方法及最新研究进展,讨论了在中文信息检索中的应用和面对的挑战。 相似文献
18.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications. 相似文献
19.
Exploiting data mining techniques for broadcasting data in mobile computing environments 总被引:1,自引:0,他引:1
Mobile computers can be equipped with wireless communication devices that enable users to access data services from any location. In wireless communication, the server-to-client (downlink) communication bandwidth is much higher than the client-to-server (uplink) communication bandwidth. This asymmetry makes the dissemination of data to client machines a desirable approach. However, dissemination of data by broadcasting may induce high access latency in case the number of broadcast data items is large. We propose two methods aiming to reduce client access latency of broadcast data. Our methods are based on analyzing the broadcast history (i.e., the chronological sequence of items that have been requested by clients) using data mining techniques. With the first method, the data items in the broadcast disk are organized in such a way that the items requested subsequently are placed close to each other. The second method focuses on improving the cache hit ratio to be able to decrease the access latency. It enables clients to prefetch the data from the broadcast disk based on the rules extracted from previous data request patterns. The proposed methods are implemented on a Web log to estimate their effectiveness. It is shown through performance experiments that the proposed rule-based methods are effective in improving the system performance in terms of the average latency as well as the cache hit ratio of mobile clients. 相似文献
20.
数据挖掘的概念、系统结构和方法 总被引:7,自引:5,他引:7
毛国君 《计算机工程与设计》2002,23(8):13-17
首先对数据挖掘的概念及相关流派加以归纳,然后给出一个数据挖掘系统的体系结构,并通过它介绍数据挖掘系统的主要功能部件,最后对数据挖掘的主要方法进行分析。 相似文献