共查询到20条相似文献,搜索用时 0 毫秒
1.
数据挖掘的并行策略研究 总被引:3,自引:1,他引:3
文章对数据挖掘算法的并行策略进行了分类,分类技术主要集中在分割训练数据以及在每一个阶段的最后从处理器中抽取属性。这种方法在关联规则和决策树中得到了广泛的研究。在策略应用中,以DD算法为例进行了说明。在文章的最后,展望了并行数据挖掘的发展方向。 相似文献
2.
图像数据挖掘研究综述 总被引:1,自引:0,他引:1
对数据挖掘中的一个新兴领域———图像数据挖掘作出了较为全面的研究。给出了图像数据挖掘的定义,分析了图像数据挖掘与图像处理分析、图像模式识别及图像检索等相关领域的异同点,并就图像数据挖掘对传统数据挖掘的改变和扩展之处作了较为详尽的阐述。从多个角度对图像数据挖掘进行了分类。介绍了关联规则、聚类、分类等技术在图像数据挖掘中的不同用法。最后,简单地介绍了图像数据挖掘的几个应用领域的研究现状。 相似文献
3.
Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lion׳s share of process mining research has been focusing on process discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctor׳s experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands. 相似文献
4.
关联规则发现中的聚类方法 总被引:2,自引:0,他引:2
算法MARC(Mining Association Rules using Clustering)将聚类技术应用到关联规则的发现上,MARC利用聚类技术压缩交易数据库,从而减少开采算法需要处理的数据量以提高开采效率,同时算法提出了聚类汇总转换的概念用以减轻压缩数据带来的信息丢失.在几个实际数据集上的实验表明该算法可以达到高精度和高性能. 相似文献
5.
6.
通过对经典Apriori算法挖掘过程的分析,提出了基于事务集分组技术的关联算法;该算法先按专业、年级和借阅数量等特性对读者聚类.然后分别对每个类进行关联分析,图书推荐质量较经典Apriori算法有所提高。 相似文献
7.
8.
通过对经典Apriori算法挖掘过程的分析,提出了基于事务集分组技术的关联算法;该算法先按专业、年级和借阅数量等特性对读者聚类,然后分别对每个类进行关联分析,图书推荐质量较经典Apriori算法有所提高。 相似文献
9.
Mining changing regions from access-constrained snapshots: a cluster-embedded decision tree approach
Change detection on spatial data is important in many applications, such as environmental monitoring. Given a set of snapshots
of spatial objects at various temporal instants, a user may want to derive the changing regions between any two snapshots.
Most of the existing methods have to use at least one of the original data sets to detect changing regions. However, in some
important applications, due to data access constraints such as privacy concerns and limited data online availability, original
data may not be available for change analysis. In this paper, we tackle the problem by proposing a simple yet effective model-based
approach. In the model construction phase, data snapshots are summarized using the novel cluster-embedded decision trees as concise models. Once the models are built, the original data snapshots will not be accessed anymore. In the change detection
phase, to mine changing regions between any two instants, we compare the two corresponding cluster-embedded decision trees.
Our systematic experimental results on both real and synthetic data sets show that our approach can detect changes accurately
and effectively.
Irene Pekerskaya’s and Jian Pei’s research is supported partly by National Sciences and Engineering Research Council of Canada
and National Science Foundation of the US, and a President’s Research Grant and an Endowed Research Fellowship Award at Simon
Fraser University. Ke Wang’s research is supported partly by Natural Sciences and Engineering Research Council of Canada.
All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect
the views of the funding agencies. 相似文献
10.
Searching for simplified farmers' crop choice models for integrated watershed management in Thailand: A data mining approach 总被引:1,自引:0,他引:1
This study used the C4.5 data mining algorithm to model farmers' crop choice in two watersheds in Thailand. Previous attempts in the Integrated Water Resource Assessment and Management Project to model farmers' crop choice produced large sets of decision rules. In order to produce simplified models of farmers' crop choice, data mining operations were applied for each soil series in the study areas. The resulting decision trees were much smaller in size. Land type, water availability, tenure, capital, labor availability as well as non-farm and livestock income were found to be important considerations in farmers' decision models. Profitability was also found important although it was represented in approximate ranges. Unlike the general wisdom on farmers' crop choice, these decision trees came with threshold values and sequential order of the important variables. The decision trees were validated using the remaining unused set of data, and their accuracy in predicting farmers' decisions was around 84%. Because of their simple structure, the decision trees produced in this study could be useful to analysts of water resource management as they can be integrated with biophysical models for sustainable watershed management. 相似文献
11.
Yi-Fan Wang Ding-An Chiang Mei-Hua Hsu Cheng-Jung Lin I-Long Lin 《Expert systems with applications》2009,36(4):8071-8075
A major concern for modern enterprises is to promote customer value, loyalty and contribution through services such as can help establish a long-term, honest relationship with customers. For purposes of better customer relationship management, data mining technology is commonly used to analyze large quantities of data about customer bargains, purchase preferences, customer churn, etc. This paper aims to propose a recommender system for wireless network companies to understand and avoid customer churn. To ensure the accuracy of the analysis, we use the decision tree algorithm to analyze data of over 60,000 transactions and of more than 4000 members, over a period of three months. The data of the first nine weeks is used as the training data, and that of the last month as the testing data. The results of the experiment are found to be very useful for making strategy recommendations to avoid customer churn. 相似文献
12.
Cristóbal Romero José María LunaJosé Raúl Romero Sebastián Ventura 《Advances in Engineering Software》2011,42(8):566-576
Nowadays, there are a great number of both specific and general data mining tools available to carry out association rule mining. However, it is necessary to use several of these tools in order to obtain only the most interesting and useful rules for a given problem and dataset. To resolve this drawback, this paper describes a fully integrated framework to help in the discovery and evaluation of association rules. Using this tool, any data mining user can easily discover, filter, visualize, evaluate and compare rules by following a helpful and practical guided process described in this paper. The paper also explains the results obtained using a sample public dataset. 相似文献
13.
输电网故障诊断决策表约简新方法 总被引:1,自引:0,他引:1
研究输电网故障诊断问题。电力系统日趋复杂,势必导致电力系统故障诊断中的决策表也更加复杂,针对决策表约简问题,提出了采用粗糙集理论与关联规则数据挖掘技术相结合的方法对决策表进行约简,将传统的粗糙集约简算法进行改进,并将得到的约简结果根据保护、断路器和元件之间存在的内部联系,运用关联规则数据挖掘技术进一步约简,最终得到简单的决策策略。通过算例分析表明所提算法简单、快速、有效。运用Visual C++编程实现了对算例的决策表约简,证明了新方法的正确性。 相似文献
14.
The visual senses for humans have a unique status, offering a very broadband channel for information flow. Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. Visual Data Mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this work, we try to investigate and expand the area of visual data mining by proposing new visual data mining techniques for the visualization of mining outcomes. 相似文献
15.
The security and privacy issues have been well investigated in typical vehicle ad hoc networks. However, considering the drive-thru Internet properties, in particular for a secure and in-motion payment services case, merely implementing the existing online payment schemes may be either infeasible or inefficient. In this paper, we propose an advanced online payment framework, which integrates three main features, including the novel pairing-free certificateless encryption, signature and semi-honest RSU-aided verification, and the CA-aided tracking and batch auditing, and providing following properties independently, e.g., achieving a higher trust level and supporting primary security services, introducing a semi-honest RSU to indicate more practicality, and optimizing the verifying and auditing efficiency for a large number of authentication requests case. Performance evaluations such as security analysis, efficiency analysis, and simulation evaluation show the security and feasibility of the proposed framework. 相似文献
16.
运用模糊集挖掘数量属性数据的关联规则 总被引:3,自引:0,他引:3
绝大多数关联规则的挖掘方法基于布尔属性数据,但在现实应用中会经常需要对数量属性的数据进行关联挖掘。该文就提出一种算法,在经典Apriori后选集算法的基础上引入了模糊逻辑集合的概念,将数据集中的数量属性按照模糊集合定义进行划分从而将原始事务数据转化成基于模糊集的数据,然后再运用Apriori算法发现潜在的关联规则。 相似文献
17.
A valuation model for cut diamonds 总被引:1,自引:0,他引:1
Margarida G. M. S. Cardoso Luis Chambel 《International Transactions in Operational Research》2005,12(4):417-436
Cut diamonds are hard to value given the number and type of properties used in price construction. This project aims to develop a valuation model for cut diamonds based on data published on the Internet. Regression trees (Classification and Regression Trees and Chi‐Square Automatic Interaction Detection) and neural networks (using backpropagation) are used for this purpose. The proposed approaches have a complementary role in the application. Neural networks have a better performance in prediction, accounting for around 96% of cut diamond unit prices variation. The role of regression trees is fundamental in interpretability, helping to understand the contribution of predictors in pricing. The models' results may prove to have some advantages over the Rapaport price lists (an industry‐wide adopted price indicator). 相似文献
18.
Irresponsible and negligent use of natural resources in the last five decades has made it an important priority to adopt more intelligent ways of managing existing resources, especially the ones related to energy. The main objective of this paper is to explore the opportunities of integrating internal data already stored in Data Warehouses together with external Big Data to improve energy consumption predictions. This paper presents a study in which we propose an architecture that makes use of already stored energy data and external unstructured information to improve knowledge acquisition and allow managers to make better decisions. This external knowledge is represented by a torrent of information that, in many cases, is hidden across heterogeneous and unstructured data sources, which are recuperated by an Information Extraction system. Alternatively, it is present in social networks expressed as user opinions. Furthermore, our approach applies data mining techniques to exploit the already integrated data. Our approach has been applied to a real case study and shows promising results. The experiments carried out in this work are twofold: (i) using and comparing diverse Artificial Intelligence methods, and (ii) validating our approach with data sources integration. 相似文献
19.