首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
针对某大型钢铁公司对钢铁销售情况预测的需求,在公司海量的钢材销售数据的基础上建立了预测分析模型,以jsya作为开发语言,建立起基于oracle的数据仓库,数据挖掘算法采用贝叶斯动态模型(DLM-Dynamic Linear Models)预测算法,通过递归运算,提高运算的精度,最后用图形的仿真来预测钢厂的销售情况,形成了用于数据挖掘的预测分析系统.通过了现场的实际运行,将实际值和预测值进行对比,验证了模型建立的正确性,同时预测值达到了较高的预测精度.对管理者的决策提供了重要的参考价值,实际应用效果良好,可以在钢铁行业内推广应用.  相似文献   

2.
水文资料整合已经成为未来水文单位进一步发展的必经之路,水文单位需要进一步的发展和提高,就需要进行数据整合.文章采用数据整合技术,针对渠道水文信息数据存在的海量、复杂、时空性等一系列特点,以数据仓库、数据挖掘、数据整合、基本单元数据等几个方面建立分析流程,建设成统一的数据集成平台.随着数据管理、数据分析等技术的完善,以数据整合技术为基础的水文信息综合数据库体系将走向实用化.  相似文献   

3.
根据POSC软件集成平台技术建立油气勘探数据仓库系统,文中介绍了系统的总体结构,基于Epicentre的多维数据模型,数据集成和数据挖掘技术的实现。并利用该系统对岩石的含油气性进行预测,实际应用表明该系统能够满足油气勘探高层次的分析和决策需要。  相似文献   

4.
随着数据挖掘技术的研究与开发,数据挖掘技术目前主要应用于智能商务领域,在石化领域中如何展开数据挖掘开发应用还有待研究。基于Oracle Data Mining(ODB)数据挖掘工具,结合实例介绍了在石化企业中应用数据挖掘的系统方法和基本过程。  相似文献   

5.
我国水文数据挖掘技术研究的回顾与展望   总被引:9,自引:0,他引:9  
水文科学研究的领域面临来自许多方面的不确定性和非确知问题。引入数据挖掘的理论与技术,结合水文科学发展的需要,充分应用以计算机技术为基础的现代信息技术,研究水文数据挖掘的理论、技术和方法,为解决水文科学研究面临的问题提供了新的思路。当前,水文数据挖掘研究还处于起步阶段,研究内容多集中在水文数据的单项和局部数据的模拟与处理方面,对基于水文数据库的全局性多因素数据挖掘涉及很少,在数据挖掘技术与水文数据适应性方面所进行的研究也还很不够。为了充分发挥数据挖掘发现知识的作用,需要在水文主题数据库和多维数据立方、水文序列的分类、聚类和关联规则挖掘技术及优化算法以及水文序列的相似性、周期性和其它序列模式挖掘方面开展进一步研究,并向形成水文数据挖掘软件及数据平台方向发展。  相似文献   

6.
基于数据挖掘技术的掘进机工况监测数据分析系统   总被引:1,自引:0,他引:1  
应用数据挖掘技术构建了一套掘进机工况监测数据分析系统。该系统通过掘进机工况监测数据采集系统采集掘进机机械系统、液压系统、电气系统、传动系统的工况参数,并将这些参数通过互联网发送到掘进机工况监测数据分析平台,在该平台上应用数据挖掘技术分析、挖掘掘进机工况参数,实现掘进机的远程维护和快速故障定位及故障处理,并对掘进机工况运行数据分析经验库进行学习更新,为掘进机故障处理提供知识经验支持。  相似文献   

7.
决策树(Decision Tree)曾在很长的时间里是非常流行的人工智能技术,随着数据挖掘技术的发展,决策树作为一个构建决策系统的强有力的技术在数据挖掘和数据分析过程中起到了非常重要的作用.决策树在数据挖掘中主要用于分类、预测以及数据的预处理等.  相似文献   

8.
为推进大数据技术在油田领域的快速融合和应用,提出一种覆盖大数据处理整个生命周期的多功能大数据处理平台。平台融合各类大数据分析框架和机器学习框架,设计面向油田领域,能够支持实时和离线处理的数据挖掘功能。基于Docker容器封装各类计算框架和算法服务,并基于Kubernetes框架完成容器的编排与调度。在系统的架构方式上采用基于微服务的架构方式,将不同技术栈的应用独立分解为单个服务模块,以此来保证业务系统服务的可靠性、可扩展性。这使得企业数据分析人员能够专注于业务数据分析问题,而不必花费大量时间学习框架部署和其他大型数据挖掘技术细节。  相似文献   

9.
高等学校教学质量管理需要数据挖掘系统来支持。介绍了数据挖掘技术以及常用的数据挖掘方法,对如何设计基于SQL Server的高校教学质量数据挖掘系统进行了探讨。以决策树方法为基础、SQL Server为挖掘平台,设计了教学质量数据挖掘模块,实现了教学质量数据挖掘系统。  相似文献   

10.
提出一个基于J2EE应用架构的多层数据挖掘系统的方案,以UML进行系统建模,采用Webwork Spring Hibernate架构开发,提高了系统的可维护性、可重用性与可扩展性.并在教学评价数据分析中引入关联规则数据挖掘技术.  相似文献   

11.
12.
随着互联网的高速发展,特别是近年来云计算、物联网等新兴技术的出现,社交网络等服务的广泛应用,人类社会的数据的规模正快速地增长,大数据时代已经到来。如何获取,分析大数据已经成为广泛的问题。但随着带来的数据的安全性必须引起高度重视。本文从大数据的概念和特征说起,阐述大数据面临的安全挑战,并提出大数据的安全应对策略。  相似文献   

13.
The optimization capabilities of RDBMSs make them attractive for executing data transformations. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed in that way.

To overcome this limitation, we propose to extend Relational Algebra with a new operator named data mapper. In this paper, we formalize the data mapper operator and investigate some of its properties. We then propose a set of algebraic rewriting rules that enable the logical optimization of expressions with mappers and prove their correctness. Finally, we experimentally study the proposed optimizations and identify the key factors that influence the optimization gains.  相似文献   


14.
As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD’03 we proposed EASE that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly. Surong Wang received the B.E. and M.E. degree from the School of Information Engineering, University of Science and Technology Beijing, China, in 1999 and 2002 respectively. She is currently studying toward for the Ph.D. degree at the School of Computer Engineering, Nanyang Technological University, Singapore. Her research interests include multimedia data processing, image processing and content-based image retrieval. Manoranjan Dash obtained Ph.D. and M. Sc. (Computer Science) degrees from School of Computing, National University of Singapore. He has worked in academic and research institutes extensively and has published more than 30 research papers (mostly refereed) in various reputable machine learning and data mining journals, conference proceedings, and books. His research interests include machine learning and data mining, and their applications in bioinformatics, image processing, and GPU programming. Before joining School of Computer Engineering (SCE), Nanyang Technological University, Singapore, as Assistant Professor, he worked as a postdoctoral fellow in Northwestern University. He is a member of IEEE and ACM. He has served as program committee member of many conferences and he is in the editorial board of “International journal of Theoretical and Applied Computer Science.” Liang-Tien Chia received the B.S. and Ph.D. degrees from Loughborough University, in 1990 and 1994, respectively. He is an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. He has recently been appointed as Head, Division of Computer Communications and he also holds the position of Director, Centre for Multimedia and Network Technology. His research interests include image/video processing & coding, multimodal data fusion, multimedia adaptation/transmission and multimedia over the Semantic Web. He has published over 80 research papers.  相似文献   

15.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation.  相似文献   

16.
Compression-based data mining of sequential data   总被引:3,自引:1,他引:2  
The vast majority of data mining algorithms require the setting of many input parameters. The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process. Data mining algorithms should have as few parameters as possible. A parameter-light algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics, learning, and computational theory hold great promise for a parameter-light data-mining paradigm. The results are strongly connected to Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen lines of code. We will show that this approach is competitive or superior to many of the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/XML/video datasets. As a further evidence of the advantages of our method, we will demonstrate its effectiveness to solve a real world classification problem in recommending printing services and products. Responsible editor: Johannes Gehrke  相似文献   

17.
18.
Linear combinations of translates of a given basis function have long been successfully used to solve scattered data interpolation and approximation problems. We demonstrate how the classical basis function approach can be transferred to the projective space ℙ d−1. To be precise, we use concepts from harmonic analysis to identify positive definite and strictly positive definite zonal functions on ℙ d−1. These can then be applied to solve problems arising in tomography since the data given there consists of integrals over lines. Here, enhancing known reconstruction techniques with the use of a scattered data interpolant in the “space of lines”, naturally leads to reconstruction algorithms well suited to limited angle and limited range tomography. In the medical setting algorithms for such incomplete data problems are desirable as using them can limit radiation dosage.  相似文献   

19.
Existing automated test data generation techniques tend to start from scratch, implicitly assuming that no pre‐existing test data are available. However, this assumption may not always hold, and where it does not, there may be a missed opportunity; perhaps the pre‐existing test cases could be used to assist the automated generation of additional test cases. This paper introduces search‐based test data regeneration, a technique that can generate additional test data from existing test data using a meta‐heuristic search algorithm. The proposed technique is compared to a widely studied test data generation approach in terms of both efficiency and effectiveness. The empirical evaluation shows that test data regeneration can be up to 2 orders of magnitude more efficient than existing test data generation techniques, while achieving comparable effectiveness in terms of structural coverage and mutation score. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
自互联网出现以来,数据保护一直是个难题。当社交媒体网站在数字市场上大展拳脚的那一刻,对用户数据和信息的保护让决策者们不得不保持警惕。在数字经济时代的背景下,数据逐渐成为企业提升竞争力的重要要素,围绕着数据展开的市场竞争越来越多。数字经济时代,企业对数据资源的重视与争夺,将网络平台权利与用户个人信息保护、互联网企业之间有关数据不正当竞争的纠纷和冲突,推上了风口浪尖。因此,如何协调和把握数据的合理利用和保护之间的关系,规制不正当竞争行为,以求在数字经济快速发展的洪流中,占据竞争优势显得尤为重要。文章将通过分析数据的二元性,讨论数据在数字经济时代的价值,并结合反不正当竞争法和实践案例,进一步讨论数据利用和保护的关系。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号