首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A Moderator is a knowledge based system that supports collaborative working by raising awareness of the priorities and requirements of other team members. However, the amount of advice a Moderator can provide is limited by the knowledge it contains on team members. The use of data mining techniques can contribute towards automating the process of knowledge acquisition for a Moderator and enable hidden data patterns and relationships to be discovered to facilitate the moderation process. A novel approach is presented, consisting of a knowledge discovery framework which provides a semi-automatic methodology to generate rules by inserting relationships discovered as a result of data mining into a generic template. To demonstrate the knowledge discovery framework methodology an application case is described. The application case acquires knowledge for a Moderator to make project partners aware of how to best formulate a proposal for a European research project by data mining summaries of successful past projects. Findings from the application case are presented.  相似文献   

2.
The number of databases that are accessible over networks within organizations is increasing. This paper presents a methodology for automatically converting the data in these databases into a useful knowledge base of case-based semantic networks that can be accessed through a browsing facility. A parallel processing strategy has been implemented for this knowledge acquisition process to support its scalability to large databases. This methodology has potential application in the development of organizational intranets. It can also be used for retrospective browsing of the context of interesting patterns discovered by data mining. The database examples used in this paper are from clinical laboratories that provide data to a hospital infection control committee. Even though the results presented here use a single domain, the methodology can be used with no changes to explore the construction of multidomain knowledge bases.  相似文献   

3.
数据挖掘是近年来数据库领域中出现的一个新兴研究热点,它是从大量数据中获取知识。进行数据挖掘的方法很多,粗糙集方法便是其中的主要方法之一。属性约简算法是基于粗糙集理论的数据挖掘模型中的关键步骤,同时也是粗糙集理论研究中的一个研究重点。通过对粗糙集理论的属性约简算法的深入研究,本文提出了一种改进的属性约简启发式算法。该算法建立在可辨识矩阵计算基础上。改进算法基于Hu的算法与Jelonek算法,在计算可辨识矩阵的基础上,保证最终能够找到决策信息系统的一个约简,同时较Jelonek算法相比,运算时间明显减少。  相似文献   

4.
基于贝叶斯网络的不确定性知识处理研究   总被引:15,自引:4,他引:11  
贝叶斯网络因其在处理不确定性知识方面的优势近来受到数据挖掘等领域的重视。与当前流行的数据挖掘算法包括决策树、神经网络和遗传算法等相比,贝叶斯网络更易于理解,且有很好的预测效果,适用于处理那些本身存在着固有的不确定性的领域。在比较了贝叶斯网络处理不确定性知识的优势的基础上,描述了用贝叶斯网络进行数据挖掘的过程及其主要研究方向,最后对贝叶斯网络的应用领域、研究现状和前景进行了分析和展望。  相似文献   

5.
基于数据融合的知识发现方法在网络管理中的应用   总被引:2,自引:0,他引:2  
提出用于网络管理的基于数据融合的知识发现系统框架,研究数据融合技术在知识发现的数据准备和预处理阶段的应用,研究关联规则在表达网络管理知识方面的适用性并针对网络管理数据时序性的特点,引入情景规则来表示期望发掘的知识,指出网络故障管理中关联规则和情景规则的挖掘算法以及知识增量式更新的算法,并简介了原型系统的实现方法。  相似文献   

6.
陈华英 《微机发展》2006,16(9):85-86
数据挖掘技术作为企业信息技术应用的自然延伸,正在成为近年来企业在实施数据仓库项目后的关注重点。文中在大量数据挖掘项目的实施总结基础上,对数据挖掘项目的特征、人员构成和角色分析、方法论和关键环节进行了深入分析。为以后不断地跟踪最新的数据挖掘知识和项目实施方法论,不断地通过数据挖掘项目实践来创造业务效益,提供了理论依据。数据挖掘相关理论和技术研究应该作为国内信息技术领域在今后一个时期的焦点命题。  相似文献   

7.
Knight:一个通用知识挖掘工具   总被引:23,自引:0,他引:23  
现有知识挖掘系统普遍存在通用性不好,发现方法单一的弱点。  相似文献   

8.
The integration of data mining techniques with data warehousing is gaining popularity due to the fact that both disciplines complement each other in extracting knowledge from large datasets. However, the majority of approaches focus on applying data mining as a front end technology to mine data warehouses. Surprisingly, little progress has been made in incorporating mining techniques in the design of data warehouses. While methods such as data clustering applied on multidimensional data have been shown to enhance the knowledge discovery process, a number of fundamental issues remain unresolved with respect to the design of multidimensional schema. These relate to automated support for the selection of informative dimension and fact variables in high dimensional and data intensive environments, an activity which may challenge the capabilities of human designers on account of the sheer scale of data volume and variables involved. In this research, we propose a methodology that selects a subset of informative dimension and fact variables from an initial set of candidates. Our experimental results conducted on three real world datasets taken from the UCI machine learning repository show that the knowledge discovered from the schema that we generated was more diverse and informative than the standard approach of mining the original data without the use of our multidimensional structure imposed on it.  相似文献   

9.
数据挖掘是一项高级的智能活动,数据挖掘的过程离不开背景知识。该文从数据挖掘角度出发,在详细分析了背景知识在数据挖掘中意义和作用的基础上,狭义地给出了背景知识的定义,并提出了基于一阶谓词逻辑的背景知识技术。最后,以关联规则挖掘和决策树构造为例,说明了背景知识可有效地提高数据挖掘的效率,改善数据挖掘的质量。  相似文献   

10.
基于web挖掘的用户服务研究   总被引:3,自引:0,他引:3  
数据丰富而知识贫乏导致了知识发现和数据挖掘领域的出现。基于Web的数据挖掘,是从Web海量的数据中自动、智能地抽取隐藏于这些数据中的知识,分析了Web挖掘技术的概念、特点、技术等。根据Web数据挖掘最流行的分类,可以分为Web内容挖掘、Web结构挖掘和Web使用记录挖掘。其中Web使用挖掘就是运用数据挖掘的思想来对服务器日志进行分析处理。该文根据Web数据挖掘的最近研究状况,主要论述了一个更新的频繁路径集的挖掘浏览模式在Web用户个性化服务中的应用,同时,还对发现的知识讨论了其在在线服务中的应用并给出了相应算法。  相似文献   

11.
当今社会已经步入大数据时代,数据挖掘已经成为商业、医疗、制造业和政务管理等应用领域的重要技术,具有十分重要的社会价值。数据挖掘课程综合了多门学科知识,其教学设计和教学方式直接影响到教学效果和人才培养的质量。针对大数据的特点,以构建课程核心知识体系为主题,采用案例教学法,改革传统的教学评价方式,理论结合实践进行了研究生数据挖掘课程教学创新尝试,其教学达到了预期效果,受到学生好评。  相似文献   

12.
An inevitable consequence of the technology-driven economy has led to the increased importance of intellectual property protection through patents. Recent global pro-patenting shifts have further resulted in high technology overlaps. Technology components are now spread across a huge corpus of patent documents making its interpretation a knowledge-intensive engineering activity. Intelligent collaborative patent mining facilitates the integration of inputs from patented technology components held by diverse stakeholders. Topic generative models are powerful natural language tools used to decompose data corpus topics and associated word bag distributions. This research develops and validates a superior text mining methodology, called Excessive Topic Generation (ETG), as a preprocessing framework for topic analysis and visualization. The presented ETG methodology adapts the topic generation characteristics from Latent Dirichlet Allocation (LDA) with added capability to generate word distance relationships among key terms. The novel ETG approach is used as the core process for intelligent collaborative patent mining. A case study of 741 global Industrial Immersive Technology (IIT) patents covering inventive and novel concepts of Virtual Reality (VR), Augmented Reality (AR), and Brain Machine Interface (BMI) are systematically processed and analyzed using the proposed methodology. Based on the discovered topics of the IIT patents, patent classification (IPC/CPC) predictions are analyzed to validate the superior ETG results.  相似文献   

13.
数据挖掘技术能够从大量、不完全、有噪声、模糊、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的本质的规律。为了有效地发现旋转机械故障诊断过程中的故障征兆知识,引入数据挖掘技术和方法。针对旋转机械,构建了基于重复增量修枝算法RIPPER(Repeated Incremental Pruning to Produce Error Reduction)的故障诊断知识获取系统。通过收集故障现象并整理成由故障征兆、故障类型等组成的故障信息样本,应用RIPPER算法对故障进行分析得到故障诊断规则集文件,实现故障诊断系统知识的获取和自动更新,并能对旋转机械的常见故障进行诊断,验证了算法的合理性。  相似文献   

14.
流程工业集成制造系统(CIMS)采用了BPS/MES/PCS三层体系结构。文章指出,现有的CIMS三层体系结构局限性已被明确提出,从而为此提出了一种基于数据挖掘和数据存储技术的新型数据平台。一个统一的数据平台是运用知识发现技术设计的,通过在生产和管理行为中管理企业的显式知识以及发现隐式知识。结果表明,文章提出的流程工业现代集成自动化系统在信息收集和知识共享方面拥有完整的结构。  相似文献   

15.
Dimensionality optimization involves optimizing the size of data sets from both dimensions, variable and observation selections. The ultimate objective of dimensionality optimization is to obtain the induced data space, by reducing both dimensionalities in such a way that the reduced subset could retain sufficient information. In most real-world applications, it is not known what the best subset is and what should be contained in such a subset. Selecting the appropriate subset is extremely important in effectively mining over large data sets in the sense that it is the only source for any data mining and knowledge discovery algorithm to work with the data of interest reliably.The statistical as well as artificial intelligence community has provided good methods in this domain, but still a lot of improvements need to be made, especially for data mining applications. This paper introduces a heuristic methodology that integrates heuristic greedy search methods and tree-structured SampleC4.5 to efficiently find the optimal subset of very large data sets from both dimensions simultaneously. A GA-based optimization approach is also proposed in the paper. Experimental results are presented which illustrate the effectiveness of our approaches in digging out the important underlying patterns, and indicate the potential advantages of the proposed techniques to improve the optimizing process while staying out of misleading dilemma. The results of our experiments also show the robustness of our approaches and complementary characteristics for knowledge discovery and data mining tasks.  相似文献   

16.
数据挖掘技术是一种新的信息处理技术。其目的是从海量数据中抽取潜在的,有价值的数据规律或数据模型。通过数据挖掘技术对高校教学数据的分析处理,能够形成真正有价值的知识,向决策者提供信息支持,有利于推动学校教学改革和建设的全面发展。本文主要针对现行高校实际运作的学分选课数据库系统,以关联规则挖掘为例,提出简单而可行的数据挖掘应用实施办法。  相似文献   

17.
数据挖掘技术可以从收集到的大量数据集中挖掘出潜在的知识,这就可能把涉及到个人隐私的信息挖掘出来,从而产生了隐私保护下的数据挖掘。首先分析了国外学者Rizvi提出的隐私保护关联规则挖掘算法MASK,然后使用分治策略对MASK进行了改进。时间复杂度分析和实验结果均表明,对MASK算法的改进是有效的。  相似文献   

18.
基于Agent的知识发现模型的设计   总被引:8,自引:3,他引:8  
KDD(the Knowledge Discovery in Database)模型的研究是数据挖掘领域中的一个重要分支,现有的一些模型各有其优势,但又不是完美的,尤其在智能性方面都表现得较差。文章设计了一个基于Agent的智能数据挖掘系统,利用多智能体技术实现了信息的收集、预处理、查询、知识的自动提取、数据挖掘等功能,使整个挖掘过程实现了知识性、智能性,它可以为智能信息系统提供必要的支持。  相似文献   

19.
随着Internet的不断发展,数据挖掘技术的研究和应用也越来越成为热点问题,如何把数据挖掘技术应用于Web,从Web服务器的日志中发掘有用的、重要的知识(包括模式、规则等),成为数据挖掘与知识发现的一个重要研究和应用领域,这就是基于Web日志的数据挖掘。  相似文献   

20.
Textual databases are useful sources of information and knowledge and if these are well utilised then issues related to future project management and product or service quality improvement may be resolved. A large part of corporate information, approximately 80%, is available in textual data formats. Text Classification techniques are well known for managing on-line sources of digital documents. The identification of key issues discussed within textual data and their classification into two different classes could help decision makers or knowledge workers to manage their future activities better. This research is relevant for most text based documents and is demonstrated on Post Project Reviews (PPRs) which are valuable source of information and knowledge. The application of textual data mining techniques for discovering useful knowledge and classifying textual data into different classes is a relatively new area of research. The research work presented in this paper is focused on the use of hybrid applications of text mining or textual data mining techniques to classify textual data into two different classes. The research applies clustering techniques at the first stage and Apriori Association Rule Mining at the second stage. The Apriori Association Rule of Mining is applied to generate Multiple Key Term Phrasal Knowledge Sequences (MKTPKS) which are later used for classification. Additionally, studies were made to improve the classification accuracies of the classifiers i.e. C4.5, K-NN, Naïve Bayes and Support Vector Machines (SVMs). The classification accuracies were measured and the results compared with those of a single term based classification model. The methodology proposed could be used to analyse any free formatted textual data and in the current research it has been demonstrated on an industrial dataset consisting of Post Project Reviews (PPRs) collected from the construction industry. The data or information available in these reviews is codified in multiple different formats but in the current research scenario only free formatted text documents are examined. Experiments showed that the performance of classifiers improved through adopting the proposed methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号