首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
面向领域的数据挖掘系统研究   总被引:1,自引:0,他引:1  
通过将领域工程、领域框架、面向对象技术、软件构件技术及数据挖掘相结合,提出了一套面向领域的数据挖掘系统开发模型及一个类似于OSI网络参考模型的新型挖掘层次结构,并且设计了一个面向领域的数据挖掘系统框架,有效地解决了以上问题。  相似文献   

2.
针对传统算法在外界环境及目标运动导致外形变化的影响下跟踪效果不稳定的问题,提出一种鲁棒的多核学习跟踪算法,将Boosting提升方法引入到多核学习框架中,用比传统多核学习算法更少的样本训练,构建出基于互补性特征集和核函数集的弱分类器池,从中将多个单核的弱分类器组合出一个多核的强分类器,从而在出现较强背景干扰、目标被遮挡的情况下仍能正确地对候选图块中的背景和目标进行分类。对不同视频序列的测试结果表明,与同样采用Boosting方法的OAB算法及近年跟踪精度高的LOT算法相比,该算法能够在复杂环境下更准确地跟踪到目标。  相似文献   

3.
基于数据挖掘技术的证券客户分析系统   总被引:2,自引:0,他引:2  
基于数据挖掘技术研究并实现了证券客户分析系统,通过对数据进行详细的分析和预处理,通过数据挖掘工具SPSS CLEMENTINE8.0中的K-平均值、C5.0算法建立模型,并运用模型预测最有潜力的客户,实际应用验证了其准确性.  相似文献   

4.
5.
SVM在基因微阵列癌症数据分类中的应用   总被引:1,自引:0,他引:1  
在总结二分类支持向量机应用的基础上,提出了利用t-验证方法和Wilcoxon验证方法进行特征选取,以支持向量机(SVM)为分类器,针对基因微阵列癌症数据进行分析的新方法,通过对白血病数据集和结肠癌数据集的分类实验,证明提出的方法不但识别率高,而且需要选取的特征子集小,分类速度快,提高了分类的准确性与分类速度。  相似文献   

6.
基因表达谱芯片数据挖掘系统*   总被引:1,自引:0,他引:1  
李荣 《计算机应用研究》2009,26(8):2938-2941
基因芯片是基因组研究的重要工具,其数据分析极大依赖于数据挖掘技术。结合数据挖掘技术和生物信息学研究,设计并实现了若干基因表达谱芯片数据挖掘分析模型及相应的数据挖掘系统,具有良好的收缩性和实体独立性,底层复杂的数据挖掘算法对用户透明。  相似文献   

7.
8.
Accurate recognition of cancers based on microarray gene expressions is very important for doctors to choose a proper treatment. Genomic microarrays are powerful research tools in bioinformatics and modern medicinal research. However, a simple microarray experiment often leads to very high-dimensional data and a huge amount of information, the vast amount of data challenges researchers into extracting the important features and reducing the high dimensionality. This paper proposed the kernel method based locally linear embedding to selecting the optimal number of nearest neighbors, constructing uniform distribution manifold. In this paper, a nonlinear dimensionality reduction kernel method based locally linear embedding is proposed to select the optimal number of nearest neighbors, constructing uniform distribution manifold. In addition, support vector machine which has given rise to the development of a new class of theoretically elegant learning machines will be used to classify and recognise genomic microarray. We demonstrate the application of the techniques to two published DNA microarray data sets. The experimental results and comparisons demonstrate that the proposed method is effective approach.  相似文献   

9.
An interactive approach to mining gene expression data   总被引:1,自引:0,他引:1  
Effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications. Several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns. However, there is no objective standard for groups of coexpressed genes. The interpretation of co-expression heavily depends on domain knowledge. Furthermore, groups of coexpressed genes in gene expression data are often highly connected through a large number of "intermediate" genes. There may be no clear boundaries to separate clusters. Clustering gene expression data also faces the challenges of satisfying biological domain requirements and addressing the high connectivity of the data sets. In this paper, we propose an interactive framework for exploring coherent patterns in gene expression data. A novel coherent pattern index is proposed to give users highly confident indications of the existence of coherent patterns. To derive a coherent pattern index and facilitate clustering, we devise an attraction tree structure that summarizes the coherence information among genes in the data set. We present efficient and scalable algorithms for constructing attraction trees and coherent pattern indices from gene expression data sets. Our experimental results show that our approach is effective in mining gene expression data and is scalable for mining large data sets.  相似文献   

10.
In this paper, we present Microarray Medical Data explorer (Microarray-MD), a novel software system that is able to assist in the exploratory analysis of gene expression microarray data. It implements a combination scheme of multiple Support Vector Machines, which integrates a variety of gene selection criteria and allows for the discrimination of multiple diseases or subtypes of a disease. The system can be trained and automatically tune its parameters with the provision of pathologically characterized gene expression data to its input. Given a set of new, uncharacterized, patient's data as input, it outputs a decision on the type or the subtype of a disease. A graphical user interface provides easy access to the system operations and direct adjustment of its parameters. It has been tested on various publicly available datasets. The overall accuracy it achieves was estimated to exceed 90%.  相似文献   

11.
Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is twofold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern-mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.  相似文献   

12.
针对银行全成本分析的业务特点和数据挖掘各种算法的应用特征,提出了基于关联规则的分类算法在银行全成本分析系统中的分析模型.将此模型与其他机器学习分类算法进行实验比较,得出此算法在该领域的最佳效果,所挖掘出的规则得到银行工作人员的肯定.  相似文献   

13.
结合Web数据挖掘在E-learning平台中的应用,分析了Web数据挖掘的基本过程与关键技术,提出了一种基于Web挖掘的个性化学习平台模型,并阐述了Web挖掘在平台中的应用及其个性化搜索引擎的实现.  相似文献   

14.
构建了一个基于数据挖掘的分布式入侵检测系统模型。采用误用检测技术与异常检测技术相结合的方法,利用数据挖掘技术如关联分析、序列分析、分类分析、聚类分析等对安全审计数据进行智能检测,分析来自网络的入侵攻击或未授权的行为,提供实时报警和自动响应,实现一个自适应、可扩展的分布式入侵检测系统。实验表明,该模型对已知的攻击模式具有很高的检测率,对未知攻击模式也具有一定的检测能力。  相似文献   

15.
影响基于视频检测的车型分类系统准确率的一个主要因素是采集的车辆外型参数的准确性。针对这种情况,提出了基于多源数据融合的方法提取车辆的外型参数,并使用SVM(支持向量机)对车辆进行分类。实验结果表明,多源数据融合的方法能够有效控制在采集过程中产生的噪音干扰和镜头畸变引起的误差,提高车型参数的准确性。使用支持向量机分类能够克服神经网络中无法避免的局部极值问题。该方法能够提高车型分类准确率,实时性强,适用于实时车型分类系统。  相似文献   

16.
霍晓钢 《计算机时代》2013,(4):12-14,17
对于教育测评所积累的大量数据,通过数据挖掘技术能产生出对学生、教师、教育管理者有用的知识。为此,从教育测评的目的、数据准备、数据积累方法,到数据的知识挖掘等几个方面,系统地研究了教育测评的知识发现的过程,并分别就三种需求的知识发现方法进行探讨,以改变教育测评仅能提供定性的结论而不能发现知识的状况,丰富教育测评的内涵。  相似文献   

17.
基于代表熵的基因表达数据聚类分析方法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对基因表达数据样本少,维数高的特点,尤其是在样本分型缺乏先验知识的情况下,结合自组织特征映射的优点提出了基于代表熵的双向聚类算法。该算法首先通过自组织特征映射网络(SOM)对基因聚类,根据波动系数挑选特征基因。然后根据代表熵的大小判断基因聚类的好坏,并确定网络的神经元个数。最后采用FCM(Fuzzy C Means)聚类算法对挑选出的特征基因集进行样本分型。将该算法用于两组公开的基因表达数据集,实验结果表明该算法在降低特征维数的同时,得出了较高的聚类准确率。  相似文献   

18.
To ensure a series of missions can be completed with only finite breaks, many systems are required to guarantee system safety and mission success. Of these, maintenance decision support is vital. One widely used maintenance strategy has been selective maintenance. Most traditional selective maintenance optimisation research has focused on binary state systems, which are subject to distribution deterioration or failure. However, a majority of systems used in aerospace or industrial applications are multi-state systems with more than two states deteriorating at the same time, meaning that real-time state distribution is needed to provide more timely and effective maintenance. This paper presents a novel integrated system health management-oriented maintenance decision support methodology and framework for a multi-state system based on data mining. An aero-engine system numerical example is given to illustrate the methodology, the results of which demonstrate the significant advantages of using data mining to efficiently obtain state distribution information, and the benefits of using a robust optimal model to choose suitable strategies. This methodology, which is applicable to multi-state systems of varying sizes, has the ability to solve maintenance problems when imperfect maintenance quality is considered.  相似文献   

19.
针对现有入侵检测系统的不足,根据入侵和正常访问模式的网络数据表现形式的不同以及特定数据分组的出现规律,提出按协议分层的入侵检测模型,并在各个协议层运用不同的数据挖掘方法抽取入侵特征,以达到提高建模的准确性、检测速度和克服人工提取入侵特征的主观性的目的。其中运用的数据挖掘算法主要有关联挖掘、序列挖掘、分类算法和聚类算法。  相似文献   

20.
正常用户行为活动是随时间变化的,一个异常分析系统要能适应这种变化更新正常行为模型,避免误报警.对增量更新算法进行了研究,使用线性回归的方法对相似度进行估计,如果实际相似度与估计值差值大于某个阈值,则产生报警;否则采用改进的滑动窗增量挖掘的方法,更新正常活动模型.并用DARPA-MIT 1999数据集验证其可行性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号