首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 512 毫秒
1.

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

  相似文献   

2.
改进的朴素贝叶斯垃圾邮件过滤算法   总被引:1,自引:1,他引:0       下载免费PDF全文
介绍了朴素贝叶斯垃圾邮件过滤算法,对于朴素贝叶斯算法中条件概率的计算,选用了多变量贝努里事件模型的计算方法,在多变量贝努里事件模型的基础上进行了改进,并在Ling-Spam语料库上进行实验,实验结果表明改进后的算法有效地提高了过滤器的召回率和精确率,并且降低了过滤器的错误率。  相似文献   

3.
Smartphone devices particularly Android devices are in use by billions of people everywhere in the world. Similarly, this increasing rate attracts mobile botnet attacks which is a network of interconnected nodes operated through the command and control (C&C) method to expand malicious activities. At present, mobile botnet attacks launched the Distributed denial of services (DDoS) that causes to steal of sensitive data, remote access, and spam generation, etc. Consequently, various approaches are defined in the literature to detect mobile botnet attacks using static or dynamic analysis. In this paper, a novel hybrid model, the combination of static and dynamic methods that relies on machine learning to detect android botnet applications is proposed. Furthermore, results are evaluated using machine learning classifiers. The Random Forest (RF) classifier outperform as compared to other ML techniques i.e., Naïve Bayes (NB), Support Vector Machine (SVM), and Simple Logistic (SL). Our proposed framework achieved 97.48% accuracy in the detection of botnet applications. Finally, some future research directions are highlighted regarding botnet attacks detection for the entire community.  相似文献   

4.
为了从大量的电子邮件中检测垃圾邮件,提出了一个基于Hadoop平台的电子邮件分类方法。不同于传统的基于内容的垃圾邮件检测,通过在Map Reduce框架上统计分析邮件收发记录,提取邮件账号的行为特征。然后使用Map Reduce框架并行的实现随机森林分类器,并基于带有行为特征的样本训练分类器和分类邮件。实验结果表明,基于Hadoop平台的电子邮件分类方法大大提高了大规模电子邮件的分类效率。  相似文献   

5.
增量学习利用增量数据中的有用信息通过修正分类参数来更新分类模型,而朴素贝叶斯算法具有利用先验信息以及增量信息的特性,因此朴素贝叶斯算法是增量学习算法设计的最佳选择。三支决策是一种符合人类认知模式的决策理论,具有主观的特性。将三支决策思想融入朴素贝叶斯增量学习中,提出一种基于三支决策的朴素贝叶斯增量学习算法。基于朴素贝叶斯算法构造了一个称为分类确信度的概念,结合代价函数,用以确定三支决策理论中的正域、负域和边界域。利用三个域中的有用信息构造基于三支决策的朴素贝叶斯增量学习算法。实验结果显示,在阈值[α]和[β]选择合适的情况下,基于该方法的分类准确性和召回率均有明显的提高。  相似文献   

6.
基于高光谱吸收特征参数的分类研究   总被引:3,自引:1,他引:2  
在Weka平台上,采用决策树C4.5、朴素贝叶斯、朴素贝叶斯树三种算法进行了带缺失属性值的高光谱分类研究。针对高光谱波段数众多、信息冗余量大的特点,首先对光谱曲线进行光谱特征参数提取,然后再选择合适的吸收峰波段作为输入向量来进行分类。实验表明,由NBTree建立的铀黑-沥青铀矿分类模型的分类误差最小,分类精度最高,其次是Na?觙veBayes和J4.8,但从训练时间来看,NBTree则高于NB和J4.8。最后,对三种分类算法的分类结果进行了分析。  相似文献   

7.
基于改进Naïve Bayes的垃圾邮件过滤模型研究   总被引:1,自引:0,他引:1  
分析了目前在垃圾邮件过滤中广泛应用的Naïve Bayes过滤模型(NBF),指出了期望交叉熵(ECE)特征词选取方法的不足。提出了改进的Naïve Bayes垃圾邮件过滤模型(A-NBF),用改进的期望交叉熵(AECE)选取垃圾邮件特征词,并在邮件分类过程中对特征词进行加权,从而提高对垃圾邮件过滤的精度。实验结果可以看出A-NBF比NBF在过滤精度方面有明显的提高。  相似文献   

8.
基于属性加权的朴素贝叶斯分类算法   总被引:3,自引:0,他引:3       下载免费PDF全文
朴素贝叶斯分类是一种简单而高效的方法,但是它的属性独立性假设,影响了它的分类性能。通过放松朴素贝叶斯假设可以增强其分类效果,但通常会导致计算代价大幅提高。提出了属性加权朴素贝叶斯算法,该算法通过属性加权来提高朴素贝叶斯分类器性能,加权参数直接从训练数据中学习得到。权值可以看作是计算某个类的后验概率时,某属性取值对该类别的影响程度。实验结果表明,该算法可行而且有效。  相似文献   

9.
In Internet applications, due to the growth of big data with more features, intrusion detection has become a difficult process in terms of computational complexity, storage efficiency and getting optimized solutions of classification through existing sequential computing environment. Using a parallel computing model and a nature inspired feature selection technique, a Hadoop Based Parallel Binary Bat Algorithm method is proposed for efficient feature selection and classification in order to obtain optimized detection rate. The MapReduce programming model of Hadoop improves computational complexity, the Parallel Binary Bat algorithm optimizes the prominent features selection and parallel Naïve Bayes provide cost-effective classification. The experimental results show that the proposed methodologies perform competently better than sequential computing approaches on massive data and the computational complexity is significantly reduced for feature selection as well as classification in big data applications.  相似文献   

10.
Naïve Bayes learners are widely used, efficient, and effective supervised learning methods for labeled datasets in noisy environments. It has been shown that naïve Bayes learners produce reasonable performance compared with other machine learning algorithms. However, the conditional independence assumption of naïve Bayes learning imposes restrictions on the handling of real-world data. To relax the independence assumption, we propose a smooth kernel to augment weights for the likelihood estimation. We then select an attribute weighting method that uses the mutual information metric to cooperate with the proposed framework. A series of experiments are conducted on 17 UCI benchmark datasets to compare the accuracy of the proposed learner against that of other methods that employ a relaxed conditional independence assumption. The results demonstrate the effectiveness and efficiency of our proposed learning algorithm. The overall results also indicate the superiority of attribute-weighting methods over those that attempt to determine the structure of the network.  相似文献   

11.
Detection of malicious software (malware) using machine learning methods has been explored extensively to enable fast detection of new released malware. The performance of these classifiers depends on the induction algorithms being used. In order to benefit from multiple different classifiers, and exploit their strengths we suggest using an ensemble method that will combine the results of the individual classifiers into one final result to achieve overall higher detection accuracy. In this paper we evaluate several combining methods using five different base inducers (C4.5 Decision Tree, Naïve Bayes, KNN, VFI and OneR) on five malware datasets. The main goal is to find the best combining method for the task of detecting malicious files in terms of accuracy, AUC and Execution time.  相似文献   

12.
传统分布式大型邮件系统对海量邮件的过滤存在编程难、效率低、前期训练耗用资源大等缺点,为此,对传统贝叶斯过滤算法进行并行化改进,利用云计算MapReduce模型在海量数据处理方面的优势,设计一种基于Hadoop开源云架构的贝叶斯邮件过滤MapReduce模型,优化邮件的训练和过滤过程。实验结果表明,与传统分布式计算模型相比,该模型在召回率、查准率和精确率方面性能较好,同时可降低邮件过滤成本,提高系统执行效率。  相似文献   

13.
提出了一个基于35维特征向量的恶意程序检测方法。特征向量的每一维用于表示一种恶意行为事件,每一事件由相应的Win32 API调用及其参数表示。实现了一个自动化行为追踪系统(Argus)用于行为特征的提取。实验数据集从8223个恶意可执行程序和2821个正常可执行程序中获取,并依据程序发生事件数的不同设立事件阈值,建立不同的训练集,分别用于训练贝叶斯分类器。实验表明,当事件阈值为3时,分类器达到最佳检测效果。  相似文献   

14.
In this paper we propose a new image classification technique. According to this note that most research focuses on extraction of features in the frequency domain, location, and reduction of feature dimensions, in this research we focused on learning step in image classification. The main aim is to use the heuristic methods to increase the function of the estimator of the learning algorithm and continue to achieve the desired state, as well as categorization without user interference and automatically performed by the model produced from the above steps. So, in this paper, a new learning approach based on the Salp Swarm Algorithm was proposed that was implemented and evaluated on learning algorithm Decision Tree, K-Nearest Neighbors and Naïve Bayes. The results demonstrate the improvement of the performance of learning algorithms in all the achieved criteria by using the SSA algorithm in comparison with traditional learning algorithms. In the accuracy, sensitivity, classification error and F1 criterion, the best performance of the proposed model is using the Decision Tree learning method with values of 99.17%, 100%, 0.83% and 95.65% respectively. In the specificity and precision criterion, the best performance of the proposed model is based on K-Nearest Neighbors learning method with values of 100%.  相似文献   

15.
传统的协同过滤推荐技术在大数据环境下存在一定的不足。针对该问题,提出了一种基于云计算技术的个性化推荐方法:将大数据集和推荐计算分解到多台计算机上并行处理。在对经典ItemCF算法MapReduce化后,建立了一个基于Hadoop开源框架的并行推荐引擎,并通过在已商用的英语训练平台上进行学习推荐工作验证了该系统的有效性。实验结果表明,在集群中使用云计算技术处理海量数据,可以大大提高推荐系统的可扩展性。  相似文献   

16.
Numerous models have been proposed to reduce the classification error of Na¨ ve Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classification error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms.  相似文献   

17.
云计算的诞生,有效地解决了海量数据集的存储和分析处理。在云计算实现的开源Hadoop分布式系统集群上,使用MapReduce并行编程模型,设计并实现了一种对TFIDF改进的分布式朴素贝叶斯文本分类算法。实验结果表明,基于Hadoop框架的分布式朴素贝叶斯文本自动分类器不仅能处理节点失效,同时具有高效性和易扩展性的优势。  相似文献   

18.
Collaborative filtering (CF) technique is capable of generating personalized recommendations. However, the recommender systems utilizing CF as their key algorithms are vulnerable to shilling attacks which insert malicious user profiles into the systems to push or nuke the reputations of targeted items. There are only a small number of labeled users in most of the practical recommender systems, while a large number of users are unlabeled because it is expensive to obtain their identities. In this paper, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed to take advantage of both types of data. It first trains a naïve Bayes classifier on a small set of labeled users, and then incorporates unlabeled users with EM-λ to improve the initial naïve Bayes classifier. Experiments on MovieLens datasets are implemented to compare the efficiency of Semi-SAD with supervised learning based detector and unsupervised learning based detector. The results indicate that Semi-SAD can better detect various kinds of shilling attacks than others, especially against obfuscated and hybrid shilling attacks.  相似文献   

19.
分类问题,尤其是文本自动分类一直是机器学习与数据挖掘研究中的研究热点与核心技术,其中如朴素贝叶斯、KNN等近年来得到了广泛的关注和快速的发展。文中在统计学理论的基础上给出了一种基于支持向量机方法的文本分类算法,并设计出了相应的垃圾邮件过滤系统。实验证明与朴素贝叶斯方法相比,该算法极大地提高了分类准确率和查全率,具有应用推广的价值。  相似文献   

20.
针对Hadoop平台MapReduce分布式计算模型运行机制中的顺序制约而产生的计算资源浪费问题,从提高平台中每个执行节点的细粒度并行数据处理角度出发,结合Java共享内存多线程编程技术,对该模型进行了优化,提出一种MapReduce+OpenMP粗细粒度相结合的分布式并行计算模型。并在由四个节点组成的Hadoop集群环境下对不同规模大小的出租车GPS轨迹数据分析处理,验证该模型的性能和效率,实验结果证明MapReduce+OpenMP分布式并行计算模型确实能够提高针对大数据集的计算效率,是对Hadoop平台大数据分析处理模型有效的完善和优化。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号