首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

2.
We have produced the first 30 m resolution global land-cover maps using Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) data. We have classified over 6600 scenes of Landsat TM data after 2006, and over 2300 scenes of Landsat TM and ETM+ data before 2006, all selected from the green season. These images cover most of the world's land surface except Antarctica and Greenland. Most of these images came from the United States Geological Survey in level L1T (orthorectified). Four classifiers that were freely available were employed, including the conventional maximum likelihood classifier (MLC), J4.8 decision tree classifier, Random Forest (RF) classifier and support vector machine (SVM) classifier. A total of 91,433 training samples were collected by traversing each scene and finding the most representative and homogeneous samples. A total of 38,664 test samples were collected at preset, fixed locations based on a globally systematic unaligned sampling strategy. Two software tools, Global Analyst and Global Mapper developed by extending the functionality of Google Earth, were used in developing the training and test sample databases by referencing the Moderate Resolution Imaging Spectroradiometer enhanced vegetation index (MODIS EVI) time series for 2010 and high resolution images from Google Earth. A unique land-cover classification system was developed that can be crosswalked to the existing United Nations Food and Agriculture Organization (FAO) land-cover classification system as well as the International Geosphere-Biosphere Programme (IGBP) system. Using the four classification algorithms, we obtained the initial set of global land-cover maps. The SVM produced the highest overall classification accuracy (OCA) of 64.9% assessed with our test samples, with RF (59.8%), J4.8 (57.9%), and MLC (53.9%) ranked from the second to the fourth. We also estimated the OCAs using a subset of our test samples (8629) each of which represented a homogeneous area greater than 500 m?×?500 m. Using this subset, we found the OCA for the SVM to be 71.5%. As a consistent source for estimating the coverage of global land-cover types in the world, estimation from the test samples shows that only 6.90% of the world is planted for agricultural production. The total area of cropland is 11.51% if unplanted croplands are included. The forests, grasslands, and shrublands cover 28.35%, 13.37%, and 11.49% of the world, respectively. The impervious surface covers only 0.66% of the world. Inland waterbodies, barren lands, and snow and ice cover 3.56%, 16.51%, and 12.81% of the world, respectively.  相似文献   

3.
This study evaluates four commonly used forms of synthetic aperture radar (SAR) data for land-cover classification in tropical rural areas. The backscatter coefficient of linearly polarized L-band SAR was compared to two distinctive feature sets derived from Eigen-based and model-based decompositions. The performance of six classifiers available in Orfeo Toolbox (OTB), that is, Bayes, artificial neural networks (ANNs), Support Vector Machine (SVM), decision trees, Random Forests (RFs), and gradient boosting trees (GBTs), was investigated to distinguish five and seven land-cover classes, with particular attention given to several types of woody vegetation: forest, mixed garden, rubber, oil palm, and tea plantations. Classifiers reacted differently to ingested forms of SAR data, and careless use of data input yielded a negative impact. The results showed that SVM provided the highest overall accuracy although the performance was not significantly better than the others. Tuning the parameters, however, significantly improved the accuracy of ANN and SVM, while RF and GBT did not respond well. Responses of two SVM parameters (cost and kernel type) fluctuated somewhat, which required further attention. ANN accuracy was improved when the number of neurons in the hidden layer was set between 10 and 12. We found that accuracy imbalance existed between designated land-cover classes, especially in woody vegetation. Imbalance can partially be reduced by tuning specific classifiers. We showed that classifier tuning can lead to significantly improved accuracy, especially for classes having medium or low accuracies. This research also demonstrated that freely available toolkits such as OTB and QGIS can be beneficial for mapping activities in developing countries, achieving a reasonable accuracy if the classification parameters are tuned properly.  相似文献   

4.
This article describes the comparison of three different classification algorithms for mapping crops in Hokkaido, Japan, using TerraSAR-X data. In the study area, beans, beets, grasslands, maize, potatoes, and winter wheat were cultivated. Although classification maps are required for both management and estimation of agricultural disaster compensation, those techniques have yet to be established. Some supervised learning models may allow accurate classification. Therefore, comparisons among the classification and regression tree (CART), the support vector machine (SVM), and random forests (RF) were performed. SVM was the optimum algorithm in this study, achieving an overall accuracy of 89.1% for the same-year classification, which is the classification using the training data in 2009 to classify the test data in 2009, and 78.0% for the cross-year classification, which is the classification using the training data in 2009 to classify the data in 2012.  相似文献   

5.
Zhang  Hongpo  Cheng  Ning  Zhang  Yang  Li  Zhanbo 《Applied Intelligence》2021,51(7):4503-4514

Label flipping attack is a poisoning attack that flips the labels of training samples to reduce the classification performance of the model. Robustness is used to measure the applicability of machine learning algorithms to adversarial attack. Naive Bayes (NB) algorithm is a anti-noise and robust machine learning technique. It shows good robustness when dealing with issues such as document classification and spam filtering. Here we propose two novel label flipping attacks to evaluate the robustness of NB under label noise. For the three datasets of Spambase, TREC 2006c and TREC 2007 in the spam classification domain, our attack goal is to increase the false negative rate of NB under the influence of label noise without affecting normal mail classification. Our evaluation shows that at a noise level of 20%, the false negative rate of Spambase and TREC 2006c has increased by about 20%, and the test error of the TREC 2007 dataset has increased to nearly 30%. We compared the classification accuracy of five classic machine learning algorithms (random forest(RF), support vector machine(SVM), decision tree(DT), logistic regression(LR), and NB) and two deep learning models(AlexNet, LeNet) under the proposed label flipping attacks. The experimental results show that two label noises are suitable for various classification models and effectively reduce the accuracy of the models.

  相似文献   

6.
Remote sensing scientists are increasingly adopting machine learning classifiers for land cover and land use (LCLU) mapping, but model selection, a critical step of the machine learning classification, has usually been ignored in the past research. In this paper, step-by-step guidance (for classifier training, model selection, and map production) with supervised learning model selection is first provided. Then, model selection is exhaustively applied to different machine learning (e.g. Artificial Neural Network (ANN), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF)) classifiers to identify optimal polynomial degree of input features (d) and hyperparameters with Landsat imagery of a study region in China and Ghana. We evaluated the map accuracy and computing time associated with different versions of machine learning classification software (i.e. ArcMap, ENVI, TerrSet, and R).

The optimal classifiers and their associated polynomial degree of input features and hyperparameters vary for the two image datasets that were tested. The optimum combination of d and hyperparameters for each type of classifier was used across software packages, but some classifiers (i.e. ENVI and TerrSet ANN) were customized due to the constraints of software packages. The LCLU map derived from ENVI SVM has the highest overall accuracy (72.6%) for the Ghana dataset, while the LCLU map derived from R DT has the highest overall accuracy (48.0%) for the FNNR dataset. All LCLU maps for the Ghana dataset are more accurate compared to those from the China dataset, likely due to more limited and uncertain training data for the China (FNNR) dataset. For the Ghana dataset, LCLU maps derived from tree-based classifiers (ArcMap RF, TerrSet DT, and R RF) routines are accurate, but these maps have artefacts resulting from model overfitting problems.  相似文献   


7.
Landsat images, which have fine spatial resolution, are an important data source for land-cover mapping. Multi-temporal Landsat classification has become popular because of the abundance of free-access Landsat images that are available. However, cloud cover is inevitable due to the relatively low temporal frequency of the data. In this paper, a novel approach for multi-temporal Landsat land-cover classification is proposed. The land cover for each Landsat acquisition date was first classified using a Support Vector Machine (SVM) and then the classification results were combined using different strategies, with missing observations allowed. Three strategies, including the majority vote (MultiSVM-MV), Expectation Maximisation (MultiSVM-EM) and joint SVM probability (JSVM), were used to merge the multi-temporal classification maps. The three algorithms were then applied to a region of the path/row 143/31 scene using 2010 Landsat-5 Thematic Mapper (TM) images. The results demonstrated that, for these three algorithms, the average overall accuracy (OA) improved with the increase in temporal depth; also, for a given temporal depth, the performance of JSVM was clearly better than that of MultiSVM-MV and MultiSVM-EM, and the performance of MultiSVM-EM was slightly better than that of MultiSVM-MV. The OA values for the three classification results, which use all epochs, were 70.28%, 72.40% and 74.80% for MultiSVM-MV, MultiSVM-EM and JSVM, respectively. In comparison, two other annual composite image-based classification methods, annual maximum Normalised Difference Vegetation Index (NDVI) composite image-based classification and annual best-available-pixel (BAP) composite image-based classification, gave OA values of 68.08% and 69.92%, respectively, meaning that our method produced a better performance. Therefore, the novel multi-temporal Landsat classification method presented in this paper can deal with the cloud-contamination problem and produce accurate annual land-cover mapping using multi-temporal cloud-contaminated images, which is of importance for regional and global land-cover mapping.  相似文献   

8.
The Resourcesat-2 is a highly suitable satellite for crop classification studies with its improved features and capabilities. Data from one of its sensors, the linear imaging and self-scanning (LISS IV), which has a spatial resolution of 5.8 m, was used to compare the relative accuracies achieved by support vector machine (SVM), artificial neural network (ANN), and spectral angle mapper (SAM) algorithms for the classification of various crops and non-crop covering a part of Varanasi district, Uttar Pradesh, India. The separability analysis was performed using a transformed divergence (TD) method between categories to assess the quality of training samples. The outcome of the present study indicates better performance of SVM and ANN algorithms in comparison to SAM for the classification using LISS IV sensor data. The overall accuracies obtained by SVM and ANN were 93.45% and 92.32%, respectively, whereas the lower accuracy of 74.99% was achieved using the SAM algorithm through error matrix analysis. Results derived from SVM, ANN, and SAM classification algorithms were validated with the ground truth information acquired by the field visit on the same day of satellite data acquisition.  相似文献   

9.
Hyperspectral and multispectral imagery allows remote-sensing applications such as the land-cover mapping, which is a significant baseline to understand and to monitor the Earth. Furthermore, it is a relevant process for socio-economic activities. For that reason, high land-classification accuracies are imperative, and minor image processing time is essential. In addition, the process of gathering classes’ documented samples is complicated. This implies that the classification system is required to perform with a limited number of training observations. Another point worth mentioning is that there are hardly any methods that can be used analogously for hyperspectral or multispectral images. This paper aims to propose a novel classification system that can be used for both types of images. The designed classification system is composed of a novel parallel feature extraction algorithm, which utilises a cluster of two graphics processing units in combination with a multicore central processing unit (CPU), and an artificial neural network (ANN) particularly devised for the classification of the features ensued by the implemented feature extraction method. To prove the performance of the proposed classification system, it is compared with non-parallel and CPU-only-parallel implementations employing multispectral and hyperspectral databases. Moreover, experiments with different number of samples for training the classifier are performed. Finally, the proposed ANN is compared with a state-of-the-art support vector machine in classification and processing time results.  相似文献   

10.
罗世奇  田生伟  禹龙  于炯  孙华 《计算机应用》2018,38(4):1058-1063
为了进一步提高恶意代码识别的准确率和自动化程度,提出一种基于深度学习的Android恶意代码分析与检测方法。首先,提出恶意代码纹理指纹体现恶意代码二进制文件块内容相似性,选取33类恶意代码活动向量空间来反映恶意代码的潜在动态活动。其次,为确保分类准确率的提高,融合上述特征,训练自编码器(AE)和Softmax分类器。通过对不同数据样本进行测试,利用栈式自编码(SAE)模型对Android恶意代码的分类平均准确率可达94.9%,比支持向量机(SVM)高出1.1个百分点。实验结果表明,所提出的方法能够有效提高恶意代码识别精度。  相似文献   

11.
支持向量机是重要的机器学习方法之一,已成功解决了许多实际的分类问题。围绕如何提高支持向量机的分类精度与训练效率,以分类过程为主线,主要综述了在训练支持向量机之前不同的特征选取方法与学习策略。在此基础上,比较了不同的特征选取方法SFS,IWSS,IWSSr以及BARS的分类精度,分析了主动学习策略与支持向量机融合后获得的分类器在测试集上的分类精度与正确率/召回率平衡点两个性能指标。实验结果表明,包装方法与过滤方法相结合的特征选取方法能有效提高支持向量机的分类精度和减少训练样本量;在标签数据较少的情况下,主动学习能达到更好的分类精度,而为了达到相同的分类精度,被动学习需要的样本数量必须要达到主动学习的6倍。  相似文献   

12.
FROM-GLC (Fine Resolution Observation and Monitoring of Global Land Cover) is the first 30 m resolution global land-cover map produced using Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) data. Due to the lack of temporal features as inputs in producing FROM-GLC, considerable confusion exists among land-cover types (e.g. agriculture lands, grasslands, shrublands, and bareland). The Moderate Resolution Imaging Spectrometer (MODIS) provides high-temporal frequency information on surface cover. Other auxiliary bioclimatic, digital elevation model (DEM), and world maps on soil-water conditions are possible sources for improving the accuracy of FROM-GLC. In this article, a segmentation-based approach was applied to Landsat imagery to down-scale coarser-resolution MODIS data (250 m) and other 1 km resolution auxiliary data to the segment scale based on TM data. Two classifiers (support vector machine (SVM) and random forest (RF)) and two different strategies for use of training samples (global and regional samples based on a spatial temporal selection criterion) were performed. Results show that RF based on the global use of training samples achieves an overall classification accuracy of 67.08% when assessed by test samples collected independently. This is better than the 64.89% achieved by FROM-GLC based on the same set of test samples. Accuracies for vegetation cover types are most substantially improved.  相似文献   

13.
As one of the most important algorithms in the field of deep learning technology, the convolutional neural network (CNN) has been successfully applied in many fields. CNNs can recognize objects in an image by considering morphology and structure rather than simply individual pixels. One advantage of CNNs is that they exhibit translational invariance; when an image contains a certain degree of distortion or shift, a CNN can still recognize the object in the image. However, this advantage becomes a disadvantage when CNNs are applied to pixel-based classification of remote-sensing images, because their translational invariance characteristics causes distortions in land-cover boundaries and outlines in the classification result image. This problem severely limits the application of CNNs in remote-sensing classification. To solve this problem, we propose a central-point-enhanced convolutional neural network (CE-CNN) to classify high-resolution remote-sensing images. By introducing the central-point-enhanced layer when classifying a sample, the CE-CNN increases the weight of the central point in feather maps while preserving the original textures and characteristics. In our experiment, we selected four representative positions on a high-resolution remote-sensing image to test the classification ability of the proposed method and compared the CE-CNN with the traditional multi-layer perceptron (MLP) and a traditional CNN. The results show that the proposed method can not only achieves a higher classification accuracy but also less distortion and fewer incorrect results at the boundaries of land covers. We further compared the CE-CNN with six state-of-the-art methods: k-NN, maximum likelihood, classification and regression tree (CART), MLP, support vector machine, and CNN. The results show that the CE-CNN’s classification accuracy is better than the other methods.  相似文献   

14.
针对运动想象脑电信号(MI-EEG)分类准确率普遍偏较低的问题,引入基于深度框架的卷积神经网络模型(CNN)。首先,使用短时傅里叶变换(STFT)和连续小波变换(CWT)得到两种不同解析度下的时频信息;然后将其与电极通道位置信息相结合并以三维张量的形式作为CNN的输入;其次,设计了两种基于不同卷积策略的网络模型MixedCNN和StepByStepCNN来分别对两种形式的输入进行特征提取和分类识别;最后,针对因训练集样本过少而易发生的过拟合问题,引入mixup数据增强策略。在BCI Competition Ⅱ dataset Ⅲ数据集上的实验结果表明,CWT得到的样本集通过mixup数据增强后送入MixedCNN网络训练出的模型的识别准确率最高(93.57%),相较于另外四种分析方法:公共空间模式(CSP)+支持向量机(SVM)、自适应自回归模型(AAR)+线性判别分析(LDA)、离散小波变换(DWT)+长短期记忆网络(LSTM)、STFT+堆栈自编码器(SAE)分别提高了19.1%、20.2%、11.7%和2.3%。所提方法可以为MI-EGG分类任务提供参考。  相似文献   

15.
Land use classification is an important part of many remote sensing applications. A lot of research has gone into the application of statistical and neural network classifiers to remote‐sensing images. This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote‐sensing image classification. Standard classifiers such as Artificial Neural Network (ANN) need a number of training samples that exponentially increase with the dimension of the input feature space. With a limited number of training samples, the classification rate thus decreases as the dimensionality increases. SVMs are independent of the dimensionality of feature space as the main idea behind this classification technique is to separate the classes with a surface that maximizes the margin between them, using boundary pixels to create the decision surface. Results from SVMs are compared with traditional Maximum Likelihood Classification (MLC) and an ANN classifier. The findings suggest that the ANN and SVM classifiers perform better than the traditional MLC. The SVM and the ANN show comparable results. However, accuracy is dependent on factors such as the number of hidden nodes (in the case of ANN) and kernel parameters (in the case of SVM). The training time taken by the SVM is several magnitudes less.  相似文献   

16.
传统的支持向量机相似性学习算法在构造样本对时,会考虑所有的原始训练样本,致使样本对空间和原样本空间呈平方关系,而过多的训练样本对会降低训练速度。为此,提出一种改进的支持向量机相似性学习算法,并应用到人脸识别中。引入二元样本对方法构造样本对,采用K近邻算法减少不相似样本对的生成,从而加快支持向量机的训练速度,同时使用随机降维方法来降低人脸数据的维数。实验结果表明,与基于差空间样本对和差绝对值样本对的算法相比,该算法可获得更高的识别率。  相似文献   

17.
曹鸿亮  张莹  武斌  李繁菀  那绪博 《计算机应用》2021,41(12):3608-3613
已有很多机器学习算法能够很好地应对预测分类问题,但这些方法在用于小样本、大特征空间的医疗数据集时存在着预测准确率和F1值不高的问题。为改善肝移植并发症预测的准确率和F1值,提出一种基于迁移成分分析(TCA)和支持向量机(SVM)的肝移植并发症预测分类方法。该方法采用TCA进行特征空间的映射和降维,将源领域和目标领域映射到同一再生核希尔伯特空间,从而实现边缘分布自适应;迁移完成之后在源领域上训练SVM,训练完成后在目标领域上实现并发症的预测分析。在肝移植并发症预测实验中,针对并发症Ⅰ、并发症Ⅱ、并发症Ⅲa、并发症Ⅲb、并发症Ⅳ进行预测,与传统机器学习和渐进式对齐异构域适应(HDA)相比,所提方法的准确率提升了7.8%~42.8%,F1值达到85.0%~99.0%,而传统机器学习和HDA由于正负样本不均衡出现了精确率很高而召回率很低的情况。实验结果表明TCA结合SVM能够有效提高肝移植并发症预测的准确率和F1值。  相似文献   

18.
实体关系自动抽取   总被引:36,自引:7,他引:36  
实体关系抽取是信息抽取领域中的重要研究课题。本文使用两种基于特征向量的机器学习算法,Winnow 和支持向量机(SVM) ,在2004 年ACE(Automatic Content Extraction) 评测的训练数据上进行实体关系抽取实验。两种算法都进行适当的特征选择,当选择每个实体的左右两个词为特征时,达到最好的抽取效果,Winnow和SVM算法的加权平均F-Score 分别为73108 %和73127 %。可见在使用相同的特征集,不同的学习算法进行实体关系的识别时,最终性能差别不大。因此使用自动的方法进行实体关系抽取时,应当集中精力寻找好的特征。  相似文献   

19.
针对支持向量机回归预测精度与训练样本尺寸不成正比的问题,结合支持向量机分类与回归算法,提出一种大样本数据分类回归预测改进算法。设计训练样本尺寸寻优算法,根据先验知识对样本数据进行人为分类,训练分类模型,基于支持向量机得到各类别样本的回归预测模型,并对数据进行预测。使用上证指数的数据进行实验,结果表明,支持向量机先分类再回归算法预测得到的均方误差达到12.4,低于人工神经网络预测得到的47.8,更远低于支持向量机直接回归预测得到的436.9,验证了该方法的有效性和可行性。  相似文献   

20.
支持向量机和人工神经网络是人工智能方法的两个分支,详细介绍了支持向量机和人工神经网络原理。建立了网络安全评估指标体系,将支持向量机和人工神经网络同时应用于网络安全风险评估的过程中,通过实例比较了两者的评估效果,结果表明了支持向量机在小样本情况下分类正确率普遍高于人工神经网络,具有较好的分类能力和泛化能力;同时在训练时间上也有绝对的优势。实践证实了支持向量机用于网络安全风险评估的有效性和优越性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号