首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
目前结合机器学习、文本分类的过滤方法成为研究热点.而这些过滤方法在邮件过滤时存在使用训练集数据量和表征数据的特征向量维数过多,引发"维灾难"和较大的运算量等问题.结合证据理论K近邻方法(evidence theory based K-nearest neighbors,EKNN)和直推式信度机(transductive confidence machines,TCM)算法思想,提出一种TCM-EKNN的邮件过滤方法,并且采用主动学习样本选择方法选择较少高质量的训练样本构建邮件分类器从而高效地实现垃圾邮件过滤.对比实验结果表明:相对于传统的邮件过滤方法,TCM-EKNN获得了良好的过滤效果,从而论证了TCM-EKNN有效性;并且在保证传统的邮件过滤方法同等高准确率前提下,TCM-EKNN采用主动学习方法后,极大地减少了训练样本数量,提高了过滤器性能,在各项评价指标上具有优越性.  相似文献   

2.
近年来,MAX相晶体由于独特的纳米层状的晶体结构具有自润滑、高韧性、导电性等优点,成为全球的研究热点之一.其中M2AX相晶体兼具陶瓷和金属化合物的性能,同时具有抗热震性、高韧性、导电性和导热性,但是由于该类材料的单相样品实验制备比较困难,从而限制了其发展.主动学习是一种利用少量标记样本可以达到较好预测性能的机器学习方法,本文将高效全局优化算法与残差主动学习回归算法相结合,提出了一种改良的主动学习选择策略RS-EGO,基于169个M2AX相晶体的数据集,对M2AX相晶体的体模量、杨氏模量与剪切模量进行建模与预测寻优,通过计算模拟的方式来探索材料性能从而减少无效的验证实验.结果发现,RS-EGO在快速寻找最优值的同时具有较好的预测能力,综合性能要优于两种原始选择策略,也更适合样本量较少的材料性能预测问题,同时选择不同的结合参数会影响改良算法的优化方向.通过在两个公开数据集上运用改良算法证明了其有效性,并给出了结合参数的选择,设计不同结合参数下的模型实验,进一步探究不同参数对模型优化方向的影响.  相似文献   

3.
王弘业  钱权  武星 《工程科学学报》2023,45(7):1225-1231
材料数据具有分批次、分阶段制备的特点,并且不同批次数据的分布也不同,而神经网络按批次学习材料数据时会存在平均准确率随批次下降的问题,这为人工智能应用于材料领域带来极大的挑战。为解决这个问题,将增量学习应用于材料数据的学习上,通过分析模型参数的变化,建立了参数惩罚机制以限制模型在学习新数据时对新数据过拟合的现象;通过增强样本空间多样性,提出经验回放方法应用于增量学习,将新数据与从缓存池中采样得到的旧数据进行联合训练。进一步地,将所提方法分别应用在材料吸声系数回归和图像分类任务上,实验结果表明采用增量学习方法后,平均准确率分别提升了45.93%和2.62%,平均遗忘率分别降低了2.25%和7.54%。除此之外,还分析了参数惩罚和经验回放方法中具体参数对平均准确率的影响, 结果显示平均准确率随着回放比例的增大而增大,随着惩罚系数的增大先增大后减小。综上所述,本文提出的方法能够跨模态、任务进行学习,且参数设置灵活,可以根据不同环境和任务进行变动,为材料数据的增量学习提供了可行的方案。   相似文献   

4.
岩矿石薄片识别是一项专业性要求极高的任务,人工识别常出现不可避免的主观错误,且效率极低。深度学习图像识别技术是可以高效进行岩矿石薄片识别的方法,但训练深度学习模型需要大量标注数据,因此如何高效利用有限标注数据具有重要意义。通过采用多标签分类方法,在有标签数据集上先训练一个分类器,然后使用该分类器为大量无标注的岩矿石薄片生成伪标签,最后使用有标签的训练数据和所有无标签数据重新训练模型。结果表明,采用多标签分类方法识别岩矿石薄片结构及矿物是可行的,同时使用半监督学习方法训练模型,在不进行大量人工标注的情况下,可提高该模型的泛化能力。  相似文献   

5.
武森  刘露  卢丹 《工程科学学报》2017,39(8):1244-1253
传统的分类算法大多假设数据集是均衡的,追求整体的分类精度.而实际数据集经常是不均衡的,因此传统的分类算法在处理实际数据集时容易导致少数类样本有较高的分类错误率.现有针对不均衡数据集改进的分类方法主要有两类:一类是进行数据层面的改进,用过采样或欠采样的方法增加少数类数据或减少多数类数据;另一个是进行算法层面的改进.本文在原有的基于聚类的欠采样方法和集成学习方法的基础上,采用两种方法相结合的思想,对不均衡数据进行分类.即先在数据处理阶段采用基于聚类的欠采样方法形成均衡数据集,然后用AdaBoost集成算法对新的数据集进行分类训练,并在算法集成过程中引用权重来区分少数类数据和多数类数据对计算集成学习错误率的贡献,进而使算法更关注少数数据类,提高少数类数据的分类精度.   相似文献   

6.
烧结矿转鼓强度是烧结过程中反映烧结矿质量的重要指标之一,其精确预测可以提高生产过程的控制精度和效率,降低生产成本和资源浪费。但在实际生产中,烧结矿转鼓强度预测存在一些困难,比如数据量有限、数据质量不佳等问题。因此,为了提高预测精度,首先采用生成对抗网络(GAN)对原始数据集进行扩增,以解决数据量有限的问题;然后采用麻雀搜索算法(SSA)优化的回声状态网络(ESN)构建预测模型。相比于传统的神经网络,ESN具有更好的稳定性和泛化能力,并且能够快速训练和适应新数据。通过试验验证了该模型的预测精度和效率,并与其他预测算法进行了比较。结果表明,采用扩增后的数据集和ESN模型可以显著提高预测精度,平均绝对百分比误差由1.41%缩小至1.06%。  相似文献   

7.
睡眠分期是评价睡眠质量的必要基础,现阶段的工作大部分采用全监督学习和单一维度视图信息进行,这不仅需要技师进行大量的睡眠数据标注,还可能因特征提取不充分而导致分期准确率受限的问题。利用半监督学习策略,实现对脑电无标注数据的学习。提出一种多视图混合神经网络,首先用多通道视图时频域机制分别提取时域信号特征和空域信号特征,实现多视图特征提取;再通过注意力机制加强对显著性特征的提取;最后将上述混合特征融合并分类。在三个公开数据集和一个私有数据集中与全监督学习进行了对比评估,半监督学习取得平均准确率为81.0%,卡帕值为73.2%。结果表明,本文模型可以与全监督学习的睡眠分期模型相媲美,同时显著减少技师标注数据的工作量。   相似文献   

8.
为了提高非平衡数据集的分类精度,提出了一种基于样本空间近邻关系的重采样算法。该方法首先根据数据集中少数类样本的空间近邻关系进行安全级别评估,根据安全级别有指导的采用合成少数类过采样技术(Synthetic minority oversampling technique,SMOTE)进行升采样;然后对多数类样本依据其空间近邻关系计算局部密度,从而对多数类样本密集区域进行降采样处理。通过以上两种手段可以均衡测试数据集,并控制数据规模防止过拟合,实现对两类样本分类的均衡化。采用十折交叉验证的方式产生训练集和测试集,在对训练集重采样之后,以核超限学习机作为分类器进行训练,并在测试集上进行验证。在UCI非平衡数据集和电路故障诊断实测数据上的实验结果表明,所提方法在整体上优于其他重采样算法。   相似文献   

9.
随着万物互联时代的快速到来,海量的数据资源在边缘侧产生,使得基于云计算的传统分布式训练面临网络负载大、能耗高、隐私安全等问题.在此背景下,边缘智能应运而生.边缘智能协同训练作为关键环节,在边缘侧辅助或实现机器学习模型的分布式训练,成为边缘智能研究的一大热点.然而,边缘智能需要协调大量的边缘节点进行机器模型的训练,在边缘场景中存在诸多挑战.因此,通过充分调研现有边缘智能协同训练研究基础,从整体架构和核心模块两方面总结现有的关键技术,围绕边缘智能协同训练在设备异构、设备资源受限和网络环境不稳定等边缘场景下进行训练的挑战及解决方案;从边缘智能协同训练的整体架构和核心模块两大方面进行介绍与总结,关注边缘设备之间的交互框架和大量边缘设备协同训练神经网络模型参数更新问题.最后分析和总结了边缘协同训练存在的诸多挑战和未来展望.  相似文献   

10.
为了解决传统的传送带托辊异常检测方法效率低、实时性差等问题,提出一种基于红外图像识别的托辊异常检测模型。通过现场采集并使用标签平滑和Mosaic数据增强处理对托辊红外图像数据集进行扩充,降低模型的训练成本。在特征提取模块提出使用GhostNet骨干特征提取网络,能够有效地降低特征提取所需成本。在特征融合模块,提出使用SPP-Net模块优化PaNet特征融合网络,增加模型的感受野。通过深度可分离卷积块简化模型结构,降低模型的计算量和参数量,并通过LeakyReLU激活函数提高模型的学习能力。试验结果表明:该检测模型能够有效识别托辊异常。在实际检测中,该方法在托辊检测中平均准确率达到94.9%,检测速度达到39.2 FPS,为矿山传送带托辊的准确高效巡检提供了保障。  相似文献   

11.
《钢铁冶炼》2013,40(5):418-426
Abstract

In this day and age, galvanised coated steel is an essential product in several key manufacturing sectors because of its anticorrosive properties. The increase in demand has led managers to improve the different phases in their production chains. Among the efforts needed to accomplish this task, process modelling can be identified as the one with the most powerful outputs in spite of its non-trivial development. In many fields, such as industrial modelling, multilayer feedforward neural networks are often proposed as universal function approximators. These supervised neural networks are commonly trained by the traditional, back-propagation learning format, which minimises the mean squared error (mse) of the training data. However, in the presence of corrupted or extremely deviated samples (outliers), this training scheme may produce incorrect models, and it is well known that industrial data sets frequently contain outliers. The process modelled is a steel coil annealing furnace in a galvanising line, which shares characteristics with most of the furnaces used in galvanised lines all over the world. This paper reports the effectiveness of robust learning algorithms compared to the classical mse-based learning algorithm for the modelling of a real industry process. From this model an adequate line velocity (the velocity set point) for a coil, depending on its characteristics and the furnace condition to receive this coil (temperature set points), can be obtained. With this set point generation model the operator could set strategies to manage the line, i.e. set the order of the coil to be treated or preview the line's speed conditions for the transitory situations.  相似文献   

12.
Outliers are an inevitable concern that needs to be identified and dealt with whenever one analyzes a large data set. Today’s water quality data are often collected on different scales, encompass several sites, monitor several correlated parameters, involve a multitude of individuals from several agencies, and span over several years. As such, the ability to identify outliers, which may affect the results of the analysis, is crucial. This note presents several statistical techniques that have been developed to deal with this problem, with particular emphasis on robust multivariate methods. These techniques are capable of isolating outliers while overcoming the effects of masking that can hinder the effectiveness of common outlier detection techniques such as Mahalanobis distances (MD). This note uses a comprehensive national metadata set on lake water quality as a case study to analyze the effectiveness of three robust outlier detection techniques, namely, the minimum covariance determinant (MCD), the minimum volume ellipsoid (MVE), and M-estimators. The note compares the results generated from these three techniques to assess the severity of each method when it comes to labeling observations as outliers. The results demonstrate the limitations of using MD to analyze multidimensional water quality data. The analysis also highlighted the differences between the three robust multivariate methods, whereby the MVE method was found to be the most severe when it came to outlier detection, while the MCD was the most lenient. Of the three robust multivariate outlier detection methods analyzed, the M-estimator proved to be the most flexible because it allowed for downweighting rather than censoring many borderline outlier observations.  相似文献   

13.
Several authors have considered the problem of detection of outliers from the general linear model Y = Xbeta + mu. Ellenberg [1973] among others, has advocated use of a detection method which involves examination of the set of internally standardized least squares residuals. Mickey [1974] and Snedecor and Cochran [1968], apparently concerned about the usefulness of an outlier detection method which is based on residual estimates that themselves are biassed by the presence of the outlier, have proposed two other alternatives. It is shown that the three approaches are exactly equivalent. A detection procedure is described which uses as its test statistic the maximum of the internally standardized least squares residuals, and upper and lower bounds for the percentage points of the test statistic are given by Bonferroni inequalities. The computations required to obtain these approximate percentage points are illustrated in a numerical example. Finally, a brief simulation study of the performance of the procedure illustrates that the power of the test can be influenced by the position of the outlier vis-a-vis the structure of the design matrix X.  相似文献   

14.
Due to the lack of simple and effective data filtering method for multi‐variable and numerous samples in BOF endpoint forecasting model, a method of outlier identification and judgment was introduced and applied to data screens for improving BOF endpoint forecasting model. The outside values as potential outliers are calculated using the method of five‐number summary which is a robust estimation of the population parameter, and then the potential outliers are judged with the clustering method. By comparing the exceptional data from clustering analysis with the outside values from the five‐number summary, the intersection of these two groups is regarded as the final outliers to be deleted; in addition, the exceptional data but not outside values are regarded as final exceptional data to be further analyzed; and the outside values but not exceptional data are regarded as final outliers to be deleted too. Finally, to verify the data selection, an improved BP‐based neural network model is used to predict the end‐point carbon content and temperature. By using this data pretreatment method, the absolute values of the mean and maximum training residuals of endpoint carbon and temperature decreased by 26.7%, 41% and 17.3%, 34.5% respectively; and those of the prediction decreased by 10%, 44.9% and 9.4%, 22.9% respectively. It is shown that the proposed method improves effectively the neural network model for BOF endpoint forecasting.  相似文献   

15.
Modeling the activated sludge wastewater treatment plant plays an important role in improving its performance. However, there are many limitations of the available data for model identification, calibration, and verification, such as the presence of missing values and outliers. Because available data are generally short, these gaps and outliers in data cannot be discarded but must be replaced by more reasonable estimates. The aim of this study is to use the Kohonen self-organizing map (KSOM), unsupervised neural networks, to predict the missing values and replace outliers in time series data for an activated sludge wastewater treatment plant in Edinburgh, U.K. The method is simple, computationally efficient and highly accurate. The results demonstrated that the KSOM is an excellent tool for replacing outliers and missing values from a high-dimensional data set. A comparison of the KSOM with multiple regression analysis and back-propagation artificial neural networks showed that the KSOM is superior in performance to either of the two latter approaches.  相似文献   

16.
针对高炉炼铁过程中的数据离群问题,首先根据数据的不同特点对数据类型进行划分,选定针对性数据离群筛选办法,利用河北某钢铁企业的数据样本结合数据的实际状况进行分析,采用全局结合局部多层次的改进型箱线图离群筛选办法,对时序类数据进行筛选;采用以差值、目标参数强关联性数据为条件对K-means算法进行优化,对关联性数据的离群值...  相似文献   

17.
作为磨矿过程的主要生产质量指标, 磨矿粒度是实现磨矿过程闭环优化控制的关键.将磨矿粒度控制在一定范围内能够提高选别作业的精矿品位和有用矿物的回收率, 并减少有用矿物的金属流失.由于经济和技术上的限制, 磨矿粒度的实时测量难以实现.因此, 磨矿粒度的在线估计显得尤为重要.然而, 目前我国所处理的铁矿石大多数为性质不稳定的赤铁矿, 其矿浆颗粒存在磁团聚现象, 所采集的数据存在大量异常值, 使得利用数据建立的磨矿粒度模型存在较大误差.同时, 传统前馈神经网络在磨矿粒度数据建模过程中存在收敛速度慢、易于陷入局部最小值等缺点, 且单一模型泛化性能较差, 现有的集成学习在异常值干扰下性能严重下降.因此, 本文在改进的随机向量函数链接网络(random vector functional link networks, RVFLN)的基础上, 将Bagging算法与自适应加权数据融合技术相结合, 提出一种基于鲁棒随机向量函数链接网络的集成建模方法, 用于磨矿粒度集成建模.所提方法首先通过基准回归问题进行了实验研究, 然后采用磨矿工业实际数据进行验证, 表明其有效性.   相似文献   

18.
We studied the pathway of cholesterol efflux from fibroblasts by testing plasma samples from obese and lean subjects. Plasma samples were incubated with [3H]cholesterol-labeled human skin fibroblasts for 1 h to ensure uniform labeling of all of the high density lipoprotein (HDL) subfractions. Supernatants were then transferred to unlabeled cells and the displacement of labeled cholesterol within HDL subfractions by unlabeled cellular cholesterol was analyzed in short-term experiments. Plasma samples of obese subjects were characterized by a lower content of total apolipoprotein A-I (apoA-I) and alpha1-HDL and a lower overall capacity to take up labeled cholesterol. In plasma of lean subjects, pre beta2-HDL and alpha1-HDL appeared to be the most active particles in the initial uptake of unlabeled cellular cholesterol. By contrast, in plasmas of obese subjects, the pre beta1-HDL appeared to be most active in taking up unlabeled cellular cholesterol and transferring [3H]cholesterol. There were negative correlations between body mass index (BMI) and apoA-I and alpha1-HDL concentrations, and with the apparent increments of cellular cholesterol uptake within pre beta2-HDL and alpha1-HDL, as well as with the overall capacity to promote cholesterol efflux. By contrast, BMI was positively correlated with the apparent increment in cellular cholesterol within pre beta1-HDL. While cholesterol efflux was correlated with total plasma apoA-1, there were no such correlations with the concentration of any individual HDL subfraction. We conclude that the pattern of cholesterol transfer between fibroblasts and high density lipoprotein particles is influenced by body fatness and may be a factor in the abnormal metabolism of HDL in obesity.  相似文献   

19.
The objective of this study was to investigate the variability of optimal power models in contrast to common regression models within and between analytical methods, as well as the frequency of outlier rejection. This was done by fitting the power model to calibration curve data using the minimum sum of squared residuals as a curve selection criterion. The jackknife percent deviation was used for detecting outliers. The data were obtained from 2087 analytical batches for 91 projects using various analytical techniques. The most frequent regression model varied between analytical techniques while the median and interquartile range of the optimal powers were stable. Outlier rejection is highest in GC and LCMS in which the Wagner (Quadratic, log-log) is the most frequent model. These results suggest that the greatest source of variability in the ideal transformation may not be the analytical technique but other within-lab sources. Outlying values may be due to these other sources of variability as suggested by the outlier rejection profile.  相似文献   

20.
针对传统算法在抗光照变化影响、大位移光流和异质点滤除等方面的不足,从人类视觉认知机理出发,提出了一种基于机器学习和生物模型的运动自适应V1-MT (motion-adaptive V1-MT,MAV1MT)序列图像光流估计算法.首先,引入基于ROF模型的结构纹理分解(structure-texture decomposition,STD)技术,有效解决了光照和色彩变化的影响.其次,利用多V1细胞加权组合及非线性正则化模拟MT细胞模型,并结合岭回归训练学习得到运动自适应的权重,解决对目标的运动速度感知问题.最后,引入由粗到精的增强方法和图像金字塔局部运动估计采样,将V1-MT运动估计模型应用于实际大位移视频序列.理论分析和实验结果表明,新方法能更加拟合人眼视觉信息处理特性,对视频序列具有普适、有效、鲁棒的运动感知性能.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号