首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
提出了一种基于图的半监督学习检测深度图像中遮挡边界的方法。该方法首先获取已标记的像素点和待检测深度图像中的像素点作为顶点构建连通无向图,其次提取无向图中各像素点的最大深度差特征和八邻域有效深度差之和特征组成特征向量,根据像素点的特征向量计算无向图中顶点之间的相似性并将该相似性作为无向图中对应边的权值,然后根据图的半监督学习思想判断无向图中待检测像素点是否为遮挡边界点,最后可视化遮挡边界点得到深度图像中的遮挡边界。实验结果表明,所提方法尽管只需少量的标记样本,但在准确性上却同已有基于监督学习的方法相当。  相似文献   

2.
多波束测深声呐的反向散射数据中包含海底表层的声学信息,可以用来进行海底表层底质分类。但实际中通过物理采样获得大范围的底质类型的标签信息所需成本过高,制约了传统监督分类算法的性能。针对实际应用中只拥有大量无标签数据和少量有标签数据的情况,文章提出了基于自动编码器预训练以及伪标签自训练的半监督学习底质分类算法。利用2018年和2019年两次同一海域实验采集的多波束测深声呐反向散射数据,对所提算法进行了验证。数据处理结果表明,相比仅利用有标签数据的监督分类算法,提出的半监督学习分类算法保证分类准确率的同时所需的有标签数据更少。自动编码器预训练的半监督学习分类方法在有标签样本数量极少时的准确率仍高于75%。  相似文献   

3.
针对机械故障诊断中准确、完备的故障训练样本获取困难,而现有分类方法难以有效地发掘大量未标记故障样本中蕴含的有用信息,提出了一种基于在线半监督学习的故障诊断方法.该方法基于Tri-training算法将在线贯序极限学习机从监督学习模式扩展到半监督学习模式,利用少量不精确的标记样本构建初始分类器,并从大量未标记样本中在线扩充标记样本,对分类器进行增量式更新以提高其泛化性能.半监督基准数据试验结果表明,训练样本总数相同但标记样本数与未标记样本数比例不同时,所提算法得到的分类准确率相当且训练时间相差小于1.2倍.以柴油机8种工况的故障模式为对象进行试验验证,结果表明标记故障样本较少时,未标记故障样本的加入可使故障分类准确率提高5%~8%.  相似文献   

4.
Intrusion detection involves identifying unauthorized network activity and recognizing whether the data constitute an abnormal network transmission. Recent research has focused on using semi-supervised learning mechanisms to identify abnormal network traffic to deal with labeled and unlabeled data in the industry. However, real-time training and classifying network traffic pose challenges, as they can lead to the degradation of the overall dataset and difficulties preventing attacks. Additionally, existing semi-supervised learning research might need to analyze the experimental results comprehensively. This paper proposes XA-GANomaly, a novel technique for explainable adaptive semi-supervised learning using GANomaly, an image anomalous detection model that dynamically trains small subsets to these issues. First, this research introduces a deep neural network (DNN)-based GANomaly for semi-supervised learning. Second, this paper presents the proposed adaptive algorithm for the DNN-based GANomaly, which is validated with four subsets of the adaptive dataset. Finally, this study demonstrates a monitoring system that incorporates three explainable techniques—Shapley additive explanations, reconstruction error visualization, and t-distributed stochastic neighbor embedding—to respond effectively to attacks on traffic data at each feature engineering stage, semi-supervised learning, and adaptive learning. Compared to other single-class classification techniques, the proposed DNN-based GANomaly achieves higher scores for Network Security Laboratory-Knowledge Discovery in Databases and UNSW-NB15 datasets at 13% and 8% of F1 scores and 4.17% and 11.51% for accuracy, respectively. Furthermore, experiments of the proposed adaptive learning reveal mostly improved results over the initial values. An analysis and monitoring system based on the combination of the three explainable methodologies is also described. Thus, the proposed method has the potential advantages to be applied in practical industry, and future research will explore handling unbalanced real-time datasets in various scenarios.  相似文献   

5.
For many Internet companies, a huge amount of KPIs (e.g., server CPU usage, network usage, business monitoring data) will be generated every day. How to closely monitor various KPIs, and then quickly and accurately detect anomalies in such huge data for troubleshooting and recovering business is a great challenge, especially for unlabeled data. The generated KPIs can be detected by supervised learning with labeled data, but the current problem is that most KPIs are unlabeled. That is a time-consuming and laborious work to label anomaly for company engineers. Build an unsupervised model to detect unlabeled data is an urgent need at present. In this paper, unsupervised learning DBSCAN combined with feature extraction of data has been used, and for some KPIs, its best F-Score can reach about 0.9, which is quite good for solving the current problem.  相似文献   

6.
As a direct consequence of production systems' digitalization, high‐frequency and high‐dimensional data has become more easily available. In terms of data analysis, latent structures‐based methods are often employed when analyzing multivariate and complex data. However, these methods are designed for supervised learning problems when sufficient labeled data are available. Particularly for fast production rates, quality characteristics data tend to be scarcer than available process data generated through multiple sensors and automated data collection schemes. One way to overcome the problem of scarce outputs is to employ semi‐supervised learning methods, which use both labeled and unlabeled data. It has been shown that it is advantageous to use a semi‐supervised approach in case of labeled data and unlabeled data coming from the same distribution. In real applications, there is a chance that unlabeled data contain outliers or even a drift in the process, which will affect the performance of the semi‐supervised methods. The research question addressed in this work is how to detect outliers in the unlabeled data set using the scarce labeled data set. An iterative strategy is proposed using a combined Hotelling's T2 and Q statistics and applied using a semi‐supervised principal component regression (SS‐PCR) approach on both simulated and real data sets.  相似文献   

7.
Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.  相似文献   

8.
航空发动机作为飞行器的动力核心对飞行器的安全飞行有着举足轻重的作用,保证航空发动机的平稳运行对飞行安全有着重大意义。在基于有监督学习的航空发动机故障诊断技术不断取得进展的同时,如何将平时获取的大量未标记数据转换为能够用来训练故障诊断模型的带标记数据,成为了制约行业发展的一大瓶颈。针对这一问题引入了基于无监督学习的DPCA算法,用以实现对未标记数据集的准确分类与标记,并针对DPCA算法在局部密度计算与簇类别数选择方面的缺陷进行了优化:针对原始DPCA算法应用标准高斯核计算局部密度易造成误识别的状况,引入共享邻域算法对局部密度的计算方法进行优化;针对原始DPCA算法需要人工研判确定簇类别数易造成的误识别状况,引入BIC选择准则对簇类别数的选择方法进行优化;提出了原始DPCA算法与共享邻域算法以及BIC选择准则相结合的BDPCA算法。最后应用航空发动机转子故障数据对BDPCA算法进行了性能验证并取得了良好的结果,证实了BDPCA算法在航空发动机气路故障诊断领域有较高的实用价值。  相似文献   

9.
The growth of cloud in modern technology is drastic by provisioning services to various industries where data security is considered to be common issue that influences the intrusion detection system (IDS). IDS are considered as an essential factor to fulfill security requirements. Recently, there are diverse Machine Learning (ML) approaches that are used for modeling effectual IDS. Most IDS are based on ML techniques and categorized as supervised and unsupervised. However, IDS with supervised learning is based on labeled data. This is considered as a common drawback and it fails to identify the attack patterns. Similarly, unsupervised learning fails to provide satisfactory outcomes. Therefore, this work concentrates on semi-supervised learning model known as Fuzzy based semi-supervised approach through Latent Dirichlet Allocation (F-LDA) for intrusion detection in cloud system. This helps to resolve the aforementioned challenges. Initially, LDA gives better generalization ability for training the labeled data. Similarly, to handle the unlabelled data, Fuzzy model has been adopted for analyzing the dataset. Here, pre-processing has been carried out to eliminate data redundancy over network dataset. In order to validate the efficiency of F-LDA towards ID, this model is tested under NSL-KDD cup dataset is a common traffic dataset. Simulation is done in MATLAB environment and gives better accuracy while comparing with benchmark standard dataset. The proposed F-LDA gives better accuracy and promising outcomes than the prevailing approaches.  相似文献   

10.
李吉明  贾森  彭艳斌 《光电工程》2012,39(11):88-86
高光谱遥感图像中包含有大量的高维数据,传统的有监督学习算法在对这些数据进行分类时要求获取足够多的有标记样本用于分类器的训练.然而,对高光谱图像中大量的复杂地物像元所属类别进行准确标注通常需要耗费极大的人力.在本文中,我们提出了一种基于半监督学习的光谱和纹理特征协同学习(STF-CT)--法,利用协同学习机制将高光谱图像光谱特征和空间纹理特征这两种不同的特征结合起来,用于小训练样本集下的高光谱图像数据分类问题.STF-CT算法充分利用了高光谱图像的光谱和纹理特征这两个独立视图,构建起一种有效的半监督分类方法,用于提升分类器在小训练样本集情况下的分类精度.实验结果表明该算法在小训练样本集下的高光谱地物分类问题上具有很好的效果.  相似文献   

11.
基于对现有Android手机活动识别技术的分析,针对从不完全、不充分的移动传感器数据中推断人体活动的难题,将能根据无标签样本提高识别预测准确性和速度的半监督(SS)学习和体现模式分类回归的有效学习机制的极限学习机(ELM)相结合给出了解决Android手机平台的人体活动识别问题的半监督极限学习机(SS-ELM)方法,并进一步提出了主成分分析(PCA)和半监督极限学习机(SS-ELM)结合的PCA+SS-ELM新方法。实验结果表明,该方法对人体活动的识别正确率能达到95%,优于最近提出的混合专家半监督模型的正确率,从而验证了该新方法是可行性。  相似文献   

12.
针对蛋白质交互作用关系(PPI)抽取研究中已标注语料有限而未标注生物医学自由文本易得的问题,进行了基于直推式支持向量机(TSVM)与主动学习融合的蛋白质交互作用关系抽取研究.通过自主选择最优的未标注样本加入到TSVM的训练过程中,最大程度地提高了系统的性能.实验结果表明,TSVM与主动学习融合的算法在少量已标注样本和大量未标注样本组成的混合样本集上取得了较好的学习效果,与传统的支持向量机(SVM)和TSVM算法相比,能有效地减少学习样本数,提高分类精度,在AImed语料上取得了F测度为64.12%的较好性能.  相似文献   

13.
Document processing in natural language includes retrieval, sentiment analysis, theme extraction, etc. Classical methods for handling these tasks are based on models of probability, semantics and networks for machine learning. The probability model is loss of semantic information in essential, and it influences the processing accuracy. Machine learning approaches include supervised, unsupervised, and semi-supervised approaches, labeled corpora is necessary for semantics model and supervised learning. The method for achieving a reliably labeled corpus is done manually, it is costly and time-consuming because people have to read each document and annotate the label of each document. Recently, the continuous CBOW model is efficient for learning high-quality distributed vector representations, and it can capture a large number of precise syntactic and semantic word relationships, this model can be easily extended to learn paragraph vector, but it is not precise. Towards these problems, this paper is devoted to developing a new model for learning paragraph vector, we combine the CBOW model and CNNs to establish a new deep learning model. Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.  相似文献   

14.
针对现有基于深度学习的滚动轴承故障诊断算法训练参数量大,训练时间长且需要大量训练样本的缺点,提出了一种基于迁移学习(TL)与深度残差网络(ResNet)的快速故障诊断算法(TL-ResNet)。首先开发了一种将短时傅里叶变换(STFT)与伪彩色处理相结合的振动信号转三通道图像数据的方法;然后将在ImageNet数据集上训练的ResNet18模型作为预训练模型,通过迁移学习的方法,应用到滚动轴承故障诊断领域当中;最后对滚动轴承在不同工况下的故障诊断问题,提出了采用小样本迁移的方法进行诊断。在凯斯西储大学(CWRU)与帕德博恩大学(PU)数据集上进行了试验,TL-ResNet的诊断准确率分别为99.8%与95.2%,且在CWRU数据集上TL-ResNet的训练时间仅要1.5 s,这表明本算法优于其他的基于深度学习的故障诊断算法与经典算法,可用于实际工业环境中的快速故障诊断。  相似文献   

15.
Supervised machine learning approaches are effective in text mining, but their success relies heavily on manually annotated corpora. However, there are limited numbers of annotated biomedical event corpora, and the available datasets contain insufficient examples for training classifiers; the common cure is to seek large amounts of training samples from unlabeled data, but such data sets often contain many mislabeled samples, which will degrade the performance of classifiers. Therefore, this study proposes a novel error data detection approach suitable for reducing noise in unlabeled biomedical event data. First, we construct the mislabeled dataset through error data analysis with the development dataset. The sample pairs’ vector representations are then obtained by the means of sequence patterns and the joint model of convolutional neural network and long short-term memory recurrent neural network. Following this, the sample identification strategy is proposed, using error detection based on pair representation for unlabeled data. With the latter, the selected samples are added to enrich the training dataset and improve the classification performance. In the BioNLP Shared Task GENIA, the experiments results indicate that the proposed approach is competent in extract the biomedical event from biomedical literature. Our approach can effectively filter some noisy examples and build a satisfactory prediction model.  相似文献   

16.
The majority of big data analytics applied to transportation datasets suffer from being too domain-specific, that is, they draw conclusions for a dataset based on analytics on the same dataset. This makes models trained from one domain (e.g. taxi data) applies badly to a different domain (e.g. Uber data). To achieve accurate analyses on a new domain, substantial amounts of data must be available, which limits practical applications. To remedy this, we propose to use semi-supervised and active learning of big data to accomplish the domain adaptation task: Selectively choosing a small amount of datapoints from a new domain while achieving comparable performances to using all the datapoints. We choose the New York City (NYC) transportation data of taxi and Uber as our dataset, simulating different domains with 90% as the source data domain for training and the remaining 10% as the target data domain for evaluation. We propose semi-supervised and active learning strategies and apply it to the source domain for selecting datapoints. Experimental results show that our adaptation achieves a comparable performance of using all datapoints while using only a fraction of them, substantially reducing the amount of data required. Our approach has two major advantages: It can make accurate analytics and predictions when big datasets are not available, and even if big datasets are available, our approach chooses the most informative datapoints out of the dataset, making the process much more efficient without having to process huge amounts of data.  相似文献   

17.
At this current time, data stream classification plays a key role in big data analytics due to its enormous growth. Most of the existing classification methods used ensemble learning, which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data, it also supposes that all data are pre-classified. Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features. The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model. The proposed model is implemented in spark architecture. For each data stream, the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation. This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency. The results show that the proposed model outperforms state-of-the-art performance in terms of accuracy (0.861) precision (0.9328) and minimal MSE (0.0416).  相似文献   

18.
Artificial intelligence, which has recently emerged with the rapid development of information technology, is drawing attention as a tool for solving various problems demanded by society and industry. In particular, convolutional neural networks (CNNs), a type of deep learning technology, are highlighted in computer vision fields, such as image classification and recognition and object tracking. Training these CNN models requires a large amount of data, and a lack of data can lead to performance degradation problems due to overfitting. As CNN architecture development and optimization studies become active, ensemble techniques have emerged to perform image classification by combining features extracted from multiple CNN models. In this study, data augmentation and contour image extraction were performed to overcome the data shortage problem. In addition, we propose a hierarchical ensemble technique to achieve high image classification accuracy, even if trained from a small amount of data. First, we trained the UC-Merced land use dataset and the contour images for each image on pretrained VGGNet, GoogLeNet, ResNet, DenseNet, and EfficientNet. We then apply a hierarchical ensemble technique to the number of cases in which each model can be deployed. These experiments were performed in cases where the proportion of training datasets was 30%, 50%, and 70%, resulting in a performance improvement of up to 4.68% compared to the average accuracy of the entire model.  相似文献   

19.
The application of deep learning in the field of object detection has experienced much progress. However, due to the domain shift problem, applying an off-the-shelf detector to another domain leads to a significant performance drop. A large number of ground truth labels are required when using another domain to train models, demanding a large amount of human and financial resources. In order to avoid excessive resource requirements and performance drop caused by domain shift, this paper proposes a new domain adaptive approach to cross-domain vehicle detection. Our approach improves the cross-domain vehicle detection model from image space and feature space. We employ objectives of the generative adversarial network and cycle consistency loss for image style transfer in image space. For feature space, we align feature distributions between the source domain and the target domain to improve the detection accuracy. Experiments are carried out using the method with two different datasets, proving that this technique effectively improves the accuracy of vehicle detection in the target domain.  相似文献   

20.
针对机械故障数据的高维性和不平衡性,提出基于格拉斯曼流形的多聚类特征选择和迭代近邻过采样的故障分类方法。对采集到的振动信号,提取时域和频域相关特征,利用多聚类特征选择将高维数据以局部流形结构映射到低维特征集合。无标签样本借助迭代近邻过采样以恢复最大平衡性为目标进行样本分类,并对剩余无标签样本进行模糊分类。选取滚动轴承正常、外圈、内圈以及滚动体的故障数据,并与支持向量机、基于图的半监督学习算法进行对比。结果表明,提出的方法能有效识别出少数类故障,并在整体上有显著的分类效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号