首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
类不均衡的半监督高斯过程分类算法   总被引:1,自引:0,他引:1  
针对传统的监督学习方法难以解决真实数据集标记信息少、训练样本集中存在类不均衡的问题,提出了类不均衡的半监督高斯过程分类算法。算法引入自训练的半监督学习思想,结合高斯过程分类算法计算后验概率,向未标记数据中注入类标记以获得更多准确可信的标记数据,使得训练样本的类分布相对平衡,分类器自适应优化以获得较好的分类效果。实验结果表明,在类不均衡的训练样本及标记信息过少的情况下,该算法通过自训练分类器获得了有效标记,使分类精度得到了有效提高,为解决类不均衡数据分类提供了一个新的思路。  相似文献   

3.
Zero-shot learning (ZSL) aims to recognize unseen image classes without requiring any training samples of these specific classes. The ZSL problem is typically achieved by building up a semantic embedding space like attributes to bridge the visual features and class labels of images. Currently, most ZSL approaches focus on learning a visual-semantic alignment from seen classes using only the human-designed attributes, and then ZSL problem is solved by transferring semantic knowledge from seen classes to the unseen classes. However, few works indicate if the human-designed attributes are discriminative enough for image class prediction. To address this issue, we propose a semantic-aware dictionary learning (SADL) framework to explore these discriminative visual attributes across seen and unseen classes. Furthermore, the semantic cues are elegantly integrated into the feature representations via learned visual attributes for recognition task. Experiments conducted on two challenging benchmark datasets show that our approach outweighs other state-of-the-art ZSL methods.  相似文献   

4.
在高维空间样本较少的情况下,基于统计模型的可拒绝分类方法难以对样本分布的复杂几何形体构建合理的覆盖模型。为此,该文提出基于高维空间最小生成树自适应覆盖模型的可拒绝分类模型。该模型采用最小生成树刻画高维空间样本点分布,将图形的边作为新增虚拟样本以提供更好的同类样本分布描述。通过将同类相近样本划分到一个连通几何覆盖区域内,将不同类的相近样本归于不同几何覆盖区域内,实现对不同训练类的覆盖。为了克服因不合理虚拟样本造成分类器拒识性能的下降,引入自适应调整覆盖半径策略,实现对训练类的紧致性覆盖。对于测试样本,根据训练类覆盖边界便可对其作出拒识或者接受的处理,针对交叉覆盖的接受样本,再根据数据场策略确定其真正归属类别。实验结果表明本文方法合理有效。  相似文献   

5.
Cost-sensitive learning has been applied to resolve the multi-class imbalance problem in Internet traffic classification and it has achieved considerable results.But the classification performance on the minority classes with a few bytes is still unhopeful because the existing research only focuses on the classes with a large amount of bytes.Therefore,the class-dependent misclassification cost is studied.Firstly,the flow rate based cost matrix(FCM) is investigated.Secondly,a new cost matrix named weighted cost matrix(WCM) is proposed,which calculates a reasonable weight for each cost of FCM by regarding the data imbalance degree and classification accuracy of each class.It is able to further improve the classification performance on the difficult minority class(the class with more flows but worse classification accuracy).Experimental results on twelve real traffic datasets show that FCM and WCM obtain more than 92% flow g-mean and 80% byte g-mean on average;on the test set collected one year later,WCM outperforms FCM in terms of stability.  相似文献   

6.
Constraints are commonly used in both simulation and formal verification in order to specify expected input conditions and state transitions. Constraint solving is a process to determine input vectors which satisfy the set of constraints during constrained random simulation. Even though constraints are used in formal property checking to restrict the search space, constraint solving has never had direct application to formal property checking. There are often many simple, yet powerful, invariants that can be learned from constraint solving during constrained random simulation. These invariants are shown in this paper to significantly simplify the formal verification problem. We use approximate constraint solving to compute an approximate set of valid input vectors. The approximate set of valid input vectors are a strict superset of the set of all legal input vectors. We use BDD techniques to compute these input vectors during constrained random simulation, then process the resulting BDDs for learning invariants which can be used during formal property checking. This paper presents efficient BDD algorithms to learn invariants from the BDDs generated from approximate constraint solving. We also present how these learned invariants can be applied to the formal property checking. Experimental results show that invariants learned during constraint solving can significantly improve the performance of formal property checking with many industrial designs.  相似文献   

7.
程磊  吴晓富  张索非 《信号处理》2020,36(1):110-107
数据集类别不平衡性是机器学习领域的常见问题,对迁移学习也不例外。本文针对迁移学习下数据集类别不平衡性的影响研究不足的问题,重点研究了以下几种不平衡性处理方法对迁移学习的影响效果分析:过采样、欠采样、加权随机采样、加权交叉熵损失函数、Focal Loss函数和基于元学习的L2RW(Learning to Reweight)算法。其中,前三种方法通过随机采样消除数据集的不平衡性,加权交叉熵损失函数和Focal Loss函数通过调整传统分类算法的损失函数以适应不平衡数据集的训练,L2RW算法则采用元学习机制动态调整样本权重以实现更好的泛化能力。大量实验结果表明,在上述各种不平衡性处理方法中,过采样处理和加权随机采样处理更适合迁移学习。   相似文献   

8.
Fuzzy computing for data mining   总被引:7,自引:0,他引:7  
The study is devoted to linguistic data mining, an endeavor that exploits the concepts, constructs, and mechanisms of fuzzy set theory. The roles of information granules, information granulation, and the techniques therein are discussed in detail. Particular attention is given to the manner in which these information granules are represented as fuzzy sets and manipulated according to the main mechanisms of fuzzy sets. We introduce unsupervised learning (clustering) where optimization is supported by the linguistic granules of context, thereby giving rise to so-called context-sensitive fuzzy clustering. The combination of neuro, evolutionary, and granular computing in the context of data mining is explored. Detailed numerical experiments using well-known datasets are also included and analyzed  相似文献   

9.
Anti-random testing has proved useful in a series of empirical evaluations. The basic premise of anti-random testing is to chose new test vectors that are as far away from existing test inputs as possible. The distance measure is either Hamming distance or Cartesian distance. Unfortunately, this method essentially requires enumeration of the input space and computation of each input vector when used on an arbitrary set of existing test data. This prevents scale-up to large test sets and/or long input vectors.We present and empirically evaluate a technique to generate anti-random vectors that is computationally feasible for large input vectors and long sequences of tests. We also show how this fast anti-random test generation (FAR) can consider retained state (i.e. effects of subsequent inputs on each other). We evaluate effectiveness of applying anti-random vectors for behavioral model verification using branch coverage as the testing criterion.  相似文献   

10.
This paper investigates variable selection (VS) and classification for biomedical datasets with a small sample size and a very high input dimension. The sequential sparse Bayesian learning methods with linear bases are used as the basic VS algorithm. Selected variables are fed to the kernel-based probabilistic classifiers: Bayesian least squares support vector machines (BayLS-SVMs) and relevance vector machines (RVMs). We employ the bagging techniques for both VS and model building in order to improve the reliability of the selected variables and the predictive performance. This modeling strategy is applied to real-life medical classification problems, including two binary cancer diagnosis problems based on microarray data and a brain tumor multiclass classification problem using spectra acquired via magnetic resonance spectroscopy. The work is experimentally compared to other VS methods. It is shown that the use of bagging can improve the reliability and stability of both VS and model prediction.  相似文献   

11.

Research on Computer-Aided Diagnosis (CAD) of medical images has been actively conducted to support decisions of radiologists. Since deep learning has shown distinguished abilities in classification, detection, segmentation, etc. in various problems, many studies on CAD have been using deep learning. One of the reasons behind the success of deep learning is the availability of large application-specific annotated datasets. However, it is quite tough work for radiologists to annotate hundreds or thousands of medical images for deep learning, and thus it is difficult to obtain large scale annotated datasets for various organs and diseases. Therefore, many techniques that effectively train deep neural networks have been proposed, and one of the techniques is transfer learning. This paper focuses on transfer learning and especially conducts a case study on ROI-based opacity classification of diffuse lung diseases in chest CT images. The aim of this paper is to clarify what characteristics of the datasets for pre-training and what kinds of structures of deep neural networks for fine-tuning contribute to enhance the effectiveness of transfer learning. In addition, the numbers of training data are set at various values and the effectiveness of transfer learning is evaluated. In the experiments, nine conditions of transfer learning and a method without transfer learning are compared to analyze the appropriate conditions. From the experimental results, it is clarified that the pre-training dataset with more (various) classes and the compact structure for fine-tuning show the best accuracy in this work.

  相似文献   

12.
Semantic segmentation is a prominent problem in scene understanding expressed as a dense labeling task with deep learning models being one of the main methods to solve it. Traditional training algorithms for semantic segmentation models produce less than satisfactory results when not combined with post-processing techniques such as CRFs. In this paper, we propose a method to train segmentation models using an approach which utilizes classification information in the training process of the segmentation network. Our method employs the use of classification network that detects the presence of classes in the segmented output. These class scores are then used to train the segmentation model. This method is motivated by the fact that by conditioning the training of the segmentation model with these scores, higher order features can be captured. Our experiments show significantly improved performance of the segmentation model on the CamVid and CityScapes datasets with no additional post processing.  相似文献   

13.
Liao  Jiyong  Wu  Sheng  Liu  Ailian 《Wireless Personal Communications》2021,116(3):1639-1657

High utility itemsets mining has become a hot research topic in association rules mining. But many algorithms directly mine datasets, and there is a problem on dense datasets, that is, too many itemsets stored in each transaction. In the process of mining association rules, it takes a lot of storage space and affects the running efficiency of the algorithm. In the existing algorithms, there is a lack of efficient itemset mining algorithms for dense datasets. Aiming at this problem, a high utility itemsets mining algorithm based on divide-and-conquer strategy is proposed. Using the improved silhouette coefficient to select the best K-means cluster number, the datasets are divided into many smaller subclasses. Then, the association rules mining is performed by Boolean matrix compression operation on each subclass, and iteratively merge them to get the final mining results. We also analyze the time complexity of our method and Apriori algorithm. Finally, experimental results on several well-known real world datasets are conducted to show that the improved algorithm performs faster and consumes less memory on dense datasets, which can effectively improve the computational efficiency of the algorithm.

  相似文献   

14.
一种基于核SMOTE的非平衡数据集分类方法   总被引:7,自引:0,他引:7       下载免费PDF全文
曾志强  吴群  廖备水  高济 《电子学报》2009,37(11):2489-2495
 本文提出一种基于核SMOTE(Synthetic Minority Over-sampling Technique)的分类方法来处理支持向量机(SVM)在非平衡数据集上的分类问题.其核心思想是首先在特征空间中采用核SMOTE方法对少数类样本进行上采样,然后通过输入空间和特征空间的距离关系寻找所合成样本在输入空间的原像,最后再采用SVM对其进行训练.实验表明,核SMOTE方法所合成的样本质量高于SMOTE算法,从而有效提高SVM在非平衡数据集上的分类效果.  相似文献   

15.
With the advantages of simple structure and fast training speed, broad learning system (BLS) has attracted attention in hyperspectral images (HSIs). However, BLS cannot make good use of the discriminative information contained in HSI, which limits the classification performance of BLS. In this paper, we propose a robust discriminative broad learning system (RDBLS). For the HSI classification, RDBLS introduces the total scatter matrix to construct a new loss function to participate in the training of BLS, and at the same time minimizes the feature distance within a class and maximizes the feature distance between classes, so as to improve the discriminative ability of BLS features. RDBLS inherits the advantages of the BLS, and to a certain extent, it solves the problem of insufficient learning in the limited HSI samples. The classification results of RDBLS are verified on three HSI datasets and are superior to other comparison methods.  相似文献   

16.
Neurofuzzy modeling of chemical vapor deposition processes   总被引:2,自引:0,他引:2  
The modeling of semiconductor manufacturing processes has been the subject of intensive research efforts for years. Physical-based (first-principle) models have been shown to be difficult to develop for processes such as plasma etching and plasma deposition, which exhibit highly nonlinear and complex multidimensional relationships between input and output process variables. As a result, many researchers have turned to empirical techniques to model many semiconductor processes. This paper presents a neurofuzzy approach as a general tool for modeling chemical vapor deposition (CVD) processes. A five-layer feedforward neural network is proposed to model the input-output relationships of a plasma-enhanced CVD deposition of a SiN film. The proposed five-layer network is constructed from a set of input-output training data using unsupervised and supervised neural learning techniques. Product space data clustering is used to perform the partitioning of the input and output spaces. Fuzzy logic rules that describe the input-output relationships are then determined using competitive learning algorithms. Finally, the fuzzy membership functions of the input and output variables are optimally adjusted using the backpropagation learning algorithm. A salient feature of the proposed neurofuzzy network is that after the training process, the internal units are transparent to the user, and the input-output relationship of the CVD process can be described linguistically in terms of IF-THEN fuzzy rules. Computer simulations are conducted to verify the validity and the performance of the proposed neurofuzzy network for modeling CVD processes  相似文献   

17.
Due to the small size of training sets, statistical shape models often over-constrain the deformation in medical image segmentation. Hence, artificial enlargement of the training set has been proposed as a solution for the problem to increase the flexibility of the models. In this paper, different methods were evaluated to artificially enlarge a training set. Furthermore, the objectives were to study the effects of the size of the training set, to estimate the optimal number of deformation modes, to study the effects of different error sources, and to compare different deformation methods. The study was performed for a cardiac shape model consisting of ventricles, atria, and epicardium, and built from magnetic resonance (MR) volume images of 25 subjects. Both shape modeling and image segmentation accuracies were studied. The objectives were reached by utilizing different training sets and datasets, and two deformation methods. The evaluation proved that artificial enlargement of the training set improves both the modeling and segmentation accuracy. All but one enlargement techniques gave statistically significantly $(p ≪ 0.05)$ better segmentation results than the standard method without enlargement. The two best enlargement techniques were the nonrigid movement technique and the technique that combines principal component analysis (PCA) and finite element model (FEM). The optimal number of deformation modes was found to be near 100 modes in our application. The active shape model segmentation gave better segmentation accuracy than the one based on the simulated annealing optimization of the model weights.   相似文献   

18.
邬少清  董一鸿  王雄  曹燕  辛宇 《电信科学》2020,36(12):20-32
现有的网络表示学习方法缺少对网络中隐含的深层次信息进行挖掘和利用。对网络中的潜在信息做进一步挖掘,提出了潜在的模式结构相似性,定义了网络结构间的相似度分数,用以衡量各个结构之间的相似性,使节点可以跨越不相干的顶点,获取全局结构上的高阶相似性。利用深度学习,融合多个信息源共同参与训练,弥补随机游走带来的不足,使得多个信息源信息之间紧密结合、互相补充,以达到最优的效果。实验选取Lap、DeepWalk、TADW、SDNE、CANE作为对比方法,将3个真实世界网络作为数据集来验证模型的有效性,进行节点分类和链路重构的实验。在节点分类中针对不同数据集和训练比例,性能平均提升1.7个百分点;链路重构实验中,仅需一半维度便实现了更好的性能,最后讨论了不同网络深度下模型的性能提升,通过增加模型的深度,节点分类的平均性能增加了1.1个百分点。  相似文献   

19.
该文提出了一种使用布尔可满足性SAT的新颖组合电路等价性验证技术。算法是在联接电路(Miter circuit)中进行推理来简化验证问题,推理中使用了与/非图结构简化、BDD扩展、隐含学习多种方法,最后使用有效SAT解算器zChaff解决验证任务。该算法综合了BDD和SAT的优点,限制BDD构建大小避免了内存爆炸,推理简化减小了SAT搜索空间。ISCAS85电路实验结果表明了本算法的有效性。  相似文献   

20.
Significant challenges still remain despite the impressive recent advances in machine learning techniques, particularly in multimedia data understanding. One of the main challenges in real-world scenarios is the nature and relation between training and test datasets. Very often, only small sets of coarse-grained labeled data are available to train models, which are expected to be applied on large datasets and fine-grained tasks. Weakly supervised learning approaches handle such constraints by maximizing useful training information in labeled and unlabeled data. In this research direction, we propose a weakly supervised approach that analyzes the dataset manifold to expand the available labeled set. A hypergraph manifold ranking algorithm is exploited to represent the contextual similarity information encoded in the unlabeled data and identify strong similarity relations, which are taken as a path to label expansion. The expanded labeled set is subsequently exploited for a more comprehensive and accurate training process. The proposed model was evaluated jointly with supervised and semi-supervised classifiers, including Graph Convolutional Networks. The experimental results on image and video datasets demonstrate significant gains and accurate results for different classifiers in diverse scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号