首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Domain adaptation aims to correct the mismatch in statistical properties between the source domain on which a classifier is trained and the target domain to which the classifier is to be applied. In this paper, we address the challenging scenario of unsupervised domain adaptation, where the target domain does not provide any annotated data to assist in adapting the classifier. Our strategy is to learn robust features which are resilient to the mismatch across domains and then use them to construct classifiers that will perform well on the target domain. To this end, we propose novel kernel learning approaches to infer such features for adaptation. Concretely, we explore two closely related directions. In the first direction, we propose unsupervised learning of a geodesic flow kernel (GFK). The GFK summarizes the inner products in an infinite sequence of feature subspaces that smoothly interpolates between the source and target domains. In the second direction, we propose supervised learning of a kernel that discriminatively combines multiple base GFKs. Those base kernels model the source and the target domains at fine-grained granularities. In particular, each base kernel pivots on a different set of landmarks—the most useful data instances that reveal the similarity between the source and the target domains, thus bridging them to achieve adaptation. Our approaches are computationally convenient, automatically infer important hyper-parameters, and are capable of learning features and classifiers discriminatively without demanding labeled data from the target domain. In extensive empirical studies on standard benchmark recognition datasets, our appraches yield state-of-the-art results compared to a variety of competing methods.  相似文献   

2.
多核局部领域适应学习   总被引:1,自引:0,他引:1  
陶剑文  王士同 《软件学报》2012,23(9):2297-2310
领域适应(或跨领域)学习旨在利用源领域(或辅助领域)中带标签样本来学习一种鲁棒的目标分类器,其关键问题在于如何最大化地减小领域间的分布差异.为了有效解决领域间特征分布的变化问题,提出一种三段式多核局部领域适应学习(multiple kernel local leaning-based domain adaptation,简称MKLDA)方法:1)基于最大均值差(maximum mean discrepancy,简称MMD)度量准则和结构风险最小化模型,同时,学习一个再生多核Hilbert空间和一个初始的支持向量机(support vector machine,简称SVM),对目标领域数据进行初始划分;2)在习得的多核Hilbert空间,对目标领域数据的类别信息进行局部重构学习;3)最后,利用学习获得的类别信息,在目标领域训练学习一个鲁棒的目标分类器.实验结果显示,所提方法具有优化或可比较的领域适应学习性能.  相似文献   

3.
领域相关的大规模和高质量的标注训练数据是分类器性能的重要保证,而标注训练语料是一件费时费力的工作。该文提出了一种采用小规模标注语料识别中文观点句的方法。首先采用Bootstrapping方法扩展训练语料,分别训练贝叶斯、支持向量机和最大熵分类器。最后,通过给三个训练好的分类器赋权获得一个集成分类器。实验结果表明,集成后的分类器性能优于单分类器,并且该方法在使用部分标注训练数据的情况下也能取得与采用全部标注训练数据相近的实验结果。  相似文献   

4.
The production of robust classifiers by combining supervised training with unsupervised training is discussed. A supervised training phase exploits statistically scene invariant labeled data to produce an initial classifier. This is followed by an unsupervised training phase that exploits clustering properties of unlabeled data. This two-phase process is termed mixed adaptation. A probabilistic model supporting this technique is presented along with examples illustrating mixed adaptation. These examples include the detection of unspecified dotted curves in dotted noise and the detection and classification of vehicles in cinematic sequences of infrared imagery  相似文献   

5.
Exploiting semantic resources for large scale text categorization   总被引:1,自引:0,他引:1  
The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.  相似文献   

6.
-1We address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce a unified flexible model for both supervised and semi-supervised learning that allows us to learn transformations between domains. Additionally, we present two instantiations of the model, one for general feature adaptation/alignment, and one specifically designed for classification. First, we show how to extend metric learning methods for domain adaptation, allowing for learning metrics independent of the domain shift and the final classifier used. Furthermore, we go beyond classical metric learning by extending the method to asymmetric, category independent transformations. Our framework can adapt features even when the target domain does not have any labeled examples for some categories, and when the target and source features have different dimensions. Finally, we develop a joint learning framework for adaptive classifiers, which outperforms competing methods in terms of multi-class accuracy and scalability. We demonstrate the ability of our approach to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types, and codebooks. The experiments show its strong performance compared to previous approaches and its applicability to large-scale scenarios.  相似文献   

7.
一种协同半监督分类算法Co-S3OM   总被引:1,自引:0,他引:1  
为了提高半监督分类的有效性, 提出了一种基于SOM神经网络和协同训练的半监督分类算法Co-S3OM (coordination semi-supervised SOM)。将有限的有标记样本分为无重复的三个均等的训练集, 分别使用改进的监督SSOM算法(supervised SOM)训练三个单分类器, 通过三个单分类器共同投票的方法挖掘未标记样本中的隐含信息, 扩大有标记样本的数量, 依次扩充单分类器训练集, 生成最终的分类器。最后选取UCI数据集进行实验, 结果表明Co-S3OM具有较高的标记率和分类率。  相似文献   

8.
This paper proposes a classification method that is based on easily interpretable fuzzy rules and fully capitalizes on the two key technologies, namely pruning the outliers in the training data by SVMs (support vector machines), i.e., eliminating the influence of outliers on the learning process; finding a fuzzy set with sound linguistic interpretation to describe each class based on AFS (axiomatic fuzzy set) theory. Compared with other fuzzy rule-based methods, the proposed models are usually more compact and easily understandable for the users since each class is described by much fewer rules. The proposed method also comes with two other advantages, namely, each rule obtained from the proposed algorithm is simply a conjunction of some linguistic terms, there are no parameters that are required to be tuned. The proposed classification method is compared with the previously published fuzzy rule-based classifiers by testing them on 16 UCI data sets. The results show that the fuzzy rule-based classifier presented in this paper, offers a compact, understandable and accurate classification scheme. A balance is achieved between the interpretability and the accuracy.  相似文献   

9.
目的 近年来,深度网络成功应用于高光谱图像分类。然而,难以获取充足的标记数据大大限制了深度网络的充分训练,进而导致网络对高光谱图像的分类能力下降。为解决以上困难,提出一种关联子域对齐网络的高光谱图像迁移分类方法。方法 基于深度迁移学习方法,通过对两域分布进行多角度、全面领域适应的同时将两域分类器进行差异适配。一方面,利用关联对齐从整体上对齐了两域的二阶统计量信息,适配了两域的全局分布;另一方面,利用局部最大均值差异对齐了相关子域的一阶统计量信息,适配了两域的局部分布。另外,构造一种分类器适配模块并将其加入所提网络中,通过对两域分类器差异进行适配,进一步增强网络的领域适应效果。结果 从4组真实高光谱数据集上的实验结果可看出:在分别采集于不同区域的高光谱图像数据对上,所提方法的精度比排名第2的分类方法高出1.01%、0.42%、0.73%和0.64%。本文方法的Kappa系数也取得最优结果。结论 与现有主流算法相比较,所提网络能够在整体和局部、一阶和二阶统计量上分别对两域进行有效对齐,进而充分利用在源域上训练好的分类器完成对目标域高光谱数据的跨域分类。  相似文献   

10.
提出了一种使用基于规则的基分类器建立组合分类器的新方法PCARules。尽管新方法也采用基分类器预测的加权投票来决定待分类样本的类,但是为基分类器创建训练数据集的方法与bagging和boosting完全不同。该方法不是通过抽样为基分类器创建数据集,而是随机地将特征划分成K个子集,使用PCA得到每个子集的主成分,形成新的特征空间,并将所有训练数据映射到新的特征空间作为基分类器的训练集。在UCI机器学习库的30个随机选取的数据集上的实验表明:算法不仅能够显著提高基于规则的分类方法的分类性能,而且与bagging和boosting等传统组合方法相比,在大部分数据集上都具有更高的分类准确率。  相似文献   

11.
已有的数据流分类算法多采用有监督学习,需要使用大量已标记数据训练分类器,而获取已标记数据的成本很高,算法缺乏实用性。针对此问题,文中提出基于半监督学习的集成分类算法SEClass,能利用少量已标记数据和大量未标记数据,训练和更新集成分类器,并使用多数投票方式对测试数据进行分类。实验结果表明,使用同样数量的已标记训练数据,SEClass算法与最新的有监督集成分类算法相比,其准确率平均高5。33%。且运算时间随属性维度和类标签数量的增加呈线性增长,能够适用于高维、高速数据流分类问题。  相似文献   

12.
An increasingly popular method for retrieving information is via the community question answering (CQA) systems such as Yahoo! Answers and Baidu Knows. In CQA, question classification plays an important role to find the answers. However, the labeled training examples for statistical question classifier are fairly expensive to obtain, as they require the experienced human efforts. Meanwhile, unlabeled data are readily available. This paper employs the method of domain adaptation via kernel mapping to solve this problem. In detail, the kernel approach is utilized to map the target-domain data and the source-domain data into a common space, where the question classifiers are trained under the closer conditional probabilities. The kernel mapping function is constructed by domain knowledge. Therefore, domain knowledge could be transferred from the labeled examples in the source domain to the unlabeled ones in the targeted domain. The statistical training model can be improved by using a large number of unlabeled data. Meanwhile, the Hadoop Platform is used to construct the mapping mechanism to reduce the time complexity. Map/Reduce enable kernel mapping for domain adaptation in parallel in the Hadoop Platform. Experimental results show that the accuracy of question classification could be improved by the method of kernel mapping. Furthermore, the parallel method in the Hadoop Platform could effective schedule the computing resources to reduce the running time.  相似文献   

13.
In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.
Stephanie SeneffEmail:
  相似文献   

14.
传统的文本分类方法需要大量的已知类别样本来得到一个好的文本分类器,然而在现实的文本分类应用过程中,大量的已知类别样本通常很难获得,因此如何利用少量的已知类别样本和大量的未知类别样本来获得比较好的分类效果成为一个热门的研究课题。本文为此提出了一种扩大已知类别样本集的新方法,该方法先从已知类别样本集中提取出每个类别的代表特征,然后根据代表特征从未知类别样本集中寻找相似样本加入已知类别样本集。实验证明,该方法能有效地提高分类效果。  相似文献   

15.
为了从海量的网络信息中迅速准确地获取评价信息,观点句识别已经成了自然语言处理的一个研究热点。现在观点句识别系统大都是基于机器学习的方法,一般使用机器学习的方法来进行分类会受到领域差异性影响。针对这个问题,该文对微博观点句识别系统是否会受到微博话题影响做了经验性研究,同时为了弥补训练数据的不足,该文通过规则方法自动标注网络数据进行了训练集的扩充。实验结果表明,微博话题间存在差异,进行分话题模型训练可以提升微博观点句识别系统的性能。  相似文献   

16.
Fuzzy feature selection   总被引:2,自引:0,他引:2  
In fuzzy classifier systems the classification is obtained by a number of fuzzy If–Then rules including linguistic terms such as Low and High that fuzzify each feature. This paper presents a method by which a reduced linguistic (fuzzy) set of a labeled multi-dimensional data set can be identified automatically. After the projection of the original data set onto a fuzzy space, the optimal subset of fuzzy features is determined using conventional search techniques. The applicability of this method has been demonstrated by reducing the number of features used for the classification of four real-world data sets. This method can also be used to generate an initial rule set for a fuzzy neural network.  相似文献   

17.
Hidden Markov model (HMM) classifier design is considered for the analysis of sequential data, incorporating both labeled and unlabeled data for training; the balance between the use of labeled and unlabeled data is controlled by an allocation parameter lambda isin (0, 1), where lambda = 0 corresponds to purely supervised HMM learning (based only on the labeled data) and lambda = 1 corresponds to unsupervised HMM-based clustering (based only on the unlabeled data). The associated estimation problem can typically be reduced to solving a set of fixed-point equations in the form of a "natural-parameter homotopy." This paper applies a homotopy method to track a continuous path of solutions, starting from a local supervised solution (lambda = 0) to a local unsupervised solution (lambda = 1). The homotopy method is guaranteed to track with probability one from lambda = 0 to lambda = 1 if the lambda = 0 solution is unique; this condition is not satisfied for the HMM since the maximum likelihood supervised solution (lambda = 0) is characterized by many local optima. A modified form of the homotopy map for HMMs assures a track from lambda = 0 to lambda = 1. Following this track leads to a formulation for selecting lambda isin (0, 1) for a semisupervised solution and it also provides a tool for selection from among multiple local-optimal supervised solutions. The results of applying the proposed method to measured and synthetic sequential data verify its robustness and feasibility compared to the conventional EM approach for semisupervised HMM training.  相似文献   

18.
ABSTRACT

This investigation proposes a fuzzy min-max hyperbox classifier to solve M-class classification problems. In the proposed fuzzy min-max hyperbox classifier, a supervised learning method is implemented to generate min-max hyperboxes for the training patterns in each class so that the generated fuzzy min-max hyperbox classifier has a perfect classification rate in the training set. However, the 100% correct classification of the training set generally leads to overfitting. In order to improve this drawback, a procedure is employed to decrease the complexity of the input decision boundaries so that the generated fuzzy hyperbox classifier has a good generalization performance. Finally, two benchmark data sets are considered to demonstrate the good performance of the proposed approach for solving this classification problem.  相似文献   

19.
One of the serious challenges in computer vision and image classification is learning an accurate classifier for a new unlabeled image dataset, considering that there is no available labeled training data. Transfer learning and domain adaptation are two outstanding solutions that tackle this challenge by employing available datasets, even with significant difference in distribution and properties, and transfer the knowledge from a related domain to the target domain. The main difference between these two solutions is their primary assumption about change in marginal and conditional distributions where transfer learning emphasizes on problems with same marginal distribution and different conditional distribution, and domain adaptation deals with opposite conditions. Most prior works have exploited these two learning strategies separately for domain shift problem where training and test sets are drawn from different distributions. In this paper, we exploit joint transfer learning and domain adaptation to cope with domain shift problem in which the distribution difference is significantly large, particularly vision datasets. We therefore put forward a novel transfer learning and domain adaptation approach, referred to as visual domain adaptation (VDA). Specifically, VDA reduces the joint marginal and conditional distributions across domains in an unsupervised manner where no label is available in test set. Moreover, VDA constructs condensed domain invariant clusters in the embedding representation to separate various classes alongside the domain transfer. In this work, we employ pseudo target labels refinement to iteratively converge to final solution. Employing an iterative procedure along with a novel optimization problem creates a robust and effective representation for adaptation across domains. Extensive experiments on 16 real vision datasets with different difficulties verify that VDA can significantly outperform state-of-the-art methods in image classification problem.  相似文献   

20.
传统机器学习面临一个难题,即当训练数据与测试数据不再服从相同分布时,由训练集得到的分类器无法对测试集文本准确分类。针对该问题,根据迁移学习原理,在源领域和目标领域的交集特征中,依据改进的特征分布相似度进行特征加权;在非交集特征中,引入语义近似度和新提出的逆文本类别指数(TF-ICF),对特征在源领域内进行加权计算,充分利用大量已标记的源领域数据和少量已标记的目标领域数据获得所需特征,以便快速构建分类器。在文本数据集20Newsgroups和非文本数据集UCI中的实验结果表明,基于分布和逆文本类别指数的特征迁移加权算法能够在保证精度的前提下对特征快速迁移并加权。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号