首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 187 毫秒
In real-world applications, we often have to deal with some high-dimensional, sparse, noisy, and non-independent identically distributed data. In this paper, we aim to handle this kind of complex data in a transfer learning framework, and propose a robust non-negative matrix factorization via joint sparse and graph regularization model for transfer learning. First, we employ robust non-negative matrix factorization via sparse regularization model (RSNMF) to handle source domain data and then learn a meaningful matrix, which contains much common information between source domain and target domain data. Second, we treat this learned matrix as a bridge and transfer it to target domain. Target domain data are reconstructed by our robust non-negative matrix factorization via joint sparse and graph regularization model (RSGNMF). Third, we employ feature selection technique on new sparse represented target data. Fourth, we provide novel efficient iterative algorithms for RSNMF model and RSGNMF model and also give rigorous convergence and correctness analysis separately. Finally, experimental results on both text and image data sets demonstrate that our REGTL model outperforms existing start-of-art methods.  相似文献   

Supervised learning methods require sufficient labeled examples to learn a good model for classification or regression. However, available labeled data are insufficient in many applications. Active learning (AL) and domain adaptation (DA) are two strategies to minimize the required amount of labeled data for model training. AL requires the domain expert to label a small number of highly informative examples to facilitate classification, while DA involves tuning the source domain knowledge for classification on the target domain. In this paper, we demonstrate how AL can efficiently minimize the required amount of labeled data for DA. Since the source and target domains usually have different distributions, it is possible that the domain expert may not have sufficient knowledge to answer each query correctly. We exploit our active DA framework to handle incorrect labels provided by domain experts. Experiments with multimedia data demonstrate the efficiency of our proposed framework for active DA with noisy labels.  相似文献   

传统机器学习方法的有效性依赖于大量的有效训练数据,而这难以满足,因此迁移学习被广泛研究并成为近年来的研究热门.针对由于训练数据严重不足导致多分类场景下分类性能降低的挑战,提出一种基于DLSR(discriminative least squares regressions)的归纳式迁移学习方法(TDLSR).该方法从归纳式迁移学习出发,通过知识杠杆机制,将源域知识迁移到目标域并同目标域数据同时进行模型学习,在提升分类性能的同时保证源域数据的安全性.TDLSR继承了DLSR在多分类任务中扩大类别间间隔的优势,为DLSR注入了迁移能力以适应数据不足的挑战,更加适用于复杂的多分类任务.通过在12个真实UCI数据集上进行实验,验证了所提出方法的有效性.  相似文献   

To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT.We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases.  相似文献   

目前大多的域自适应算法在源域与目标域具有相同类别的场景下,利用标签丰富的源域信息对标签稀少且分布相似的目标域数据进行迁移学习,取得了很多成果.然而,由于现实场景的复杂性和开放性,源域和目标域在类别空间上不尽相同,往往会各自包含一些类别未知且超出现有类别设定的样本.对于这样具有挑战性的开放集场景,传统的域自适应算法将无能...  相似文献   

自动驾驶评估数据集的构建在很大程度上取决于能否覆盖复杂多样的交通场景。但是通过人工获取数据工作量巨大,且难以覆盖所有场景,在这种情况下,虚拟引擎可以提供极大的便利。利用UE4(Unreal Engine 4)虚拟引擎构建常见驾驶场景,并结合设计的后处理系统和自动标注算法高效获取、标注复杂场景图像。针对虚拟引擎生成图像逼真度不足这一问题,搭建ObjectGAN域适应模型,基于虚拟数据重建逼真图像,该模型针对目标数据,引入特征一致性监督,无须另外标注信息便可有效缩小与真实数据间域差异。创建了一个新颖的复杂场景虚拟自动驾驶数据集,其中包含多种天候、光照、驾驶场景数据。通过该数据集验证ObjectGAN模型可以有效缩小虚拟数据与真实数据间域差异,经过域适应处理后的数据可以在复杂场景中对主流检测器进行有效的性能评估。  相似文献   

本文着重介绍了微软synchronization system(SS)设计方案及其实现原理。该框架是一个非常复杂的同步平台,对任何的数据类型、任何存储设备、任何的传输模式、任何的网络拓扑结构都给予支持。提供的应用服务面非常之广。为了满足对文件同步的实时性要求,本文又对文件监听进行深入的研究。把文件监听和文件同步两者结合就实现了文件实时同步,只要源目录中的文件进行更改,目标目录中的文件就随之更改,很好的满足了系统要求。  相似文献   

现有的跨领域情感分类方法大多只利用了单个源领域到目标域的迁移特征,没有充分考虑目标域实例与不同源域之间的联系。针对此问题,本文提出一种无监督的多源跨领域情感分类模型。首先利用单个源域到目标域的迁移特征训练基分类器,并对不同的基分类器加权;然后将不同基分类器对目标域实例预测的集成一致性作为目标函数,优化该目标函数,得到不同基分类器的权重;最后利用加权后的基分类器得到目标域的情感分类结果。该模型在Amazon数据集上进行了多源域情感迁移实验,取得了较好的实验结果,相对其他基线模型,在4组实验中平均提升了0.75%。  相似文献   

In many applications, a face recognition model learned on a source domain but applied to a novel target domain degenerates even significantly due to the mismatch between the two domains. Aiming at learning a better face recognition model for the target domain, this paper proposes a simple but effective domain adaptation approach that transfers the supervision knowledge from a labeled source domain to the unlabeled target domain. Our basic idea is to convert the source domain images to target domain (termed as targetize the source domain hereinafter), and at the same time keep its supervision information. For this purpose, each source domain image is simply represented as a linear combination of sparse target domain neighbors in the image space, with the combination coefficients however learnt in a common subspace. The principle behind this strategy is that, the common knowledge is only favorable for accurate cross-domain reconstruction, but for the classification in the target domain, the specific knowledge of the target domain is also essential and thus should be mostly preserved (through targetization in the image space in this work). To discover the common knowledge, specifically, a common subspace is learnt, in which the structures of both domains are preserved and meanwhile the disparity of source and target domains is reduced. The proposed method is extensively evaluated under three face recognition scenarios, i.e., domain adaptation across view angle, domain adaptation across ethnicity and domain adaptation across imaging condition. The experimental results illustrate the superiority of our method over those competitive ones.  相似文献   

标准域无监督域适应学习是从相关的源域学习知识迁移到目标域,通常假设源域数据在训练阶段是可直接使用的。但是由于隐私和安全问题,在一些现实的应用中,源域数据往往是不可直接获取的,如何有效利用目标域数据从而减少噪声类的输出或特征的产生是源域无关域适应学习的巨大挑战。为解决这个问题,提出了一个基于双矫正机制的源域无关域适应学习模型(source-free domain adaptation with dual-correction mechanism,DCM)。首先,探索目标域样本信息结构,对噪声类输出进行矫正;其次,采用教师—学生模型指导特征的学习,最大化高置信度特征间的一致性以及低置信度特征间的差异性。最后,在数字集、Office-31和Office-Home数据集上的实验结果证实了DCM的有效性。  相似文献   

Toward Web-based application management systems   总被引:2,自引:0,他引:2  
As Web technology spreads, the number, variety, and sophistication of Web based information services is literally exploding. While some effort has been put into managing a single, centrally controlled Web site, current Web technologies offer little help for managing Web based applications in-the-large. This is partly due to the distributed, heterogeneous, and open nature of such applications. The paper proposes a generic framework for managing Web based applications which addresses both semantic and managerial issues. Semantic issues are addressed through the inclusion of a domain model component in the framework which describes the kinds of information that are available. Management issues are treated through a framework which includes formally defined notions for an information model, information base consistency, transactions, and concurrency control. Thus, the proposed management system provides a semantically robust environment for Web based information services while allowing for Web source independence  相似文献   

对于现有的多源自适应学习方案无法有效区分多个源域中的有用信息并迁移至目标域的问题,提出一种具有特征选择的多源自适应分类框架(MACFFS),并将特征选择和共享特征子空间学习整合到统一框架中进行联合特征学习。具体来说,MACFFS将来自多个源域的特征数据投影至不同的潜在空间中来学习得到多个源域分类模型,实现目标域的分类。然后,将得到的多个分类结果进行整合用于目标域分类模型的学习。此外,框架还利用L2,1范数稀疏回归代替传统的基于L2范数的最小二乘回归来提高鲁棒性。最后,把多种现有方法在两项任务中与MACFFS进行实验比较分析。实验结果表明,与现有方法中表现最好的DSM相比,MACFFS节省了接近1/4的计算时间,并且提升了大约2%的识别率。总的来说,MACFFS结合了机器学习、统计学习等相关知识,为多源自适应方法提供了一个新的思路,且该方法在现实场景下的识别应用中比现有方法具有更好的性能。  相似文献   

对于现有的多源自适应学习方案无法有效区分多个源域中的有用信息并迁移至目标域的问题,提出一种具有特征选择的多源自适应分类框架(MACFFS),并将特征选择和共享特征子空间学习整合到统一框架中进行联合特征学习。具体来说,MACFFS将来自多个源域的特征数据投影至不同的潜在空间中来学习得到多个源域分类模型,实现目标域的分类。然后,将得到的多个分类结果进行整合用于目标域分类模型的学习。此外,框架还利用L2,1范数稀疏回归代替传统的基于L2范数的最小二乘回归来提高鲁棒性。最后,把多种现有方法在两项任务中与MACFFS进行实验比较分析。实验结果表明,与现有方法中表现最好的DSM相比,MACFFS节省了接近1/4的计算时间,并且提升了大约2%的识别率。总的来说,MACFFS结合了机器学习、统计学习等相关知识,为多源自适应方法提供了一个新的思路,且该方法在现实场景下的识别应用中比现有方法具有更好的性能。  相似文献   

唐宋  陈利娟  陈志贤  叶茂 《计算机应用》2017,37(4):1164-1168
在许多实际工程应用中,训练场景(源域)和测试场景(目标域)的分布并不相同,如果将源域中训练的分类器直接应用到目标域,性能往往会出现大幅度下降。目前大多数域自适应方法以概率推导为基础。从图像特征表达的角度出发,针对自适应图像分类问题,提出一种新的基于协同特征的无监督方法。首先,所有源样本被作为字典;然后,距离目标样本最近的三个目标域样本被用来帮助鲁棒地表达局部近邻几何信息;最后,结合字典和局部近邻信息实现编码,并利用最近邻分类器完成分类。因为协同特征通过融合目标域局部近邻信息,获得了更强的鲁棒性和区分性,基于该特征编码的分类方法具有更好的分类性能。在域自适应数据集上的对比实验结果表明所提算法是有效的。  相似文献   

当训练集数据和测试集数据来自不同的载体源时,即在载体源失配的条件下,通常会使一个表现优异的隐写分析器检测准确率下降。在实际应用中,隐写分析人员往往需要处理从互联网上采集的图像。然而,与训练集数据相比,这些可疑图像很可能具有完全不同的捕获和处理历史,导致隐写分析模型可能出现不同程度的检测性能下降,这也是隐写分析工具在现实应用中很难成功部署的原因。为了提高基于深度学习的隐写分析方法的实际应用价值,对测试样本信息加以利用,使用领域自适应方法来解决载体源失配问题,将训练集数据作为源领域,将测试集数据作为目标领域,通过最小化源领域与目标领域之间的特征分布差异来提高隐写分析器在目标领域的检测性能,提出了一种对抗子领域自适应网络(ASAN,adversarial subdomain adaptation network)。一方面从生成特征的角度出发,要求隐写分析模型生成的源领域特征和目标领域特征尽可能相似,使判别器分辨不出特征来自哪一个领域;另一方面从减小域间特征分布差异的角度出发,采用子领域自适应方法来减少相关子领域分布的非期望变化,有效地扩大了载体与载密样本之间的距离,有利于分类精度的提高。通过...  相似文献   

随着规模和复杂性的迅猛膨胀,软件系统中不可避免地存在缺陷.近年来,基于深度学习的缺陷预测技术成为软件工程领域的研究热点.该类技术可以在不运行代码的情况下发现其中潜藏的缺陷,因而在工业界和学术界受到了广泛的关注.然而,已有方法大多关注方法级的源代码中是否存在缺陷,无法精确识别具体的缺陷类别,从而降低了开发人员进行缺陷定位及修复工作的效率.此外,在实际软件开发实践中,新的项目通常缺乏足够的缺陷数据来训练高精度的深度学习模型,而利用已有项目的历史数据训练好的模型往往在新项目上无法达到良好的泛化性能.因此,本文首先将传统的二分类缺陷预测任务表述为多标签分类问题,即使用CWE(common weakness enumeration)中描述的缺陷类别作为细粒度的模型预测标签.为了提高跨项目场景下的模型性能,本文提出一种融合对抗训练和注意力机制的多源域适应框架.具体而言,该框架通过对抗训练来减少域(即软件项目)差异,并进一步利用域不变特征来获得每个源域和目标域之间的特征相关性.同时,该框架还利用加权最大均值差异作为注意力机制以最小化源域和目标域特征之间的表示距离,从而使模型可以学习到更多的域无关特征.最后在八个真实世界的开源项目上与最先进的基线方法进行大量对比实验验证了所提方法的有效性.  相似文献   

Radio frequency identification (RFID) holds the promise of real-time identifying, locating, tracking and monitoring physical objects without line of sight, and it can be used for a wide range of pervasive computing applications. To achieve these goals, RFID data have to be collected, transformed and expressively modeled as their virtual counterparts in the virtual world. RFID data, however, have their own unique characteristics–including aggregation, location, temporal and history oriented–which have to be fully considered and integrated into the data model. The diversity of RFID applications poses further challenges to a generalized framework for RFID data modeling. In this paper, we explore the fundamental characteristics of RFID applications, and classify applications into a set of basic scenarios based on these characteristics. We then develop constructs for modeling each scenario, which then can be integrated to model most complex RFID applications in the real world. We further demonstrate that our model provides powerful support on querying physical objects in RFID-based applications.  相似文献   

经典机器学习算法假设训练数据和测试数据具有相同的输入特征空间和数据分布,但在很多现实应用中这一假设通常并不成立,导致经典机器学习算法失效。领域自适应是一种新的机器学习策略,其关键技术在于通过学习新的特征表达来对齐源域和目标域的数据分布,使得在有标签源域中训练的模型可以直接迁移到没有标签的目标域上,且不会引起模型性能的明显下降。介绍领域自适应的定义、分类和代表性算法,讨论基于度量学习和基于对抗学习的两类领域自适应算法。在此基础上,分析领域自适应的典型应用和现存挑战,并对其发展趋势及未来研究方向进行展望。  相似文献   

The application of transfer learning to effectively identify rolling bearing fault has been attracting much attention. Most of the current studies are based on single-source domain or multi-source domains constructed from different working conditions of the same machine. However, in practical scenarios, it is common to obtain multiple source domains from different machines, which brings new challenges to how to use these source domains to complete fault diagnosis. To solve the issue, a conditional distribution-guided adversarial transfer learning network with multi-source domains (CDGATLN) is developed for fault diagnosis of bearing installed on different machines. Firstly, the knowledge of multi-source domains from different machines is transferred to the single target domain by decreasing data distribution discrepancy between each source domain and target domain. Then, a conditional distribution-guided alignment strategy is introduced to decrease conditional distribution discrepancy and calculate the importance per source domain based on the conditional distribution discrepancy, so as to promote the knowledge transfer of each source domain. Finally, a monotone importance specification mechanism is constructed to constrain each importance to ensure that the source domain with low importance will not be discarded, which enables the knowledge of each source domain to participate in the construction of the model. Extensive experimental results verify the effectiveness and superiority of CDGATLN.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号