首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
百度百科包含了大量的实体和丰富的链接与分类关系,在中文领域含有大量人类知识,能够弥补普通词典词汇覆盖面小的缺点。 在商品品牌名称挖掘中,该文提出了发现新的品牌名称的基于图模型的半指导方法。利用百度百科中词条间的相关关系和开放分类,该文使用不同的准则计算词条间的相似度,结合词条和分类的关联性,分类与分类之间的关联性,使用标记传播算法,在130万个词条上进行了品牌名称的挖掘,取得了较好地效果。  相似文献   

2.
In multilabel learning, each instance in the training set is associated with a set of labels and the task is to output a label set whose size is unknown a priori for each unseen instance. In this paper, this problem is addressed in the way that a neural network algorithm named BP-MLL, i.e., Backpropagation for Multilabel Learning, is proposed. It is derived from the popular Backpropogation algorithm through employing a novel error function capturing the characteristics of multilabel learning, i.e., the labels belonging to an instance should be ranked higher than those not belonging to that instance. Applications to two real-world multilabel learning problems, i.e., functional genomics and text categorization, show that the performance of BP-MLL is superior to that of some well-established multilabel learning algorithms.  相似文献   

3.
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.  相似文献   

4.
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions are strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.  相似文献   

5.
语义分析是基于内容的文本挖掘领域的重要技术和研究难点。有监督机器学习方法受限于标注语料的规模,在小规模标注样本中难以获取较高性能。本文面向浅层语义分析任务,采用一种新颖的半监督学习方法——直推式支持向量机,并结合其训练特点提出了基于主动学习的样本优化策略。实验表明,本文提出的浅层语义分析方法通过整合主动学习与半监督学习,在小规模标注样本环境中取得了良好的学习效果。  相似文献   

6.
Some recent successful semi-supervised learning methods construct more than one learner from both labeled and unlabeled data for inductive learning. This paper proposes a novel multiple-view multiple-learner (MVML) framework for semi-supervised learning, which differs from previous methods in possession of both multiple views and multiple learners. This method adopts a co-training styled learning paradigm in enlarging labeled data from a much larger set of unlabeled data. To the best of our knowledge it is the first attempt to combine the advantages of multiple-view learning and ensemble learning for semi-supervised learning. The use of multiple views is promising to promote performance compared with single-view learning because information is more effectively exploited. At the same time, as an ensemble of classifiers is learned from each view, predictions with higher accuracies can be obtained than solely adopting one classifier from the same view. Experiments on different applications involving both multiple-view and single-view data sets show encouraging results of the proposed MVML method.  相似文献   

7.
多标记学习主要用于解决单个样本同时属于多个类别的问题.传统的多标记学习通常假设训练数据集含有大量有标记的训练样本.然而在许多实际问题中,大量训练样本中通常只有少量有标记的训练样本.为了更好地利用丰富的未标记训练样本以提高分类性能,提出了一种基于正则化的归纳式半监督多标记学习方法——MASS.具体而言,MASS首先在最小化经验风险的基础上,引入两种正则项分别用于约束分类器的复杂度及要求相似样本拥有相似结构化多标记输出,然后通过交替优化技术给出快速解法.在网页分类和基因功能分析问题上的实验结果验证了MASS方法的有效性.  相似文献   

8.
文中给出了多关系学习的产生、实质以及任务,指出多关系学习具有狭义和广义两个层面。对粗集和多关系学习给出了简单综述,表明粗集理论在多关系学习中占有重要地位。分析研究了粗集和归纳逻辑程序设计方法用于多关系学习的几种结合途径,尤其重点介绍了RSILP系列模型,并在文中给出其一般模型。文中对用于多关系学习中其它粗集方法也作了简单介绍。RSILP模型的完善扩展以及粗集方法在多关系学习中的进一步应用是今后的工作方向。  相似文献   

9.
网络表示学习是一个重要的研究课题,其目的是将高维的属性网络表示为低维稠密的向量,为下一步任务提供有效特征表示。最近提出的属性网络表示学习模型SNE(Social Network Embedding)同时使用网络结构与属性信息学习网络节点表示,但该模型属于无监督模型,不能充分利用一些容易获取的先验信息来提高所学特征表示的质量。基于上述考虑提出了一种半监督属性网络表示学习方法SSNE(Semi-supervised Social Network Embedding),该方法以属性网络和少量节点先验作为前馈神经网络输入,经过多个隐层非线性变换,在输出层通过保持网络链接结构和少量节点先验,学习最优化的节点表示。在四个真实属性网络和两个人工属性网络上,同现有主流方法进行对比,结果表明本方法学到的表示,在聚类和分类任务上具有较好的性能。  相似文献   

10.
现有疾病基因预测方法大多利用致病基因的各类注释信息进行预测,但仍有很多疾病没有任何注释信息。针对该问题,提出一种基于文本挖掘与功能相似性的疾病基因预测方法,通过数据挖掘获取疾病的相关基因本体术语,利用功能相似性分析基因与疾病之间的相关程度,并根据该相关程度对所有候选基因进行排序,从而识别出致病基因。测试结果显示,该方法能有效预测没有已知功能注释的致病基因。  相似文献   

11.
由于数据库的频繁更新,时态数据库隐藏了大量的未知信息,因此针对实时更新的数据库应产生相应的时态关联规则.虽然关联规则算法已经被深入广泛地研究,但在文本数据中时态关联规则算法的研究还不多见.在深入了解时态关联规则算法及其在文本数据中的研究价值后,以时态文本为对象进行了时态关联规则算法的研究,建立了时态文本数据的时间表示模型,提出了文本时态关联规则算法SPFM,最后通过实验对算法进行了有效性验证,结果表明该算法是正确可行的.  相似文献   

12.
Semi-Supervised Learning on Riemannian Manifolds   总被引:1,自引:0,他引:1  
We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.  相似文献   

13.
该文介绍HowNet在文本挖掘中的应用,利用HowNet从中文语义的角度计算中文词语相似度,计算词语之间的相关性,为实现更深层次的信息处理做准备。  相似文献   

14.
已有的数据流分类算法多采用有监督学习,需要使用大量已标记数据训练分类器,而获取已标记数据的成本很高,算法缺乏实用性。针对此问题,文中提出基于半监督学习的集成分类算法SEClass,能利用少量已标记数据和大量未标记数据,训练和更新集成分类器,并使用多数投票方式对测试数据进行分类。实验结果表明,使用同样数量的已标记训练数据,SEClass算法与最新的有监督集成分类算法相比,其准确率平均高5。33%。且运算时间随属性维度和类标签数量的增加呈线性增长,能够适用于高维、高速数据流分类问题。  相似文献   

15.
16.
在e-Learning系统中,对学生学习的评价难以进行。在分析e-Learning环境下学习评价特征的基础上,本文引入电子学档评价方法,提出将文本挖掘技术运用于学习评价,依据学生学习评价量规,实现对学生学习过程的评价。  相似文献   

17.
Low-rank structures play important roles in recent advances of many problems in image science and data science. As a natural extension of low-rank structures for data with nonlinear structures, the concept of the low-dimensional manifold structure has been considered in many data processing problems. Inspired by this concept, we consider a manifold based low-rank regularization as a linear approximation of manifold dimension. This regularization is less restricted than the global low-rank regularization, and thus enjoy more flexibility to handle data with nonlinear structures. As applications, we demonstrate the proposed regularization to classical inverse problems in image sciences and data sciences including image inpainting, image super-resolution, X-ray computer tomography image reconstruction and semi-supervised learning. We conduct intensive numerical experiments in several image restoration problems and a semi-supervised learning problem of classifying handwritten digits using the MINST data. Our numerical tests demonstrate the effectiveness of the proposed methods and illustrate that the new regularization methods produce outstanding results by comparing with many existing methods.  相似文献   

18.
王树芬  张哲  马士尧  陈俞强  伍一 《计算机工程》2022,48(6):107-114+123
联邦学习允许边缘设备或客户端将数据存储在本地来合作训练共享的全局模型。主流联邦学习系统通常基于客户端本地数据有标签这一假设,然而客户端数据一般没有真实标签,且数据可用性和数据异构性是联邦学习系统面临的主要挑战。针对客户端本地数据无标签的场景,设计一种鲁棒的半监督联邦学习系统。利用FedMix方法分析全局模型迭代之间的隐式关系,将在标签数据和无标签数据上学习到的监督模型和无监督模型进行分离学习。采用FedLoss聚合方法缓解客户端之间数据的非独立同分布(non-IID)对全局模型收敛速度和稳定性的影响,根据客户端模型损失函数值动态调整局部模型在全局模型中所占的权重。在CIFAR-10数据集上的实验结果表明,该系统的分类准确率相比于主流联邦学习系统约提升了3个百分点,并且对不同non-IID水平的客户端数据更具鲁棒性。  相似文献   

19.
崔鹏  张汝波 《计算机工程》2009,35(15):187-189
介绍一种定义近邻图上的高斯域(GF)及用于降维和分类的GF的相关知识,提出一种用于半监督回归的高斯域,能自动设置模型参数和近邻数,利用监督和无监督数据进行熵值查询选择从而进行主动学习。实验将其与半监督学习法进行比较并验证了GF的有效性。  相似文献   

20.
Dozier  C. Jackson  P. 《Software, IEEE》2005,22(3):94-100
Text mining is a relatively new research area associated with the creation of novel information resources from electronic text repositories. An expert-witness database based on text from legal, medical, and news documents demonstrates the successful application of text-mining techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号