期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning to select pseudo labels: a semi-supervised method for named entity recognition

Li Zhen-zhen Feng Da-wei Li Dong-sheng Lu Xi-cheng 《浙江大学学报:C卷英文版》2020,21(6):903-916

Frontiers of Information Technology & Electronic Engineering - Deep learning models have achieved state-of-the-art performance in named entity recognition (NER); the good performance, however,... 相似文献

2.

RankCNN: When learning to rank encounters the pseudo preference feedback

《Computer Standards & Interfaces》2014,36(3):554-562

Learning to rank has received great attentions in the field of text retrieval for several years. However, a few researchers introduce the topic into visual reranking due to the special nature of image presentation. In this paper, a novel unsupervised visual reranking is proposed, termed rank via the convolutional neural networks (RankCNN). This approach integrates deep learning with pseudo preference feedback. The optimal set of pseudo preference pairs is first detected from initial list by a modified graph-based method. Ranking is then reduced to pairwise classification in the architecture of CNN. In addition, Accelerated Mini-Batch Stochastic Dual Coordinate Ascent (ASDCA) is introduced to the framework to accelerate the training. The experiments indicate the competitive performance on the LETOR 4.0, the Paris and the Francelandmark dataset. 相似文献

3.

Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers 总被引：1，自引：1，他引：1

Li Wei Eamonn Keogh Helga Van Herle Agenor Mafra-Neto Russell J. Abbott 《Knowledge and Information Systems》2007,11(3):313-344

In this paper, we define time series query filtering, the problem of monitoring the streaming time series for a set of predefined patterns. This problem is of great practical importance given the massive volume of streaming time series available through sensors, medical patient records, financial indices and space telemetry. Since the data may arrive at a high rate and the number of predefined patterns can be relatively large, it may be impossible for the comparison algorithm to keep up. We propose a novel technique that exploits the commonality among the predefined patterns to allow monitoring at higher bandwidths, while maintaining a guarantee of no false dismissals. Our approach is based on the widely used envelope-based lower-bounding technique. As we will demonstrate on extensive experiments in diverse domains, our approach achieves tremendous improvements in performance in the offline case, and significant improvements in the fastest possible arrival rate of the data stream that can be processed with guaranteed no false dismissals. As a further demonstration of the utility of our approach, we demonstrate that it can make semisupervised learning of time series classifiers tractable. Li Wei is a Ph.D. candidate in the Department of Computer Science & Engineering at the University of California, Riverside. She received her B.S. and M.S. degrees from Fudan University, China. Her research interests include data mining and information retrieval. Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases”. Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University; 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds an M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in chemical engineering from UCLA (1985). Agenor Mafra-Neto, Ph.D., is the CEO of ISCA Technologies, Inc., in California and the founder of ISCA Technologies, LTDA, in Brazil. His research interests include the analysis of insect behavior and communication systems, the manipulation of insect behavior, and the automation of pest monitoring and pest control. Dr. Mafra-Neto is currently coordinating the deployment of area-wide smart sensor and effector networks to micromanage agricultural and public health pests in the field in an automatic fashion. Russell J. Abbott is a Professor of computer science at California State University, Los Angeles, and a member of the staff at the Aerospace Corporation, El Segundo, CA. His primary interests are in the field of complex systems. He is currently organizing a workshop to bring together people working in the fields of complex systems and systems engineering. 相似文献

4.

Feature-based approach to semi-supervised similarity learning

Philippe H. Gosselin Matthieu Cord 《Pattern recognition》2006,39(10):1839-1851

For the management of digital document collections, automatic database analysis still has difficulties to deal with semantic queries and abstract concepts that users are looking for. Whenever interactive learning strategies may improve the results of the search, system performances still depend on the representation of the document collection. We introduce in this paper a weakly supervised optimization of a feature vector set. According to an incomplete set of partial labels, the method improves the representation of the collection, even if the size, the number, and the structure of the concepts are unknown. Experiments have been carried out on synthetic and real data in order to validate our approach. 相似文献

5.

Preference-based learning to rank

Nir Ailon Mehryar Mohri 《Machine Learning》2010,80(2-3):189-211

This paper presents an efficient preference-based ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking and that of the binary classifier obtained. Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a regret bound of two for a deterministic algorithm in the bipartite case, suggest a trade-off between achieving low regret and determinism in this context. Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular, the number of calls to the preference function in the reduction is improved from Ω(n ²) to O(nlog?n). In addition, when the top k ranked elements only are required (k?n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O(klog?k+n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand. 相似文献

6.

面向排序学习的特征分析的研究

下载免费PDF全文

花贵春张敏邝达刘奕群马少平茹立云《计算机工程与应用》2011,47(17):122-127

排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习（Learning to Rank）,一个信息检索与机器学习的交叉学科,越来越受到人们的重视。从排序特征的构建方式易知,特征之间并不是完全独立的,然而现有的排序学习方法的研究,很少在特征分析的基础上,从特征重组与选择的角度,来构建更有效的排序函数。针对这一问题,提出如下的模型框架：对构建排序函数的特征集合进行分析,然后重组与选择,利用排序学习方法学习排序函数。基于这一框架,提出四种特征处理的算法：基于主成分分析的特征重组方法、基于MAP、前向选择和排序学习算法隐含的特征选择。实验结果显示,经过特征处理后,利用排序学习算法构建的排序函数,一般优于原始的排序函数。相似文献

7.

Stream-based semi-supervised learning for recommender systems

Pawel Matuszyk Myra Spiliopoulou 《Machine Learning》2017,106(6):771-798

To alleviate the problem of data sparsity inherent to recommender systems, we propose a semi-supervised framework for stream-based recommendations. Our framework uses abundant unlabelled information to improve the quality of recommendations. We extend a state-of-the-art matrix factorization algorithm by the ability to add new dimensions to the matrix at runtime and implement two approaches to semi-supervised learning: co-training and self-learning. We introduce a new evaluation protocol including statistical testing and parameter optimization. We then evaluate our framework on five real-world datasets in a stream setting. On all of the datasets our method achieves statistically significant improvements in the quality of recommendations. 相似文献

8.

Pointwise manifold regularization for semi-supervised learning

Yunyun WANG Jiao HAN Yating SHEN Hui XUE 《Frontiers of Computer Science》2021,15(1):151303-98

Manifold regularization(MR)provides a powerful framework for semi-supervised classification using both the labeled and unlabeled data.It constrains that similar instances over the manifold graph should share similar classification out-puts according to the manifold assumption.It is easily noted that MR is built on the pairwise smoothness over the manifold graph,i.e.,the smoothness constraint is implemented over all instance pairs and actually considers each instance pair as a single operand.However,the smoothness can be pointwise in nature,that is,the smoothness shall inherently occur“everywhereto relate the behavior of each point or instance to that of its close neighbors.Thus in this paper,we attempt to de-velop a pointwise MR(PW_MR for short)for semi-supervised learning through constraining on individual local instances.In this way,the pointwise nature of smoothness is preserved,and moreover,by considering individual instances rather than instance pairs,the importance or contribution of individual instances can be introduced.Such importance can be described by the confidence for correct prediction,or the local density,for example.PW.MR provides a different way for implementing manifold smoothness Finally,empirical results show the competitiveness of PW_MR compared to pairwise MR. 相似文献

9.

Leveraging pointwise prediction with learning to rank for top-N recommendation

Zhu Nengjun Cao Jian Lu Xinjiang Gu Qi 《World Wide Web》2021,24(1):375-396

World Wide Web - Pointwise prediction and Learning to Rank (L2R) are two hot strategies to model user preference in recommender systems. Currently, these two types of approaches are often... 相似文献

10.

Multi-task learning to rank for web search

Yi Chang Jing Bai Hongyuan Zha 《Pattern recognition letters》2012,33(2):173-181

Both the quality and quantity of training data have significant impact on the accuracy of rank functions in web search. With the global search needs, a commercial search engine is required to expand its well tailored service to small countries as well. Due to heterogeneous intrinsic of query intents and search results on different domains (i.e., for different languages and regions), it is difficult for a generic ranking function to satisfy all type of queries. Instead, each domain should use a specific well tailored ranking function. In order to train each ranking function for each domain with a scalable strategy, it is critical to leverage existing training data to enhance the ranking functions of those domains without sufficient training data. In this paper, we present a boosting framework for learning to rank in the multi-task learning context to attack this problem. In particular, we propose to learn non-parametric common structures adaptively from multiple tasks in a stage-wise way. An algorithm is developed to iteratively discover super-features that are effective for all the tasks. The estimation of the regression function for each task is then learned as linear combination of those super-features. We evaluate the accuracy of multi-task learning methods for web search ranking using data from multiple domains from a commercial search engine. Our results demonstrate that multi-task learning methods bring significant relevance improvements over existing baseline method. 相似文献

11.

Flexible data representation with feature convolution for semi-supervised learning

Dornaika F. 《Applied Intelligence》2021,51(11):7690-7704

Applied Intelligence - Data representation plays a crucial role in semi-supervised learning. This paper proposes a framework for semi-supervised data representation. It introduces a flexible... 相似文献

12.

A unified framework for semi-supervised PU learning

Haoji Hu Chaofeng Sha Xiaoling Wang Aoying Zhou 《World Wide Web》2014,17(4):493-510

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method. 相似文献

13.

Ensemble diversified learning for image classification with noisy labels

Ahmed Ahmed Yousif Hayder He Zhihai 《Multimedia Tools and Applications》2021,80(14):20759-20772

Multimedia Tools and Applications - In this work, we develop a new approach for learning a deep neural network for image classification with noisy labels using ensemble diversified learning. We... 相似文献

14.

基于改进伪中值滤波和非局部均值滤波的红外图像滤波方法

张倩《工矿自动化》2014,(12):57-60

首先对伪中值滤波算法进行了改进:噪声检测过程融入像素点灰度值、几何距离等因素,实现噪声点从图像像素点中的逐步分离;采用加权滤波的方法滤除噪声。其次对改进非局部均值滤波算法的先验信息获取方法进行了改进:对噪声图像进行提升小波变换,采用一种新型阈值函数选择低频分解系数,对高于阈值的系数进行重构得到参考图像,计算参考图像的相似度权值并将其作为改进非局部均值滤波算法的先验信息。最后基于2种改进算法提出了一种红外图像滤波方法,即依次采用改进伪中值滤波算法和基于先验信息的改进非局部均值滤波算法对红外图像进行滤波处理,然后将其与参考图像进行融合,以修正被过度滤波的图像。实验结果表明,该方法针对高密度噪声的红外图像有较好的滤波效果。相似文献

15.

A semi-supervised learning system for service robots to recognise human actions

Chao Tang Huosheng Hu Wei Pan Lidong Xie 《Advanced Robotics》2014,28(13):907-918

相似文献

16.

Supervised rank aggregation based on query similarity for document retrieval

Yang Wang Yalou Huang Xiaodong Pang Min Lu Maoqiang Xie Jie Liu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2013,17(3):421-429

This paper is concerned with supervised rank aggregation, which aims to improve the ranking performance by combining the outputs from multiple rankers. However, there are two main shortcomings in previous rank aggregation approaches. First, the learned weights for base rankers do not distinguish the differences among queries. This is suboptimal since queries vary significantly in terms of ranking. Besides, most current aggregation functions do not directly optimize the evaluation measures in ranking. In this paper, the differences among queries are taken into consideration, and a supervised rank aggregation function is proposed. This aggregation function is directly optimizing the evaluation measure NDCG, referred to as RankAgg.NDCG, We prove that RankAgg.NDCG can achieve better NDCG performance than the linear combination of the base rankers. Experimental results performed on benchmark datasets show our approach outperforms a number of baseline approaches. 相似文献

17.

Graph-based semi-supervised learning with Local Binary Patterns for holistic object categorization

《Expert systems with applications》2014,41(17):7744-7753

相似文献

18.

A general dimension for query learning

《Journal of Computer and System Sciences》2007,73(6):924-940

We introduce a combinatorial dimension that characterizes the number of queries needed to exactly (or approximately) learn concept classes in various models. Our general dimension provides tight upper and lower bounds on the query complexity for all sorts of queries, not only for example-based queries as in previous works.As an application we show that for learning DNF formulas, unspecified attribute value membership and equivalence queries are not more powerful than standard membership and equivalence queries. Further, in the approximate learning setting, we use the general dimension to characterize the query complexity in the statistical query as well as the learning by distances model. Moreover, we derive close bounds on the number of statistical queries needed to approximately learn DNF formulas. 相似文献

19.

Active learning with effective scoring functions for semi-supervised temporal action localization

《Displays》2023

Temporal Action Localization (TAL) aims to predict both action category and temporal boundary of action instances in untrimmed videos, i.e., start and end time. Existing works usually adopt fully-supervised solutions, however, one of the practical bottlenecks in these solutions is the large amount of labeled training data required. To reduce expensive human label cost, this paper focuses on a rarely investigated yet practical task named semi-supervised TAL and proposes an effective active learning method, named AL-STAL. We leverage four steps for actively selecting video samples with high informativeness and training the localization model, named Train, Query, Annotate, Append. Two scoring functions that consider the uncertainty of localization model are equipped in AL-STAL, thus facilitating the video sample ranking and selection. One takes entropy of predicted label distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE). And the other introduces a new metric based on mutual information between adjacent action proposals, named Temporal Context Inconsistency (TCI). To validate the effectiveness of proposed method, we conduct extensive experiments on three benchmark datasets THUMOS’14, ActivityNet 1.3 and ActivityNet 1.2. Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning. 相似文献

20.

Boosted multi-class semi-supervised learning for human action recognition

Tianzhu Zhang Si Liu Changsheng Xu Hanqing Lu 《Pattern recognition》2011,44(10-11):2334-2342

Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work use the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive. To tackle this problem, we propose a boosted multi-class semi-supervised learning algorithm in which the co-EM algorithm is adopted to leverage the information from unlabeled data. Three key issues are addressed in this paper. Firstly, we formulate the action recognition in a multi-class semi-supervised learning problem to deal with the insufficient labeled data and high computational expense. Secondly, boosted co-EM is employed for the semi-supervised model construction. To overcome the high dimensional feature space, weighted multiple discriminant analysis (WMDA) is used to project the features into low dimensional subspaces in which the Gaussian mixture models (GMM) are trained and boosting scheme is used to integrate the subspace models. Thirdly, we present the upper bound of the training error in multi-class framework, which is able to guide the novel classifier construction. In theory, the proposed solution is proved to minimize this upper error bound. Experimental results have shown good performance on public datasets. 相似文献