共查询到20条相似文献,搜索用时 0 毫秒
1.
推特文本中包含着大量的非标准词,这些非标准词是由人们有意或无意而创造的。对很多自然语言处理的任务而言,预先对推特文本进行规范化处理是很有必要的。针对已有的规范化系统性能较差的问题,提出一种创新的无监督文本规范化系统。首先,使用构造的标准词典来判断当前的推特是否需要标准化。然后,对推特中的非标准词会根据其特征来考虑进行一对一还是一对多规范化;对于需要一对多的非标准词,通过前向和后向搜索算法,计算出所有可能的多词组合。其次,对于多词组合中的非规范化词,基于二部图随机游走和误拼检查,来产生合适的候选。最后,使用基于上下文的语言模型来得到最合适的标准词。所提算法在数据集上获得86.4%的F值,超过当前最好的基于图的随机游走算法10个百分点。 相似文献
2.
In this paper, a novel unsupervised dimensionality reduction algorithm, unsupervised Globality-Locality Preserving Projections in Transfer Learning (UGLPTL) is proposed, based on the conventional Globality-Locality Preserving dimensionality reduction algorithm (GLPP) that does not work well in real-world Transfer Learning (TL) applications. In TL applications, one application (source domain) contains sufficient labeled data, but the related application contains only unlabeled data (target domain). Compared to the existing TL methods, our proposed method incorporates all the objectives, such as minimizing the marginal and conditional distributions between both the domains, maximizing the variance of the target domain, and performing Geometrical Diffusion on Manifolds, all of which are essential for transfer learning applications. UGLPTL seeks a projection vector that projects the source and the target domains data into a common subspace where both the labeled source data and the unlabeled target data can be utilized to perform dimensionality reduction. Comprehensive experiments have verified that the proposed method outperforms many state-of-the-art non-transfer learning and transfer learning methods on two popular real-world cross-domain visual transfer learning data sets. Our proposed UGLPTL approach achieved 82.18% and 87.14% mean accuracies over all the tasks of PIE Face and Office-Caltech data sets, respectively. 相似文献
3.
Record linkage is a process of identifying records that refer to the same real-world entity. Many existing approaches to record linkage apply supervised machine learning techniques to generate a classification model that classifies a pair of records as either match or non-match. The main requirement of such an approach is a labelled training dataset. In many real-world applications no labelled dataset is available hence manual labelling is required to create a sufficiently sized training dataset for a supervised machine learning algorithm. Semi-supervised machine learning techniques, such as self-learning or active learning, which require only a small manually labelled training dataset have been applied to record linkage. These techniques reduce the requirement on the manual labelling of the training dataset. However, they have yet to achieve a level of accuracy similar to that of supervised learning techniques. In this paper we propose a new approach to unsupervised record linkage based on a combination of ensemble learning and enhanced automatic self-learning. In the proposed approach an ensemble of automatic self-learning models is generated with different similarity measure schemes. In order to further improve the automatic self-learning process we incorporate field weighting into the automatic seed selection for each of the self-learning models. We propose an unsupervised diversity measure to ensure that there is high diversity among the selected self-learning models. Finally, we propose to use the contribution ratios of self-learning models to remove those with poor accuracy from the ensemble. We have evaluated our approach on 4 publicly available datasets which are commonly used in the record linkage community. Our experimental results show that our proposed approach has advantages over the state-of-the-art semi-supervised and unsupervised record linkage techniques. In 3 out of 4 datasets it also achieves comparable results to those of the supervised approaches. 相似文献
4.
This letter presents a novel unsupervised competitive learning rule called the boundary adaptation rule (BAR), for scalar quantization. It is shown both mathematically and by simulations that BAR converges to equiprobable quantizations of univariate probability density functions and that, in this way, it outperforms other unsupervised competitive learning rules. 相似文献
5.
自闭症患者的行为和认知缺陷与潜在的脑功能异常有关。对于静息态功能磁振图像(functional magnetic resonance imaging, fMRI)高维特征,传统的线性特征提取方法不能充分提取其中的有效信息用于分类。为此,本文面向fMRI数据提出一种新型的无监督模糊特征映射方法,并将其与多视角支持向量机相结合,构建分类模型应用于自闭症的计算机辅助诊断。该方法首先采用多输出TSK模糊系统的规则前件学习方法,将原始特征数据映射到线性可分的高维空间;然后引入流形正则化学习框架,提出新型的无监督模糊特征学习方法,从而得到原输出特征向量的非线性低维嵌入表示;最后使用多视角SVM算法进行分类。实验结果表明:本文方法能够有效提取静息态fMRI数据中的重要特征,在保证模型具有优越且稳定的分类性能的前提下,还可以提高模型的可解释性。 相似文献
6.
We show how the quantum paradigm can be used to speed up unsupervised learning algorithms. More precisely, we explain how it is possible to accelerate learning algorithms by quantizing some of their subroutines. Quantization refers to the process that partially or totally converts a classical algorithm to its quantum counterpart in order to improve performance. In particular, we give quantized versions of clustering via minimum spanning tree, divisive clustering and k-medians that are faster than their classical analogues. We also describe a distributed version of k-medians that allows the participants to save on the global communication cost of the protocol compared to the classical version. Finally, we design quantum algorithms for the construction of a neighbourhood graph, outlier detection as well as smart initialization of the cluster centres. 相似文献
7.
We propose an automatic thresholding technique for difference images in unsupervised change detection. Such a technique takes into account the different costs that may be associated with commission and omission errors in the selection of the decision threshold. This allows the generation of maps in which the overall change-detection cost is minimized, i.e. the more critical kind of error is reduced according to end-user requirements. 相似文献
8.
特征抽取是图像识别的关键环节,准确的特征表达能够产生更准确的分类效果。采用软阈值编码器和正交匹配追踪(OMP)算法正交化视觉词典的方法,以提高单级计算结构的识别率,并进一步构造两级计算结构,获取图像更准确的特征,以提高图像的识别率。实验表明,采用软阈值编码器和OMP算法能提高单级计算结构提取特征的能力,提高大样本数据集中图像的识别率。两级计算结构能够提高自选数据集中图像的识别率。采用OMP算法能提高VOC2012数据中图像的识别率。在自选数据集上,两级计算结构优于单级计算结构,与NIN结构相比表现出优势,与卷积神经网络CNN相当,说明两级计算结构在自选数据集上有很好的适应性。 相似文献
9.
Shrestha Sushma Alsadoon Abeer Prasad P. W. C. Seher Indra Alsadoon Omar Hisham 《Multimedia Tools and Applications》2021,80(14):21293-21313
Multimedia Tools and Applications - Deep learning has not been successfully implemented in the past with accurate segmentation of prostate on Magnetic Resonance (MR) image in nerve sparing prostate... 相似文献
10.
Over the last decade, the deep neural networks are a hot topic in machine learning. It is breakthrough technology in processing images, video, speech, text and audio. Deep neural network permits us to overcome some limitations of a shallow neural network due to its deep architecture. In this paper we investigate the nature of unsupervised learning in restricted Boltzmann machine. We have proved that maximization of the log-likelihood input data distribution of restricted Boltzmann machine is equivalent to minimizing the cross-entropy and to special case of minimizing the mean squared error. Thus the nature of unsupervised learning is invariant to different training criteria. As a result we propose a new technique called “REBA” for the unsupervised training of deep neural networks. In contrast to Hinton’s conventional approach to the learning of restricted Boltzmann machine, which is based on linear nature of training rule, the proposed technique is founded on nonlinear training rule. We have shown that the classical equations for RBM learning are a special case of the proposed technique. As a result the proposed approach is more universal in contrast to the traditional energy-based model. We demonstrate the performance of the REBA technique using wellknown benchmark problem. The main contribution of this paper is a novel view and new understanding of an unsupervised learning in deep neural networks. 相似文献
11.
L H Andrew 《Neural Networks, IEEE Transactions on》1996,7(1):254-256
This note propose an alternative to a neural network for designing scaler quantizers proposed by Van Hulle and Martinez (ibid., vol.5, p.498-501, May 1994). It also points out that the performance measure used is of limited applicability. 相似文献
12.
多维尺度分析已经在维度约减和数据挖掘领域得到了广泛应用。MDS的主要缺点是其定义在训练数据上,对于新的测试样本无法直接获得映射结果。另外,MDS基于欧氏距离度量,不适合获取相似数据中的非线性流形结构。将MDS扩展到关联度量空间,称为关联度量多维尺度分析(CMDS)。与传统MDS在训练数据中完成映射,进而缩小空间范围相比,CMDS 能够直接获得测试样本映射结果。此外,CMDS基于关联度量,能够有效学习相似数据中的非线性流形结构。理论分析表明,CMDS可以利用核方法扩展到新特征空间,解决非线性问题。实验结果表明,CMDS及其核形式KG-CMDS性能优于常用传统降维方法。 相似文献
13.
为了充分利用人脸图像的潜在信息,提出一种通过设置不同尺寸的卷积核来得到图像多尺度特征的方法,多尺度卷积自动编码器(Multi-Scale Convolutional Auto-Encoder,MSCAE)。该结构所提取的不同尺度特征反映人脸的本质信息,可以更好地还原人脸图像。这种特征提取框架是一个卷积和采样交替的层级结构,使得特征对旋转、平移、比例缩放等具有高度不变性。MSCAE以encoder-decoder模式训练得到特征提取器,用它提取特征,并融合形成用于分类的特征向量。BP神经网络在ORL和Yale人脸库上的分类结果表明,多尺度特征在识别率和性能上均优于单尺度特征。此外,MSCAE特征与HOG(Histograms of Oriented Gradients)的融合特征取得了比单一特征更高的识别率。 相似文献
14.
We propose a method for visual tracking-by-detection based on online feature learning. Our learning framework performs feature encoding with respect to an over-complete dictionary, followed by spatial pyramid pooling. We then learn a linear classifier based on the resulting feature encoding. Unlike previous work, we learn the dictionary online and update it to help capture the appearance of the tracked target as well as the background. In more detail, given a test image window, we extract local image patches from it and each local patch is encoded with respect to the dictionary. The encoded features are then pooled over a spatial pyramid to form an aggregated feature vector. Finally, a simple linear classifier is trained on these features.Our experiments show that the proposed powerful—albeit simple—tracker, outperforms all the state-of-the-art tracking methods that we have tested. Moreover, we evaluate the performance of different dictionary learning and feature encoding methods in the proposed tracking framework, and analyze the impact of each component in the tracking scenario. In particular, we show that a small dictionary, learned and updated online is as effective and more efficient than a huge dictionary learned offline. We further demonstrate the flexibility of feature learning by showing how it can be used within a structured learning tracking framework. The outcome is one of the best trackers reported to date, which facilitates the advantages of both feature learning and structured output prediction. We also implement a multi-object tracker, which achieves state-of-the-art performance. 相似文献
15.
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection. 相似文献
16.
Dong-Chul Park 《Neural Networks, IEEE Transactions on》2000,11(2):520-528
An unsupervised competitive learning algorithm based on the classical k-means clustering algorithm is proposed. The proposed learning algorithm called the centroid neural network (CNN) estimates centroids of the related cluster groups in training date. This paper also explains algorithmic relationships among the CNN and some of the conventional unsupervised competitive learning algorithms including Kohonen's self-organizing map and Kosko's differential competitive learning algorithm. The CNN algorithm requires neither a predetermined schedule for learning coefficient nor a total number of iterations for clustering. The simulation results on clustering problems and image compression problems show that CNN converges much faster than conventional algorithms with compatible clustering quality while other algorithms may give unstable results depending on the initial values of the learning coefficient and the total number of iterations. 相似文献
17.
Raffay Hamid Siddhartha Maddi Aaron Bobick Charles Isbell 《Artificial Intelligence》2009,173(14):1221-1244
Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification.In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments. 相似文献
18.
Neural-network front ends in unsupervised learning 总被引:1,自引:0,他引:1
Proposed is an idea of partial supervision realized in the form of a neural-network front end to the schemes of unsupervised learning (clustering). This neural network leads to an anisotropic nature of the induced feature space. The anisotropic property of the space provides us with some of its local deformation necessary to properly represent labeled data and enhance efficiency of the mechanisms of clustering to be exploited afterwards. The training of the network is completed based upon available labeled patterns-a referential form of the labeling gives rise to reinforcement learning. It is shown that the discussed approach is universal and can be utilized in conjunction with any clustering method. Experimental studies are concentrated on three main categories of unsupervised learning including FUZZY ISODATA, Kohonen self-organizing maps, and hierarchical clustering. 相似文献
19.
Ling Chen Chuandong Li Tingwen Huang Yiran Chen Xin Wang 《Neural computing & applications》2014,25(2):393-400
This letter presents a new memristor crossbar array system and demonstrates its applications in image learning. The controlled pulse and image overlay technique are introduced for the programming of memristor crossbars and promising a better performance for noise reduction. The time-slot technique is helpful for improving the processing speed of image. Simulink and numerical simulations have been employed to demonstrate the useful applications of the proposed circuit structure in image learning. 相似文献