共查询到20条相似文献,搜索用时 15 毫秒
1.
Matthew Chang Author Vitae Author Vitae 《Journal of Systems and Software》2009,82(6):1036-1045
In this paper, we report our experience on the use of phrases as basic features in the email classification problem. We performed extensive empirical evaluation using our large email collections and tested with three text classification algorithms, namely, a naive Bayes classifier and two k-NN classifiers using TF-IDF weighting and resemblance respectively. The investigation includes studies on the effect of phrase size, the size of local and global sampling, the neighbourhood size, and various methods to improve the classification accuracy. We determined suitable settings for various parameters of the classifiers and performed a comparison among the classifiers with their best settings. Our result shows that no classifier dominates the others in terms of classification accuracy. Also, we made a number of observations on the special characteristics of emails. In particular, we observed that public emails are easier to classify than private ones. 相似文献
2.
Qi Tian Author Vitae Ying Wu Author Vitae Author Vitae Thomas S. Huang Author Vitae 《Pattern recognition》2005,38(6):903-917
For learning-based tasks such as image classification, the feature dimension is usually very high. The learning is afflicted by the curse of dimensionality as the search space grows exponentially with the dimension. Discriminant-EM (DEM) proposed a framework by applying self-supervised learning in a discriminating subspace. This paper extends the linear DEM to a nonlinear kernel algorithm, Kernel DEM (KDEM), and evaluates KDEM extensively on benchmark image databases and synthetic data. Various comparisons with other state-of-the-art learning techniques are investigated for several tasks of image classification. Extensive results show the effectiveness of our approach. 相似文献
3.
Behzad Zamani Ahmad Akbari Babak Nasersharif Azarakhsh Jalalvand 《Pattern recognition letters》2011,32(7):948-955
Feature extraction is an important component of pattern classification and speech recognition. Extracted features should discriminate classes from each other while being robust to environmental conditions such as noise. For this purpose, several feature transformations are proposed which can be divided into two main categories: data-dependent transformation and classifier-dependent transformation. The drawback of data-dependent transformation is that its optimization criteria are different from the measure of classification error which can potentially degrade the classifier’s performance. In this paper, we propose a framework to optimize data-dependent feature transformations such as PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis) and HLDA (Heteroscedastic LDA) using minimum classification error (MCE) as the main objective. The classifier itself is based on Hidden Markov Model (HMM). In our proposed HMM minimum classification error technique, the transformation matrices are modified to minimize the classification error for the mapped features, and the dimension of the feature vector is not changed. To evaluate the proposed methods, we conducted several experiments on the TIMIT phone recognition and the Aurora2 isolated word recognition tasks. The experimental results show that the proposed methods improve performance of PCA, LDA and HLDA transformation for mapping Mel-frequency cepstral coefficients (MFCC). 相似文献
4.
This paper presents a document classifier based on text content features and its application to email classification. We test the validity of a classifier which uses Principal Component Analysis Document Reconstruction (PCADR), where the idea is that principal component analysis (PCA) can compress optimally only the kind of documents-in our experiments email classes-that are used to compute the principal components (PCs), and that for other kinds of documents the compression will not perform well using only a few components. Thus, the classifier computes separately the PCA for each document class, and when a new instance arrives to be classified, this new example is projected in each set of computed PCs corresponding to each class, and then is reconstructed using the same PCs. The reconstruction error is computed and the classifier assigns the instance to the class with the smallest error or divergence from the class representation. We test this approach in email filtering by distinguishing between two message classes (e.g. spam from ham, or phishing from ham). The experiments show that PCADR is able to obtain very good results with the different validation datasets employed, reaching a better performance than the popular Support Vector Machine classifier. 相似文献
5.
Learning discriminative spatiotemporal features for precise crop classification from multi-temporal satellite images 总被引:1,自引:0,他引:1
《International journal of remote sensing》2012,33(8):3162-3174
ABSTRACTPrecise crop classification from multi-temporal remote sensing images has important applications such as yield estimation and food transportation planning. However, the mainstream convolutional neural networks based on 2D convolution collapse the time series information. In this study, a 3D fully convolutional neural network (FCN) embedded with a global pooling module and channel attention modules is proposed to extract discriminative spatiotemporal presentations of different types of crops from multi-temporal high-resolution satellite images. Firstly, a novel 3D FCN structure is introduced to replace 2D FCNs as well as to improve current 3D convolutional neural networks (CNNs) by providing a mean to learn distinctive spatiotemporal representations of each crop type from the reshaped multi-temporal images. Secondly, to strengthen the learning significance of the spatiotemporal representations, our approach includes 3D channel attention modules, which regulate the between-channel consistency of the features from the encoder and the decoder, and a 3D global pooling module, which selects the most distinctive features at the top of the encoder. Experiments were conducted using two data sets with different types of crops and time spans. Our results show that our method outperformed in both accuracy and efficiency, several mainstream 2D FCNs as well as a recent 3D CNN designed for crop classification. The experimental data and source code are made openly available at http://study.rsgis.whu.edu.cn/pages/download/. 相似文献
6.
《Journal of Network and Computer Applications》2012,35(2):770-777
Without imposing restrictions, many enterprises find nonwork-related contents consuming network resources. Business communication over emails thus incurs undesired delays and inflicts damages to businesses, explaining why many enterprises are concerned with the competition to use email services. Obviously, enterprises should prioritize business emails over personal ones in their email service. Therefore, previous works present content-based classification methods to categorize enterprise emails into business or personal correspondence. Accuracy of these methods is largely determined by their ability to survey as much information as possible. However, in addition to decreasing the performance of these methods, monitoring the details of email contents may violate privacy rights that are under legal protection, requiring a careful balance of accurately classifying enterprise emails and protecting privacy rights. The proposed email classification method is thus based on social features rather than a survey of emails contents. Social-based metrics are also designed to characterize emails as social features; the obtained features are treated as an input of machine learning-based classifiers for email classification. Experimental results demonstrate the high accuracy of the proposed method in classifying emails. In contrast with other content-based methods that examine email contents, the emphasis on social features in the proposed method is a promising alternative for solving similar email classification problems. 相似文献
7.
8.
9.
Customer complaint management is becoming a critical key success factor in today's business environment. This study introduces a methodology to improve complaint-handling strategies through an automatic email-classification system that distinguishes complaints from non-complaints. As such, complaint handling becomes less time-consuming and more successful. The classification system combines traditional text information with new information about the linguistic style of an email. The empirical results show that adding linguistic style information into a classification model with conventional text-classification variables results in a significant increase in predictive performance. In addition, this study reveals linguistic style differences between complaint emails and others. 相似文献
10.
Shougang Ren Sheng Wan Xiangbo Shu Huangliang Xu 《International journal of remote sensing》2019,40(15):5812-5834
In hyperspectral image (HSI) processing, the inclusion of both spectral and spatial features, e.g. morphological features, shape features, has shown great success in classification of hyperspectral data. Nevertheless, there exist two main issues to address: (1) The multiple features are often treated equally and thus the complementary information among them is neglected. (2) The features are often degraded by a mixture of various kinds of noise, leading to the classification accuracy decreased. In order to address these issues, a novel robust discriminative multiple features extraction (RDMFE) method for HSI classification is proposed. The proposed RDMFE aims to project the multiple features into a common low-rank subspace, where the specific contributions of different types of features are sufficiently exploited. With low-rank constraint, RDMFE is able to uncover the intrinsic low-dimensional subspace structure of the original data. In order to make the projected features more discriminative, we make the learned representations optimal for classification. With intrinsic information preserving and discrimination capabilities, the learned projection matrix works well in HSI classification tasks. Experimental results on three real hyperspectral datasets confirm the effectiveness of the proposed method. 相似文献
11.
Effective and efficient texture feature extraction and classification is an important problem in image understanding and recognition. Recently, texton learning based texture classification approaches have been widely studied, where the textons are usually learned via K-means clustering or sparse coding methods. However, the K-means clustering is too coarse to characterize the complex feature space of textures, while sparse texton learning/encoding is time-consuming due to the l0-norm or l1-norm minimization. Moreover, these methods mostly compute the texton histogram as the statistical features for classification, which may not be effective enough. This paper presents an effective and efficient texton learning and encoding scheme for texture classification. First, a regularized least square based texton learning method is developed to learn the dictionary of textons class by class. Second, a fast two-step l2-norm texton encoding method is proposed to code the input texture feature over the concatenated dictionary of all classes. Third, two types of histogram features are defined and computed from the texton encoding outputs: coding coefficients and coding residuals. Finally, the two histogram features are combined for classification via a nearest subspace classifier. Experimental results on the CUReT, KTH_TIPS and UIUC datasets demonstrated that the proposed method is very promising, especially when the number of available training samples is limited. 相似文献
12.
Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human–computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front–end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front–end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification. 相似文献
13.
14.
15.
Chengjun Liu 《Pattern recognition letters》2011,32(14):1796-1804
This paper presents a discriminative color features (DCF) method, which applies a simple yet effective color model, a novel similarity measure, and effective color feature extraction methods, for improving face recognition performance. First, the new color model is constructed according to the principle of Ockham’s razor from a number of available models that take advantage of the subtraction of the primary colors for boosting pattern recognition performance. Second, the novel similarity measure integrates both the angular and the distance information for improving upon the broadly applied similarity measures. Finally, the discriminative color features are extracted from a compact color image representation by means of discriminant analysis with enhanced generalization capabilities. Experiments on the Face Recognition Grand Challenge (FRGC) version 2 Experiment 4, which contains 12,776 training images, 16,028 controlled target images, and 8,014 uncontrolled query images, show the feasibility of the proposed method. 相似文献
16.
Context‐based email classification requires understanding of semantic and structural attributes of email. Most of the research has focused on generating semantic properties through structural components of email. By viewing emails as events (as a major subset of class of email), a rich contextual test‐bed representation for understanding of the semantic attributes of emails has been devised. The event‐ based emails have traditionally been studied based on simple structural properties. In this paper, we present a novel approach by first representing such class of emails as graphs, followed by heuristically applying graph mining and matching algorithm to pick templates representing contextual and semantic attributes that help classify emails. The classification templates used three key event classes: social, personal and professional. Results show that our graph mining and matching supported template‐based approach performs consistently well over event email data set with high accuracy. 相似文献
17.
Cai Huanhuan Huang Lei Zhang Wenfeng Wei Zhiqiang 《Multimedia Tools and Applications》2022,81(2):1787-1809
Multimedia Tools and Applications - We focus on the one-example person re-identification (Re-ID) task, where each identity has only one labeled example along with many unlabeled examples. Since... 相似文献
18.
Email is one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide, individuals and organizations more and more rely on the emails to communicate and share information and knowledge. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. It is becoming a big challenge to process and manage the emails efficiently for and individuals and organizations. This paper proposes new email classification models using a linear neural network trained by perceptron learning algorithm and a nonlinear neural network trained by back-propagation learning algorithm. An efficient semantic feature space (SFS) method is introduced in these classification models. The traditional back-propagation neural network (BPNN) has slow learning speed and is prone to trap into a local minimum, so the modified back-propagation neural network (MBPNN) is presented to overcome these limitations. The vector space model based email classification system suffers from a large number of features and ambiguity in the meaning of terms, which will lead to sparse and noisy feature space. So we use the SFS to convert the original sparse and noisy feature space to a semantically richer feature space, which will helps to accelerate the learning speed. The experiments are conducted based on different training set size and extracted feature size. Experimental results show that the models using MBPNN outperform the traditional BPNN, and the use of SFS can greatly reduce the feature dimensionality and improve email classification performance. 相似文献
19.
已有视图度量无法同时描述3维模型整体和局部细节特征,因此难以得到理想的最优视图.提出一种结合统计分类和视图边缘细节特征的最优视图提取算法.首先,采用Adaboost进行样例学习,通过最优视图之间的几何特征相似性得到候选视图集合.然后,定义边缘分布熵对候选视图进行局部特征分析,用以提取最优视图,从而使提取出来的最优视图能够有效描述出3维模型的结构特征和内在细节特征,符合人类视觉感知效果.最后,通过3维模型数据库对算法进行统计分析.实验结果表明,本文算法要优于类似的最优视图算法. 相似文献
20.
Multimedia Systems - The bag-of-words (BOW) based methods are widely used in image classification. However, huge number of visual information is omitted inevitably in the quantization step of the... 相似文献