首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions.  相似文献   

2.
Kristof  Dirk   《Decision Support Systems》2008,44(4):870-882
Customer complaint management is becoming a critical key success factor in today's business environment. This study introduces a methodology to improve complaint-handling strategies through an automatic email-classification system that distinguishes complaints from non-complaints. As such, complaint handling becomes less time-consuming and more successful. The classification system combines traditional text information with new information about the linguistic style of an email. The empirical results show that adding linguistic style information into a classification model with conventional text-classification variables results in a significant increase in predictive performance. In addition, this study reveals linguistic style differences between complaint emails and others.  相似文献   

3.
文本分类在采用向量空间模型(VSM)表达文本特征时,容易出现特征向量高维且稀疏的现象,为了对原始的文本特征向量进行有效简化,提出了一种基于粒子群(PSO)优化独立分量分析(ICA)进行降维的方法,并将其运用到文本分类中。在该算法中,以负熵作为粒子群算法的适应度函数,依据其高斯性原理作为独立性判别标准对分离矩阵进行自适应更新。实验结果表明,相比于传统的特征降维方法,该方法可以解决高维度文本特征向量降维困难的问题,使得文本分类的效率、准确率显著提升。  相似文献   

4.
Texture based image analysis techniques have been widely employed in the interpretation of earth cover images obtained using remote sensing techniques, seismic trace images, medical images and in query by content in large image data bases. The development in multi-resolution analysis such as wavelet transform leads to the development of adequate tools to characterize different scales of textures effectively. But, the wavelet transform lacks in its ability to decompose input image into multiple orientations and this limits their application to rotation invariant image analysis. This paper presents a new approach for rotation invariant texture classification using Gabor wavelets. Gabor wavelets are the mathematical model of visual cortical cells of mammalian brain and using this, an image can be decomposed into multiple scales and multiple orientations. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain and found widespread use in computer vision. Texture features are found by calculating the mean and variance of the Gabor filtered image. Rotation normalization is achieved by the circular shift of the feature elements, so that all images have the same dominant direction. The texture similarity measurement of the query image and the target image in the database is computed by minimum distance criterion.  相似文献   

5.
Classifying walking patterns helps the diagnosis of health status, disease progression and the effect of interventions. In this paper, we develop previous research on human gait to extract a meaningful set of parameters that allow us to design a highly interpretable system capable of identifying different gait styles with linguistic fuzzy if-then rules. The model easily discriminates among five different walking patterns, namely: normal walk, on tiptoes, dragging left limb, dragging right limb, and dragging both limbs. We have carried out a complete experimentation to test the performance of the extracted parameters to correctly classify these five chosen gait styles.  相似文献   

6.
A methodology is described for classifying noisy fingerprints directly from raw unprocessed images. The directional properties of fingerprints are exploited as input features by computing one-dimensional fast Fourier transform (FFT) of the images over some selected bands in four and eight directions. The ability of the multilayer perceptron (MLP) for generating complex boundaries is utilised for the purpose of classification. The superiority of the method over some existing ones is established for fingerprints corrupted with various types of distortions, especially random noise.  相似文献   

7.
Knowledge and Information Systems - Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which...  相似文献   

8.
Supervised text classifiers need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available because human labeling is enormously time-consuming. For this reason, there has been recent interest in methods that are capable of obtaining a high accuracy when the size of the training set is small.In this paper we introduce a new single label text classification method that performs better than baseline methods when the number of labeled examples is small. Differently from most of the existing methods that usually make use of a vector of features composed of weighted words, the proposed approach uses a structured vector of features, composed of weighted pairs of words.The proposed vector of features is automatically learned, given a set of documents, using a global method for term extraction based on the Latent Dirichlet Allocation implemented as the Probabilistic Topic Model. Experiments performed using a small percentage of the original training set (about 1%) confirmed our theories.  相似文献   

9.
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential.  相似文献   

10.
基于N元语言模型的文本分类方法   总被引:6,自引:0,他引:6  
分类是近年来自然语言处理领域的一个研究热点。在分析了传统的分类模型后,文中提出了用N元语言模型作为中文文本分类模型。该模型不以传统的"词袋"(bagofwords)方法表示文档,而将文档视为词的随机观察序列。根据该方法,设计并实现一个基于词的2元语言模型分类器。通过N元语言模型与传统分类模型(向量空间模型和NaiveBayes模型)的实验对比,结果表明:N元模型分类器具有更好的分类性能。  相似文献   

11.
尹春勇  何苗 《计算机应用》2020,40(9):2525-2530
针对卷积神经网络(CNN)中的池化操作会丢失部分特征信息和胶囊网络(CapsNet)分类精度不高的问题,提出了一种改进的CapsNet模型。首先,使用两层卷积层对特征信息进行局部特征提取;然后,使用CapsNet对文本的整体特征进行提取;最后,使用softmax分类器进行分类。在文本分类中,所提模型比CNN和CapsNet在分类精度上分别提高了3.42个百分点和2.14个百分点。实验结果表明,改进CapsNet模型更适用于文本分类。  相似文献   

12.
尹春勇  何苗 《计算机应用》2005,40(9):2525-2530
针对卷积神经网络(CNN)中的池化操作会丢失部分特征信息和胶囊网络(CapsNet)分类精度不高的问题,提出了一种改进的CapsNet模型。首先,使用两层卷积层对特征信息进行局部特征提取;然后,使用CapsNet对文本的整体特征进行提取;最后,使用softmax分类器进行分类。在文本分类中,所提模型比CNN和CapsNet在分类精度上分别提高了3.42个百分点和2.14个百分点。实验结果表明,改进CapsNet模型更适用于文本分类。  相似文献   

13.
从信息论的角度,提出了一种新的文本分类模型.该模型以文本提供的关于类别的信息作为分类依据,从另一个角度来思考文本分类问题.从实用性的角度来看,该模型与传统的朴素贝叶斯模型和基于KL距离的中心向量法具有一定的关系,并给出了证明.根据广义信息论的基本概念,又对此模型进行推广,提出了特征权重的概念,可以通过修正特征权重来修正文本分类模型,为成功解决文本分类模型的修正问题提供了理论基础.  相似文献   

14.
Pattern Analysis and Applications - Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods...  相似文献   

15.
Web 2.0 and social media provide users with an opportunity to discuss and share opinions, as a result, a considerable amount of information will emerge which can be drawn upon to determine some demographic and behavioral features.This study is an attempt to predict gender, as a demographic feature, using linguistic features of data collected from the users' comments in the social media.For this purpose, a framework is proposed to predict the users' gender by counting the number of some given words including verbs, pronouns, articles, adjectives, adverbs, preposition and numbers. This framework, thereafter, was tested using the comments that readers of Los Angeles Times left and the model were observed to predict the gender with an accuracy of 66.66%. Security solution and e-marketing can use this framework respectively for authentication and niche marketing.  相似文献   

16.
Medical thermography has proved to be useful in various medical applications including the detection of breast cancer where it is able to identify the local temperature increase caused by the high metabolic activity of cancer cells. It has been shown to be particularly well suited for picking up tumours in their early stages or tumours in dense tissue and outperforms other modalities such as mammography for these cases. In this paper we perform breast cancer analysis based on thermography, using a series of statistical features extracted from the thermograms quantifying the bilateral differences between left and right breast areas, coupled with a fuzzy rule-based classification system for diagnosis. Experimental results on a large dataset of nearly 150 cases confirm the efficacy of our approach that provides a classification accuracy of about 80%.  相似文献   

17.
基于隐马尔可夫模型的文本分类算法   总被引:2,自引:0,他引:2  
杨健  汪海航 《计算机应用》2010,30(9):2348-2350
自动文本分类领域近年来已经产生了若干成熟的分类算法,但这些算法主要基于概率统计模型,没有与文本自身的语法和语义建立起联系。提出了将隐马尔可夫序列分析模型(HMM)用于自动文本分类的算法,首先构造表示文档类别的特征词集合,并以文档类别的特征词序列作为不同HMM分类器的观察序列,而HMM的状态转换序列则隐含地表示了不同类别文档内容的形成演化过程。分类时,具有最大生成概率的HMM分类器类标即为测试文档的分类结果。该算法构造的分类器模型一定程度上体现了不同类别文档的语法和语义特征,并可以实现多类别的自动文本分类,分类效率较高。  相似文献   

18.
在扩展Petri网基础上提出了一种新的文本分类模型。基本思想是利用定性映射方法扩展Petri网系统,利用状态方程进行推理,使文本分类更接近于人类思维判断过程,并给出了分类算法。  相似文献   

19.
针对现实文本分类环境下通常仅有少量标记样本而影响分类精度的问题,提出了一种基于概率主题模型潜在 Dirichlet 分配的分类算法。借助标准词频逆文档频率函数将每个文档表示成术语权重向量;利用概率主题模型预处理以简化文档,并从文档中提取术语;再利用潜在 Dirichlet 分配模型进行关系学习,构建基于图的分类器完成分类。在公开的 Reuters-21578资源库上的分类实验评估了该方法的有效性,相比分类效果较好的支持向量机,该方法在大部分情况下能够取得更高的分类精度。  相似文献   

20.
Pattern Analysis and Applications - In this paper, we propose a novel framework for text classification based on subspace-based methods. Recent studies showed the advantages of modeling texts as...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号