首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Email has become one of the fastest and most economical forms of communication. Email is also one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. This paper proposes a new spam filtering system using revised back propagation (RBP) neural network and automatic thesaurus construction. The conventional back propagation (BP) neural network has slow learning speed and is prone to trap into a local minimum, so it will lead to poor performance and efficiency. The authors present in this paper the RBP neural network to overcome the limitations of the conventional BP neural network. A well constructed thesaurus has been recognized as a valuable tool in the effective operation of text classification, it can also overcome the problems in keyword-based spam filters which ignore the relationship between words. The authors conduct the experiments on Ling-Spam corpus. Experimental results show that the proposed spam filtering system is able to achieve higher performance, especially for the combination of RBP neural network and automatic thesaurus construction.  相似文献   

2.
In this paper, a corpus-based thesaurus and WordNet were used to improve text categorization performance. We employed the k-NN algorithm and the back propagation neural network (BPNN) algorithms as the classifiers. The k-NN is a simple and famous approach for categorization, and the BPNNs has been widely used in the categorization and pattern recognition fields. However the standard BPNN has some generally acknowledged limitations, such as a slow training speed and can be easily trapped into a local minimum. To alleviate the problems of the standard BPNN, two modified versions, Morbidity neurons Rectified BPNN (MRBP) and Learning Phase Evaluation BPNN (LPEBP), were considered and applied to the text categorization. We conducted the experiments on both the standard reuter-21578 data set and the 20 Newsgroups data set. Experimental results showed that our proposed methods achieved high categorization effectiveness as measured by the precision, recall and F-measure protocols.  相似文献   

3.
This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN) and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578 corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.  相似文献   

4.
This paper proposed a new improved method for back propagation neural network, and used an efficient method to reduce the dimension and improve the performance. The traditional back propagation neural network (BPNN) has the drawbacks of slow learning and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the learning phase evaluation back propagation neural network (LPEBP) to improve the traditional BPNN. We adopt a singular value decomposition (SVD) technique to reduce the dimension and construct the latent semantics between terms. Experimental results show that the LPEBP is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. The SVD technique cannot only greatly reduce the high dimensionality but also enhance the performance. So SVD is to further improve the document classification systems precisely and efficiently.  相似文献   

5.
文本分类为一个文档自动分配一组预定义的类别或主题。文本分类中,文档的表示对学习机的学习性能有很大的影响。以实现哈萨克语文本分类为目的,根据哈萨克语语法规则设计实现哈萨克语文本的词干提取,完成哈萨克语文本的预处理。提出基于最近支持向量机的样本距离公式,避免k参数的选定,以SVM与KNN分类算法的特殊组合算法(SV-NN)实现了哈萨克语文本的分类。结合自己构建的哈萨克语文本语料库的语料进行文本分类仿真实验,数值实验展示了提出算法的有效性并证实了理论结果。  相似文献   

6.
The slow convergence of back-propagation neural network (BPNN) has become a challenge in data-mining and knowledge discovery applications due to the drawbacks of the gradient descent (GD) optimization method, which is widely adopted in BPNN learning. To solve this problem, some standard optimization techniques such as conjugate-gradient and Newton method have been proposed to improve the convergence rate of BP learning algorithm. This paper presents a heuristic method that adds an adaptive smoothing momentum term to original BP learning algorithm to speedup the convergence. In this improved BP learning algorithm, adaptive smoothing technique is used to adjust the momentums of weight updating formula automatically in terms of “3 σ limits theory.” Using the adaptive smoothing momentum terms, the improved BP learning algorithm can make the network training and convergence process faster, and the network’s generalization performance stronger than the standard BP learning algorithm can do. In order to verify the effectiveness of the proposed BP learning algorithm, three typical foreign exchange rates, British pound (GBP), Euro (EUR), and Japanese yen (JPY), are chosen as the forecasting targets for illustration purpose. Experimental results from homogeneous algorithm comparisons reveal that the proposed BP learning algorithm outperforms the other comparable BP algorithms in performance and convergence rate. Furthermore, empirical results from heterogeneous model comparisons also show the effectiveness of the proposed BP learning algorithm.  相似文献   

7.
In the practice of information retrieval, there are some problems such as the lack of accurate expression of user query requests, the mismatch between document and query and query optimization. Focusing on these problems, we propose the query expansion method based on conceptual semantic space with deep learning, this hybrid query expansion technique include deep learning and pseudocorrelation feedback, use the deep learning and semantic network WordNet to construct query concept tree in the level of concept semantic space, the pseudo-correlation feedback documents are processed by observation window, compute the co-occurrence weight of the words by using the average mutual information and get the final extended words set. The results of experiment show that the expansion algorithm based on conceptual semantic space with deep learning has better performance than the traditional pseudo-correlation feedback algorithm on query expansion.  相似文献   

8.
In this paper, a novel control scheme to deal with process uncertainties in the form of disturbance loads and modelling errors, as well as time-varying process parameters is proposed by applying the back-propagation neural network (BPNN) approach. A BPNN predictive controller that replaces the entire Smith predictor structure is initially trained offline. Lyapunov direct method is used to prove that the convergence of this BPNN is guaranteed by selecting a suitable learning rate during the learning process. However, the Smith predictor based BPNN control is an off-line training based algorithm, which is a time consuming method and requires a known process plant input from the controller. A desired control input to the process is difficult to obtain for the training of the network. As a result a group of proper training data (target control inputs and outputs) can hardly be provided. In order to overcome this problem, a BPNN with an on-line training algorithm is introduced for the control of a First Order plus Dead Time (FOPDT) process. The stability analysis is carried out using the Lyapunov criterion to demonstrate the network convergence ability. Simulation results show that this proposed online trained neural Smith predictor based controller provides excellent robustness to process modelling errors and disturbance loads, and high adaptability to time varying processes parameters.  相似文献   

9.
In this paper, we study the problem of adding a large number of new words into a Chinese thesaurus according to their definitions in a Chinese dictionary, while minimizing the effort of hand tagging. To deal with the problem, we first make use of a kind of supervised learning technique to learn a set of defining formats for each class in the thesaurus, which tries to characterize the regularities about the definitions of the words in the class. We then use traditional techniques in Graph theory to derive a minimal subset of the new words to be added into the thesaurus, which meets the following condition: if we add the new words in the subset into the thesaurus by hand, the other new words can be added into the thesaurus automatically by matching their definitions with the defining formats of each class in the thesaurus. The method uses little, if any, language-specific or thesaurus-specific knowledge, and can be applied to the thesauri of other languages. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

10.
This paper studies parallel training of an improved neural network for text categorization. With the explosive growth on the amount of digital information available on the Internet, text categorization problem has become more and more important, especially when millions of mobile devices are now connecting to the Internet. Improved back-propagation neural network (IBPNN) is an efficient approach for classification problems which overcomes the limitations of traditional BPNN. In this paper, we utilize parallel computing to speedup the neural network training process of IBPNN. The parallel IBNPP algorithm for text categorization is implemented on a Sun Cluster with 34 nodes (processors). The communication time and speedup for the parallel IBPNN versus various number of nodes are studied. Experiments are conducted on various data sets and the results show that the parallel IBPNN together with SVD technique achieves fast computational speed and high text categorization correctness.  相似文献   

11.
针对资源分配网络(RAN)算法存在隐含层节点受初始学习数据影响大、收敛速度低等问题,提出一种新的RAN学习算法。通过均值算法确定初始隐含层节点,在原有的“新颖性准则”基础上增加RMS窗口,更好地判定隐含层节点是否增加。同时,采用最小均方(LMS)算法与扩展卡尔曼滤波器(EKF)算法相结合调整网络参数,提高算法学习速度。由于基于词向量空间文本模型很难处理文本的高维特性和语义复杂性,为此通过语义特征选取方法对文本输入空间进行语义特征的抽取和降维。实验结果表明,新的RAN学习算法具有学习速度快、网络结构紧凑、分类效果好的优点,而且,在语义特征选取的同时实现了降维,大幅度减少文本分类时间,有效提高了系统分类准确性。  相似文献   

12.
基于关联规则的文本聚类算法的研究*   总被引:1,自引:0,他引:1  
K-均值聚类算法是目前一种较好的文本分类算法,算法中的相似度计算通常基于词频统计,小文档或简单句子由于词频过小,使用该算法聚类效果较差。为此,提出了一种基于词语关联度的相似度计算算法,对简单文档集执行关联规则算法,得出基于关键词的关联规则,并根据这些规则求得词语关联度矩阵,然后由权重对文本进行文本特征向量表示,最后借助于关联度矩阵和文本特征向量,并按一定算法计算出句子相似度。实验证明该算法可得到较好的聚类结果,且其不仅利用词频统计的方法而且考虑了词语间的关系。  相似文献   

13.
A connectionist scheme, namely, σ-Fuzzy Lattice Neurocomputing scheme or σ-FLN for short, which has been introduced in the literature lately for clustering in a lattice data domain, is employed for computing clusters of directed graphs in a master-graph. New tools are presented and used, including a convenient inclusion measure function for clustering graphs. A directed graph is treated by σ-FLN as a single datum in the mathematical lattice of subgraphs stemming from a master-graph. A series of experiments is detailed where the master-graph emanates from a thesaurus of spoken language synonyms. The words of the thesaurus are fed to σ-FLN in order to compute clusters of semantically related words, namely hyperwords. The arithmetic parameters of σ-FLN can be adjusted so as to calibrate the total number of hyperwords computed in a specific application. It is demonstrated how the employment of hyperwords implies a reduction, based on the a priori knowledge of semantics contained in the thesaurus, in the number of features to be used for document classification. In a series of comparative experiments for document classification, it appears that the proposed method favorably improves classification accuracy in problems involving longer documents, whereas performance deteriorates in problems involving short documents  相似文献   

14.
结合类频率的关联中文文本分类   总被引:6,自引:2,他引:6  
该文提出一种词类频率和关联中文文本分类相结合的算法ARCTC。此算法将文档视作事务,关键词视作项,并针对文本事务的特性,提出利用词的类频率筛选与分类相关性不大的词汇,然后将改进的关联规则挖掘算法用于挖掘项和类别间的相关关系。挖掘出的规则用于形成类别特征词的集合,可用来和类标号未知文档的词的集合求交集,交集元素个数最多者即为所分类别。实验证明,该算法在提高训练时间和测试时间的同时具有较好的召回率、准确率和F-Measure。  相似文献   

15.
Query refinement is essential for information retrieval. In this study, a fuzzy-related thesaurus based query refinement mechanism is proposed. This thesaurus can be dynamically generated during the retrieval process for a document collection that is classified by an unsupervised neural network, the self-organising map. In contrast with general relational thesaurus, the fuzzy-related thesaurus is more effective and efficient. The relationships between the terms are based on the classification of a document collection, and thus, the generated thesaurus naturally has more power to enhance retrieval quality. The recognition of the relationships can be done automatically without human involvement, which significantly reduces the cost associated with the construction of the thesaurus. An evaluation on the query refinement mechanism based on the fuzzy-related thesaurus has conducted and the preliminary result is promising. A significant improvement on retrieval performance was observed when a fuzzy-related thesaurus was used for query refinement for a software document collection.  相似文献   

16.
Bo Yu  Dong-hua Zhu 《Knowledge》2009,22(5):376-381
Email is one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide, individuals and organizations more and more rely on the emails to communicate and share information and knowledge. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. It is becoming a big challenge to process and manage the emails efficiently for and individuals and organizations. This paper proposes new email classification models using a linear neural network trained by perceptron learning algorithm and a nonlinear neural network trained by back-propagation learning algorithm. An efficient semantic feature space (SFS) method is introduced in these classification models. The traditional back-propagation neural network (BPNN) has slow learning speed and is prone to trap into a local minimum, so the modified back-propagation neural network (MBPNN) is presented to overcome these limitations. The vector space model based email classification system suffers from a large number of features and ambiguity in the meaning of terms, which will lead to sparse and noisy feature space. So we use the SFS to convert the original sparse and noisy feature space to a semantically richer feature space, which will helps to accelerate the learning speed. The experiments are conducted based on different training set size and extracted feature size. Experimental results show that the models using MBPNN outperform the traditional BPNN, and the use of SFS can greatly reduce the feature dimensionality and improve email classification performance.  相似文献   

17.
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better.  相似文献   

18.
提出一种不依赖于词典的抽取文本特征词的桥接模式滤除算法(BPFA).该算法统计文本中的汉字结合模式及其出现频率,通过消除桥接频率得到模式的支持频率,并依此来判断和提取正确词语.实验结果显示,BPFA能够有效提高分词结果的查准率和查全率.该算法适用于对词语频率敏感的中文信息处理应用,如文本分类、文本自动摘要等.  相似文献   

19.
20.

为提高待生催化剂碳含量预测的准确性, 提出一种基于改进的教学算法(MTLBO) 来优化BP 神经网络的预测模型. 针对基础教学算法全局搜索能力差的问题, 在教师阶段前后增加了预习和复习过程, 并在学生阶段采用量子方式进行更新. 测试结果表明, 该改进能够提高教学算法全局探索和局部改良能力, 利用改进教学算法可优化BP神经网络的权值和阈值, 并进行待生催化剂碳含量预测. 仿真结果表明, 改进后预测模型的预测精度和泛化能力均有一定程度的提高.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号