首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Spammers often embed text into images in order to avoid filtering by text-based spam filters, which result in a large number of advertisement spam images. Garbage image recognition has become one of the hotspots in the field of Internet spam filtering research. Its goal is to solve the problem that traditional spam information filtering methods encounter a sharp performance decline or even failure when filtering spam image information. Based on the clustering algorithm, this paper proposes a method to expand the data samples, which greatly improves the number of high-quality training samples and meets the needs of model training. Then, we train a convolutional neural networks using the enlarged data samples to recognize the SPAM in real time. The experimental results show that the accuracy of the model is increased by more than 14% after using the method of data augmentation. The accuracy of the model can be improved by 6% compared with other methods of data augmentation. Combined with convolutional neural networks and the proposed method of data augmentation, the accuracy of our SPAM filtering model is 7–11% higher than that of the traditional method.  相似文献   

2.
In this paper, a three-layer back-propagation neural network (BPNN) is employed for spam detection by using a concentration based feature construction (CFC) approach. In the CFC approach, ‘self’ and ‘non-self’ concentrations are constructed through ‘self’ and ‘non-self’ gene libraries, respectively, to form a two-element concentration vector for expressing the e-mail efficiently. A three-layer BPNN with two-element input is then employed to classify e-mails automatically. Comprehensive experiments are conducted on two public benchmark corpora PU1 and Ling to demonstrate that the proposed CFC approach based BPNN classifier not only has a very much fast speed but also achieves 97 and 99% of classification accuracy on corpora PU1 and Ling by just using a two-element concentration feature vector.  相似文献   

3.
This paper focuses on a method to overcome some of the disadvantages that are related with the use of artificial neural networks (ANNs) as supervised classifiers. The proposed method aims at speeding up network learning, improving classification accuracies and reducing variability on classification performance due to random weight initialization. This can be realized by transferring implicit knowledge from a previously learned source task to a new target task using the proposed algorithm, Discriminality Based Transfer (DBT). The presented approach is compared with conventional network training and a literal transfer method in a 13-class tropical savannah classification experiment using Landsat Thematic Mapper (TM) data. Knowledge was extracted from a network trained on the Kara experimental site in Togo. This information was used to classify the Savanes-L'Oti area which differs in terms of geographical position, image acquisition date, climatological condition and land cover. It was possible to speed up network learning 5.2, 4.3 and 1.8 times using, respectively, 5-, 10- and 20-pixels-per-class training sets. Larger training sets showed less speed improvement. After applying DBT, average classification accuracies were not significantly different from accuracies obtained after training random initialized networks, although DBT tended to show better performance on smaller training sets. It was possible to explain differences in individual class accuracies by analysing Bhattacharyya (BH) distances calculated between all Kara and Savanes-L'Oti classes. Finally, variability on classification performance decreased significantly when training with 5-, 10- and 20-pixels-per-class training sets after DBT application.  相似文献   

4.
The feature of brevity in mobile phone messages makes it difficult to distinguish lexical patterns to identify spam. This paper proposes a novel approach to spam classification of extremely short messages using not only lexical features that reflect the content of a message but new stylistic features that indicate the manner in which the message is written. Experiments on two mobile phone message collections in two different languages show that the approach outperforms previous content-based approaches significantly, regardless of language.  相似文献   

5.
Email has become one of the fastest and most economical forms of communication. Email is also one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. This paper proposes a new spam filtering system using revised back propagation (RBP) neural network and automatic thesaurus construction. The conventional back propagation (BP) neural network has slow learning speed and is prone to trap into a local minimum, so it will lead to poor performance and efficiency. The authors present in this paper the RBP neural network to overcome the limitations of the conventional BP neural network. A well constructed thesaurus has been recognized as a valuable tool in the effective operation of text classification, it can also overcome the problems in keyword-based spam filters which ignore the relationship between words. The authors conduct the experiments on Ling-Spam corpus. Experimental results show that the proposed spam filtering system is able to achieve higher performance, especially for the combination of RBP neural network and automatic thesaurus construction.  相似文献   

6.
This paper proposes an approach using large-scale text features for fault-prone module detection inspired by spam filtering. The number of every text feature in the source code of a module is counted and used as data for training detection models. In this paper, we prepared a naive Bayes classifier and a logistic regression model as detection models. To show the effectiveness of our approaches, we conducted experiments with five open source projects and compared them with a well-known metrics set, thereby achieving higher detection results. The results imply that large-scale text features are useful in constructing practical detection models, and measuring sophisticated metrics is not always necessary for detecting fault-prone modules.  相似文献   

7.
A methodology with back-propagation neural network models is developed to explore the artificial neural nets (ANN) technology in the new application territory of design optimization. This design methodology could go beyond the Hopfield network model, Hopfield and Tank (1985), for combinatorial optimization problems In this approach, pattern classification with back-propagation network, the most demonstrated power of neural networks applications, is utilized to identify the boundaries of the feasible and the infeasible design regions. These boundaries enclose the multi-dimensional space within which designs satisfy all design criteria. A feedforward network is then incorporated to perform function approximation of the design objective function. This approximation is performed by training the feedforward network with objective functions evaluated at selected design sets in the feasible design regions. Additional optimum design sets in the classified feasible regions are calculated and included in the successive training sets to improve the function mapping. Iteration is continued until convergent criteria are satisfied. This paper demonstrates that the artificial neural nets technology provides a global perspective of the entire design space with good and near optimal solutions. ANN can indeed be a potential technology for design optimization.  相似文献   

8.
在垃圾邮件过滤中,考虑到特征词对合法邮件和垃圾邮件分类贡献的不同,通过定义分类贡献比系数,将特征词分类贡献的思想应用到特征选择和朴素贝叶斯过滤器的设计中,在英文语料库上进行实验,实验结果表明,应用特征词分类贡献的垃圾邮件过滤方法可以有效提高过滤器对合法邮件和垃圾邮件的识别能力,降低过滤器对合法邮件和垃圾邮件的误判率。  相似文献   

9.
The problem of cloud data classification from satellite imagery using neural networks is considered. Several image transformations such as singular value decomposition (SVD) and wavelet packet (WP) were used to extract the salient spectral and textural features attributed to satellite cloud data in both visible and infrared (IR) channels. In addition, the well-known gray-level cooccurrence matrix (GLCM) method and spectral features were examined for the sake of comparison. Two different neural-network paradigms namely probability neural network (PNN) and unsupervised Kohonen self-organized feature map (SOM) were examined and their performance were also benchmarked on the geostationary operational environmental satellite (GOES) 8 data. Additionally, a postprocessing scheme was developed which utilizes the contextual information in the satellite images to improve the final classification accuracy. Overall, the performance of the PNN when used in conjunction with these feature extraction and postprocessing schemes showed the potential of this neural-network-based cloud classification system.  相似文献   

10.
Multi-class pattern classification has many applications including text document classification, speech recognition, object recognition, etc. Multi-class pattern classification using neural networks is not a trivial extension from two-class neural networks. This paper presents a comprehensive and competitive study in multi-class neural learning with focuses on issues including neural network architecture, encoding schemes, training methodology and training time complexity. Our study includes multi-class pattern classification using either a system of multiple neural networks or a single neural network, and modeling pattern classes using one-against-all, one-against-one, one-against-higher-order, and P-against-Q. We also discuss implementations of these approaches and analyze training time complexity associated with each approach. We evaluate six different neural network system architectures for multi-class pattern classification along the dimensions of imbalanced data, large number of pattern classes, large vs. small training data through experiments conducted on well-known benchmark data.  相似文献   

11.
This paper presents a general framework for robust adaptive neural network (NN)‐based feedback linearization controller design for greenhouse climate system. The controller is based on the well‐known feedback linearization, combined with radial basis functions NNs, which allows the feedback linearization technique to be used in an adaptive way. In addition, a robust sliding mode control is incorporated to deal with the bounded disturbances and the approximation errors of NNs. As a result, an inherently nonlinear robust adaptive control law is obtained, which not only provides fast and accurate tracking of varying set‐points, but also guarantees asymptotic tracking even if there are inherent approximation errors. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
Multimedia Tools and Applications - In this paper, we propose a new model based on 3D discrete orthogonal moments and deep neural networks (DNN) to improve the classification accuracy of 3D objects...  相似文献   

13.
Content-based e-mail spam filtering continues to be a challenging machine learning problem. Usually, the joint distribution of e-mails and labels changes from user to user and from time to time, and the training data are poor representatives of the true distribution. E-mail service providers have two options for automatic spam filtering at the service-side: a single global filter for all users or a personalized filter for each user. The practical usefulness of these options, however, depends upon the robustness and scalability of the filter. In this paper, we address these challenges by presenting a robust personalizable spam filter based on local and global discrimination modeling. Our filter exploits highly discriminating content terms, identified by their relative risk, to transform the input space into a two-dimensional feature space. This transformation is obtained by linearly pooling the discrimination information provided by each term for spam or non-spam classification. Following this local model, a linear discriminant is learned in the feature space for classification. We also present a strategy for personalizing the local and global models using unlabeled e-mails, without requiring user’s feedback. Experimental evaluations and comparisons are presented for global and personalized spam filtering, for varying distribution shift, for handling the problem of gray e-mails, on unseen e-mails, and with varying filter size. The results demonstrate the robustness and effectiveness of our filter and its suitability for global and personalized spam filtering at the service-side.  相似文献   

14.
基于文本区域特征的图像型垃圾邮件过滤算法   总被引:4,自引:0,他引:4  
垃圾邮件图像中通常含有大量文本区域,且这些区域常含有较多区分能力强的特征。提出一种基于图像中文本区域特征的垃圾邮件图像识别算法。首先提取出图像中文本区域的特征,包括:文本区域数量和面积、色饱和度、文字数量和颜色数量,以及图像的一些属性特征如图像面积等;然后利用支持向量机分类算法来识别垃圾邮件图像。实验表明,对于真实的邮件图像集,算法能够识别出98.5%的垃圾邮件图像,且正确率超过98%。  相似文献   

15.
Early detection of cancer is the most promising way to enhance a patient's chance for survival. This paper presents a computer aided classification method in computed tomography (CT) images of lungs developed using artificial neural network. The entire lung is segmented from the CT images and the parameters are calculated from the segmented image. The statistical parameters like mean, standard deviation, skewness, kurtosis, fifth central moment and sixth central moment are used for classification. The classification process is done by feed forward and feed forward back propagation neural networks. Compared to feed forward networks the feed forward back propagation network gives better classification. The parameter skewness gives the maximum classification accuracy. Among the already available thirteen training functions of back propagation neural network, the Traingdx function gives the maximum classification accuracy of 91.1%. Two new training functions are proposed in this paper. The results show that the proposed training function 1 gives an accuracy of 93.3%, specificity of 100% and sensitivity of 91.4% and a mean square error of 0.998. The proposed training function 2 gives a classification accuracy of 93.3% and minimum mean square error of 0.0942.  相似文献   

16.
The increased synergy between neural networks (NN) and fuzzy sets has led to the introduction of granular neural networks (GNNs) that operate on granules of information, rather than information itself. The fact that processing is done on a conceptual rather than on a numerical level, combined with the representation of granules using linguistic terms, results in increased interpretability. This is the actual benefit, and not increased accuracy, gained by GNNs. The constraints used to implement the GNN are such that accuracy degradation should not be surprising. Having said that, it is well known that simple structured NNs tend to be less prone to over‐fitting the training data set, maintaining the ability to generalize and more accurately classify previously unseen data. Standard NNs are frequently found to be accurate but difficult to explain, hence they are often associated with the black box syndrome. Because in GNNs the operation is carried out at a conceptual level, the components have unambiguous meaning, revealing how classification decisions are formed. In this paper, the interpretability of GNNs is exploited using a satellite image classification problem. We examine how land use classification using both spectral and non‐spectral information is expressed in GNN terms. One further contribution of this paper is the use of specific symbolization of the network components to easily establish causality relationships.  相似文献   

17.
Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures.  相似文献   

18.
Robust radar target classifier using artificial neural networks   总被引:3,自引:0,他引:3  
In this paper an artificial neural network (ANN) based radar target classifier is presented, and its performance is compared with that of a conventional minimum distance classifier. Radar returns from realistic aircraft are synthesized using a thin wire time domain electromagnetic code. The time varying backscattered electric field from each target is processed using both a conventional scheme and an ANN-based scheme for classification purposes. It is found that a multilayer feedforward ANN, trained using a backpropagation learning algorithm, provides a higher percentage of successful classification than the conventional scheme. The performance of the ANN is found to be particularly attractive in an environment of low signal-to-noise ratio. The performance of both methods are also compared when a preemphasis filter is used to enhance the contributions from the high frequency poles in the target response.  相似文献   

19.
张建  严珂  马祥 《计算机应用》2022,42(3):770-777
垃圾信息的识别是自然语言处理方面主要的任务之一.传统方法是基于文本特征或词频的方法,其识别准确率主要依赖于特定关键词的出现与否,存在对关键词识别错误或对未出现关键词的垃圾信息文本识别能力较差的问题,提出基于神经网络的方法.首先,利用传统方法针对这一类垃圾信息文本进行识别训练和测试;然后,利用从垃圾短信、广告和垃圾邮件数...  相似文献   

20.
Manufacturing features recognition using backpropagation neural networks   总被引:3,自引:0,他引:3  
A backpropagation neural network (BPN) is applied to the problem of feature recognition from a boundary representation (B-rep) solid model to facilitate process planning of manufactured products. It is based on the use of the face complexity code to represent the features and a neural network for the analysis of the recognition. The face complexity code is a measure of the face complexity of a feature based on the convexity or concavity of the surrounding geometry. The codes for various features are fed to the network for analysis. A backpropagation network is implemented for recognition of features and tested on published results to measure its performance. Any two or more features having significant differences in face complexity codes were used as exemplars for training the network. A new feature presented to the network is associated with one of the existing clusters, if they are similar, or the network creates a new cluster, if otherwise. Experimental results show that the network was consistent in recognizing features, hence is appropriate for application to the problem of feature recognition in automated manufacturing environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号