共查询到20条相似文献,搜索用时 15 毫秒
1.
Email has become one of the fastest and most economical forms of communication. Email is also one of the most ubiquitous and pervasive applications used on a daily basis by millions of people worldwide. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. This paper proposes a new spam filtering system using revised back propagation (RBP) neural network and automatic thesaurus construction. The conventional back propagation (BP) neural network has slow learning speed and is prone to trap into a local minimum, so it will lead to poor performance and efficiency. The authors present in this paper the RBP neural network to overcome the limitations of the conventional BP neural network. A well constructed thesaurus has been recognized as a valuable tool in the effective operation of text classification, it can also overcome the problems in keyword-based spam filters which ignore the relationship between words. The authors conduct the experiments on Ling-Spam corpus. Experimental results show that the proposed spam filtering system is able to achieve higher performance, especially for the combination of RBP neural network and automatic thesaurus construction. 相似文献
2.
In this paper, we address the problem of document re-ranking in information retrieval, which is usually conducted after initial
retrieval to improve rankings of relevant documents. To deal with this problem, we propose a method which automatically constructs
a term resource specific to the document collection and then applies the resource to document re-ranking. The term resource
includes a list of terms extracted from the documents as well as their weighting and correlations computed after initial retrieval.
The term weighting based on local and global distribution ensures the re-ranking not sensitive to different choices of pseudo
relevance, while the term correlation helps avoid any bias to certain specific concept embedded in queries. Experiments with
NTCIR3 data show that the approach can not only improve performance of initial retrieval, but also make significant contribution
to standard query expansion. 相似文献
3.
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms. 相似文献
4.
Multimedia Tools and Applications - Online social media is a powerful source of information that can influence users’ decisions. Due to the huge volume of data generated by such media, many... 相似文献
5.
The list of documents returned by Internet search engines in response to a query these days can be quite overwhelming. There is an increasing need for organising this information and presenting it in a more compact and efficient manner. This paper describes a method developed for the automatic clustering of World Wide Web documents, according to their relevance to the user’s information needs, by using a hybrid neural network. The objective is to reduce the time and effort the user has to spend to find the information sought after. Clustering documents by features representative of their contents—in this case, key words and phrases—increases the effectiveness and efficiency of the search process. It is shown that a two-dimensional visual presentation of information on retrieved documents, instead of the traditional linear listing, can create a more user-friendly interface between a search engine and the user. 相似文献
7.
This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN)
and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is
easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN
to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by
using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which
each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers
the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578
corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance
of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving
good classification results. 相似文献
8.
The objective of this paper is to find a sequence of jobs in the flow shop to minimize makespan. A feed forward back propagation
neural network is used to solve the problem. The network is trained with the optimal sequences of completely enumerated five,
six and seven jobs, ten machine problem and this trained network is then used to solve the problem with greater number of
jobs. The sequence obtained using artificial neural network (ANN) is given as the initial sequence to a heuristic proposed
by Suliman and also to genetic algorithm (GA) as one of the sequences of the population for further improvement. The approaches
are referred as ANN-Suliman heuristic and ANN-GA heuristic respectively. Makespan of the sequences obtained by these heuristics
are compared with the makespan of the sequences obtained using the heuristic proposed by Nawaz, Enscore and Ham (NEH) and
Suliman Heuristic initialized with Campbell Dudek and Smith (CDS) heuristic called as CDS-Suliman approach. It is found that
the ANN-GA and ANN-Suliman heuristic approaches perform better than NEH and CDS-Suliman heuristics for the problems considered. 相似文献
9.
In the present article we introduce and validate an approach for single-label multi-class document categorization based on text content features. The introduced approach uses the statistical property of Principal Component Analysis, which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. Such matrix transforms the original set of training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum reconstruction error. The proposed method, called Minimizer of the Reconstruction Error (mRE) classifier, uses this property, and extends and applies it to new unseen test documents. Several experiments on four multi-class datasets for text categorization are conducted in order to test the stable and generally better performance of the proposed approach in comparison with other popular classification methods. 相似文献
10.
The current thrust of research in robotics is to build robots which can operate in dynamic and/or partially known environments. The ability of learning endows the robot with a form of autonomous intelligence to handle such situations. This paper focuses on the intersection of the fields of robot control and learning methods as represented by artificial neural networks. An in-depth overview of the application of neural networks to the problem of robot control is presented. Some typical neural network architectures are discussed first. The important issues involved in the study of robotics are then highlighted. This paper concentrates on the neural network applications to the motion control of robots involved in both non-contact and contact tasks. The current state of research in this area is surveyed and the strengths and weakness of the present approaches are emphasized. The paper concludes by indentifying areas which need future research work. 相似文献
11.
We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance. 相似文献
12.
根据大规模中文文本分类的特点,提出了一种基于最大特征值选取的快速文本正交编码方法,并构造了一种具有较快收敛速度的Hopfield神经网络模型.采用神经动力学方法,对自反馈Hopfield神经网络的网络结构进行了稳定性分析.在Hopfield神经网络中引入KNN再预测机制,使进入伪状态而被拒收的样本能有效地逃离伪状态.实验结果表明,该方法应用到大规模的中文文本分类时,效果良好. 相似文献
13.
提出了一种基于后向传播神经网络的专利自动分类方法.通过中文分词从专利文件集中提取特征项,并根据特征项在专利文件中出现的频率赋予其权重,从而将每篇专利文件表示为一个特征项向量.为取得较好的BP神经网络(BPN)训练效果,使用X2统计方法进行特征向量降维,并使用BPN专利分类器进行专利文件分类.用国际分类号为H02下的专利文件作为测试数据,取得了较好的分类效果. 相似文献
14.
Camouflaged people like soldiers on the battlefield or even camouflaged objects in the natural environments are hard to be detected because of the strong resemblances between the hidden target and the background. That’s why seeing these hidden objects is a challenging task. Due to the nature of hidden objects, identifying them require a significant level of visual perception. To overcome this problem, we present a new end-to-end framework via a multi-level attention network in this paper. We design a novel inception module to extract multi-scale receptive fields features aiming at enhancing feature representation. Furthermore, we use a dense feature pyramid taking advantage of multi-scale semantic features. At last, to locate and distinguish the camouflaged target better from the background, we develop a multi-attention module that generates more discriminative feature representation and combines semantic information with spatial information from different levels. Experiments on the camouflaged people dataset show that our approach outperformed all state-of-the-art methods. 相似文献
15.
As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weighted Prototypes (CAWP), which works with a similarity matrix. In CAWP, each cluster is characterized by multiple objects with different representative weights. With this new cluster representation scheme, CAWP aims to simultaneously produce clusters of improved quality and a set of ranked representative objects for each cluster. An efficient algorithm is derived to alternatingly update the clusters and the representative weights of objects with respect to each cluster. An annealing-like optimization procedure is incorporated to alleviate the local optimum problem for better clustering results and at the same time to make the algorithm less sensitive to parameter setting. Experimental results on benchmark document datasets show that, CAWP achieves favorable effectiveness and efficiency in clustering, and also provides useful information for cluster-specified analysis. 相似文献
16.
This paper proposed a new improved method for back propagation neural network, and used an efficient method to reduce the dimension and improve the performance. The traditional back propagation neural network (BPNN) has the drawbacks of slow learning and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the learning phase evaluation back propagation neural network (LPEBP) to improve the traditional BPNN. We adopt a singular value decomposition (SVD) technique to reduce the dimension and construct the latent semantics between terms. Experimental results show that the LPEBP is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. The SVD technique cannot only greatly reduce the high dimensionality but also enhance the performance. So SVD is to further improve the document classification systems precisely and efficiently. 相似文献
18.
A back-propagation (BP) neural network has good self-learning, self-adapting and generalization ability, but it may easily
get stuck in a local minimum, and has a poor rate of convergence. Therefore, a method to optimize a BP algorithm based on
a genetic algorithm (GA) is proposed to speed the training of BP, and to overcome BP’s disadvantage of being easily stuck
in a local minimum. The UCI data set is used here for experimental analysis and the experimental result shows that, compared
with the BP algorithm and a method that only uses GA to learn the connection weights, our method that combines GA and BP to
train the neural network works better; is less easily stuck in a local minimum; the trained network has a better generalization
ability; and it has a good stabilization performance. 相似文献
19.
This paper presents a new single-layer neural network which is based on orthogonal functions. This neural network is developed to avoid the problems of traditional feedforward neural networks such as the determination of initial weights and the numbers of layers and processing elements. The desired output accuracy determines the required number of processing elements. Because weights are unique, the training of the neural network converges rapidly. An experiment in approximating typical continuous and discrete functions is given. The results show that the neural network has excellent performance in convergence time and approximation error. 相似文献
20.
Numerous studies have addressed nonlinear functional approximation by multilayer perceptrons (MLPs) and RBF networks as a special case of the more general mapping problem. The performance of both these supervised network models intimately depends on the efficiency of their learning process. This paper presents an unsupervised recurrent neural network, based on the recurrent Mean Field Theory (MFT) network model, that finds a least-squares approximation to an arbitrary L 2 function, given a set of Gaussian radially symmetric basis functions (RBFs). Essential is the reformulation of RBF approximation as a problem of constrained optimisation. A new concept of adiabatic network organisation is introduced. Together with an adaptive mechanism of temperature control this allows the network to build a hierarchical multiresolution approximation with preservation of the global optimisation characteristics. A revised problem mapping results in a position invariant local interconnectivity pattern, which makes the network attractive for electronic implementation. The dynamics and performance of the network are illustrated by numerical simulation. 相似文献
|