首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the constantly changing and deceptive strategies that can be concealed in complex of financial statements, traditional means of financial analysis is unable to detect these accounting frauds in advance. In order to detect new accounting frauds and find out the true meaning of off-balance sheet arrangements, we propose an easy and feasible method using an unsupervised learning system. In unsupervised learning, the training of the network is entirely data-driven and no target results are provided. The features that do not help in clustering can be removed. With unsupervised learning it is possible to learn larger and more complex relations than with supervised learning. In the demonstration, we extract four non-traditional warning signals using adaptive resonance theory, with Enron and WorldCom as prototypes to identify the possibility of potential fraud of a company that investors or analysts may be concerned with.  相似文献   

2.
This paper investigates the abilities of adaptive resonance theory (ART) neural networks as miners of hierarchical thematic structure in text collections. We present experimental results with binary ART1 on the benchmark Reuter-21578 corpus. Using both quantitative evaluation with the standard F 1 measure and qualitative visualization of the hierarchy obtained with ART, we discuss how useful ART built hierarchies would be to a user intending to use it as a means to find and access textual information. Our F 1 results show that ART1 produces hierarchical clustering that exhibit a quality exceeding k-means and a hierarchical clustering algorithm. However, we identify several critical problem areas that would make it rather impractical to actually use such a hierarchy in a real-life environment. These predicaments point to the importance of semantic feature selection. Our main contribution is to test in details the applicability of ART to the important domain of hierarchical document clustering, an application of Adaptive Resonance that had received little attention until now.
Louis MasseyEmail:
  相似文献   

3.
In this paper, we propose a new feature-selection algorithm for text classification, called best terms (BT). The complexity of BT is linear in respect to the number of the training-set documents and is independent from both the vocabulary size and the number of categories. We evaluate BT on two benchmark document collections, Reuters-21578 and 20-Newsgroups, using two classification algorithms, naive Bayes (NB) and support vector machines (SVM). Our experimental results, comparing BT with an extensive and representative list of feature-selection algorithms, show that (1) BT is faster than the existing feature-selection algorithms; (2) BT leads to a considerable increase in the classification accuracy of NB and SVM as measured by the F1 measure; (3) BT leads to a considerable improvement in the speed of NB and SVM; in most cases, the training time of SVM has dropped by an order of magnitude; (4) in most cases, the combination of BT with the simple, but very fast, NB algorithm leads to classification accuracy comparable with SVM while sometimes it is even more accurate.  相似文献   

4.
The problem of object category classification by committees or ensembles of classifiers, each of which is based on one diverse codebook, is addressed in this paper. Two methods of constructing visual codebook ensembles are proposed in this study. The first technique introduces diverse individual visual codebooks using different clustering algorithms. The second uses various visual codebooks of different sizes for constructing an ensemble with high diversity. Codebook ensembles are trained to capture and convey image properties from different aspects. Based on these codebook ensembles, different types of image representations can be acquired. A classifier ensemble can be trained based on different expression datasets from the same training image set. The use of a classifier ensemble to categorize new images can lead to improved performance. Detailed experimental analysis on a Pascal VOC challenge dataset reveals that the present ensemble approach performs well, consistently improves the performance of visual object classifiers, and results in state-of-the-art performance in categorization.  相似文献   

5.
Stability and plasticity in learning systems are both equally essential, but achieving stability and plasticity simultaneously is difficult. Adaptive resonance theory (ART) neural networks are known for their plastic and stable learning of categories, hence providing an answer to the so called stability-plasticity dilemma. However, it has been demonstrated recently that contrary to general belief, ART stability is not possible with infinite streaming data. In this paper, we present an improved stabilization strategy for ART neural networks that does not suffer from this problem and that produces a soft-clustering solution as a positive side effect. Experimental results in a task of text clustering demonstrate that the new stabilization strategy works well, but with a slight loss in clustering quality compared to the traditional approach. For real-life intelligent applications in which infinite streaming data is generated, the stable and soft-clustering solution obtained with our approach more than outweighs the small loss in quality. This research was supported in part by the National Defence Academic Research Program (ARP) under grant 743321.  相似文献   

6.
基于Boosting算法的文本自动分类器设计   总被引:2,自引:0,他引:2  
Boosting算法是目前流行的一种机器学习算法。采用一种改进的Boosting算法Adaboost.MHKR作为分类算法,设计了一个文本自动分类器,并给出了评估方法和结果。评价表明,该分类器有很好的分类精度。  相似文献   

7.
基于向量空间模型的中文文本层次分类方法研究   总被引:8,自引:0,他引:8  
肖雪  何中市 《计算机应用》2006,26(5):1125-1126
在文本分类的类别数量庞大的情况下,层次分类是一种有效的分类途径。针对层次分类的结构特点,考虑到不同的层次对特征选择和分类方法有不同的要求,提出了新的基于向量空间模型的二重特征选择方法FDS以及层次分类算法HTC。二重特征选择方法对每一层均进行一次特征选择,并逐层改变特征数量和权重计算方法;HTC算法把分别对粗分和细分更有效的类中心向量法与SVM方法相结合。实验表明,该方法相对于平面分类和一般的层次分类方法,有较高的准确率。  相似文献   

8.
Based on a combination of a PD controller and a switching type two-parameter compensation force, an iterative learning controller with a projection-free adaptive algorithm is presented in this paper for repetitive control of uncertain robot manipulators. The adaptive iterative learning controller is designed without any a priori knowledge of robot parameters under certain properties on the dynamics of robot manipulators with revolute joints only. This new adaptive algorithm uses a combined time-domain and iteration-domain adaptation law allowing to guarantee the boundedness of the tracking error and the control input, in the sense of the infinity norm, as well as the convergence of the tracking error to zero, without any a priori knowledge of robot parameters. Simulation results are provided to illustrate the effectiveness of the learning controller.  相似文献   

9.
蔡月红  朱倩  孙萍  程显毅 《计算机应用》2010,30(4):1015-1018
针对海量短文本分类中的标注语料匮乏问题,提出了一种基于属性选择的半监督短文本分类算法。通过基于ReliefF评估和独立性度量的属性选择技术选出部分具有较好的属性独立关系的属性参与分类模型的学习,以弱化朴素贝叶斯模型的强独立性假设条件;借助集成学习,以具有一定差异性的分类器组去估计初始值,并以多数投票策略去分类未标注语料集,以减低最大期望算法(EM)对于初始值的敏感。通过真实语料上进行的比较实验,证明了该方法能有效利用大量未标注语料提高算法的泛化能力。  相似文献   

10.
文本特征表示是在文本自动分类中最重要的一个环节。在基于向量空间模型(VSM)的文本表示中特征单元粒度的选择直接影响到文本分类的效果。对于基于词袋模型(BOW)的维吾尔文文本分类效果不理想的问题,提出了一种基于统计方法的维吾尔语短语抽取算法并将抽取到的短语作为文本特征项,采用支持向量机(SVM)算法对维吾尔文文本进行了分类实验。实验结果表明,与以词为特征的文本分类相比,短语作为文本特征能够提高维吾尔文文本分类的准确率和召回率。  相似文献   

11.
This paper proposes an efficient technique for learning a discriminative codebook for scene categorization. A state-of-the-art approach for scene categorization is the Bag-of-Words (BoW) framework, where codebook generation plays an important role in determining the performance of the system. Traditionally, the codebook generation methods adopted in the BoW techniques are designed to minimize the quantization error, rather than optimize the classification accuracy. In view of this, this paper tries to address the issue by careful design of the codewords such that the resulting image histograms for each category will retain strong discriminating power, while the online categorization of the testing image is as efficient as in the baseline BoW. The codewords are refined iteratively to improve their discriminative power offline. The proposed method is validated on UIUC Scene-15 dataset and NTU Scene-25 dataset and it is shown to outperform other state-of-the-art codebook generation methods in scene categorization.  相似文献   

12.
This paper provides a comprehensive overview of the motivations, methodology and current status of an ongoing research program whose long-term goal is to elucidate the essential principles of a theory of adaptive behavior. The thoroughly dynamical nature of both adaptive behavior itself and the causal mechanisms that support it is emphasized throughout. An initial mapping of the basic concepts of adaptive behavior into the language of dynamical systems theory is proposed, and some of the general consequences of this preliminary theoretical framework are discussed. The two key ideas of this framework are (1) that an agent and its environment should be understood as two coupled dynamical systems whose mutual interaction is jointly responsible for the agent's behavior, and (2) that an agent's need to maintain its existence in its environment defines a viability constraint on its behavioral dynamics. A constructive research methodology involving the use of evolutionary algorithms to evolve continuous-time recurrent neural networks for controlling the behavior of model agents is described, and several examples of this methodology are presented, including models of chemotaxis, walking, sequential decision-making and learning. Finally, a detailed dynamical analysis of one evolved walking circuit is presented. This analysis illustrates the kinds of insights that can be obtained by treating agents as dynamical systems and applying the tools of dynamical systems theory to their behavior.  相似文献   

13.
FANNC: A Fast Adaptive Neural Network Classifier   总被引:3,自引:0,他引:3  
In this paper, a fast adaptive neural network classifier named FANNC is proposed. FANNC exploits the advantages of both adaptive resonance theory and field theory. It needs only one-pass learning, and achieves not only high predictive accuracy but also fast learning speed. Besides, FANNC has incremental learning ability. When new instances are fed, it does not need to retrain the whole training set. Instead, it could learn the knowledge encoded in those instances through slightly adjusting the network topology when necessary, that is, adaptively appending one or two hidden units and corresponding connections to the existing network. This characteristic makes FANNC fit for real-time online learning tasks. Moreover, since the network architecture is adaptively set up, the disadvantage of manually determining the number of hidden units of most feed-forward neural networks is overcome. Benchmark tests show that FANNC is a preferable neural network classifier, which is superior to several other neural algorithms on both predictive accuracy and learning speed. Received 10 February 1999 / Revised 21 June 1999 / Accepted in revised form 11 October 1999  相似文献   

14.
ContextNumerous software design patterns have been introduced and cataloged either as a canonical or a variant solution to solve a design problem. The existing automatic techniques for design pattern(s) selection aid novice software developers to select the more appropriate design pattern(s) from the list of applicable patterns to solve a design problem in the designing phase of software development life cycle.GoalHowever, the existing automatic techniques are limited to the semi-formal specification, multi-class problem, an adequate sample size to make precise learning and individual classifier training in order to determine a candidate design pattern class and suggest more appropriate pattern(s).MethodTo address these issues, we exploit a text categorization based approach via Fuzzy c-means (unsupervised learning technique) that targets to present a systematic way to group the similar design patterns and suggest the appropriate design pattern(s) to developers related to the specification of a given design problem. We also propose an evaluation model to assess the effectiveness of the proposed approach in the context of several real design problems and design pattern collections. Subsequently, we also propose a new feature selection method Ensemble-IG to overcome the multi-class problem and improve the classification performance of the proposed approach.ResultsThe promising experimental results suggest the applicability of the proposed approach in the domain of classification and selection of appropriate design patterns. Subsequently, we also observed the significant improvement in learning precision of the proposed approach through Ensemble-IG.ConclusionThe proposed approach has four advantages as compared to previous work. First, the semi-formal specification of design patterns is not required as a prerequisite; second, the ground reality of class label assignment is not mandatory; third, lack of classifier’s training for each design pattern class and fourth, an adequate sample size is not required to make precise learning.  相似文献   

15.
基于LDA模型的文本分类研究   总被引:3,自引:0,他引:3       下载免费PDF全文
针对传统的降维算法在处理高维和大规模的文本分类时存在的局限性,提出了一种基于LDA模型的文本分类算法,在判别模型SVM框架中,应用LDA概率增长模型,对文档集进行主题建模,在文档集的隐含主题-文本矩阵上训练SVM,构造文本分类器。参数推理采用Gibbs抽样,将每个文本表示为固定隐含主题集上的概率分布。应用贝叶斯统计理论中的标准方法,确定最优主题数T。在语料库上进行的分类实验表明,与文本表示采用VSM结合SVM,LSI结合SVM相比,具有较好的分类效果。  相似文献   

16.
The computer model of searching adaptive behavior is constructed and investigated. The model describes searching behavior of caddis fly larvae which inhabit creek bottoms and build their cases using hard particles of different size. Using large particles, the larva can build cases more quickly and effectively than with small particles, so it prefers large ones. Inertial switching between search tactics takes place. The model is compared with results of biological experiment. The results of simulation are adequate to biological data. The text was submitted by the authors in English.  相似文献   

17.
Variations in clothing alter an individual's appearance, making the problem of gait identification much more difficult. If the type of clothing differs between the gallery and a probe, certain parts of the silhouettes are likely to change and the ability to discriminate subjects decreases with respect to these parts. A part-based approach, therefore, has the potential of selecting the appropriate parts. This paper proposes a method for part-based gait identification in the light of substantial clothing variations. We divide the human body into eight sections, including four overlapping ones, since the larger parts have a higher discrimination capability, while the smaller parts are more likely to be unaffected by clothing variations. Furthermore, as there are certain clothes that are common to different parts, we present a categorization for items of clothing that groups similar clothes. Next, we exploit the discrimination capability as a matching weight for each part and control the weights adaptively based on the distribution of distances between the probe and all the galleries. The results of the experiments using our large-scale gait dataset with clothing variations show that the proposed method achieves far better performance than other approaches.  相似文献   

18.
基于主动学习支持向量机的文本分类   总被引:2,自引:0,他引:2       下载免费PDF全文
提出基于主动学习支持向量机的文本分类方法,首先采用向量空间模型(VSM)对文本特征进行提取,使用互信息对文本特征进行降维,然后提出主动学习算法对支持向量机进行训练,使用训练后的分类器对新的文本进行分类,实验结果表明该方法具有良好的分类性能。  相似文献   

19.
Discovering the hierarchical structures of differ- ent classes of object behaviors can satisfy the requirements of various degrees of abstraction in association analysis, be- havior modeling, data preprocessing, pattern recognition and decision making, etc. In this paper, we call this process as associative categorization, which is different from classical clustering, associative classification and associative cluster- ing. Focusing on representing the associations of behaviors and the corresponding uncertainties, we propose the method for constructing a Markov network (MN) from the results of frequent pattern mining, called item-associative Markov net- work (IAMN), where nodes and edges represent the frequent patterns and their associations respectively. We further dis- cuss the properties of a probabilistic graphical model to guar- antee the IAMN's correctness theoretically. Then, we adopt the concept of chordal to reflect the closeness of nodes in the IAMN. Adopting the algorithm for constructing join trees from an MN, we give the algorithm for IAMN-based associa- tive categorization by hierarchical bottom-up aggregations of nodes. Experimental results show the effectiveness, efficiency and correctness of our methods.  相似文献   

20.
Discovering the hierarchical structures of different classes of object behaviors can satisfy the requirements of various degrees of abstraction in association analysis, behavior modeling, data preprocessing, pattern recognition and decision making, etc. In this paper, we call this process as associative categorization, which is different from classical clustering, associative classification and associative clustering. Focusing on representing the associations of behaviors and the corresponding uncertainties, we propose the method for constructing a Markov network (MN) from the results of frequent pattern mining, called item-associative Markov network (IAMN), where nodes and edges represent the frequent patterns and their associations respectively. We further discuss the properties of a probabilistic graphical-model to guarantee the IAMN’s correctness theoretically. Then, we adopt the concept of chordal to reflect the closeness of nodes in the IAMN. Adopting the algorithm for constructing join trees from an MN, we give the algorithm for IAMN-based associative categorization by hierarchical bottom-up aggregations of nodes. Experimental results show-the effectiveness, efficiency and correctness of our methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号