共查询到20条相似文献,搜索用时 109 毫秒
1.
基于核方法的XML文档自动分类 总被引:3,自引:0,他引:3
支持向量机(SVM)方法通过核函数进行空间映射并构造最优分类超平面解决分类器的构造问题,该方法在文本自动分类应用中具有明显优势.XML 文档是文本内容信息与结构信息的综合体,作为一种新的数据形式,成为当前的研究热点.文中以结构链接向量模型为基础,研究了基于支持向量机的XML文档自动分类方法,提出了适合XML文档分类的核... 相似文献
2.
基于向量空间模型的文本分类由于文本向量维数较高导致分类器效率较低.针对这一不足,提出一种新的基于簇划分的文本分类方法.其主要思想是根据向量空间中向量间的距离,将训练文档分成若干簇,同一簇中的文档具有相同类别.测试时,根据测试文档落入哪个簇,确定文档的类别,并且和传统的文本分类方法k-NN进行了比较.实验结果表明,该方法在高维空间具有良好的泛化能力和很好的时间性能. 相似文献
3.
基于关联规则的Web文档分类 总被引:5,自引:2,他引:5
在现有的Web文档分类器中,有的分类器产生比较精确的分类结果,有的分类器产生更易解释的分类模型,但还没有分类器可以将两个方面的优点结合起来.有鉴于此,论文提出一种基于关联规则的Web文档分类方法.该方法采用事务概念,主要考虑两方面的问题:①在文档训练集中发现最优的词条关联规则;②用这些规则构建一个Web文档分类器.试验表明该分类器性能良好,训练速度快,产生的规则易于被人理解,而且容易更新和调整. 相似文献
4.
5.
中心分类法性能高效,但需要大量的训练文档(已标识文档)来训练分类器以保证分类的正确性.而训练文档因需花费大量人力物力来分类而数量有限,同时,网络上存在着很多未标识文档.为此,对中心分类法进行改进,提出了ONUC和0FFUC算法,以弥补当训练文档不足时,中心分类法性能急剧下降的缺陷.考虑到中心分类法易受孤立点的影响,采取了去边处理.实验证明,与普通的中心分类法、其它半监督经典算法比较,在训练文档很少的情况下,该算法能获得较好的性能. 相似文献
6.
7.
针对文本情感分类准确率不高的问题,提出基于CCA-VSM分类器和KFD的多级文本情感分类方法。采用典型相关性分析对文档的权重特征向量和词性特征向量进行降维,在约简向量集上构建向量空间模型,根据模型之间的差异度设计VSM分类器,筛选出与测试文档差异度较小的R个模型作为核Fisher判别的输入,最终判别出文档的情感观点。实验结果表明:该方法比传统支持向量机有较高的分类准确率和较快的分类速度,权重特征和词性特征对分类准确率的影响较大。 相似文献
8.
为了高效地解决Web文档分类问题,提出了一种基于核鉴别分析方法KDA和SVM的文档分类算法。该算法首先利用KDA对训练集中的高维Web文档空间进行降维,然后在降维后的低维特征空间中利用乘性更新规则优化的SVM进行分类预测。采用了文档分类领域两个著名的数据集Reuters-21578和20-Newsgroup进行实验,实验结果表明该算法不仅获得了更高的分类准确率,而且具有较少的运行时间。 相似文献
9.
针对现有文档向量表示方法受噪声词语影响和重要词语语义不完整的问题,通过融合单词贡献度与Word2Vec词向量提出一种新的文档表示方法。应用数据集训练Word2Vec模型,计算数据集中词语的贡献度,同时设置贡献度阈值,提取贡献度大于该阈值的单词构建单词集合。在此基础上,寻找文档与集合中共同存在的单词,获取其词向量并融合单词贡献度生成文档向量。实验结果表明,该方法在搜狗中文文本语料库和复旦大学中文文本分类语料库上分类的平均准确率、召回率和F1值均优于TF-IDF、均值Word2Vec、PTF-IDF加权Word2Vec模型等传统方法,同时其对英文文本也能进行有效分类。 相似文献
10.
11.
12.
Ronald R. Yager 《Information Sciences》2006,176(5):577-588
Our objective here is to provide an extension of the naive Bayesian classifier in a manner that gives us more parameters for matching data. We first describe the naive Bayesian classifier, and then discuss the ordered weighted averaging (OWA) aggregation operators. We introduce a new class of OWA operators which are based on a combining the OWA operators with t-norm’s operators. We show that the naive Bayesian classifier can seen as a special case of this. We use this to suggest an extended version of the naive Bayesian classifier which involves a weighted summation of products of the probabilities. An algorithm is suggested to obtain the weights associated with this extended naive Bayesian classifier. 相似文献
13.
Reliable pedestrian detection is of great importance in visual surveillance. In this paper, we propose a novel multiplex classifier model, which is composed of two multiplex cascades parts: Haar-like cascade classifier and shapelet cascade classifier. The Haar-like cascade classifier filters out most of irrelevant image background, while the shapelet cascade classifier detects intensively head-shoulder features. The weighted linear regression model is introduced to train its weak classifiers. We also introduce a structure table to label the foreground pixels by means of background differences. The experimental results illustrate that our classifier model provides satisfying detection accuracy. In particular, our detection approach can also perform well for low resolution and relatively complicated backgrounds. 相似文献
14.
This paper presents a novel method for differential diagnosis of erythemato-squamous disease. The proposed method is based on fuzzy weighted pre-processing, k-NN (nearest neighbor) based weighted pre-processing, and decision tree classifier. The proposed method consists of three parts. In the first part, we have used decision tree classifier to diagnosis erythemato-squamous disease. In the second part, first of all, fuzzy weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified using decision tree classifier. In the third part, k-NN based weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified via decision tree classifier. The employed decision tree classifier, fuzzy weighted pre-processing decision tree classifier, and k-NN based weighted pre-processing decision tree classifier have reached to 86.18, 97.57, and 99.00% classification accuracies using 20-fold cross validation, respectively. 相似文献
15.
Atorn Nuntiyagul Kanlaya Naruedomkul Nick Cercone Damras Wongsawang 《Computational Intelligence》2007,23(1):28-44
We proposed a feature selection approach, Patterned Keyword in Phrase ( PKIP ), to text categorization for item banks. The item bank is a collection of textual question items that are short sentences. Each sentence does not contain enough relevant words for directly categorizing by the traditional approaches such as "bag-of-words." Therefore, PKIP was designed to categorize such question item using only available keywords and their patterns. PKIP identifies the appropriate keywords by computing the weight of all words. In this paper, two keyword selection strategies are suggested to ensure the categorization accuracy of PKIP. PKIP was implemented and tested with the item bank of Thai high primary mathematics questions. The test results have proved that PKIP is able to categorize the question items correctly and the two keyword selection strategies can extract the very informative keywords. 相似文献
16.
基于权重查询词的XML结构查询扩展 总被引:9,自引:0,他引:9
文本文档信息检索中检索质量不高的一个主要原因是用户难以提出准确的描述查询意图的查询表达式. 而XML文档除了具有文本文档的内容特征外,还具有结构特征,导致用户更难以提出准确的查询表达式.为了解决这一问题,提出一种基于相关反馈的查询扩展方法,可以帮助用户构建满足查询意图的"内容 结构"的查询表达式.该方法首先进行查询词扩展,找到最能代表用户查询意图的权重扩展查询词;然后在扩展查询词的基础上进行结构查询扩展;最终形成完整的"内容 结构"的查询扩展表达式.实验结果表明,与未进行查询扩展相比,扩展后prec@10和prec@20的平均准确率提高30%以上. 相似文献
17.
Daehoon Kim Daeyong Kim Sanghoon Jun Seungmin Rho Eenjun Hwang 《Multimedia Tools and Applications》2014,73(2):857-872
With the flood and popularity of various multimedia contents on the Internet, searching for appropriate contents and representing them effectively has become an essential part for user satisfaction. So far, many contents recommendation systems have been proposed for this purpose. A popular approach is to select hot or popular contents for recommendation using some popularity metric. Recently, various social network services (SNSs) such as Facebook and Twitter have become a widespread social phenomenon owing to the smartphone boom. Considering the popularity and user participation, SNS can be a good source for finding social interests or trends. In this study, we propose a platform called TrendsSummary for retrieving trendy multimedia contents and summarizing them. To identify trendy multimedia contents, we select candidate keywords from raw data collected from Twitter using a syntactic feature-based filtering method. Then, we merge various keyword variants based on several heuristics. Next, we select trend keywords and their related keywords from the merged candidate keywords based on term frequency and expand them semantically by referencing portal sites such as Wikipedia and Google. Based on the expanded trend keywords, we collect four types of relevant multimedia contents—TV programs, videos, news articles, and images—from various websites. The most appropriate media type for the trend keywords is determined based on a naïve Bayes classifier. After classification, appropriate contents are selected from among the contents of the selected media type. Finally, both trend keywords and their related multimedia contents are displayed for effective browsing. We implemented a prototype system and experimentally demonstrated that our scheme provides satisfactory results. 相似文献
18.
How to effectively predict financial distress is an important problem in corporate financial management. Though much attention has been paid to financial distress prediction methods based on single classifier, its limitation of uncertainty and benefit of multiple classifier combination for financial distress prediction has also been neglected. This paper puts forward a financial distress prediction method based on weighted majority voting combination of multiple classifiers. The framework of multiple classifier combination system, model of weighted majority voting combination, basic classifiers’ voting weight model and basic classifiers’ selection principles are discussed in detail. Empirical experiment with Chinese listed companies’ real world data indicates that this method can greatly improve the average prediction accuracy and stability, and it is more suitable for financial distress prediction than single classifiers. 相似文献
19.
Traditional approaches for text data stream classification usually require the manual labeling of a number of documents, which is an expensive and time consuming process. In this paper, to overcome this limitation, we propose to classify text streams by keywords without labeled documents so as to reduce the burden of labeling manually. We build our base text classifiers with the help of keywords and unlabeled documents to classify text streams, and utilize classifier ensemble algorithms to cope with concept drifting in text data streams. Experimental results demonstrate that the proposed method can build good classifiers by keywords without manual labeling, and when the ensemble based algorithm is used, the concept drift in the streams can be well detected and adapted, which performs better than the single window algorithm. 相似文献
20.
最小距离分类器的改进算法--加权最小距离分类器 总被引:12,自引:0,他引:12
最小距离分类器是一种简单而有效的分类方法。为了提高最小距离分类器的分类性能,主要的改进方法是选择更有效的距离度量。通过分析多重限制分类器和决策树分类器的分类原则,提出了基于标准化欧式距离的加权最小距离分类器。该分类器通过对标称型和字符串型属性的距离的加权定义。以及增加属性值的范围约束,扩大了最小标准化欧式距离分类器的适用范围,同时提高了其分类准确率。实验结果表明,加权最小距离分类器具有较高的分类准确率。 相似文献