首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
基于类别信息的特征权重计算方法对特征与类别的关系表达不够准确,即对于类别频率相同的特征无法比较其对类别的区分能力,因此要考虑特征在类内的分布情况。将特征的反类别频率(inverse category frequency,ICF)和类内熵(entropy)相结合引入到特征权重计算方案中,构造了两种有监督特征权重计算方案。在维吾尔文文本分类语料上进行的实验结果表明,该方法能够明显改善样本的空间分布状态并提高维吾尔文文本分类的微平均◢F◣▼1▽值。  相似文献   

2.
基于类信息的文本特征选择与加权算法研究   总被引:3,自引:1,他引:2  
文本自动分类中特征选择和加权的目的是为了降低文本特征空间维数、去除噪音和提高分类精度。传统的特征选择方案筛选出的特征往往偏爱类分布不均匀文档集中的大类,而常用的TF·IDF特征加权方案仅考虑了特征与文档的关系,缺乏对特征与类别关系的考虑。针对上述问题,提出了基于类别信息的特征选择与加权方法,在两个不同的语料集上进行比较和分析实验,结果显示基于类别信息的特征选择与加权方法比传统方法在处理类分布不均匀的文档集时能有效提高分类精度,并且降维程度有所提高。  相似文献   

3.
基于SVM的维吾尔文文本分类研究   总被引:1,自引:0,他引:1       下载免费PDF全文
文本自动分类技术在提高文本信息利用的有效性和准确性上具有重要的现实意义和广阔的应用前景。随着Internet上维吾尔文信息的迅速发展,维吾尔文文本分类成为处理和组织这些大量文本数据的关键技术。研究维吾尔文文本分类相关技术和方法,针对维吾尔文文本在向量空间模型表示下的高维性,本文采用词干提取和χ2统计量相结合的方法对表示空间进行降维。采用SVM算法构造了维吾尔文文本分类器。针对维吾尔文文本分类语料进行的实验结果表明,SVM分类器的MacroF1值达到了84.6%,明显好于kNN方法。  相似文献   

4.
特征选择是维吾尔语文本分类的关键技术,对分类结果将产生直接的影响。为了提高传统信息增益在维吾尔文特征选择中的效果,在深度分析维吾尔文语种特点的基础上,提出了一种新的信息增益特征选择方法。该方法结合类词频和特征分布系数以及倒逆文档频率,对传统信息增益进行修正;引入一个备选特征分布系数来平衡类间选取的特征个数;在维吾尔文数据集上实验验证。实验结果表明,改进的算法对维吾尔文分类效果有明显的提高。  相似文献   

5.
中文文本分类中的特征选择研究   总被引:76,自引:3,他引:76  
本文介绍和比较了八种用于文本分类的特征选择方法,其中把应用于二元分类器中的优势率改造成适用于多类问题的形式,并提出了一种新的类别区分词的特征选择方法,结合两种不同的分类方法:文本相似度方法和Na?ve Bayes方法,在两个不同的数据集上分别作了训练和测试,结果表明,在这八种文本特征选择方法中,多类优势率和类别区分词方法取得了最好的选择效果。其中,当用Na?ve Bayes分类方法对各类分布严重不均的13890样本集作训练和测试时,当特征维数大于8000以后,用类别区分词作特征选择得到的宏F1值比用IG作特征选择得到的宏F1值高出3%~5%左右。  相似文献   

6.
基于词频的优化互信息文本特征选择方法   总被引:1,自引:0,他引:1  
互信息(MI)是一种常用的文本特征选择方法,经典MI方法未考虑同一个特征项在不同类别内频数的差异性,也未考虑同一个特征在同一类别内的不同文本之间分布上的差异性。针对上述不足,以特征项的频数为依据,分别从特征项的类内分布、类间分布上的差异以及类内不同文本之间分布上的差异等角度,通过引入特征项的类内频数因子、类内位置分布因子以及类间分布因子,提出一种改进的MI文本特征选择方法,使得特征项的频数信息在MI模型中得到有效利用,合理改善互信息模型在文本特征选择方面的不足。文本分类实验结果表明,改进MI文本特征选择方法的平均准确率、召回率分别提高约5.2%及4.6%,平均综合评价指标值提高约4.9%,有效提高了模型的文本分类效率。  相似文献   

7.
针对文本分类中传统特征选择方法卡方统计量和信息增益的不足进行了分析,得出文本分类中的特征选择关键在于选择出集中分布于某类文档并在该类文档中均匀分布且频繁出现的特征词。因此,综合考虑特征词的文档频、词频以及特征词的类间集中度、类内分散度,提出一种基于类内类间文档频和词频统计的特征选择评估函数,并利用该特征选择评估函数在训练集每个类别中选取一定比例的特征词组成该类别的特征词库,而训练集的特征词库则为各类别特征词库的并集。通过基于SVM的中文文本分类实验表明,该方法与传统的卡方统计量和信息增益相比,在一定程度上提高了文本分类的效果。  相似文献   

8.
基于类别特征域的文本分类特征选择方法   总被引:11,自引:2,他引:11  
特征选择是文本分类的关键问题之一,而噪音与数据稀疏则是特征选择过程中遇到的主要障碍。本文介绍了一种基于类别特征域的特征选择方法。该方法首先利用“组合特征抽取”[1 ]的方法去除原始特征空间中的噪音 ,从中抽取出候选特征。这里“, 组合特征抽取”是指先利用文档频率(DF)的方法去掉一部分低频词,再用互信息的方法选择出候选特征。接下来,本方法为分类体系中的每个类别构建一个类别特征域,对出现在类别特征域中的候选特征进行特征的合并和强化,从而解决数据稀疏的问题。实验表明,这种新的方法较之各种传统方法在特征选择的效果上有着明显改善,并能显著提高文本分类系统的性能。  相似文献   

9.
现有的维吾尔文文本情感分类方法以从空格分词中得到的unigram特征作为文本表示,因而无法挖掘与情感表达相关的深层语言现象。该文从维吾尔文词汇之间的顺序依赖关系入手,总结若干个词性组合规则,提取能够表达丰富情感信息的Bi-tagged特征,并基于支持向量机(SVM)分类器对维吾尔文情感语料库进行了正负情感分类。实验结果表明,在维吾尔文文本情感分类中: (1)当包含该文提出的各项词性规则时,Bi-tagged特征的性能最优;(2)Bi-tagged特征不仅能够提取情感丰富的信息,而且可以提取否定信息;(3)与常用的unigram、bigram特征以及unigram和bigram的组合特征在该文数据集上的分类效果相比,该文所提取的Bi-tagged与unigram的组合特征分类效果更佳,比该文的Baseline的分类准确率提高了4.225%。该研究成果不但可以进一步提高维吾尔文文本情感分类效率,也可为哈萨克语、柯尔克孜语等亲属语言的情感分类提供借鉴。  相似文献   

10.
为解决维吾尔文文本分类中不平衡数据集问题,提出了一种改进的卡方特征选择方法.结合维吾尔文的语言特性对文本进行预处理,降低特征空间维度;运用卡方和逆文档频数相结合的方法进行特征选择,进一步降低特征空间维数;使用朴素贝叶斯分类器进行分类.在维吾尔文不平衡语料库上进行的实验表明,提出的特征选择方法在不平衡数据集中要优于卡方和信息增益特征选择方法.  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
蒙古语言是中国蒙古族使用的通用语言,由于蒙古文区别于其他文字的书写方式和其自身变形机制等特点,在很多通用的文字处理引擎中都不被支持。在嵌入式产品开发与应用领域中Linux加QTE已经成为流行方式。该文给出了一种在QTE环境上实现基于标准Unicode的蒙古文点阵显示和变形算法, 并自定义了支持蒙古文的QTE组件,扩展了QTE功能,为在Linux加QTE方式的嵌入式体系结构中处理蒙古文提供了一种解决方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号