期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Text Classification Using Sentential Frequent Itemsets

Shi-Zhu Liu He-Ping Hu 《计算机科学技术学报》2007,22(2):334-ver

Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset＇s contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system. 相似文献

2.

Text Classification Based on Domain Ontology

Huazhen Gu Kuanjiu Zhou 《通讯和计算机》2006,3(5):29-32

With the quick increase of information and knowledge, automatically classifying text documents is becoming a hotspot of knowledge management. A critical capability of knowledge management systems is to classify the text documents into different categories, which are meaningful to users. In this paper, a text topic classification model based on domain ontology by using Vector Space Model is proposed. Eigenvectors as the input to the vector space model are constructed by utilizing concepts and hierarchical structure of ontology, which also provides the domain knowledge. However, a limited vocabulary problem is encountered while mapping keywords to their corresponding ontology concepts. A synonymy lexicon is utilized to extend the ontology and compress the eigenvector. The problem that eigenvectors are too large and complex to be calculated in traditional methods can be solved. At last, combing the concept＇s supporting, a top-down method according to the ontology structure is used to complete topic classification. An experimental system is implemented and the model is applied to this practical system. Test results show that this model is feasible. 相似文献

3.

基于XML技术的个性化主动信息服务系统关键技术的研究 总被引：1，自引：0，他引：1

陈俊杰郭永明宋翰涛陆玉昌余雪丽《计算机科学》2003,30(8):93-95

相似文献

4.

缺值背景中的概念分析与知识获取

谢志鹏刘宗田《计算机科学》2000,27(9):36-39

In formal concept analysis ,concept lattice as the fundamental data structure can be construct-ed front a formal context. Howevt, r,it is required that the relation between object and feature in the for-real context should be certain, For uncertain relation,this paper uses the thoughts of upper and lowerapproximation in rough set theory to deal with it ,and gives out the corresponding definitions of missing-value context and rough formal concept, Based on them, this paper employs rough concept lattice,formed by rough formal concepts and partial order relation on them,as the basic data structure for con-cept analysis and knowledge acquisition. Then a theroem is presented to describe the method of extract-ing rules from constructed rough formal concept lattice,and the semantic interpretation of discoveredrules is explained. 相似文献

5.

Structured Development Environment Based on the Object-Oriented Concepts

下载免费PDF全文

费翔林廖雷王和珍汪承藻《计算机科学技术学报》1992,(3)

The object oriented software development is a kind of promising software methodology and leading to awholly new way for solving problems. In the research on the rapid construction of Structured Development Envi-ronment (SDE)that supports detailed design and coding in software development, a generator that can gener-ate the SDE has been applied as a metatool. The kernel of SDE is a syntax-directed editor based on the ob-ject-oriented concepts. The key issue in the design of SDE is how to represent the elements of target languagewith the class concept and a program internally. In this paper, the key concepts and design of the SDE and itsgenerator as well as the implementation of a prototype are to be discussed. 相似文献

6.

A new similarity computing method based on concept similarity in Chinese text processing 总被引：2，自引：0，他引：2

Jing Peng DongQing Yang ShiWei Tang TengJiao Wang Jun Gao 《中国科学F辑(英文版)》2008,51(9):1215-1230

The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include： 1） propose a new computing formula between words; 2） propose a new text similarity computing method based on words similarity; 3） successfully use the method in the application of similarity computing of WEB news; and 4） prove the validity of the method through extensive experiments. 相似文献

7.

Self-Switching Classification Framework for Titled Documents

下载免费PDF全文

Hang Guo 《计算机科学技术学报》2009,24(4):615-625

Ambiguous words refer to words that have multiple meanings such as apple,window.In text classification they are usually removed by feature reduction methods like Information Gain.Sometimes there are too many ambiguous words in the corpus,which makes throwing away all of them not a viable option,as in the case when classifying documents from the Web.In this paper we look for a method to classify Titled documents with the help of ambiguous words.Titled documents are a kind of documents that have a simple s... 相似文献

8.

A classification model based on SVM and rough set theory

ZHAO Wen-qing ZHU Yong-li JIANG Bo 《通讯和计算机》2008,5(2):42-45

This paper presents a novel classification approach based on rough set theory and supporter vector machine. Sometimes, there are many attributes for classification samples and it is difficult to carry out classification. In this paper, the attributes of data set are reduction by rough set theory firstly, and then the classification is carried out using support vector machine. Finally, the classification results are obtained through the proposed model. Moreover, the proposed classification model has higher prediction accuracy by comparing with the traditional algorithm Naive Bayes algorithm and reduces the cost of calculation. 相似文献

9.

Structured Development Environment Based on the Object-Oriented Concepts

下载免费PDF全文

Fei Xianglin Liao Lei Wang Hezhen Wang Chengzao 《计算机科学技术学报》1992,7(3):193-201

The object-oriented software development is a kind of promising software methodology and leading to a wholly new way for solving problems.In the research on the rapid construction of Structured Development Environment(SDE)that supports detailed design and coding in software development,a generator that can generate the SE has been applied as a metatool.The kernel of SDE is a syntax-directed editor based on the object-oriented concepts.The key issue in the design of SDE is how to represent the elements of target language with the class concept and a program internally.In this paper,the key concepts and design of the SDE and its generator as well as the implementation of a prototype are to be discussed. 相似文献

10.

Rough集之间的相似度量 总被引：4，自引：0，他引：4

徐久成沈钧毅王国胤《计算机科学》2003,30(10):55-57

Applications of rough set theory in incomplete information systems are a key of putting rough set into real applications. In this paper, after analyzing some basic concepts of classical rough set theory and extended rough set theory, the measure of similarity is developed between two rough sets in the classical rough set theory based on indiscernibility relation and between two rough sets in the extended rough set theory based on limited tolerance relation. Then,some properties of these two methods for measuring similarity are developed respectively. At last,these two measure methods of rough set theory are compared. 相似文献

11.

制造业设计文档的模糊分类

刘志洪顾宁《计算机辅助设计与图形学学报》2004,16(11):1608-1612

提出一种面向制造业设计文档的模糊分类方法．利用领域本体的层次结构和概念间的语义关系，对设计文档进行结构划分与标注，通过特征词与概念之间的距离和位置重要性计算权重，提高了设计文档分类的准确性．相似文献

12.

文本分类中一种特征选择方法研究

赵婧邵雄凯刘建舟王春枝《计算机应用研究》2019,36(8)

针对文本分类中传统特征选择方法卡方统计量和信息增益的不足进行了分析,得出文本分类中的特征选择关键在于选择出集中分布于某类文档并在该类文档中均匀分布且频繁出现的特征词。因此,综合考虑特征词的文档频、词频以及特征词的类间集中度、类内分散度,提出一种基于类内类间文档频和词频统计的特征选择评估函数,并利用该特征选择评估函数在训练集每个类别中选取一定比例的特征词组成该类别的特征词库,而训练集的特征词库则为各类别特征词库的并集。通过基于SVM的中文文本分类实验表明,该方法与传统的卡方统计量和信息增益相比,在一定程度上提高了文本分类的效果。相似文献

13.

基于核方法的Web挖掘研究 总被引：2，自引：0，他引：2

傅向华冯博琴马兆丰韩冰《小型微型计算机系统》2005,26(5):727-731

基于词空间的分类方法很难处理文本的高维特性和捕获文本语义概念．利用核主成分分析和支持向量机。提出一种通过约简文本数据维数抽取语义概念、基于语义概念进行文本分类的新方法．首先将文档映射到高维线性特征空间消除非线性特征，然后在映射空间中通过主成分分析消除变量之间的相关性，实现降维和语义概念抽取，得到文档的语义概念空间，最后在语义概念空间中采用支持向量机进行分类．通过新定义的核函数，不必显式实现到语义概念空间的映射，可在原始文档向量空间中直接实现基于语义概念的分类．利用核化的GHA方法自适应迭代求解核矩阵的特征向量和特征值，适于求解大规模的文本分类问题．试验结果表明该方法对于改进文本分类的性能具有较好的效果．相似文献

14.

基于频繁词集聚类的海量短文分类方法 总被引：1，自引：0，他引：1

王永恒贾焰杨树强《计算机工程与设计》2007,28(8):1744-1746,1780

信息技术的飞速发展造成了大量的文本数据累积,其中很大一部分是短文本数据.文本分类技术对于从这些海量短文中自动获取知识具有重要意义.但是对于关键词出现次数少的短文,现有的一般文本挖掘算法很难得到可接受的准确度.一些基于语义的分类方法获得了较好的准确度但又由于其低效性而无法适用于海量数据.针对这个问题提出了一个新颖的基于频繁词集聚类的短文分类算法.该算法使用频繁词集聚类来压缩数据,并使用语义信息进行分类.实验表明该算法在对海量短文进行分类时,其准确度和性能超过其它的算法. 相似文献

15.

Knowledge maps for e-learning

Jae Hwa Lee Aviv Segev 《Computers & Education》2012

Maps such as concept maps and knowledge maps are often used as learning materials. These maps have nodes and links, nodes as key concepts and links as relationships between key concepts. From a map, the user can recognize the important concepts and the relationships between them. To build concept or knowledge maps, domain experts are needed. Therefore, since these experts are hard to obtain, the cost of map creation is high. In this study, an attempt was made to automatically build a domain knowledge map for e-learning using text mining techniques. From a set of documents about a specific topic, keywords are extracted using the TF/IDF algorithm. A domain knowledge map (K-map) is based on ranking pairs of keywords according to the number of appearances in a sentence and the number of words in a sentence. The experiments analyzed the number of relations required to identify the important ideas in the text. In addition, the experiments compared K-map learning to document learning and found that K-map identifies the more important ideas. 相似文献

16.

Musag an agent that learns what you mean

Claudia V. Goldman Amir Langer Jeffrey S. Rosenschein 《Applied Artificial Intelligence》2013,27(5):413-435

This article presents a system that carries out highly effective searches over collections of textual information, such as those found on the Internet. The system is made up of two major parts. The first part consists of an agent, Musag, that learns to relate concepts that are semantically ''similar'' to one another. In other words, this agent dynamically builds a dictionary of expressions for a given concept that captures the words people have in mind when mentioning the specific concept. We aim at achieving this by learning from the context in which these words appear. The second part consists of another agent, Sag, which is responsible for retrieving documents, given a set of keywords with relative weights. This retrieval makes use of the dictionary learned by Musag, in the sense that the documents to be retrieved for a query are related to the concept given according to the context of previously scanned documents. In this way, we overcome two main problems with current text search engines, which are largely based on syntactic methods. One problem is that the keyword given in the query might have ambiguous meaning, leading to the retrieval of documents not related to the topic requested. The second problem concerns relevant documents that will not be recommended to the user, since they did not include the specific keyword mentioned in the query. Using context learning methods, we will be able to retrieve such documents if they include other words, learned by Musag, that are related to the main concept. We describe the agents'system architecture, along with the nature of their interactions. We describe our learning and search algorithms and present results from experiments performed on specific concepts. We also discuss the notion of ''cost of learning'' and how it influences the learning process and the quality of the dictionary at any given time. 相似文献

17.

基于模糊形式概念分析的文本分类模型

李想王素格郭晓敏《电脑开发与应用》2014,(5):1-3

提出了一种基于模糊形式概念分析的文本分类模型,通过概念化文本到一个更加抽象的概念形式,以概念而非文本作为训练样本,最终结合近邻分类算法实现文本分类决策。实验结果表明该算法有很好的性能。相似文献

18.

基于特征隶属度的文本分类相似性度量方法

池云仙赵书良罗燕赵骏鹏高琳李超《计算机科学》2017,44(11):289-296

基于相似性进行文本分类是当前流行的文本处理方法。基于特征隶属度的文本分类相似性度量方法旨在利用特征与文档间的隶属关系度量文档相似性,从而实现文本分类。该方法基于特征与文档的隶属关系,对特征进行全隶属、偏隶属和无隶属词集划分,并基于3种隶属词集定义隶属度函数。全隶属词集隶属于两篇文档,隶属度随权差增大而降低;偏隶属词集仅隶属于其中某一篇文档,隶属度为一个定值;无隶属词集与两篇文档无隶属关系,隶属度为零。在度量相似性时,偏隶属关系高于全隶属关系。由于同类文档词集相近,异类文档词集差异明显,因此,基于特征与文档的隶属度进行相似性度量,可清晰界定词集与类别的隶属关系,提升分类精度。最后,采用数据集20-Newgroups和Reuters-21578对分类有效性进行验证,结果表明基于特征隶属度的相似性度量方法的性能优于目前流行的相似性度量方法。相似文献

19.

基于概念层次的英文文本自动分类研究 总被引：2，自引：0，他引：2

厉宇航罗振声程慕胜《计算机工程与应用》2004,40(11):75-77

该文意在设计并且实现一个针对英文文本的自动归类以及检索系统,重点在于提高分类方法的准确率。自动文本分类系统中,一般来说文本内容是以N维特征空间的形式存储的,所以特征提取的方法和准确率极大地影响到分类结果的正确率。传统方法是基于词形的,并不考察词语的意义,忽略了同一意义下词形的多样性、不确定性以及词义之间的关系,尤其是上下位关系。该文提出的方法,在向量空间模型(VSM)的基础上,以“概念”为基础,同时考虑词义的上位关系,使得训练过程中可以从词语中提炼出更加概括性的信息,从而达到提高分类精度的目的。相似文献

20.

基于图模型的中文文档分类研究

邹加棋陈国龙郭文忠《小型微型计算机系统》2006,27(4):754-757

信息处理领域中，现有的各种文本分类算法大都基于向量空间模型，而向量空间模型却不能够有效地表达文档的结构信息，从而使得它还不能充分地表达文档的语义信息．为了更有效地表达文档的语义信息，本文首先提出了一种新的文档表示模型一图模型，即通过带权标号图表达文档的特征词条及其位置关联信息，在此基础上本文继而提出了一种新的文档相似性度量标准，并用于中文文本的分类．实验结果表明，基于图模型的这种文档表示方式是有效的和可行的．相似文献