首页 | 本学科首页   官方微博 | 高级检索  
     

基于潜在语义分析的BBS文档Bayes鉴别器
引用本文:刘昌钰,唐常杰,于中华,杜永萍,郭颖.基于潜在语义分析的BBS文档Bayes鉴别器[J].计算机学报,2004,27(4):566-572.
作者姓名:刘昌钰  唐常杰  于中华  杜永萍  郭颖
作者单位:1. 四川大学计算机科学系,成都,610064
2. 山西大学计算机科学系,太原,030006
基金项目:国家自然科学基金 ( 60 0 73 0 46),高等学校博士学科点专项科研基金( 2 0 0 2 0 610 0 0 7)资助
摘    要:电子公告栏(BBS)的滥用是一种以信息污染为特色的社会问题,对BBS文档进行鉴别已成为信息安全重要内容之一,该文融合了数据挖掘技术、数理统计技术和自然语言理解技术,提出了基于潜在语义分析与Bayes分类的BBS文档鉴别方法:利用自然语言处理技术从训练文档中抽取典型短语集;通过潜在语义分析进行典型短语同义归约,应用关联规则采掘技术提高典型短语间的独立性;用Bayes分类器对BBS文档进行鉴别。该文还对影响系统的关键参数进行了大量的讨论和测试,实验表明该方法对于BBS文档的鉴别是可行而有效的。

关 键 词:数据挖掘  关联规则  Bayes分类  潜在语义分析  BBS  电子公告栏

Bayes Discriminator for BBS Documents Based on Latent Semantic Analysis
LIU Chang-Yu,TANG Chang-Jie,YU Zhong-Hua,DU Yong-Ping,GUO Ying.Bayes Discriminator for BBS Documents Based on Latent Semantic Analysis[J].Chinese Journal of Computers,2004,27(4):566-572.
Authors:LIU Chang-Yu  TANG Chang-Jie  YU Zhong-Hua  DU Yong-Ping  GUO Ying
Affiliation:LIU Chang-Yu 1) TANG Chang-Jie 1) YU Zhong-Hua 1) DU Yong-Ping 2) GUO Ying 1) 1)
Abstract:With the rapid development of Internet, the abuse and misuse of BBS become a social problem of information pollution and call on the demand to the discrimination techniques for BBS document. Borrowing the techniques from data mining, probability-statistics and Natural Language Understanding, this paper proposes a new discrimination method for BBS document, called Bayes Discrimination based on Latent Semantic Analysis(BDLSA). The main steps of the new method includes following steps: (1)Makes typical phrase set by extracting the typical sentences from training documents in preprocessing stage with natural language understanding techniques.(2)Applies synonymy reduction on typical phrases by Latent Semantic Analysis.(3)Discovers the association rules between typical phrases to increase the independency of phrases so that the traditional Bayes discriminator works efficiently.(4)Discriminates BBS document by Bayes classifier. The algorithms to construct typical phrase set and to reduce synonymy are proposed and implemented. The experiment is based on real document form Web, with training data of 583 documents and test-data of 308 documents, the correctness is up to 75%. This shows the effetiveness and validation of the new method.
Keywords:data mining  associate rule  Bayes classifier  latent semantic analysis  BBS
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号