首页 | 本学科首页   官方微博 | 高级检索  
     

Web实时环境两级过滤中文文本内容自学习算法
引用本文:段磊,唐常杰,左劫,彭京,刘婷婷,苟驰. Web实时环境两级过滤中文文本内容自学习算法[J]. 计算机科学与探索, 2011, 5(8): 695-706. DOI: 10.3778/j.issn.1673-9418.2011.08.003
作者姓名:段磊  唐常杰  左劫  彭京  刘婷婷  苟驰
作者单位:1. 四川大学计算机学院,成都,610065
2. 成都市公安局科技处,成都,610017
基金项目:高等学校博士学科点专项科研基金No.20100181120029; 四川大学青年教师科研启动基金No.2009SCU11030~~
摘    要:用户在互联网发布信息的自由性对Web信息内容过滤提出新的挑战。为此,给出一种自学习的两级内容过滤算法SAFE(self-study algorithm of filtering Chinese text content)。SAFE以数据流的方式处理文本,并根据Apriori性质,在不依赖词典的情况下,通过挖掘关键字和关键词实现对文档的两级内容过滤。利用真实世界Web文档验证了SAFE的有效性,实验表明对给定的主题进行文本内容过滤,SAFE的查全率达到93.75%以上,查准率达到100%,执行时间能够满足Web应用的实时性要求。

关 键 词:数据挖掘  文本内容过滤  关键词挖掘
修稿时间: 

Self-Study Algorithm for Filtering Chinese Text Content through Two Layers in Web Real-Time Environment
DUAN Lei,TANG Changjie,ZUO Jie,PENG Jing,LIU Tingting,GOU Chi. Self-Study Algorithm for Filtering Chinese Text Content through Two Layers in Web Real-Time Environment[J]. Journal of Frontier of Computer Science and Technology, 2011, 5(8): 695-706. DOI: 10.3778/j.issn.1673-9418.2011.08.003
Authors:DUAN Lei  TANG Changjie  ZUO Jie  PENG Jing  LIU Tingting  GOU Chi
Affiliation:1. School of Computer Science, Sichuan University, Chengdu 610065, China 2. Department of Science & Technology, Chengdu Municipal Public Security Bureau, Chengdu 610017, China
Abstract:The publishing freedom of users on Internet poses new challenges in Web content filtering. This paper presents a self-study algorithm, called SAFE (self-study algorithm of filtering Chinese text content), for Chinese content filtering through two layers. It processes texts in the form of data stream. Based on Apriori property, SAFE filters Chinese text content through two layers by mining key characters and keywords without manual dictionary. The per-formance research of SAFE on the real-world data shows that for the given theme, the recall of SAFE is greater than 93.75% and the precision is 100%. The runtime of SAFE satisfies the real-time requirement of Web applications.
Keywords:data mining  text content filtering  keywords mining
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学与探索》浏览原始摘要信息
点击此处可从《计算机科学与探索》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号