首页 | 本学科首页   官方微博 | 高级检索  
     

基于汉字部件组合的关键词过滤技术
引用本文:朱文轩,刘功申,李生红. 基于汉字部件组合的关键词过滤技术[J]. 信息技术, 2008, 32(10)
作者姓名:朱文轩  刘功申  李生红
作者单位:上海交通大学信息安全实验室,上海,200240
基金项目:国家自然科学基金,教育部跨世纪优秀人才培养计划
摘    要:关键词过滤是基于文本内容过滤中最为常用的一种方法,有着广泛的应用.汉字由部件组成,将汉字拆成部件给关键词过滤造成了困难.提出了基于汉字部件组合的关键词过滤技术,依托于汉字结构标注库,运用改进的多模式匹配算法处理海量文本内容.实验结果证明,该方法能够找出被故意拆分的关键词.

关 键 词:汉字部件  多模式匹配  过滤

Keywords filtering technology based on combination of Chinese character constituents
ZHU Wen-xuan,LIU Gong-shen,LI Sheng-hong. Keywords filtering technology based on combination of Chinese character constituents[J]. Information Technology, 2008, 32(10)
Authors:ZHU Wen-xuan  LIU Gong-shen  LI Sheng-hong
Affiliation:ZHU Wen-xuan,LIU Gong-shen,LI Sheng-hong(Information Security Lab,Shanghai Jiaotong University,Shanghai 200240,China)
Abstract:Keywords filtering is one of the most common methods in text content filtering and is widely used.Chinese characters are combinations of constituents,and splitting characters into constituents makes keywords filtering difficult.To deal with this problem,a keywords filtering technology based on combination of Chinese character constituents is proposed.It is based on Chinese characters structure library,and uses improved multiple patterns matching algorithm to deal with massive text contents.Tests show that t...
Keywords:Chinese characters constituents  multiple patterns matching  filtering  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号