首页 | 本学科首页   官方微博 | 高级检索  
     

基于词语热度的启发式中文句子压缩算法
引用本文:韩 静,张东站.基于词语热度的启发式中文句子压缩算法[J].计算机工程与应用,2014(4):132-139.
作者姓名:韩 静  张东站
作者单位:厦门大学信息科学与技术学院,福建厦门361005
基金项目:国家自然科学基金(No.50604012)。
摘    要:传统的句子压缩方法多基于难以获得的"原句-压缩句"对齐语料库,因此提出了不依赖于对齐语料库的中文句子压缩算法。通过研究人工压缩结果并结合语言学知识,提出了词语层面和分句层面的两组压缩规则。算法在原句句法分析树和词语间依赖关系的基础上,使用两组规则进行压缩,同时为了保证压缩算法具有更强的适应性和准确性,引入词语的热度加强了压缩算法,最后通过句子整理和语法修复得到最终的压缩句。对比了人工压缩、只使用规则压缩和引入词语热度压缩三种压缩方法。实验结果表明,基于热度的启发式中文句子压缩算法可以在压缩比、语法性、信息量都损失较少的情况下,提高压缩句的热度。

关 键 词:中文句子压缩  热词  语言学  句法分析树

Heuristic Chinese sentence compression algorithm based on hot word
HAN Jing,ZHANG Dongzhan.Heuristic Chinese sentence compression algorithm based on hot word[J].Computer Engineering and Applications,2014(4):132-139.
Authors:HAN Jing  ZHANG Dongzhan
Affiliation:School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
Abstract:Since the parallel sentence/compression corpora which most of the traditional methods based on are not easy to obtain, a linguistically-motivated heuristics Chinese sentence compression algorithm is proposed after studying traditional methods. By analyzing the human-produced compression and linguistic knowledge, two sets of rules are proposed, one is in word layer and the other is in clause layer. Two sets of rules based on the parse tree and the words dependence are used to compress sentence, and enhance the algorithm by hot word in order to keep the algorithm flexibility and accuracy. In the last step the compression result is cleaned and repaired. Human-produced compression, rule-only algorithm and hot word enhanced algorithm are compared then the results are evaluated in compression rate, grammaticality, informative-ness and heat. The experimental results show that heuristic Chinese sentence compression algorithm based on hot word can improve the heat of compression results without much loss in compression rate, grammaticality and informativeness.
Keywords:Chinese sentence compression  hot word  linguistic  parse tree
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号