首页 | 本学科首页   官方微博 | 高级检索  
     

基于长度和位置信息的双语句子对齐方法
引用本文:李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692.
作者姓名:李维刚  刘挺  张宇  李生
作者单位:哈尔滨工业大学,计算机科学与技术学院,黑龙江,哈尔滨,150001
摘    要:提出了一种利用句子长度和位置信息的双语句子对齐方法,该方法的根本思想是:一定长度的句对在双语文本中的位置分布是相似的,利用(1∶1)型的句珠代替高频词作为候选锚点,使这种方法具有通用性.利用多种形式的测试数据进行的评价结果显示,这种方法有着良好的健壮性和语言无关性,有效地解决了双语真实文本的句子对齐问题.

关 键 词:句子对齐  双语语料库  锚点  长度和位置
文章编号:0367-6234(2006)05-0689-04
收稿时间:2004-02-20
修稿时间:2004年2月20日

Bilingual sentence alignment method based on sentence length and location information
LI Wei-gang,LIU Ting,ZHANG Yu,LI Sheng.Bilingual sentence alignment method based on sentence length and location information[J].Journal of Harbin Institute of Technology,2006,38(5):689-692.
Authors:LI Wei-gang  LIU Ting  ZHANG Yu  LI Sheng
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:This paper describes a new method for aligning real bilingual texts using sentence pairs' length and location information.The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly.It uses(1:1) sentence beads instead of high frequency words as the candidate anchors to make the method general.The method was developed and evaluated through many different test data.The results show that it can achieve good aligned performance and be robust and language independent.It can resolve the alignment problem on real bilingual text.
Keywords:sentence alignment  bilingual corpus  anchors  length and location  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号