首页 | 本学科首页   官方微博 | 高级检索  
     

基于维基百科的未登录词译文挖掘
引用本文:孙常龙,洪宇,葛运东,姚建民,朱巧明.基于维基百科的未登录词译文挖掘[J].计算机研究与发展,2011,48(6).
作者姓名:孙常龙  洪宇  葛运东  姚建民  朱巧明
作者单位:苏州大学江苏省计算机信息处理重点实验室,江苏苏州,215006
基金项目:国家自然科学基金项目(60970057,61003152)
摘    要:未登录词(out of vocabulary,OOV)的查询翻译是影响跨语言信息检索(cross-language information retrieval,CLIR)性能的关键因素之一.它根据维基百科(Wikipedia)的数据结构和语言特性,将译文环境划分为目标存在环境和目标缺失环境.针对目标缺失环境下的译文挖掘难点,它采用频度变化信息和邻接信息实现候选单元抽取,并建立基于频度-距离模型、表层匹配模板和摘要得分模型的混合译文挖掘策略.实验将基于搜索引擎的未登录词挖掘技术作为baseline,并采用TOP1进行评测.实验验证基于维基百科的混合译文挖掘方法可达到0.6822的译文正确率,相对baseline取得6.98%的改进.

关 键 词:未登录词  维基百科  跨语言信息检索  译文挖掘  目标缺失环境  

The Translation Mining of the Out of Vocabulary Based on Wikipedia
Sun Changlong,Hong Yu,Ge Yundong,Yao Jianmin,Zhu Qiaoming.The Translation Mining of the Out of Vocabulary Based on Wikipedia[J].Journal of Computer Research and Development,2011,48(6).
Authors:Sun Changlong  Hong Yu  Ge Yundong  Yao Jianmin  Zhu Qiaoming
Affiliation:Sun Changlong,Hong Yu,Ge Yundong,Yao Jianmin,and Zhu Qiaoming(Jiangsu Province Key Laboratory of Computer Information Processing,Soochow University,Suzhou,Jiangsu 215006)
Abstract:The query translation is one of the key factors that affect the performance of cross-language information retrieval(CLIR).In the process of querying,the excavation of the out of vocabulary(OOV)has the important significance to improve CLIRT.Out of Vocabulary means the words or phrase which can't be found in the dictionary.In this paper,according to Wikipedia data structure and language features,we divide translation environment into target-existence environment and target-deficit environment.Depending on th...
Keywords:out of vocabulary(OOV)  Wikipedia  cross-language information retrieval(CLIR)  translation mining  target-deficit environment  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号