首页 | 本学科首页   官方微博 | 高级检索  
     

综合型语言知识库的建设与利用
引用本文:俞士汶,段慧明,朱学锋,张化瑞.综合型语言知识库的建设与利用[J].中文信息学报,2004,18(5):2-11.
作者姓名:俞士汶  段慧明  朱学锋  张化瑞
作者单位:北京大学计算语言学研究所
基金项目:国家高技术研究发展计划(863计划),国家自然科学基金
摘    要:语言知识库的规模和质量决定了自然语言处理系统的成败。经过18年的努力,北京大学计算语言学研究所已经积累了一系列颇具规模、质量上乘的语言数据资源:现代汉语语法信息词典,大规模基本标注语料库,现代汉语语义词典,中文概念词典,不同单位对齐的双语语料库,多个专业领域的术语库,现代汉语短语结构规则库,中国古代诗词语料库等等。本项研究将把这些语言数据资源集成为一个综合型的语言知识库。集成不同的语言数据资源时,必须克服它们之间的“缝隙”。规划中的综合型语言知识库除了有统一的友好的使用界面和方便的应用程序接口外,还将提供支持知识挖掘的工具软件,促使现有的语言数据资源从初级产品形式向深加工产品形式不断发展;提供多种形式的知识传播和信息服务机制,让综合型语言知识库为语言信息处理研究、语言学本体研究和语言教学提供全方位的、多层次的支持。

关 键 词:计算机应用  中文信息处理  语言处理  语言知识库  语言数据资源  电子词典  语料库  
文章编号:1003-0077(2004)05-0001-10

The Coonstruction and Utilization of A Comprehensive Language Knowledge-base
YU Shi-wen,DUAN Hui-ming,ZHU Xue-feng,ZHANG Hua-rui.The Coonstruction and Utilization of A Comprehensive Language Knowledge-base[J].Journal of Chinese Information Processing,2004,18(5):2-11.
Authors:YU Shi-wen  DUAN Hui-ming  ZHU Xue-feng  ZHANG Hua-rui
Affiliation:Institute of Computational Linguistics , Peking University
Abstract:The scale and quality of the knowledge-base decides the success or failure of the natural language processing system. Institute of computational linguistics of Peking university has accumulated a series of languages-data resources that have good quality with considerable scale after 18 years of diligent work: the grammatical knowledge-base of contemporary Chinese, the large-scale POS-Tagged corpus of contemporary Chinese, Semantics Knowledge-base of Contemporary Chinese (SKCC), Chinese Concept Dictionary (CCD), a bilingual parallel corpus with different aligned units, special term bank of different disciplines, the phrase structure knowledge-base of contemporary Chinese, a corpus of ancient Chinese poems. The present research will integrate these language data resources into one unified and comprehensive language knowledge-base. While incorporating all these different resources, the gaps between them must be filled up. The comprehensive language knowledge-base being planned will provide not only friendly using interface and convenient application program interface but also various software toolssupporting knowledge mining. Therefore, the research promotes the present language data resources to develop constantly from primary products into deep processed products. It will set up diversified forms of knowledge spreading mechanism and information service mechanism to offer omni-directional and multi-level support to language information processing, traditional linguistics research and language teaching.
Keywords:computer application  Chinese information processing  natural language processing  language data resources  language knowledge-base  electronic dictionary  corpus
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号