基于Web的大规模语料库构建方法 Construction Approach of Large-scale Corpus Based on Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web的大规模语料库构建方法

引用本文：	李培峰,朱巧明,钱培德. 基于Web的大规模语料库构建方法[J]. 计算机工程, 2008, 34(7): 41-43,4

作者姓名：	李培峰朱巧明钱培德

作者单位：	苏州大学计算机科学和技术学院,苏州,215006

基金项目：	国家自然科学基金 , 江苏省高技术研究发展计划项目 , 江苏省自然科学基金

摘要：	低成本、短周期构建大规模语料库是目前研究工作的难点之一。该文提出一种建设大规模语料库的新方法，主要解决如何基于Web构建大规模的语料库及对语料库纠错，从而提高其质量。该方法利用网格技术的大规模计算能力与Wiki的开放编辑环境去收集和处理语料，根据可信度模型挑选出不可信的语料并由人工进行校对，计算校对后结果的可信度，选择出最可信的结果作为正确语料存储到语料库中。
关键词：	大规模语料库网格可信度
文章编号：	1000-3428(2008)07-0041-03
修稿时间：	2007-04-30
Construction Approach of Large-scale Corpus Based on Web

LI Pei-feng,ZHU Qiao-ming,QIAN Pei-de. Construction Approach of Large-scale Corpus Based on Web[J]. Computer Engineering, 2008, 34(7): 41-43,4

Authors:	LI Pei-feng ZHU Qiao-ming QIAN Pei-de

Affiliation:	??School of Computer Science and Technology, Soochow University, Suzhou 215006??

Abstract:	Nowadays, it’s hard to build a large-scale corpus with low cost and short period. A new approach is provided to build that on Web. It mainly focuses on how to build a large-scale corpus on Web and then how to correct the mistakes in the corpus. The language materials are collected and processed based on grid and Wiki. The untrustworthy language materials in the corpus are picked out to be checked manually on Wiki according to their trustworthiness. After the check finishes, the approach calculates the trustworthiness of each checked result and selects ones with highest trustworthiness as the correct result.

Keywords:	large scale corpus grid trustworthiness
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏