Intra-sentence segmentation based on support vector machines in English–Korean machine translation systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Intra-sentence segmentation based on support vector machines in English–Korean machine translation systems

Authors:	Yu-Seop Kim Yu-Jin Oh

Affiliation:	^aDepartment of Computer Engineering, Hallym University, 39 Hallymdaehak-gil, Chuncheon, Gangwon-do 200-702, Republic of Korea ^bDepartment of Economics, University of Seoul, 90 Cheonong Dong, Dongdaemoon Gu, Seoul 130-743, Republic of Korea

Abstract:	This work is about intra-sentence segmentation performed before syntactic analysis of long sentences composed of at least 20 words in an English–Korean machine translation system. A long sentence has been known to spend enormous computational time and space when it is analyzed syntactically. It can also produce poor translation results. To resolve this problem, we partitioned a long sentence into a few segments to analyze each segment separately. To partition the sentence, firstly, we tried to find candidates for each segment position in the sentence. We then generated input vectors representing lexical contexts of the corresponding candidates and also used the support vector machines (SVM) algorithm to learn and recognize the appropriate segment positions. We used three kernel functions, the linear kernel, the polynomial kernel and the Gaussian kernel, to find optimal hyperplanes classifying proper positions and we compared results obtained from each kernel function. As a result of the experiments, we acquired 0.81, 0.83, and 0.79 f-measure values from the linear, polynomial and Gaussian kernel, respectively.

Keywords:	Intra-sentence segmentation Support vector machines Linear kernel Polynomial kernel Gaussian kernel
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏