首页 | 本学科首页   官方微博 | 高级检索  
     


Resolving Ambiguous Segmentation of Korean Compound Nouns Using Statistics and Rules
Authors:Bo-Hyun Yun  Yong-Jae Kwak  & Hae-Chang Rim
Affiliation:Computer Science Department, Korea University
Abstract:Korean compound nouns may be written as a sequence of characters without blanks between unit nouns. For Korean processing systems, Korean compound nouns have to be first segmented into a sequence of unit nouns. However, the segmentation task is difficult because a sequence of characters may be ambiguously segmented to several sequences of appropriate unit nouns. Moreover, this task is not trivial because Korean compound nouns may include many unknown unit nouns.
This paper proposes a new method for KCNS (Korean Compound Noun Segmentation) and reports on the appliccation of such a segmentationtechnique to enhance the performance of an information retrieval system. According to our method, compound nouns are first segmented by using a dictionary and structure patterns. If they are ambiguously segmented, we resolve the ambiguities by using statistical information and a preference rule. Moreover, we employ three kinds of heuristics in order to segment compound nouns with unknown unit nouns.
To evaluate KCNS, we use three kinds of data from various domains. Experimental results show that the precision of KCNS's output is approximately 96% on average, regardless of domains. The effectiveness of using the segmented unit nouns provided by KCNS for indexing is proved by improving retrieval performance of our information retrieval system.
Keywords:compound noun  ambiguous segmentation  unknown noun  information retrieval
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号