Probability-Based Chinese Text Processing and Retrieval |
| |
Authors: | Xiangji Huang,Stephen Robertson,Nick Cercone,& Aijun An |
| |
Affiliation: | Department of Information Science, City University, UK,;Department of Computer Science, University of Waterloo, Canada |
| |
Abstract: | We discuss the use of probability-based natural language processing for Chinese text retrieval. We focus on comparing different text extraction methods and probabilistic weighting methods. Several document processing methods and probabilistic weighting functions are presented. A number of experiments have been conducted on large standard text collections. We present the experimental results that compare a word-based text processing method with a character-based method. The experimental results also compare a number of term-weighting functions including both single-unit weighting and compound-unit weighting functions. |
| |
Keywords: | information retrieval word-based and character-based Chinese text processing single-unit and compound-unit weighting |
|
|