CVA file: an index structure for high-dimensional datasets |
| |
Authors: | Jiyuan An Hanxiong Chen Kazutaka Furuse Nobuo Ohbo |
| |
Affiliation: | (1) Doctoral Program in Engineering, University of Tsukuba, Ibaraki, Japan;(2) Institute of Information Sciences and Electronics, University of Tsukuba, Ibaraki, Japan;(3) Centre for Information Technology Innovation, Queensland University of Technology, 126 Margaret Street GPO Box 2434, Brisbane, Australia |
| |
Abstract: | Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data. |
| |
Keywords: | Information retrieval High-dimensional data Spatial index Local dimensionality reduction Zipf /content/6lctpw4qvew087ef/xxlarge8217.gif" alt=" rsquo" align=" BASELINE" BORDER=" 0" >s law CVA file |
本文献已被 SpringerLink 等数据库收录! |
|