首页 | 本学科首页   官方微博 | 高级检索  
     


Effective asymmetric XML compression
Authors:Przemys?aw Skibiński  Szymon Grabowski  Jakub Swacha
Affiliation:1. Institute of Computer Science, University of Wroc?aw, Joliot‐Curie 15, 50‐383 Wroc?aw, Poland;2. Computer Engineering Department, Technical University of ?ód?, Politechniki 11, 90‐924 ?ód?, Poland;3. Institute of Information Technology in Management, Szczecin University, Mickiewicza 64, 71‐101 Szczecin, Poland
Abstract:The innate verbosity of the extensible markup language (XML) remains one of its main weaknesses, especially when large documents are concerned. This problem can be solved with the aid of dedicated XML compression algorithms. In this work, we describe XML word‐replacing transform (XML‐WRT), a fast and fully reversible XML transform, which, when combined with generally used LZ77‐style compression algorithms, allows to attain high compression ratios, comparable to those achieved by the current state‐of‐the‐art XML compressors. The resulting compression scheme is asymmetric in the sense that its decoder is much faster than the coder. This is a desirable practical property, as in many XML applications data are read much more often than written. The key features of the transform are dictionary‐based encoding of both document structure and content, separation of different content types into multiple streams, and dedicated encoding of specific patterns, including numbers and dates. The test results show that the proposed transform improves the XML compression efficiency of general‐purpose compressors on average by 35% in case of gzip, and 17% in case of LZMA. Compared with the current state‐of‐the‐art SCMPPM algorithm, XML‐WRT with LZMA attains over 2% better compression ratio, while being 55% faster. Copyright © 2007 John Wiley & Sons, Ltd.
Keywords:extensible markup language  XML compression  XML encoding  text transform
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号