首页 | 本学科首页   官方微博 | 高级检索  
     


Information-Theoretic method for classification of texts
Authors:B Ya Ryabko  A E Gus’kov  I V Selivanova
Affiliation:1.Institute of Computational Technologies,Siberian Branch of the Russian Academy of Sciences,Novosibirsk,Russia;2.Novosibirsk State University,Novosibirsk,Russia;3.Russian National Public Library for Science and Technnology,Siberian Branch of the Russian Academy of Sciences,Novosibirsk,Russia
Abstract:We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号