首页 | 本学科首页   官方微博 | 高级检索  
     

基于日志分析的中文输入法用户行为研究
引用本文:许丹青,刘奕群,岑荣伟,马少平,茹立云,杨磊. 基于日志分析的中文输入法用户行为研究[J]. 中文信息学报, 2011, 25(2): 44-49
作者姓名:许丹青  刘奕群  岑荣伟  马少平  茹立云  杨磊
作者单位:智能技术与系统国家重点实验室,清华信息科学与技术国家实验室(筹),清华大学计算机系,北京 100084
基金项目:国家自然科学基金资助项目,高等学校博士学科点专项科研基金资助项目
摘    要:与拼音文字不同,用户在进行中文输入时需要借助输入法软件完成从拼音串到汉字串的转换过程,输入法因此成为中文用户进行人机交互的基础性工具,而输入法的相关技术研发也一直是学术界与产业界的关注热点。在中文输入法技术的研究中,用户的行为特点对输入法软件的词库建立、算法设计、交互方式设计与性能评价等多方面都有着至关重要的作用,但由于数据获取与分析的困难,这方面的相关研究尚不多见。该文利用某中文输入法在用户许可下收集的超过4.1亿条用户输入行为记录,进行了中文输入法用户行为的分析研究,针对不同类别应用程序的输入词频差异,不同用户在同类应用程序中的不同候选词条的选择等行为特点进行了挖掘分析,研究结果会对深入了解中文输入法用户行为,进而改进输入法软件性能具有一定的指导意义。

关 键 词:中文输入法  用户行为  日志分析  

Research on User Input Behavior Based on Log Analysis of a Chinese Input Method Editor
XU Danqing,LIU Yiqun,CEN Rongwei,MA Shaoping,RU Liyun,YANG Lei. Research on User Input Behavior Based on Log Analysis of a Chinese Input Method Editor[J]. Journal of Chinese Information Processing, 2011, 25(2): 44-49
Authors:XU Danqing  LIU Yiqun  CEN Rongwei  MA Shaoping  RU Liyun  YANG Lei
Affiliation:State Key Laboratory of Intelligent Technology and Systems,Tsinghua National Laboratory for Information Science
and Technology,Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Abstract:Different from alphabetic languages, input software is required to transform PinYin strings into characters for Chinese language. Input software therefore plays an important role in HCI process for Chinese users. In the research field of Chinese input method, it is important to look into users behavior information to improve the qualityof dictionary construction, the algorithm, the interaction design as well as the performance evaluation. However, there lacks such works due to the difficulties in collecting corresponding behavior data. With the help of a widely-used Chinese input software company, we collected user input logs under users agreement which contain 410 million input strings. With analysis into these input logs, we focused on the following behavior featuresinput string length distribution, character/word/phrase selection for different kinds of application software and the adoption of abbreviations. Conclusions help us to better understand users input behavior and show possible ways to improve input software designation.
Key wordsChinese input software; user behavior; log analysis
Keywords:Chinese input software   user behavior   log analysis  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号