首页 | 本学科首页   官方微博 | 高级检索  
     

基于随机游走模型的查询日志中命名实体挖掘
引用本文:伍大勇,刘挺. 基于随机游走模型的查询日志中命名实体挖掘[J]. 智能计算机与应用, 2012, 0(4): 22-26,30
作者姓名:伍大勇  刘挺
作者单位:哈尔滨工业大学计算机科学与技术学院
基金项目:国家自然科学基金面上项目(61073129),国家自然科学基金面上项目(61073126);国家863重大项目(2011AA01A207)
摘    要:提出了一种弱指导的方法从搜索引擎查询日志中挖掘命名实体。该方法中采用人工选择的少量命名实体名称作为种子,使用随机游走模型从查询日志中获得大量的命名实体。其中采用了查询日志中的实体上下文模板,用户点击URL和候选命名实体构建三分图,根据在该图上的随机游走计算候选命名实体属于指定目标实体类别的概率,从而在查询日志中获取该类别的命名实体。在真实的查询日志上对7个实体类别进行的实验,实验结果显示本文方法在各个类别上均获得较好的命名实体挖掘效果。

关 键 词:命名实体  查询  随机游走

Mining Named Entities in Query Log Using Random Walk Model
WU Dayong,LIU Ting. Mining Named Entities in Query Log Using Random Walk Model[J]. INTELLIGENT COMPUTER AND APPLICATIONS, 2012, 0(4): 22-26,30
Authors:WU Dayong  LIU Ting
Affiliation:(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
Abstract:This paper proposes a novel weakly-supervised approach to mining named entities(NEs) from the query log of search engine.In the approach,a random walk model is adopted to obtain a great amount of NEs from a query log,in which only a few seed NEs manually selected are required.Specifically,the context patterns of NEs in queries,clicked URLs and candidate NEs extracted from a query log are used to construct a tri-partite graph.The random walk on the tri-partite graph can assign each candidate NE a probability of belonging to a given target NE category,so that the candidate NEs belonging to the category in query log can be obtained.The paper experiments the approach on a real-world query log within 7 NE categories and experimental results show that the approach achieves good performance in NE mining on each NE category.
Keywords:Named entity: Query: Random Walk
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号