首页 | 本学科首页   官方微博 | 高级检索  
     

改进后缀树的中文检索结果聚类研究
引用本文:袁津生,荣元媛.改进后缀树的中文检索结果聚类研究[J].计算机工程与应用,2014,50(21):143-146.
作者姓名:袁津生  荣元媛
作者单位:北京林业大学 信息学院,北京 100083
摘    要:检索结果聚类能够帮助用户快速定位需要查找的信息。注重进行中文文本聚类的同时生成高质量的标签,获取搜索引擎返回的网页标题和摘要,利用分词工具对文本分词,去除停用词;统一构建一棵后缀树,以词语为单位插入后缀树各节点,通过词频、词长、词性和位置几项约束条件计算各节点词语得分;合并基类取得分高的节点词作标签。实验结果显示该方法的聚类簇纯度较高,提取的标签准确且区分性较强,方便用户使用。

关 键 词:检索结果聚类  后缀树  聚类标签  中文检索  聚类  

Chinese search results cluster research based on improved STC
YUAN Jinsheng,RONG Yuanyuan.Chinese search results cluster research based on improved STC[J].Computer Engineering and Applications,2014,50(21):143-146.
Authors:YUAN Jinsheng  RONG Yuanyuan
Affiliation:College of Information, Beijing Forestry University, Beijing 100083, China
Abstract:The search result clustering can help users quickly find the information needed. This paper focuses on Chinese text clustering and how to generate high quality tags. The search engine returns the webpage title and abstract. It uses text segmentation tool to segment text, and removes stop words;it constructs a suffix tree, with words put into the suffix tree nodes. By several constraint conditions such as word frequency, word length, word and location, it calculates each node score; it combines base clusters and makes node word with high score as the label. The experimental results show this method’s clusters have high purity. The extracted labels are accurate and distinguish strongly. It’s user-friendly.
Keywords:search results clustering  suffix tree  cluster label  Chinese search  clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号