首页 | 本学科首页   官方微博 | 高级检索  
     

基于一致性Hash的分布式海量分子检索模型
引用本文:孙霞,禹龙,田生伟,闫奕霖,林江丽. 基于一致性Hash的分布式海量分子检索模型[J]. 计算机应用, 2015, 35(4): 956-959. DOI: 10.11772/j.issn.1001-9081.2015.04.0956
作者姓名:孙霞  禹龙  田生伟  闫奕霖  林江丽
作者单位:1. 新疆大学 软件学院, 乌鲁木齐 830008;2. 新疆大学 网络中心, 乌鲁木齐 830046;3. 江苏理工学院 计算机工程学院, 江苏 常州 213001;4. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;5. 新疆大学 化学化工学院, 乌鲁木齐 830046
基金项目:国家自然科学基金资助项目
摘    要:针对大数据环境下,传统通用图匹配检索低效、折射率数据无法快速定位的问题,建立了基于一致性Hash的分布式海量分子检索模型。模型结合分子特点,将连续的折射率通过等宽算法离散化建立高速Hash索引,实现分布式海量分子检索系统,有效减小了参与计算的分子数据规模,并根据分子访问频次处理冲突从而提高分子检索效率。实验结果表明,在包含20万个分子的数据中,该方法平均检索耗时约为通用图匹配平均检索耗时的5%,模型性能稳定,具有高可扩展性;对于海量数据环境下依据折射率检索高频次分子较为适用。

关 键 词:分子检索  离散化  一致性Hash  冲突处理  分布式计算  
收稿时间:2014-11-04
修稿时间:2014-12-31

Distributed massive molecule retrieval model based on consistent Hash
SUN Xia,YU Long,TIAN Shengwei,YAN Yilin,LIN Jiangli. Distributed massive molecule retrieval model based on consistent Hash[J]. Journal of Computer Applications, 2015, 35(4): 956-959. DOI: 10.11772/j.issn.1001-9081.2015.04.0956
Authors:SUN Xia  YU Long  TIAN Shengwei  YAN Yilin  LIN Jiangli
Affiliation:1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
2. Network Center, Xinjiang University, Urumqi Xinjiang 830046, China;
3. School of Computer Engineering, Jiangsu University of Technology, Changzhou Jiangsu 213001, China;
4. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China;
5. School of Chemistry and Chemical Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
Abstract:In view of the problems that the traditional general graph matching search is inefficient, and refractive index data cannot be positioned fast in large data environment, a distributed massive molecular retrieval model based on consistent Hash function was established. Combined with the characteristics of molecular storage structures, to improve retrieval efficiency of molecules, the continuous refractive index was discretized by fixed width algorithm to establish high-speed Hash index, and the distributed massive retrieval system was realized. The size of dataset was effectively reduced, and Hash collision was handled according to the visiting frequency. The experimental results show that, in the chemical data containing 200 thousand structures of molecules, the average time of this method is about five percent of the traditional general graph matching search. Besides, the model has the steady performance with high scalability. It is applicable to retrieve high-frequency molecules in accordance with refractive index under the environment of massive data.
Keywords:molecular retrieval  discretization  consistent Hash  conflict settlement  distributed computation
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号