基于频繁序列挖掘的文件系统缓存算法设计 File system caching algorithm based on frequent sequence mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于频繁序列挖掘的文件系统缓存算法设计

引用本文：	杜科星,张小芳,张晓,赵小南.基于频繁序列挖掘的文件系统缓存算法设计[J].计算机应用研究,2022,39(3):831-835.

作者姓名：	杜科星张小芳张晓赵小南

作者单位：	西北工业大学软件学院,西安710072,西北工业大学计算机学院,西安710072

基金项目：	国家重点研发计划资助项目(2018YFB1004401)；

摘要：	传统缓存算法存在命中率低、交换率高等问题,且现有缓存算法在分布式大数据存储系统中并不适用,为此提出了一种基于频繁序列挖掘的自适应缓存策略。该方法使用数据挖掘算法挖掘历史访问窗口内的频繁序列,将频繁序列模糊合并后构建匹配模式集合以供查询。当新的访问来临时,将固定访问长度内的子序列与匹配模式集合进行匹配,然后根据匹配结果预取数据,同时结合修改后的S4LRU(4-segmented least recently used)数据结构进行缓存数据换出。在公开的大数据处理trace集上进行了仿真实验,实验结果表明,在不同的缓存大小下,提出算法与现有典型缓存算法相比,平均命中率提高了0.327倍,平均交换率降低了0.33倍,同时具有低开销和高时效的特点。此结果表明,该方法较传统替换算法而言是一个更为有效的缓存策略。
关键词：	缓存算法频繁序列挖掘分布文件系统优化
收稿时间：	2021/8/1 0:00:00
修稿时间：	2022/2/16 0:00:00
File system caching algorithm based on frequent sequence mining

Du Ke Xing,Zhang Xiao Fang,Zhang Xiao and Zhao Xiao Nan.File system caching algorithm based on frequent sequence mining[J].Application Research of Computers,2022,39(3):831-835.

Authors:	Du Ke Xing Zhang Xiao Fang Zhang Xiao and Zhao Xiao Nan

Affiliation:	(College of Software,Northwestern Polytechnical University,Xi’an 710072,China;College of Computer,Northwestern Polytechnical University,Xi’an 710072,China)

Abstract:	Traditional cache algorithms have problems such as low hit rate and high exchange rate. And the existing caching algorithm is not applicable in the distributed big data storage system. This paper proposed an adaptive caching strategy based on frequent sequence mining. This method used a data mining algorithm to mine the frequent sequences in the historical access window, and merged the frequent sequences to construct a set of matching patterns for query. When a new access coming, matched the subsequence within the fixed access length with the matching pattern set, and then prefetched the data according to the matching result, and combined with the modified S4 LRU(4-segmented least recently used) data structure for cache data exchange out. This paper conducted simulation experiments on the public big data processing trace set. The experimental results show that, under different cache sizes, compared with the existing typical cache algorithms, the proposed algorithm increases the average hit rate by 0.327 times and the average exchange rate reduces by 0.33 times, at the same time has the characteristics of low overhead and high time efficiency. This result shows that the proposed method is a more effective caching strategy than the traditional replacement algorithm.

Keywords:	caching algorithm frequent sequence mining distributed file system optimization
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏