首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
方才华  刘景宁  童薇  高阳  雷霞  蒋瑜 《计算机应用》2017,37(5):1257-1262
由于NAND闪存的固有限制,写前擦除和擦除粒度较大,基于NAND Flash的固态硬盘(SSD)需要执行垃圾回收以重用失效页。然而垃圾回收带来的高开销会显著降低SSD的性能,也会直接影响SSD的寿命。特别是对于频繁使用的有数据碎片的SSD,垃圾回收带来的性能下降问题将更为严重,现有的垃圾回收(GC)算法各自侧重垃圾回收操作的某个步骤,并没有给出全面考虑各步骤对整体影响的综合方案。针对该问题,在详细剖析垃圾回收过程的基础上,提出了一种全程优化的垃圾回收方法WPO-GC,在数据初始放置、垃圾回收目标块的选择、有效数据的迁移、触发回收的时间点以及中断处理方式上,尽可能全面地考虑各步骤对SSD正常读写请求和寿命的影响。通过开源模拟器SSDsim上的WPO-GC的有效性验证表明,同典型GC算法相比,WPO-GC可以减少SSD读请求延迟20%~40%和写请求延迟17%~40%,均衡磨损近30%。  相似文献   

2.
An efficient peer-to-peer indexing tree structure for multidimensional data   总被引:4,自引:1,他引:3  
As one of the most important technologies for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and industrial communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks. However, multidimensional data indexing remains as a big challenge to P2P computing, because of the inefficiency in search and network maintenance caused by the complicated existing index structures, which greatly limits the scalability of applications and dimensionality of the data to be indexed.We propose SDI (Swift tree structure for multidimensional Data Indexing), a swift index scheme with a simple tree structure for multidimensional data indexing in large-scale distributed systems. While keeping the query efficiency in O(logN) in terms of routing hops, SDI has extremely low maintenance costs which is proved through theoretical analysis. Furthermore, SDI overcomes the root-bottleneck problem existing in most other tree-based distributed indexing systems. Extensive empirical study verifies the superiority of SDI in both query and maintenance performance.  相似文献   

3.
The selection of the right I/O scheduler for a given workload can significantly improve the I/O performance. However, this is not an easy task because several factors should be considered, and even the “best” scheduler can change over the time, specially if the workload’s characteristics change too. To address this problem, we present a Dynamic and Automatic Disk Scheduling framework (DADS) that simultaneously compares two different Linux I/O schedulers, and dynamically selects that which achieves the best I/O performance for any workload at any time. The comparison is made by running two instances of a disk simulator inside the Linux kernel. Results show that, by using DADS, the performance achieved is always close to that obtained by the best scheduler. Thus, system administrators are exempted from selecting a suboptimal scheduler which can provide a good performance for some workloads, but may downgrade the system throughput when the workloads change.  相似文献   

4.
5.
Hierarchical semistructured data arise frequently in the Web, or in biological information processing applications. Semistructured objects describing the same type of information have similar but not identical structure. Usually they share some common ‘schema’. Finding the common schema of a collection of semistructured objects is a very important task and due to the huge amount of such data encountered, data mining techniques have been employed.In this paper, we study the problem of discovering frequently occurring structures in semistructured objects using the notion of association rules. We identify that discovering the frequent structures in the early phases of the mining procedure is the dominant cost and we provide a fast algorithm addressing this issue. We present experimental results, which demonstrate the superiority of the proposed algorithm and also its efficiency in reducing dramatically the processing cost.  相似文献   

6.
Today’s data storage systems are increasingly adopting flash-based solid-state drives (SSD), in which new data is written out-of-place. The space occupied by invalidated data is reclaimed by means of a garbage-collection process. This involves additional write operations that result in write amplification, which negatively affects the performance, endurance, and lifetime of SSDs. Several garbage-collection schemes have been proposed in the literature and corresponding models have been developed for assessing their efficiency. We demonstrate that some of these publications arrive at conflicting results. We establish that the discrepancies identified are due to pitfalls in the modeling and analysis of some of the basic garbage-collection schemes. We effectively resolve these discrepancies by rectifying the pitfalls and developing proper analytical models that yield accurate results. We obtain new results for the circular scheme that are subsequently used to develop a new accurate model for the windowed greedy garbage-collection scheme. Results of theoretical and practical importance are analytically derived and experimentally confirmed. They demonstrate that, as the window size decreases, the write amplification increases, illustrating the tradeoff between computation time and write amplification. The results also show that, under certain conditions, the write amplification of the simple circular scheme can be similar to that of the optimum, albeit more complex greedy garbage-collection scheme. In this case, the write amplification for the windowed greedy scheme is essentially found to be independent of the window size.  相似文献   

7.
The emergence of non-volatile memory (NVM) has introduced new opportunities for performance optimizations in existing storage systems. To better utilize its byte-addressability and near-DRAM performance, NVM can be attached on the memory bus and accessed via load/store memory instructions rather than the conventional block interface. In this scenario, a cache line (usually 64 bytes) becomes the data transfer unit between volatile and non-volatile devices. However, the failureatomicity of write on NVM is the memory bit width (usually 8 bytes). This mismatch between the data transfer unit and the atomicity unit may introduce write amplification and compromise data consistency of node-based data structures such as B+-trees. In this paper, we propose WOBTree, a Write-Optimized B+-Tree for NVM to address the mismatch problem without expensive logging. WOBTree minimizes the update granularity from a tree node to a much smaller subnode and carefully arranges the write operations in it to ensure crash consistency and reduce write amplification. Experimental results show that compared with previous persistent B+-tree solutions, WOBTree reduces the write amplification by up to 86× and improves write performance by up to 61× while maintaining similar search performance.  相似文献   

8.
We will show that the hierarchical linear subspace method is a tree that divides the distances between the subspaces. By doing so, the data is divided into disjoint entities. The asymptotic upper bound estimation of the maximum applicable number of subspaces is logarithmically constrained by the number of represented elements and their dimension. The search in such a tree starts at the subspace with the lowest dimension. In this subspace, the set of all possible similar images is determined. In the next subspace, additional metric information corresponding to a higher dimension is used to reduce this set. The distances between the subspaces correspond to the values represented by the difference between the mean distance of all the points in one space and a corresponding mean distance of the objects in a subspace. The theoretical estimation of temporal complexity of the algorithmic is logarithmic. The costs are equivalent to the search costs in an tree plus the additional costs of the dimension of the data space.  相似文献   

9.
Modern solid-state drives (SSDs) are integrating more internal resources to achieve higher capacity. Parallelizing accesses across internal resources can potentially enhance the performance of SSDs. However, exploiting parallelism inside SSDs is challenging owing to real-time access conflicts. In this paper, we propose a highly parallelizable I/O scheduler (PIOS) to improve internal resource utilization in SSDs from the perspective of I/O scheduling. Specifically, we first pinpoint the conflicting flash requests with precision during the address translation in the Flash Translation Layer (FTL). Then, we introduce conflict eliminated requests (CERs) to reorganize the I/O requests in the device-level queue by dispatching conflicting flash requests to different CERs. Owing to the significant performance discrepancy between flash read and write operations, PIOS employs differentiated scheduling schemes for read and write CER queues to always allocate internal resources to the conflicting CERs that are more valuable. The small dominant size prioritized scheduling policy for the write queue significantly decreases the average write latency. The high parallelism density prioritized scheduling policy for the read queue better utilizes resources by exploiting internal parallelism aggressively. Our evaluation results show that the parallelizable I/O scheduler (PIOS) can accomplish better SSD performance than existing I/O schedulers implemented in both SSD devices and operating systems.  相似文献   

10.
This article addresses the problem of indexing and retrieving first-order predicate calculus terms in the context of automated deduction programs. The four retrieval operations of concern are to find variants, generalizations, instances, and terms that unify with a given term. Discrimination-tree indexing is reviewed, and several variations are presented. The path-indexing method is also reviewed. Experiments were conducted on large sets of terms to determine how the properties of the terms affect the performance of the two indexing methods. Results of the experiments are presented.This was supported by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.  相似文献   

11.
Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of S within some threshold distance to g, whereas in a k-nearest neighbor searching problem, the k elements of S closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-tree, the multivantage point structure, and the FQ-tree to create a new structure in the "bisector tree" class, called the Antipole tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament is far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole tree outperforms by a factor of approximately two existing structures such as list of clusters, M-trees, and others and, in many cases, it achieves better clustering properties.  相似文献   

12.
The K-Nearest Neighbor (K-NN) search problem is the way to find the K closest and most similar objects to a given query. The K-NN is essential for many applications such as information retrieval and visualization, machine learning and data mining. The exponential growth of data imposes to find approximate approaches to this problem. Permutation-based indexing is one of the most recent techniques for approximate similarity search. Objects are represented by permutation lists ordering their distances to a set of selected reference objects, following the idea that two neighboring objects have the same surrounding. In this paper, we propose a novel quantized representation of permutation lists with its related data structure for effective retrieval on single and multicore architectures. Our novel permutation-based indexing strategy is built to be fast, memory efficient and scalable. This is experimentally demonstrated in comparison to existing proposals using several large-scale datasets of millions of documents and of different dimensions.  相似文献   

13.
Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control and recovery in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs.  相似文献   

14.
The increasing performance and wider spread use of automated semantic annotation and entity linking platforms has empowered the possibility of using semantic information in information retrieval. While keyword-based information retrieval techniques have shown impressive performance, the addition of semantic information can increase retrieval performance by allowing for more accurate sense disambiguation, intent determination, and instance identification, just to name a few. Researchers have already delved into the possibility of integrating semantic information into practical search engines using a combination of techniques such as using graph databases, hybrid indices and adapted inverted indices, among others. One of the challenges with the efficient design of a search engine capable of considering semantic information is that it would need to be able to index information beyond the traditional information stored in inverted indices, including entity mentions and type relationships. The objective of our work in this paper is to investigate various ways in which different data structure types can be adopted to integrate three types of information including keywords, entities and types. We will systematically compare the performance of the different data structures for scenarios where (i) the same data structure types are adopted for the three types of information, and (ii) different data structure types are integrated for storing and retrieving the three different information types. We report our findings in terms of the performance of various query processing tasks such as Boolean and ranked intersection for the different indices and discuss which index type would be appropriate under different conditions for semantic search.  相似文献   

15.
Google and other products have revolutionized the way we search for information. There are, however, still a number of research challenges. One challenge that arises specifically in desktop search is to exploit the structure and semantics of documents, as defined by the application program that generated the data (e.g., Word, Excel, or Outlook). The current generation of search products does not understand these structures and therefore often returns wrong results. This paper shows how today’s search technology can be extended in order to take the specific semantics of certain structures into account. The key idea is to extend inverted file index structures with predicates which encode the circumstances under which certain keywords of a document become visible to a user. This paper provides a framework that allows to express the semantics of structures in documents and algorithms to construct enhanced, predicate-based indexes. Furthermore, this paper shows how keyword and phrase queries can be processed efficiently on such enhanced indexes. It is shown that the proposed approach has superior retrieval performance with regard to both recall and precision and has tolerable space and query running time overheads.  相似文献   

16.
Recent papers have indicated that indexing is a promising approach to fast model-based object recognition because it allows most of the possible matches between sets of image features and sets of model features to be quickly eliminated from consideration. This correspondence describes a system that is capable of indexing using sets of three points undergoing 3D transformations and projection by taking advantage of the probabilistic peaking effect. To be able to index using sets of three points, we must allow false negatives. These are overcome by ensuring that we examine several correct hypotheses. The use of these techniques to speed up the alignment method is described. Results are given on real and synthetic data  相似文献   

17.
18.
America's Space Exploration program is characterized by cutting edge technology in support of NASA's goals and research objectives. One principal component of the Space Shuttle System is the Orbiter Vehicle which undergoes necessary repairs and refurbishment at the Kennedy Space Center following each mission. The Orbiter's Thermal Protection System (TPS) is a mission critical component which protects the orbiter from the heat of re-entry from space. The University of Central Florida (UCF) research team was tasked with evaluating TPS processing requirements to reduce overall cycle times between operational mission. A job/task performance analysis was conducted to identify candidate TPS processes for evaluation and possible improvements. The research team developed a computer generated Improvement Potential Index (IPI) following a systematic data collection process. The IPI was then used to identify the top twenty processes which TPS engineers, quality assurance personnel and technicians identified as tasks which would most likely benefit from process improvement. The specified tasks were then analyzed for methods optimization. This paper documents the results of this effort to date, with emphasis on the Improvement Potential Index.  相似文献   

19.
引言随着计算机科技的发展,无纸办公日益成为各单位日常办公的主要形式。而随着USB存储设备日益广泛的使用,数据泄漏的危害也越来越严重。因此在单位内部对USB存储设备的操作权限进行控制是很有必要的。本设计可将不同的USB存储设备(包括安全存储设  相似文献   

20.
单片机实现对CF卡的读写   总被引:5,自引:0,他引:5  
CF 卡是一种包含了控制器和大容量 Flash 存储器的标准器件,具有容量大、体积小、高性能、携带方便等优点,已广泛应用在数据采集系统和许多消费类电子产品中。本文详细介绍 C F 卡在单片机系统中的硬件接口电路,以及单片机对 CF 卡进行标准文件读写的实现,且写入的文件能被 Windows 操作系统读写。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号