首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Database machines are special purpose backend architectures that are designed to support efficiently database management system operations. An important problem in the development of database machines has been that of increasing their performance. Earlier research on the performance evaluation of database machines has indicated that I/O operations constitute a principle performance bottleneck. This is increasingly the case with the advances in multiprocessing and a growth in the volume of data handled by a database machine. One possible strategy to improve the performance of such a system which handles huge volumes of data is to store the data in a compressed form. This can be achieved by introducing VLSI chips for data compression so that data can be compressed and decompressed “on-the-fly”. A set of hardware algorithms for data compression based on the Huffman coding scheme proposed in an earlier work is described. The main focus of this paper is the investigation conducted by the authors to study the effect of incorporating such hardware in a special purpose backend relational database machine. Detailed analytical models of a relational database machine and the analytical results that quantify the performance improvement due to compression hardware are presented.  相似文献   

2.
Scientific and statistical database systems heavily depend on data compression techniques to make possible the management and storage of their large databases. The efficiency of data compression methods has a signficant impact on the overall performance of these systems. The purpose of this paper is to show the importance of data compression to scientific/statistical databases, to discuss the pros and cons of data compression, and to survey data compression techniques relevant to scientific/statistical databases. The emphasis is on the basic idea, motivation, and tradeoffs of each approach. Both software and hardware methods are covered. The paper is concluded by a discussion of several points of research that seem worthy of further investigation.  相似文献   

3.
张佳辰  刘晓光  王刚 《计算机应用》2018,38(5):1404-1409
近年来,各行业数据量增速提升,对承担数据存储任务的数据库系统进行性能优化的需求也越来越强烈。利用关系型数据库I/O密集型、服务器CPU相对空闲的特点,在数据库中引入数据压缩技术,节省了数据存储空间和I/O传输带宽。但当今主流数据库系统的压缩技术都是针对传统的存储和运行环境设计,并未考虑固态硬盘(SSD)等新型存储设备和云数据库等虚拟化运行环境对系统性能的影响,因此,以数据库压缩系统在不同存储环境的缓存优化作为切入点,对系统整体性能的影响进行分析,给出了数据库压缩系统性能的分析模型,并以MySQL为例进行具体分析,给出了对应的缓存优化措施。在内核虚拟机(KVM)和MySQL数据库测试平台上的性能评估结果表明,所提出的优化方法使得系统性能最高有超过40%的提升,在某些配置下获得了优于物理机的性能。  相似文献   

4.
Visual (image and video) database systems require efficient indexing to enable fast access to the images in a database. In addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate the use of compression techniques. We note that image/video indexing and compression are typically pursued independently. This reduces the storage efficiency and may degrade the system performance. In this paper, we present novel algorithms based on vector quantization (VQ) for indexing of compressed images and video. To start with, the images are compressed using VQ. In the first technique, for each codeword in the codebook, a histogram is generated and stored along with the codeword. We note that the superposition of the histograms of the codewords, which are used to represent an image, is a close approximation of the histogram of the image. This histogram is used as an index to store and retrieve the image. In the second technique, the histogram of the labels of an image is used as an index to access the image. We also propose an algorithm for indexing compressed video sequences. Here, each frame is encoded in the intraframe mode using VQ. The labels are used for the segmentation of a video sequence into shots, and for indexing the representative frame of each shot. The proposed techniques not only provide fast access to stored visual data, but also combine compression and indexing. The average retrieval rates are 95% and 94% at compression ratios of 16:1 and 64:1, respectively. The corresponding cut detection rates are 97% and 90%, respectively.  相似文献   

5.
Realtime applications of any microprocessor necessitate interfacing to a large variety of peripheral devices. Various interfacing techniques are discussed. Examples are given in which Intel's 8085 is taken as the typical microprocessor. The I/O transfers considered fall into two categories: memory-mapped transfers and I/O-mapped transfers. Both synchronous and asynchronous types are dealt with. Bit masking and interrupt techniques were used for asynchronous memory-mapped I/O transfer.Also included are multiplexed channel transfers and interrupt transfers. The former are treated as a special class of I/O transfer. The latter are useful in applications where it cannot be predicted when data will arrive for transfer to the microprocessor. Unlike other types of transfer, interrupt transfers are initiated by the I/O devices and not by the microprocessor. They are subdivided into software- and hardware-polled transfers. Examples are given of daisychain and search ring transfers.  相似文献   

6.
The increasing availability of online databases and other information resources in digital libraries and on the World Wide Web has created the need for efficient and effective algorithms for selecting databases to search. A number of techniques have been proposed for query routing or database selection. We have developed a methodology and metrics that can be used to directly compare competing techniques. They can also be used to isolate factors that influence the performance of these techniques so that we can better understand performance issues. In this paper we describe the methodology we have used to examine the performance of database selection algorithms such as gGlOSS and CORI. In addition we develop the theory behind a “random” database selection algorithm and show how it can be used to help analyze the behavior of realistic database selection algorithms. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

7.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

8.
The introduction of software defined networking (SDN) has created an opportunity for file access services to get a view of the underlying network and to further optimize large data transfers. This opportunity is still unexplored while the amount of data that needs to be transferred is growing. Data transfers are also becoming more frequent as a result of interdisciplinary collaborations and the nature of research infrastructures. To address the needs for larger and more frequent data transfers, we propose an approach which enables file access services to use SDN. We extend the file access services developed in our earlier work by including network resources in the provisioning for large data transfers. A novel SDN-aware file transfer mechanism is prototyped for improving the performance and reliability of large data transfers on research infrastructure equipped with programmable network switches. Our results show that I/O and data-intensive scientific workflows benefit from SDN-aware file access services.  相似文献   

9.
超大型压缩数据仓库上的CUBE算法   总被引:9,自引:2,他引:7  
高宏  李建中 《软件学报》2001,12(6):830-839
数据压缩是提高多维数据仓库性能的重要途径,联机分析处理是数据仓库上的主要应用,Cube操作是联机分析处理中最常用的操作之一.压缩多维数据仓库上的Cube算法的研究是数据库界面临的具有挑战性的重要任务.近年来,人们在Cube算法方面开展了大量工作,但却很少涉及多维数据仓库和压缩多维数据仓库.到目前为止,只有一篇论文提出了一种压缩多维数据仓库上的Cube算法.在深入研究压缩数据仓库上的Cube算法的基础上,提出了产生优化Cube计算计划的启发式算法和3个压缩多维数据仓库上的Cube算法.所提出的Cube算法直  相似文献   

10.
This paper presents a performance model of a two-dimensional disk array (TIDA) system, which is composed of several major subsystems including disk cache, intelligent disk array controller, SCSI-like I/O bus, and two-dimensional array of disk devices. Accessing conflict in these subsystems and fork/join synchronization of physical disk requests are considered in the model. The representation for the complex behavior, including the interactions among subsystems, of a whole disk array system distinguishes the model from others that model only individual subsystems. To assist evaluating the architectural alternatives of TIDA, we employ a subsystem access time modeling methodology, in which we model for each subsystem the mean subsystem access time per request (SATPR). Fed with a given set of representative workload parameters, the performance model is used to conduct performance evaluation and the SATPRs of the subsystems are utilized to identify the bottleneck subsystem for performance improvement. The results show that (1) the values of some key design parameters, such as data block size and I/O bus bandwidth that yield the best system throughput are dependent not only on the subsystem performance but also on the interaction among subsystems; (2) an I/O bus bandwidth of 5 Mbytes/s per disk device is large enough for data transfers from/to disk devices equipped with a cache of 1 Mbytes; and (3) the activity of fork/join synchronization of physical disk requests may cause performance degradation, which can be improved by using large I/O bus bandwidth and/or placing a cache in each disk device.  相似文献   

11.
12.
《Parallel Computing》2014,40(10):697-709
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today’s big data era, such an approach is inefficient.In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by “read” operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and (2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4–10 and double the overall execution performance as compared with existing schemes.  相似文献   

13.
Databases store large amounts of information about consumer transactions and other kinds of transactions. This information can be used to deduce rules about consumer behavior, and the rules can in turn be used to determine company policies, for instance with regards to production, marketing and in several other areas. Since databases typically store millions of records, and each record could have up to 100 or more attributes, as an initial step it is necessary to reduce the size of the database by eliminating attributes that do not influence the decision at all or do so very minimally. In this paper we present techniques that can be employed effectively for exact and approximate reduction in a database system. These techniques can be implemented efficiently in a database system using SQL (structured query language) commands. We tested their performance on a real data set and validated them. The results showed that the classification performance actually improved with a reduced set of attributes as compared to the case when all the attributes were present. We also discuss how our techniques differ from statistical methods and other data reduction methods such as rough sets.  相似文献   

14.
基于压缩域的图像检索技术   总被引:21,自引:0,他引:21  
李晓华  沈兰荪 《计算机学报》2003,26(9):1051-1059
图像检索技术是多媒体应用中的关键技术,现有的基于内容图像检索技术大都是基于非压缩域的,对于目前普遍存在的压缩格式图像,采用这种技术必须先解压再检索,不但计算量大,而且需占用较多的中介存储空间,所以严重影响了检索系统的实时性和灵活性,同时各种压缩标准(如JPEG,MPEG,JPEG2000等)的推出与普及也促使人们寻求可以直接在压缩域操作的检索技术,该文对现有的压缩域图像检索技术的发展进行综述,并讨论了未来可能的研究方向。  相似文献   

15.
并存文伴系统是解决I/O瓶颈问题的重要途径。研究表明,科学应用中跨越式的文件访问模式与现存并行文件系统访问这些数据的方法的结合,对于大型数据集的访问其I/O性能是难以接受的。为了提高并行文件系统中对不连续数据的I/O性能,创建了一种新型高性能I/O方法:用户自定义文件视图结合合并I/O请求。并且在WPFS并行文件系统中实现了该方法。研究和实验结果表明,该方法具有增强科学应用性能的潜力。  相似文献   

16.
17.
Database query languages and their use for programming nontraditional applications, such as engineering and artificial intelligence applications, are discussed. In such environments, database programs are used to code applications that work over large data sets residing in databases. Optimizing such programs then becomes a necessity. An examination is made of various optimization techniques, and transformations are suggested for improving the performance of database programs. These transformations result in new equivalent database programs with better space and time performance. Several of these techniques apply to classical query languages, although extended query languages which include an iteration operator are specifically discussed  相似文献   

18.
It is desirable to design partitioning methods that minimize the I/O time incurred during query execution in spatial databases. This paper explores optimal partitioning for two-dimensional data for a class of queries and develops multi-disk allocation techniques that maximize the degree of I/O parallelism obtained in each case. We show that hexagonal partitioning has optimal I/O performance for circular queries among all partitioning methods that use convex non-overlapping regions. An analysis and extension of this result to all possible partitioning techniques is also given. For rectangular queries, we show that hexagonal partitioning has overall better I/O performance for a general class of range queries, except for rectilinear queries, in which case rectangular grid partitioning is superior. By using current algorithms for rectangular grid partitioning, parallel storage and retrieval algorithms for hexagonal partitioning can be constructed. Some of these results carry over to circular partitioning of the data—which is an example of a non-convex region.  相似文献   

19.
Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte compression over differentially coded keys. We revisit this problem with various compression alternatives such as Google's VarIntGB, Binary Packing and Frame-of-Reference. In all cases, we describe algorithms that can operate directly on compressed data. Many of our alternatives exploit the single-instruction-multiple-data (SIMD) instructions supported by modern CPUs. We evaluate our techniques in a database environment provided by Upscaledb, a production-quality key-value database. Our best techniques are SIMD accelerated: they simultaneously reduce memory usage while improving single-threaded speeds. In particular, a differentially coded SIMD binary-packing techniques (BP128) can offer a superior query speed (e.g., 40% better than an uncompressed database) while providing the best compression (e.g., by a factor of ten). For analytic workloads, our fast compression techniques offer compelling benefits. Our software is available as open source.  相似文献   

20.
With the exponential growth in size of geometric data, it is becoming increasingly important to make effective use of multilevel caches, limited disk storage, and bandwidth. As a result, recent work in the visualization community has focused either on designing sequential access compression schemes or on producing cache-coherent layouts of (uncompressed) meshes for random access. Unfortunately combining these two strategies is challenging as they fundamentally assume conflicting modes of data access. In this paper, we propose a novel order-preserving compression method that supports transparent random access to compressed triangle meshes. Our decompression method selectively fetches from disk, decodes, and caches in memory requested parts of a mesh. We also provide a general mesh access API for seamless mesh traversal and incidence queries. While the method imposes no particular mesh layout, it is especially suitable for cache-oblivious layouts, which minimize the number of decompression I/O requests and provide high cache utilization during access to decompressed, in-memory portions of the mesh. Moreover, the transparency of our scheme enables improved performance without the need for application code changes. We achieve compression rates on the order of 20:1 and significantly improved I/O performance due to reduced data transfer. To demonstrate the benefits of our method, we implement two common applications as benchmarks. By using cache-oblivious layouts for the input models, we observe 2?6 times overall speedup compared to using uncompressed meshes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号