期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

TridentFS: a hybrid file system for non‐volatile RAM,flash memory and magnetic disk

Ting‐Chang Huang Da‐Wei Chang 《Software》2016,46(3):291-318

A hybrid file system with high flexibility and performance, called Trident file system (TridentFS), is proposed to manage three types of storage with different performance characteristics, that is, Non‐Volatile RAM (NVRAM), flash memory and magnetic disk. Unlike previous NVRAM‐based hybrid file systems, novel techniques are used in TridentFS to improve the flexibility and performance. TridentFS is flexible by the support of various forms of flash memory and a wide range of NVRAM size. The former is achieved on the basis of the concept of stackable file systems, and the latter is achieved by allowing data eviction from the NVRAM. TridentFS achieves high performance by keeping hot data in the NVRAM and allowing data evicted from the NVRAM to be parallel distributed to the flash memory and disk. A data eviction policy is proposed to determine the data to be evicted from the NVRAM. Moreover, a data distribution algorithm is proposed to effectively leverage the parallelism between flash memory and disk during data distribution. TridentFS is implemented as a loadable module on Linux 2.6.29. The performance results show that it works well for both small‐sized and large‐sized NVRAM, and the proposed eviction policy outperforms LRU by 27%. Moreover, by effectively leveraging the parallelism between flash memory and disk, the proposed data distribution algorithm outperforms the RAID‐0 and a size‐based distribution method by up to 471.6% and 82.6%, respectively. By considering the data size and performance characteristics of the storage, the proposed data distribution algorithm outperforms the greedy algorithm by up to 15.5%. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

2.

Dependency‐Free Parallel Progressive Meshes

E. Derzapf M. Guthe 《Computer Graphics Forum》2012,31(8):2288-2302

The constantly increasing complexity of polygonal models in interactive applications poses two major problems. First, the number of primitives that can be rendered at real‐time frame rates is currently limited to a few million. Secondly, less than 45 million triangles—with vertices and normal—can be stored per gigabyte. Although the rendering time can be reduced using level‐of‐detail (LOD) algorithms, representing a model at different complexity levels, these often even increase memory consumption. Out‐of‐core algorithms solve this problem by transferring the data currently required for rendering from external devices. Compression techniques are commonly used because of the limited bandwidth. The main problem of compression and decompression algorithms is the only coarse‐grained random access. A similar problem occurs in view‐dependent LOD techniques. Because of the interdependency of split operations, the adaption rate is reduced leading to visible popping artefacts during fast movements. In this paper, we propose a novel algorithm for real‐time view‐dependent rendering of gigabyte‐sized models. It is based on a neighbourhood dependency‐free progressive mesh data structure. Using a per operation compression method, it is suitable for parallel random‐access decompression and out‐of‐core memory management without storing decompressed data. 相似文献

3.

IN‐Tune: an In‐Situ non‐invasive performance tuning tool for multi‐threaded Linux on symmetric multiprocessing Pentium workstations

Jeremy B. Rodgers Rhonda Kay Gaede Jeffrey H. Kulick 《Software》1999,29(9):775-792

This paper documents the design and implementation of the IN‐Tune software tool suite, which enables a user to collect real‐time code and hardware profiling information on Intel‐based symmetric multiprocessors running the Linux operating system. IN‐Tune provides a virtually non‐invasive tool for performance analysis and tuning of programs. Unlike other analysis tools, IN‐Tune isolates data with respect to individual threads. It also utilizes performance monitoring hardware registers to permit instrumentation of individual threads as they run in‐situ, thus collecting data with appropriate considerations for a multiprocessor environment. Data can be sampled using two different mechanisms. First, the user can collect data by making calls to the system upon the occurrence of specific software events. Secondly, data can be collected at a fixed, fine grain (e.g. 1–10 microseconds) interval using either software or hardware interrupts. To allow observation of codes for which source code modification is impractical or impossible, a ‘shell’ task is created which permits monitoring without code modification. Although this work deals with Intel processors and Linux, the widespread availability of performance monitoring registers in modern processors makes this work widely applicable. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

4.

State‐of‐the‐Art in Compressed GPU‐Based Direct Volume Rendering

M. Balsa Rodríguez E. Gobbetti J.A. Iglesias Guitián M. Makhinya F. Marton R. Pajarola S.K. Suter 《Computer Graphics Forum》2014,33(6):77-100

Great advancements in commodity graphics hardware have favoured graphics processing unit (GPU)‐based volume rendering as the main adopted solution for interactive exploration of rectilinear scalar volumes on commodity platforms. Nevertheless, long data transfer times and GPU memory size limitations are often the main limiting factors, especially for massive, time‐varying or multi‐volume visualization, as well as for networked visualization on the emerging mobile devices. To address this issue, a variety of level‐of‐detail (LOD) data representations and compression techniques have been introduced. In order to improve capabilities and performance over the entire storage, distribution and rendering pipeline, the encoding/decoding process is typically highly asymmetric, and systems should ideally compress at data production time and decompress on demand at rendering time. Compression and LOD pre‐computation does not have to adhere to real‐time constraints and can be performed off‐line for high‐quality results. In contrast, adaptive real‐time rendering from compressed representations requires fast, transient and spatially independent decompression. In this report, we review the existing compressed GPU volume rendering approaches, covering sampling grid layouts, compact representation models, compression techniques, GPU rendering architectures and fast decoding techniques. 相似文献

5.

Data‐Parallel Decompression of Triangle Mesh Topology

Quirin Meyer Benjamin Keinert Gerd Sußner Marc Stamminger 《Computer Graphics Forum》2012,31(8):2541-2553

We propose a lossless, single‐rate triangle mesh topology codec tailored for fast data‐parallel GPU decompression. Our compression scheme coherently orders generalized triangle strips in memory. To unpack generalized triangle strips efficiently, we propose a novel parallel and scalable algorithm. We order vertices coherently to further improve our compression scheme. We use a variable bit‐length code for additional compression benefits, for which we propose a scalable data‐parallel decompression algorithm. For a set of standard benchmark models, we obtain (min: 3.7, med: 4.6, max: 7.6) bits per triangle. Our CUDA decompression requires only about 15% of the time it takes to render the model even with a simple shader. 相似文献

6.

HCCMeshes: Hierarchical‐Culling oriented Compact Meshes

Tae‐Joon Kim Yongyoung Byun Yongjin Kim Bochang Moon Seungyong Lee Sung‐Eui Yoon 《Computer Graphics Forum》2010,29(2):299-308

Hierarchical culling is a key acceleration technique used to efficiently handle massive models for ray tracing, collision detection, etc. To support such hierarchical culling, bounding volume hierarchies (BVHs) combined with meshes are widely used. However, BVHs may require a very large amount of memory space, which can negate the benefits of using BVHs. To address this problem, we present a novel hierarchical‐culling oriented compact mesh representation, HCCMesh, which tightly integrates a mesh and a BVH together. As an in‐core representation of the HCCMesh, we propose an i‐HCCMesh representation that provides an efficient random hierarchical traversal and high culling efficiency with a small runtime decompression overhead. To further reduce the storage requirement, the in‐core representation is compressed to our out‐of‐core representation, o‐HCCMesh, by using a simple dictionary‐based compression method. At runtime, o‐HCCMeshes are fetched from an external drive and decompressed to the i‐HCCMeshes stored in main memory. The i‐HCCMesh and o‐HCCMesh show 3.6:1 and 10.4:1 compression ratios on average, compared to a naively compressed (e.g., quantized) mesh and BVH representation. We test the HCCMesh representations with ray tracing, collision detection, photon mapping, and non‐photorealistic rendering. Because of the reduced data access time, a smaller working set size, and a low runtime decompression overhead, we can handle models ten times larger in commodity hardware without the expensive disk I/O thrashing. When we avoid the disk I/O thrashing using our representation, we can improve the runtime performances by up to two orders of magnitude over using a naively compressed representation. 相似文献

7.

3D Video Recorder: a System for Recording and Playing Free‐Viewpoint Video†

Stephan Würmlin Edouard Lamboray Oliver G. Staadt Markus H. Gross 《Computer Graphics Forum》2003,22(2):181-193

We present the 3D Video Recorder, a system capable of recording, processing, and playing three‐dimensional video from multiple points of view. We first record 2D video streams from several synchronized digital video cameras and store pre‐processed images to disk. An off‐line processing stage converts these images into a time‐varying 3D hierarchical point‐based data structure and stores this 3D video to disk. We show how we can trade‐off 3D video quality with processing performance and devise efficient compression and coding schemes for our novel 3D video representation. A typical sequence is encoded at less than 7 Mbps at a frame rate of 8.5 frames per second. The 3D video player decodes and renders 3D videos from hard‐disk in real‐time, providing interaction features known from common video cassette recorders, like variable‐speed forward and reverse, and slow motion. 3D video playback can be enhanced with novel 3D video effects such as freeze‐and‐rotate and arbitrary scaling. The player builds upon point‐based rendering techniques and is thus capable of rendering high‐quality images in real‐time. Finally, we demonstrate the 3D Video Recorder on multiple real‐life video sequences. ACM CSS: I.3.2 Computer Graphics—Graphics Systems, I.3.5 Computer Graphics—Computational Geometry and Object Modelling, I.3.7 Computer Graphics—Three‐Dimensional Graphics and Realism 相似文献

8.

On‐demand data co‐allocation with user‐level cache for grids

Po‐Cheng Chen Jyh‐Biau Chang Yen‐Liang Su Ce‐Kuen Shieh 《Concurrency and Computation》2010,22(18):2488-2513

Conventional remote data access middlewares usually provide client applications with either a pre‐staging scheme or an on‐demand access scheme to fetch data. The pre‐staging scheme uses parallel downloads to fetch a completed input file from multiple data sources, even when only a tiny file fragment is required. Such a transfer scheme consumes unnecessary data transmission time and storage space. In contrast, the on‐demand scheme downloads only the required data blocks from a single data source and does not fully utilize the downstream bandwidth of the computing nodes. This paper presents a middleware called ‘Spigot’ that facilitates legacy (grid‐unaware) applications to transparently access remote data by using native I/O function calls. Spigot uses the on‐demand concept to avoid unnecessary data transfer and adopts a co‐allocation download algorithm to improve the data transfer performance. Moreover, it uses the pre‐fetching strategy to reduce the data waiting time by overlapping data acquisition and data processing. It also provides the client application with its own user‐level cache, which is advantageous since a larger cache space is available in comparison with the kernel‐level cache. Further, it is easy to maintain data consistency between Spigot nodes. The experimental results indicate that Spigot achieves superior performance in reducing the data waiting time than the pre‐staging and the on‐demand access schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

9.

Improving energy efficiency for flash memory based embedded applications

Hyungkeun Song Sukwon Choi Hojung Cha Rhan Ha 《Journal of Systems Architecture》2009,55(1):15-24

The JFFS2 file system for flash memory compresses files before actually writing them into flash memory. Because of this, multimedia files, for instance, which are already compressed in the application level go through an unnecessary and time-consuming compression stage and cause energy waste. Also, when reading such multimedia files, the default use of disk cache results in unnecessary main memory access, hence an energy waste, due to the low cache hit ratio. This paper presents two techniques to reduce the energy consumption of the JFFS2 flash file system for power-aware applications. One is to avoid data compression selectively when writing files, and the other is to bypass the page caching when reading sequential files. The modified file system is implemented on a PDA running Linux and the experiment results show that the proposed mechanism effectively reduces the overall energy consumption when accessing continuous and large files. 相似文献

10.

A 510‐kb SOG‐DRAM for mobile displays with embedded frame memories

H. Haga Y. Nonaka Y. Kamon T. Otose D. Sasaki Y. Kitagishi T. Matsuzaki Y. Sato H. Asada 《Journal of the Society for Information Display》2006,14(4):339-344

Abstract— A system‐on‐glass (SOG) dynamic random access memory (DRAM), which enables the implementation of frame‐memory‐integrated displays, has been developed. A dynamic one‐transistor‐one‐capacitor memory cell, which has a data retention time of over 16.6 msec, and a compression/decompression (CODEC) circuit were developed to reduce the layout area and power. The CODEC enables an 18‐bit/pixel color display, while reducing the memory capacity from 18 to 12 bits/pixel. A frame‐memory macro was created by combining the SOG‐DRAM with an embedded controller that enables independent access for writing and reading. Its operation was verified by chip measurement and demonstration as a frame‐memory operation of 262k‐color QCIF+ displays. The work reported in this paper was the first step to creating a Zero‐Chip Display with an integrated frame memory, and it proved the concept was feasible. 相似文献

11.

对几种倒排文件压缩技术的研究与分析

王虎王潜平《计算机工程与应用》2006,42(7):169-173

要提高文件检索系统的性能,需要对倒排文件压缩技术进行研究与对比,以使文件检索系统在最大压缩比和最快的解压速度间寻找均衡,以到达最大的吞吐量。对Golomb,Eliasgamma,Eliasdelta,VariableByteEncoding和BinaryInterpolativeCoding五种压缩技术通过在Windows操作系统下对theTRECWallStreetJournalcollection文件系统进行存取与压缩,从CPU时钟周期角度来对比各个算法的压缩比、压缩与解压缩的时间和对文件的读取和查询所花费的时间,并对它们进行了一个综合的评测。相似文献

12.

复内核操作系统文件系统的设计与实现

王国伟王彩芬朱俊杰《计算机工程与应用》2016,52(7):43-49

随着云计算、大数据进一步的发展,促使提供计算服务的单个节点的硬件性能不断的提升,但数据中心资源利用率较低,且可扩展性较差的问题始终存在。人们试图从各个方面解决这个问题。复内核操作系统Popcorn Linux就是其中一个比较典型的解决方案。文件系统作为操作系统的重要组成部分,直接影响着数据中心应用的执行效率。传统的文件系统因为磁盘控制器的原因,无法移植到复内核操作系统上,从而难以满足新形势下的需求。针对这个问题,提出了一种全新的适用于复内核操作系统的文件系统POPFUSE。该文件系统基于FUSE框架实现,解决了因磁盘控制器有限,多个内核实例无法同时访问磁盘资源的问题,通过共享内存的方式,保证了通信的稳定,提高了文件系统的效率,进而促进了多个内核的操作系统整体性能的提升。相似文献

13.

Linux下Ext2文件系统的精剪与优化设计

周俊杰柯跃《单片机与嵌入式系统应用》2017,17(12)

基于Linux系统通过精细剪裁和优化设计,构建开发了一个Linux多级用户的Ext2二级文件系统,以作GPS等专业嵌入式的文件系统.其系统包括各基本实体(索引节点、目录节点、文件节点等)的定义和各功能模块的定义及实现,最后展示了系统部分功能模块的测试数据.本文所构造的Linux文件系统已经过测试并取得了良好效果. 相似文献

14.

Key Time Steps Selection for Large‐Scale Time‐Varying Volume Datasets Using an Information‐Theoretic Storyboard

下载免费PDF全文

Bo Zhou Yi‐Jen Chiang 《Computer Graphics Forum》2018,37(3):37-49

Key time steps selection is essential for effective and efficient scientific visualization of large‐scale time‐varying datasets. We present a novel approach that can decide the number of most representative time steps while selecting them to minimize the difference in the amount of information from the original data. We use linear interpolation to reconstruct the data of intermediate time steps between selected time steps. We propose an evaluation of selected time steps by computing the difference in the amount of information (called information difference) using variation of information (VI) from information theory, which compares the interpolated time steps against the original data. In the one‐time preprocessing phase, a dynamic programming is applied to extract the subset of time steps that minimize the information difference. In the run‐time phase, a novel chart is used to present the dynamic programming results, which serves as a storyboard of the data to guide the user to select the best time steps very efficiently. We extend our preprocessing approach to a novel out‐of‐core approximate algorithm to achieve optimal I/O cost, which also greatly reduces the in‐core computing time and exhibits a nice trade‐off between computing speed and accuracy. As shown in the experiments, our approximate method outperforms the previous globally optimal DTW approach [ TLS12 ] on out‐of‐core data by significantly improving the running time while keeping similar qualities, and is our major contribution. 相似文献

15.

The design and implementation of the parallel out‐of‐core ScaLAPACK LU,QR, and Cholesky factorization routines

Eduardo D'Azevedo Jack Dongarra 《Concurrency and Computation》2000,12(15):1481-1493

This paper describes the design and implementation of three core factorization routines—LU, QR, and Cholesky—included in the out‐of‐core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer sub‐matrice panels into memory. The ‘left‐looking’ column‐oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high‐performance ScaLAPACK factorization routines as in‐core computational kernels. We present the details of the implementation for the out‐of‐core ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf Linux cluster. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

16.

Cross‐layer resource control and scheduling for improving interactivity in Android

Sungju Huh Jonghun Yoo Seongsoo Hong 《Software》2015,45(11):1549-1570

Android smartphones are often reported to suffer from sluggish user interactions due to poor interactivity. This is partly because Android and its task scheduler, the completely fair scheduler (CFS), may incur perceptibly long response time to user‐interactive tasks. Particularly, the Android framework cannot systemically favor user‐interactive tasks over other background tasks since it does not distinguish between them. Furthermore, user‐interactive tasks can suffer from high dispatch latency due to the non‐preemptive nature of CFS. To address these problems, this paper presents framework‐assisted task characterization and virtual time‐based CFS. The former is a cross‐layer resource control mechanism between the Android framework and the underlying Linux kernel. It identifies user‐interactive tasks at the framework‐level, by using the notion of a user‐interactive task chain. It then enables the kernel scheduler to selectively promote the priorities of worker tasks appearing in the task chain to reduce the preemption latency. The latter is a cross‐layer refinement of CFS in terms of interactivity. It allows a task to be preempted at every predefined period. It also adjusts the virtual runtimes of the identified user‐interactive tasks to ensure that they are always scheduled prior to the other tasks in the run‐queue when they wake up. As a result, the dispatch latency of a user‐interactive task is reduced to a small value. We have implemented our approach into Android 4.1.2 running with Linux kernel 3.0.31. Experimental results show that the response time of a user interaction is reduced by up to 77.35% while incurring only negligible overhead. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

17.

设计和实现基于UsbKey的透明加解密文件系统 总被引：2，自引：0，他引：2

刘威鹏胡俊刘毅《计算机科学》2008,35(11):100-103

加密文件系统是在操作系统级保护系统和防止用户敏感数据泄漏的一种有效手段。首先对几个著名的加密文件系统地进行了分析,并指出其在使用方式、密钥保护、配置管理以及性能方面存在的问题,而后从设计目标、系统组成、实现三个方面分析和讨论了一种基于UsbKey的透明加解密文件系统的实现方案,该方案基于Linux操作系统,使用UsbKey对于用户密钥进行安全存储和保护,并利用LSM（Linux Secure Module）机制,通过对于inode节点的读、写等关键操作的重定向,来实现文件的透明加解密的功能。本文对于Linux操作系统下的透明加解密文件系统的设计与实现具有一定的指导意义。相似文献

18.

Multirate sampled‐data stabilization for a class of low‐order lower‐triangular nonlinear systems

下载免费PDF全文

Jinping Jia Weisheng Chen Hao Dai 《国际强度与非线性控制杂志
》2018,28(6):2121-2130

This paper investigates the problem of sampled‐data controller design for a class of lower‐triangular systems in the p‐normal form (0<p<1). A multirate digital feedback control scheme is proposed to achieve the global strong stabilization of the sampled‐data closed‐loop system under some assumptions. In the design of the controller, the input‐Lyapunov matching strategy and multirate control approach are combined to obtain better stabilizing performance. Unlike the design method based on the approximate discrete‐time model, our controller is obtained from the exact discrete‐time equivalent model, which does not need to be computed completely. The approximate multirate digital controllers are proved to be effective in the practical implementation. It is shown that, compared with the emulated control scheme, our controller may provide faster decrease of Lyapunov function for each subsystem. This will lead to allow large sampling periods. An illustrative example is provided to verify the effectiveness of the proposed control scheme. 相似文献

19.

EXT2文件系统分析 总被引：1，自引：1，他引：0

包怀忠《计算机工程与设计》2005,26(4):1022-1024

Linux由于其源代码的开放性而得到越来越广泛的认同,针对其文件系统的编程成为其中最普遍最核心的应用。通过分析Linux文件系统源代码,给出了EXT2文件系统的硬盘布局,详细论述了超级块、组描述符、I节点、目录结构等与EXT2文件系统相关的数据结构,着重分析这些结构中的核心数据项以及它们之间的关系,并从编程的角度归纳了这些结构在文件系统中的作用。相似文献

20.

Profit data caching and hybrid disk‐aware Completely Fair Queuing scheduling algorithms for hybrid disks

Hsung‐Pin Chang Syuan‐You Liao Da‐Wei Chang Guo‐Wei Chen 《Software》2015,45(9):1229-1249

Recently, a hybrid disk drive that integrates a small amount of flash memory within a mechanical drive has received significant attention. The hybrid drive extends the storage hierarchy by using flash memory to cache data from the mechanical disk. Unfortunately, current caching architectures fail to fully exploit the potential of the hybrid drive. Furthermore, current disk input/output (I/O) schedulers are optimized for rotational mechanical disk drives and thus must be re‐targeted for the hybrid disk drive. In this paper, we propose a new data caching scheme, called Profit Caching, for hybrid drives. Profit Caching is a self‐optimizing caching algorithm. It considers and seamlessly integrates all possible data characteristics that impact the performance of hybrid drives, including read count, write count, sequentiality, randomness, and recency, to determine the caching policy. Moreover, we propose a hybrid disk‐aware Completely Fair Queuing (HA‐CFQ) scheduler to avoid unnecessary I/O anticipations of the CFQ scheduler. We have implemented Profit Caching and HA‐CFQ scheduler in the Linux kernel. Coupled with a trace‐driven simulator, we have also conducted detailed experiments under a variety of workloads. Experimental results show that Profit Caching provides significantly improved performance compared with the previous schemes. In particular, the throughput of Profit Caching outperforms previous Random Access First and FlashCache caching schemes by factors of up to 1.8 and 7.6, respectively. In addition, the HA‐CFQ scheduler reduces the total execution time of the CFQ scheduler by up to 1.74%. Finally, the experimental results show that the runtime overhead of Profit Caching is extremely insignificant and can be ignored. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献