期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李增智韩冬王建国李钢《计算机工程与设计》2002,23(2):47-50

随着网络规模的迅速扩大，传统的集中式的网络的缺陷已经凸现，一个灵活高效的网络管理平台应该是分布式，可编程的，文章探讨了把主动网技术应用到网络管理中来实现高效的分布式网络管理，提出了一种新的网络体系结构。相似文献

2.

Performance linked dynamic cache tuning: A static energy reduction approach in tiled CMPs

《Microprocessors and Microsystems》2017

Advancement in semiconductor technology increases power density in recent Chip Multi-Processors (CMPs) which significantly increases the leakage energy consumptions of on-chip Last Level Caches (LLCs). Performance linked dynamic tuning in LLC size is a promising option for reducing the cache leakage.This paper reduces static power consumption by dynamically shutting down or turning on cache banks based upon system performance and cache bank usage statistics. Shutting down of a cache bank remaps its future requests to another active bank, called as target bank. The proposed method is evaluated on three different implementation policies, viz (1) The system can decide to shutdown or turn-on some cache banks periodically throughout the process execution. (2) The system allows to shutdown banks initially and once the bank restarting initiates, no more shutdown is permitted further. (3) This policy resizes cache like first policy with some predefined time slices, in which cache cannot be resized.For a 4MB 4 way set associative L2 cache, experimental analysis shows 66% reduction in static energy with 29% gain in Energy Delay Product (EDP) for first strategy; for the second policy, static power is reduced by 59% with 27% savings in EDP. Finally, last policy saves 65% in static power and 30% in EDP with minimal performance penalty. 相似文献

3.

An efficient code generation technique for tiled iteration spaces

Goumas G. Athanasaki M. Koziris N. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(10):1021-1034

This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multilevel memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work, especially when nonrectangular tile shapes and iteration space bounds are concerned. Our method considerably enhances previous work on rewriting tiled loops, by considering parallelepiped tiles and arbitrary iteration space shapes. In order to generate tiled code, we first enumerate all tiles containing points within the iteration space and, second, sweep all points within each tile. For the first subproblem, we refine upon previous results concerning the computation of new loop bounds of an iteration space that has been transformed by a nonunimodular transformation. For the second subproblem, we transform the initial parallelepiped tile into a rectangular one, in order to generate efficient code with the aid of a nonunimodular transformation matrix and its Hermite Normal Form (HNF). Experimental results show that the proposed method significantly accelerates the compilation process and generates much more efficient code. 相似文献

4.

Design of an efficient communication infrastructure for highly contended locks in many-core CMPs

José L. Abellán Juan Fernández Manuel E. Acacio 《Journal of Parallel and Distributed Computing》2013

Lock synchronization is a key programming primitive for shared-memory many-core CMPs. However, as the number of cores increases, conventional software implementations cannot meet the desirable levels of performance and scalability. Meanwhile, most existing hardware-supported lock proposals require modifications at some level of the memory hierarchy, thus degrading QoS of applications through synchronization traffic. 相似文献

5.

A design methodology for efficient application-specific on-chip interconnects 总被引：1，自引：0，他引：1

Ho W.H. Pinkston T.M. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(2):174-190

As the level of chip-integration continues to advance at a fast pace, the desire for efficient interconnects - whether on-chip or off-chip - is rapidly increasing. Traditional interconnects like buses, point-to-point wires, and regular topologies may suffer from poor resource sharing in the time and space domains, leading to high contention or low resource utilization. In this paper, we propose a design methodology for constructing networks for special-purpose computer systems with well-behaved (known) communication characteristics. A temporal and spatial model is proposed to define the sufficient condition for contention-free communication. Based upon this model, a design methodology using a recursive bisection technique is applied to systematically partition a parallel system such that the required number of links and switches is minimized while achieving low contention. Results show that the design methodology can generate more optimized on-chip networks with up to 60 percent fewer resources than meshes or tori while providing blocking performance closer to that of a fully connected crossbar. 相似文献

6.

RACMan: Replication-aware cache management for manycore CMPs with private LLCs

《Microprocessors and Microsystems》2017

The last level cache (LLC) in private configurations offer lower latency and isolation but extinguishes the possibility of sharing underutilized cache resources. Cooperative Caching (CC) provides capacity sharing by spilling a line evicted from one cache to another. However, CC proposals did not pay enough attention to the natural problem of private LLC, replication. The static policies either indulging the replicated blocks (replicas) in or excluding them out of LLC invariably are deficient for the complex cache capacity situations in manycore environment. In this paper, we present replication-aware cache management (RACMan) to optimize replication for private configurations. RACMan relies on a novel coarse-grained low-overheard mechanism PBFP that monitors and predicts the replica reusability to dynamically adjust LLC insertion policies giving replicas different positions of LRU chain and chances of survival in LLC according to the prediction. Experiment results show our proposal is competent to optimize replication by performing better than two baseline systems in the respects of L2 Hit Rate, Network Traffics, IPC, and Dynamic Energy. RACMan fulfils the requirements of manycore CMPs with private LLC for increasing system performance, area efficiency, and scalability. 相似文献

7.

Fast and efficient extraction algorithm for high-speed interconnects with arbitrary boundaries

Rohit Sharma T. Chakravarty K. Choi 《The Journal of supercomputing》2012,62(1):251-264

In this paper, we present an accurate and general interconnect model for planar transmission line interconnects with arbitrary boundary conditions. Based on the unified approach, we develop a SPICE-compatible parameter extraction algorithm that can be used in high-performance computer-aided-design applications. A range of multilayered interconnect geometries with arbitrary boundaries are analyzed. Different typical configurations of ground placement are considered to verify the applicability of this method. For all such cases, results are compared for admittance, line parameters, and delay giving physical insight on the effect of boundary conditions on them. Compared with existing industry standard numerical field-solvers, like HFSS, the proposed model demonstrates more than 10× speedup within 2% accuracy. 相似文献

8.

Exploiting the JPEG compression scheme for image retrieval 总被引：10，自引：0，他引：10

Shneier M. Abdel-Mottaleb M. 《IEEE transactions on pattern analysis and machine intelligence》1996,18(8):849-853

We address the problem of retrieving images from a large database using an image as a query. The method is specifically aimed at databases that store images in JPEG format, and works in the compressed domain to create index keys. A key is generated for each image in the database and is matched with the key generated for the query image. The keys are independent of the size of the image. Images that have similar keys are assumed to be similar, but there is no semantic meaning to the similarity 相似文献

9.

Exploiting multi-scale support vector regression for image compression

Bin Danian Lifeng Shiqiang 《Neurocomputing》2007,70(16-18):3068

Unlike traditional neural networks that require predefined topology of the network, support vector regression (SVR) approach can model the data within the given level of accuracy with only a small subset of the training data, which are called support vectors (SVs). This property of sparsity has been exploited as the basis for image compression. In this paper, for still image compression, we propose a multi-scale support vector regression (MS-SVR) approach, which can model the images with steep variations and smooth variations very well resulting in good performance. We test our proposed MS-SVR based algorithm on some standard images. The experimental results verify that the proposed MS-SVR achieves better performance than standard SVR. And in a wide range of compression ratio, MS-SVR is very close to JPEG in terms of peak signal-to-noise ratio (PSNR) but exhibits better subjective quality. Furthermore, MS-SVR even outperforms JPEG on both PSNR and subjective quality when the compression ratio is higher enough, for example 25:1 for Lena image. Even when compared with JPEG-2000, the results show greatly similar trend as those in JPEG experiments, except that the compression ratio is a bit higher where our proposed MS-SVR will outperform JPEG-2000. 相似文献

10.

Memory efficient and scalable address mapping for flash storage devices

《Journal of Systems Architecture》2014,60(4):357-371

相似文献

11.

An efficient message scheduling algorithm for WDM lightwave networks

《Computer Networks》1999,31(20):2139-2152

Two important issues that need to be addressed when designing medium access control (MAC) protocols for Wavelength Division Multiplexing networks are message sequencing and channel assignment. Channel assignment addresses the problem of choosing an appropriate data channel via which a message is transmitted. This problem has been addressed extensively in the literature. On the other hand, message sequencing, which addresses the order in which messages are sent, has rarely been addressed. In this paper, we propose a new reservation-based message scheduling algorithm called RO-EATS that addresses both the channel assignment and message sequencing during its scheduling process. We formulate an analytical model and conduct extensive simulations to evaluate the performance of this algorithm. We compare the performance results of a well-known algorithm which only addresses the channel assignment issue with those of our new algorithm. The comparison shows that our new algorithm gives significant improvement over scheduling algorithms that do not consider message sequencing. As a result, we anticipate that these research results will lead to new approaches to message scheduling on WDM networks. 相似文献

12.

Designing efficient irregular networks for heterogeneous systems-on-chip

Christian Norbert 《Journal of Systems Architecture》2008,54(3-4):384-396

Networks-on-chip will serve as the central integration platform in future complex systems-on-chip (SoC) designs, composed of a large number of heterogeneous processing resources. Most researchers advocate the use of traditional regular networks like meshes, tori or trees as architectural templates which gained a high popularity in general-purpose parallel computing. However, most SoC platforms are special-purpose tailored to the domain-specific requirements of their application. They are usually built from a large diversity of heterogeneous components which communicate in a very specific, mostly irregular way.

In this work, we propose a methodology for the design of customized irregular networks-on-chip, called INoC. We take advantage of a priori knowledge of the communication characteristic of the application to generate an optimized network topology and routing algorithm. We show that customized irregular networks are clearly superior to traditional regular architectures in terms of performance at comparable implementation costs for irregular workloads. Even more, they inherently offer a high degree of scalability and expansibility which allows to adapt the network to an arbitrary number of nodes with a given communication demand. This can normally not be accomplished by traditional approaches. 相似文献

13.

Automatic synthesis of compression techniques for heterogeneous files

William H. Hsu Amy E. Zwarico 《Software》1995,25(10):1097-1116

We present a compression technique for heterogeneous files, those files which contain multiple types of data such as text, images, binary, audio, or animation. The system uses statistical methods to determine the best algorithm to use in compressing each block of data in a file (possibly a different algorithm for each block). The file is then compressed by applying the appropriate algorithm to each block. We obtain better savings than possible by using a single algorithm for compressing the file. The implementation of a working version of this heterogeneous compressor is described, along with examples of its value toward improving compression both in theoretical and applied contexts. We compare our results with those obtained using four commercially available compression programs, PKZIP, Unix compress, Stufflt, and Compact Pro, and show that our system provides better space savings. 相似文献

14.

Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system

Qiang Wu Canqun Yang Tao Tang Liquan Xiao 《Journal of Parallel and Distributed Computing》2013

Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics processing units (GPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in GPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH-1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations. 相似文献

15.

CCS-MAC: Exploiting the overheard data for compression in wireless sensor networks

Y. Peng Hu R. Li S. Wang Zhou Y. Ping Lin 《Computer Communications》2011,34(14):1696-1707

Both the overhearing and overhearing avoidance in a densely distributed sensor network may inevitably incur considerable power consumption. In this paper we propose a so-called CCS-MAC (collaborative compression strategy-based MAC) MAC protocol which facilitates to exploit those overheard data that is treated useless in traditional MAC protocols for the purpose of cost and energy savings. Particularly the CCS-MAC enables different sensor nodes to perform data compression cooperatively with regard to those overheard data, so that the redundancy of data prepared for the link layer transmission can be totally eliminated at the earliest. The problem of collaborative compression is analyzed and discussed along with a corresponding linear programming model formulated. Based on it a heuristic node-selection algorithm with a time complexity of (O(N²)) is proposed to the solve the linear programming problem. The node-selection algorithm is implemented in CCS-MAC at each sensor node in a distributed manner. The experiment results verify that the proposed CCS-MAC scheme can achieve a significant energy savings so as to prolong the lifetime of the sensor networks so far. 相似文献

16.

StreamQCTree:一种流数据方压缩结构

下载免费PDF全文

甘亮刘东红贾焰韩伟红《计算机工程与应用》2011,47(19):140-143

数据流管理系统计算聚集查询结果保存在内存中形成流数据方(StreamCube),提供快速、精确的在线OLAP查询。有限的内存空间需要一种有效的存储方法来存储更大时间窗口的流数据方。提出一种基于QC-Tree结构的流数据方StreamQCTree生成、裁剪及查询方法。将QC-Tree结构中上界集划分为基本上界类和附加上界类;并分析附加上界类的成本计算模型;根据该模型在固定存储空间下,采用动态选择物化结点的方案选择物化部分附加上界类,使对StreamQCTree的平均查询响应时间最小。实验表明,StreamQCTree能够有效地访问数据方且获得较好的压缩效果。相似文献

17.

Analysis and design on efficient message relay methods in VANET

Donggeun Lee Sang-woo Chang Sang-sun Lee 《Multimedia Tools and Applications》2015,74(16):6331-6340

相似文献

18.

A message efficient intersection control algorithm for intelligent transportation in smart cities

《Future Generation Computer Systems》2017

IoV based traffic control at intersections has been recently studied widely to realize intelligent transportation. However, existing solutions usually suffer from high communication cost, which may cause serious packet interference and long time delays. In this paper, we design a new algorithm to realize intersection control via vehicular ad hoc networking. We basically adopt the approach of mutual exclusion, which can let vehicles at an intersection compete for the privilege of passing via message exchange. Different from existing works, we adopt a group based privilege competition design. By letting only group head handling requests from other lanes, message cost can be significantly reduced. Reducing communication cost is a significant issue in IoV based intersection control because high communication cost will cause packet interference and packet losses, which further result in safety problems. The key challenges lie in deterring group size and recognizing group head. Compared similar works, our new algorithm can conduct intersection control with much less message cost, and its advantage is validated by simulations using ns3. 相似文献

19.

Real-time compression architecture for efficient coding in autostereoscopic displays

D. P. Chaikalis N. P. Sgouros D. E. Maroulis M. S. Sangriotis 《Journal of Real-Time Image Processing》2010,5(1):45-56

Integral imaging is a promising technique for delivering high-quality three-dimensional content. However, the large amounts of data produced during acquisition prohibits direct transmission of Integral Image data. A number of highly efficient compression architectures are proposed today that outperform standard two-dimensional encoding schemes. However, critical issues regarding real-time compression for quality demanding applications are a primary concern to currently existing Integral Image encoders. In this work we propose a real-time FPGA-based encoder for Integral Image and integral video content transmission. The proposed encoder is based on a highly efficient compression algorithm used in Integral Imaging applications. Real-time performance is achieved by realizing a pipelined architecture, taking into account the specific structure of an Integral Image. The required memory access operations are minimized by adopting a systolic concept of data flow through the core processing elements, further increasing the performance boost. The encoder targets, real-time, broadcast-type high-resolution Integral Image and video sequences and performs three orders of magnitude faster than the analogous software approach. 相似文献

20.

A message combining approach for efficient array redistribution in non-all-to-all communication networks

《国际计算机数学杂志》2012,89(11):1609-1619

The Array redistribution problem is the heart of a number of applications in parallel computing. This paper presents a message combining approach for scheduling runtime array redistribution of one-dimensional arrays. The important contribution of the proposed scheme is that it eliminates the need for local data reorganization, as noted by Sundar in 2001; the blocks destined for each processor are combined in a series of messages exchanged between neighbouring nodes, so that the receiving processors do not need to reorganize the incoming data blocks before storing them to memory locations. Local data reorganization is of great importance, especially in networks where there is no direct communication between all nodes (like tori, meshes, and trees). Thus, a block must travel through a number of relays before reaching the target processor. This requires a higher number of messages generated, therefore, a higher number of data permutations within the memory of each target processor should be made to assure correct data order. The strategy is based on a relation between groups of communicating processor pairs called superclasses. 相似文献