期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Two versions of architectures for dynamic implied addressing mode

Jonghee M. Youn Minwook Ahn Yunheung Paek Jongwung Kim Jeonghun Cho 《Journal of Systems Architecture》2010,56(8):368-383

The complexity of today’s embedded applications increases with various requirements such as execution time, code size or power consumption. To satisfy these requirements for performance, efficient instruction set design is one of the important issues because an instruction customized for specific applications can make better performance than multiple instructions in aspect of fast execution time, decrease of code size, and low power consumption. Limited encoding space, however, does not allow adding application specific and complex instructions freely to the instruction set architecture. To resolve this problem, conventional architectures increases free space for encoding by trimming excessive bits required beyond the fixed word length. This approach however shows severe weakness in terms of the complexity of compiler, code size and execution time. In this paper, we propose a new instruction encoding scheme based on the dynamic implied addressing mode (DIAM) to resolve limited encoding space and side-effect by trimming. We report our two versions of architectures to support our DIAM-based approach. In the first version, we use a special on-chip memory to store extra encoding information. In the second version, we replace the memory by a small on-chip buffer along with a special instruction. We also suggest a code generation algorithm to fully utilize DIAM. In our experiment, the architecture augmented with DIAM shows about 8% code size reduction and 18% speed up on average, as compared to the basic architecture without DIAM. 相似文献

2.

Developing a custom DSP for vision based human computer interaction applications

Shin Jangseop Kim MoonKwon Paek Yunheung Ko Kwangman 《Multimedia Tools and Applications》2018,77(22):30051-30065

Multimedia Tools and Applications - As the computing power of modern devices become greater, computer vision is increasingly adopted as the means of human-computer interaction. The industry is... 相似文献

3.

Unified Interprocedural Parallelism Detection

Jay P. Hoeflinger Yunheung Paek Kwang Yi 《International journal of parallel programming》2001,29(2):185-215

相似文献

4.

Efficient embedded code generation with multiple load/store instructions

Yunheung Paek Minwook Ahn Doosan Cho Taehwan Kim 《Software》2007,37(11):1133-1159

In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single‐instruction multiple‐data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

5.

DADE: a fast data anomaly detection engine for kernel integrity monitoring

Yi Hayoon Cho Yeongpil Paek Yunheung Ko Kwangman 《The Journal of supercomputing》2019,75(8):4575-4600

The Journal of Supercomputing - In computer systems, ensuring the integrity of the kernel assumes importance as attacks against the kernel allow an adversary to obtain the highest privilege within... 相似文献

6.

Fast graph‐based instruction selection for multi‐output instructions

Jonghee M. Youn Jongwon Lee Yunheung Paek Jongeun Lee Hanno Scharwaechter Rainer Leupers 《Software》2011,41(6):717-736

相似文献

7.

Optimization techniques to enable execution offloading for 3D video games

Donghyun Kwon Seungjun Yang Yunheung Paek Kwangman Ko 《Multimedia Tools and Applications》2017,76(9):11347-11360

Nowadays, mobile devices are becoming the most popular computing device as their computing capabilities increase rapidly. However, it is still challenging to execute highly sophisticated applications such as 3D video games on mobile devices due to its constrained key computational resources. Execution offloading approaches have been proposed to resolve this problem by strengthening mobile devices with powerful cloud. Unfortunately, the existing offloading approaches are not suitable for 3D video games because of the unique execution characteristics of them. In this paper, we propose a streaming-based execution offloading framework to enable execution offloading for 3D video games. The experiments show that our framework successfully guarantees 20 frames per second for our benchmark. 相似文献

8.

Compiler transformations for effectively exploiting a zero overhead loop buffer

Gang‐Ryung Uh Yuhong Wang David Whalley Sanjay Jinturkar Yunheung Paek Vincent Cao Chris Burns 《Software》2005,35(4):393-412

A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times without incurring any loop overhead. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. We have found that many common code improving transformations used by optimizing compilers on conventional architectures can be easily used to (1) allow more loops to be placed in a ZOLB, (2) further reduce loop overhead of the loops placed in a ZOLB, and (3) avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural feature can often be exploited with substantial improvements in execution time and slight reductions in code size for various signal processing applications. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

9.

Code optimizations for a VLIW-style network processing unit

Jinhwan Kim Yunheung Paek Gangryung Uh 《Software》2004,34(9):847-874

The explosive growth in network bandwidth and Internet services such as QoS (quality of service) and SLA (service level agreement) monitoring have created the need for new networking hardware called a Network Processing Unit (NPU). In order to rapidly reconfigure the NPU for frequently varying Internet services and technologies, a high-performance C compiler is urgently needed. Several code generation techniques, which are intended to meet the high code quality demands of other types of application specific instruction-set processors (ASIPs) like digital signal processors (DSPs), have already been developed. However, these techniques are insufficient for NPUs due to striking architectural differences such as asymmetric data paths. The main purpose of this paper is to discuss our recent experience with the development of a commercial compiler for a new NPU called the Paion PPII, which is basically a packet engine for NPU to meet the growing need for new high-bandwidth communication equipment targeted for Internet routers and ethernet adapters. For this purpose, we will first show the architectural challenges posed by the target NPU. Then, we will describe several compiler techniques that we found to be effective for the target NPU with various unorthogonal architectural features. The current implementations of the PPII use a VLIW (Very Long Instruction Word) architecture. So, we handled this VLIW-style architecture by employing a simple code compaction scheme which packs multiple parallel instructions into one long instruction word. The experimental results show that our techniques are effective for significantly reducing the dynamic instruction count. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

10.

A proof method for the correctness of modularized 0CFA

Oukseh Lee Kwangkeun Yi Yunheung Paek 《Information Processing Letters》2002,81(4):179-185

相似文献