期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

KAIST image computing system (KICS): A parallel architecture for real-time multimedia data processing

JaeHo Hyung-Sun GeonYoung HyunWook 《Journal of Systems Architecture》2000,46(15):1403-1418

An efficient parallel architecture is proposed for high-performance multimedia data processing using multiple multimedia video processors (MVP; TMS320C80), which are fully programmable general digital signal processors (DSP). This paper describes several requirements for a multimedia data processing system and the system architecture of an image computing system called the KAIST Image Computing System (KICS). The performance of the KICS is evaluated in terms of its I/O bandwidth and the execution time for some image processing functions. An application of the KICS to the real-time Moving Picture Expert Group 2 (MPEG-2) encoder is introduced. The programmability and the high-speed data-access capability of the KICS are its most important features as a high-performance system for real-time multimedia data processing. 相似文献

2.

An FPGA-based parallel architecture for on-line parameter estimation using the RLS identification algorithm

T. Ananthan M.V. VaidyanAuthor Vitae 《Microprocessors and Microsystems》2014

A parallel architecture for an on-line implementation of the recursive least squares (RLS) identification algorithm on a field programmable gate array (FPGA) is presented. The main shortcoming of this algorithm for on-line applications is its computational complexity. The matrix computation to update error covariance consumes most of the time. To improve the processing speed of the RLS architecture, a multi-stage matrix multiplication (MMM) algorithm was developed. In addition, a trace technique was used to reduce the computational burden on the proposed architecture. High throughput was achieved by employing a pipelined design. The scope of the architecture was explored by estimating the parameters of a servo position control system. No vendor dependent modules were used in this design. The RLS algorithm was mapped to a Xilinx FPGA Virtex-5 device. The entire architecture operates at a maximum frequency of 339.156 MHz. Compared to earlier work, the hardware utilization was substantially reduced. An application-specific integrated circuit (ASIC) design was implemented in 180 nm technology with the Cadence RTL compiler. 相似文献

3.

Memory requirements for parallel programs

F. Warren Burton David J. Simpson 《Parallel Computing》2000,26(13-14)

Parallel execution is normally used to decrease the amount of time required to run a program. However, the parallel execution may require far more space than that required by the sequential execution. Worse yet, the parallel space requirement may be very much more difficult to predict than the sequential space requirement because there are more factors to consider. These include essentially nondeterministic factors that can influence scheduling, which in turn may dramatically influence space requirements. We survey some scheduling algorithms that attempt to place bounds on the amount of time and space used during parallel execution. We also outline a direction for future research. This direction takes us into the area of functional programming, where the declarative nature of the languages can help the programmer to produce correct parallel programs, a feat that can be difficult with procedural languages. Currently the high-level nature of functional languages can make it difficult for the programmer to understand the operational behavior of the program. We look at some of the problems in this area, with the goal of achieving a programming environment that supports correct, efficient parallel programs. 相似文献

4.

基于布隆过滤器的新型混合内存架构磨损均衡策略

张震付印金胡谷雨《计算机应用》2018,38(8):2230-2235

相变存储器（PCM）凭借低功耗的优势有望成为新一代主存储器,但是耐受性的缺陷成为其广泛应用的重要障碍。现有的随机存取存储器（DRAM）缓存技术和磨损均衡分别从减少PCM写数量以及均匀化写操作分布两个角度延长PCM使用寿命,但前者在写回数据时未考虑数据的读写倾向性,后者在空间局部性较强的应用场景下存在数据交换粒度、空间开销、随机性等诸多问题。因此,设计一种全新的混合存储架构,结合最近最少使用（LRU）算法和带有时间变化的最不经常使用（LFU-Aging）算法提出区分数据读写倾向性的缓存策略,并且基于布隆过滤器（BF）设计针对强空间局部性工作集的动态磨损均衡算法,在有效减少冗余写操作的同时实现低空间开销的组间磨损均衡操作。实验结果表明,该策略能够减少PCM上13.4%~38.6%的写操作,同时有效均匀90%以上分组的写操作分布。相似文献

5.

The impact of scaling on a multimedia connection architecture

Eve M. Schooler 《Multimedia Systems》1993,1(1):2-9

As the last two meetings of the Internet Engineering Task Force have shown, the demand for Internet teleconferencing has arrived. Packet audio and video have now been multicast to approximately 170 different hosts in ten countries, and for upcoming meetings the number of remote participants is likely to be substantially larger. Yet the network infrastructure to support wide-scale packet teleconferencing is not in place. These experiments represent a departure from the two- to ten-site telemeetings that are the norm today. They represent an increase in scale of multiple orders of magnitude in several interrelated dimensions.This paper discusses the impact of scaling on our efforts to define a multimedia teleconferencing architecture. Three scaling dimensions of particular interest are: (1) very large numbers of participants per conference, (2) many simultaneous teleconferences, and (3) a widely dispersed user population. Here we present a strawman architecture and describe how conference-specific information is captured, then conveyed among end systems. We provide a comparison of connection models and outline the tradeoffs and requirements that change as we travel along each dimension of scale. In conclusion, we identify five critical needs for a scalable teleconferencing architecture. 相似文献

6.

A survey of memory architecture for 3D chip multi-processors

Yuang Zhang Li Li Zhonghai Lu Axel Jantsch Minglun Gao Hongbing Pan Feng Han 《Microprocessors and Microsystems》2014

3D chip multi-processors (3D CMPs) combine the advantages of 3D integration and the parallelism of CMPs, which are emerging as active research topics in VLSI and multi-core computer architecture communities. One significant potentiality of 3D CMPs is to exploit the diversity of integration processes and high volume of vertical TSV bandwidth to mitigate the well-known “Memory Wall” problem. Meanwhile, the 3D integration techniques are under the severe thermal, manufacture yield and cost constraints. Research on 3D stacking memory hierarchy explores the high performance and power/thermal efficient memory architectures for 3D CMPs. The micro-architectures of memories can be designed in the 3D integrated circuit context and integrated into 3D CMPs. This paper surveys the design of memory architectures for 3D CMPs. We summarize current research into two categories: stacking cache-only architectures and stacking main memory architectures for 3D CMPs. The representative works are reviewed and the remaining opportunities and challenges are discussed to guide the future research in this emerging area. 相似文献

7.

基于业务区分的无线多媒体传感器网络QoS体系结构

下载免费PDF全文

晏鳌白光伟曹磊《计算机工程与应用》2012,48(3):57-60

无线多媒体传感器网络（WMSNs）的网络服务质量（QoS）一直是人们关心的核心问题,然而目前WMSNs的QoS保障研究主要针对单个协议层或特定应用场景,缺少系统性的QoS体系框架研究。结合无线传感器网络自身特点,利用图论对网络进行建模。在此基础上,提出一种三层可计算QoS指标体系,并根据各种应用不同QoS需求将应用分为四类,设计出一种基于业务区分的无线多媒体传感器网络QoS体系结构（DQoSAW）。以传输MPEG视频流为例对DQoSAW进行验证,实验结果表明DQoSAW能够显著改进WMSNs的整体性能。相似文献

8.

Intermediate-level vision tasks on a memory array architecture

Poras T. Balsara Mary Jane Irwin 《Machine Vision and Applications》1993,6(1):50-65

With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at very high speed. A conventional von Neumann computer is not suitable for this purpose, because it takes a tremendous amount of time to solve most typical image analysis problems. Thus, it is now imperative to study computer vision in a parallel processing framework in order to reduce the processing time. In this paper we demonstrate the applicability of a simple memory array architecture to some intermediate-level computer vision tasks. This architecture, called theAccess Constrained Memory Array Architecture (ACMAA) has a linear array of processors which concurrently access distinct rows or columns of an array of memory modules. Because of its efficient local and global communication capabilities ACMAA is well suited for low-level as well as intermediate-level vision tasks. This paper presents algorithms for connected component labeling, determination of area, perimeter and moments of a labeled region, convex hull of a region, and Hough transform of an image. ACMAA is well suited to an efficient hardware implementation because it has a modular structure, simple interconnect and limited global control. 相似文献

9.

A throughput maximised parallel architecture for 2D fast Discrete Pascal Transform

M.M. Wong I. Hijazin 《Computers & Electrical Engineering》2010,36(3):585-591

In this paper, we present a fully pipelined parallel implementation of a two dimensional (2D) Discrete Pascal Transform (DPT). Our approach first makes use of the properties of the Kronecker product and the vec operation on matrices to form an alternate 2D DPT representation suitable for column parallel computation. Next, we lend ourselves to the results from Skodras’ work in 1D DPT to achieve the final architecture for fast 2D DPT. With a fully pipelined implementation, the architecture possesses an initial latency of 2(N-1) clock cycles and a maximum throughput of one complete two dimensional transform every clock cycle, given any input matrix of size N×N. To evaluate our work, our results obtained from actual FPGA implementation were benchmarked against results from other previous works. 相似文献

10.

A memory organization for parallel computers

F. A. Murzin V. A. Sluev 《New Generation Computing》1988,6(1):3-18

In this paper a computer memory system intended for storing an arbitrary sequence of multidimensional arrays is described. This memory system permits a parallel access to the cuts distinguished in the given array by fixing one of the coordinates and to the large set of parallelepipeds which are the same dimension subarrays of the given arrays. 相似文献

11.

An architecture for adaptive multimedia content delivery

Kaname Harumoto Tadashi Nakano Shinij Shimojo 《New Generation Computing》2000,18(4):375-389

In this paper, we propose an architecture for multimedia content delivery considering Quality of Service (QoS), based on both the policy-based network and the best-effort network. The architecture consists of four fundamental elements: multimedia content model, application level QoS policy, QoS adaptation mechanism, and delivery mechanism. Applications based on current architecture loses their meaning by drastically degrading quality when network congestion occurs. Despite of this all-or-nothing architecture, applications based on our adaptive architecture can reduce its quality and then negotiate with the network entity, keeping its quality measure as much as possible even when network congestion occurs. We may consider a quality measure for Web pages, total page transmission time, and transmission order of inline objects as a segregation. We then define a language to specify application level QoS policies for Web pages and implement a delivery mechanism and a QoS adaptation mechanism to fulfill these policies. Kaname Harumoto, Ph.D.: He received the M.E. and Ph.D. (Eng.) degrees from Osaka University, Osaka, Japan, in 1994 and 1998, respectively. From 1994 through 1999, he was with the Department of Information Systems Engineering, Grauuate School of Engineering, Osaka University. Since November 1999, he has been an Assistant Professor in Computation Center (currently, the name has changed to Cybermedia Center), Osaka University. His research interests include database systems, especially in advanced network environments. He is a member of IEEE. Tadashi Nakano: He received the B.E. degree from Osaka University in 1999. Currently, he is a Ph.D. candidate in Graduate School of Engineering, Osaka University. His current reeearch interests include multimedia content delivery architecture. Shinji SHIMOJO, Ph.D.: He received the M.E. and a Dr.E. degrees from Osaka University in 1983 and 1986, respectively. From 1986 through 1989, he was an Assistant Professor in the Department of Information and Computer Sciences, Faculty of Engineering Science, Osaka University. From 1989 through 1998, he was an Associate Professor and since 1998, he has been a Professor in Computation Center (currently, the name has changed to Cybermedia Center), Osaka University. He was engaged in the project of object-oriented multimedia presentation system called Harmony. His current interests cover wide diversity of multimedia applications such as News On Demand System, multimedia database and networked virtual reality. He is a member of ACM and IEEE. 相似文献

12.

动态关联神经网络存储器及其并联系统的理论分析和推导

陈洪《自动化与仪器仪表》2006,29(6):21-24

作者研究了n个神经子网络的并联系统（PCS），并联系统的存储容量能达到n的指数级，而神经元的个数只是n的线性函数。此外，并联神经网络存储器的纠错能力远远高于每一个子网络，收敛时间仅仅是所有子网络中的最长者。相似文献

13.

Synchronization barrier and related tools for shared memory parallel programming

Boris D. Lubachevsky 《International journal of parallel programming》1990,19(3):225-250

The synchronization barrier is a point in the program where the processing elements (PEs) wait until all the PEs have arrived at this point. In a reduction computation, given a commutative and associative binary operationop, one needs to reduce valuesa ₀,...,a _N-1, stored in PEs 0,...,N-1 to a single valuea _*=a ₀ op a, op...op a _N-1 and then to broadcast the resulta _* to all PEs. This computation is often followed by a synchronization barrier. Routines to perform these functions are frequently required in parallel programs. Simple and efficient, workingC-language routines for the parallel barrier synchronization and reduction computations are presented. The codes are appropriate for a CREW (concurrent-read-exclusive-write) or EREW parallel random access shared memory MIMD computer. They require only shared memory read and write; no locks, semaphores etc. are needed. The running time of each of these routines isO(logN). The amount of shared memory required and the number of shared memory accesses generated are botO(N). These are the asymptotically minimum values for the three parameters. The algorithms employ the obvious computational scheme involving a binary tree. Examples of applications for these routines and results of performance testing on the Sequent Balance 21000 computer are presented.An abstract of this article appeared inProc. 1989 Int. Conf. Parallel Processing, p. II-175. 相似文献

14.

Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

《Journal of Systems Architecture》2013,59(7):389-399

In multi-core Digital Signal Processing (DSP) Systems, the processor-memory gap remains the primary obstacle in improving system performance. This paper addresses this bottleneck by combining task scheduling and memory accesses so that the system architecture and memory modules of a multi-core DSP can be utilized as efficiently as possible. To improve the system and memory utilization, the key is to take advantage of locality as much as possible and integrate it into task scheduling. Two algorithms are proposed to optimize memory accesses while scheduling tasks with timing and resource constraints. The first one uses Integer Linear Programming (ILP) to produce a schedule with the most efficient memory access sequence while satisfying the constraints. The second one is a heuristic algorithm which can produce a near optimal schedule with polynomial running time. The experimental results show that the memory access cost can be reduced up to 60% while the schedule length is also shortened. 相似文献

15.

A novel conflict-free parallel memory access scheme for FFT constant geometry architectures

CuiMei Ma He Chen JiYang Yu Teng Long 《中国科学:信息科学(英文版)》2013,56(4):1-9

The challenges imposed by environmental issues, such as global warming and the energy crisis, are demanding more responsible energy usage, including in the optical networking field. In optical transmission networks, most of the electrical power is consumed by the optical-electrical-optical conversion in optical repeaters. Modern optical network control plane technologies allow idle optical repeaters to be put into a low-power sleep mode. Inspired by this, we propose a novel power-efficient routing and wavelength assignment (RWA) algorithm, called HTAPE. The HTAPE algorithm exploits the knowledge of the connection holding times to minimize the number of optical repeaters in the active mode, and hence reduce the total electricity consumption of the optical network. We test the new algorithm on the typical CERNET and USNET networks. Compared with traditional RWA algorithms without holding-time-awareness, it is observed that the HTAPE algorithm yields significant reductions in power consumption. 相似文献

16.

Solution of discrete-time optimal control problems on parallel computers

Stephen J. Wright 《Parallel Computing》1990,16(2-3):221-237

We describe locally-convergent algorithms for discrete-time optimal control problems which are amenable to multiprocessor implementation. Parallelism is achieved both through concurrent evaluation of the component functions and their derivatives, and through the use of a parallel solver which solves a linear system to find the step at each iteration.. Results from an implementation on the Alliant FX/8 are described. 相似文献

17.

The parallel ‘Deutschland-Modell’ - A message-passing version for distributed memory computers

Ulrich Schättler Elisabeth Krenzien 《Parallel Computing》1997,23(14):13-2226

The parallel ‘Deutschland-Modell’ and its implementation on distributed memory parallel computers using the message-passing library PARMACS 6.0 is described. Performance results on a Cray T3D are given and the problem of dynamical load imbalances is addressed. 相似文献

18.

Optimization of data exchange in parallel computers with distributed memory

E. V. Adutskevich N. A. Likhoded 《Cybernetics and Systems Analysis》2006,42(2):298-310

The problem of optimization of communications during the execution of a program on a parallel computer with distributed memory is investigated. Statements are formulated that make it possible to determine the possibility of organization of data broadcast and translation. The conditions proposed are represented in the form suitable for practical application and can be used for automated parallelization of programs. This work was done within the framework of the State Program of Fundamental Studies of the Republic of Belarus (under the code name “Mathematical structures 21”) with the partial support of the Foundation for Fundamental Studies of the Republic of Belarus (grant F03-062). __________ Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 166–182, March–April 2006. 相似文献

19.

An architecture and a data model for integrated multimedia documents and presentational applications

H. Khalfallah A. Karmouch 《Multimedia Systems》1995,3(5-6):238-250

相似文献

20.

Locally parallel cache design based on KL1 memory access characteristics

Akira Matsumoto Takayuki Nakagawa Masatoshi Sato Yasunori Kimura Kenji Nishida Atsuhiro Goto 《New Generation Computing》1991,9(2):149-169

The parallel inference machine (PIM) is now being developed at ICOT. It consists of a dozen or more clusters, each of which is a tightly coupled multiprocessor (comprising about eight processing elements) with shared global memory and a common bus. Kernel language 1 (KL1), a parallel logic programming language based on Guarded Horn Clauses (GHC), is executed on each PIM cluster. This paper describes the memory access characteristics in KL1 parallel execution and a locally parallel cache mechanism with hardware lock. The most important issue of locally parallel cache design is how to reduce common bus traffic. A write-back cache protocol having five cache states specially optimized for KL1 execution on each PIM cluster is described. We introduced new software controlled memory access commands, named DW, ER, and RP. A hardware lock mechanism is attached to the cache on each processor. This lock mechanism enables efficient word-by-word locking, reducing common bus traffic by using the cache states. 相似文献