首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper compares implementation strategies for function calls in compiled Lisp. We discuss various ways of allowing compiled call instructions to branch immediately to a callee's code (direct calls), rather than to refer to a symbol that points to the function's definition (indirect calls). We examine the performance of direct and indirect function calls on the VAX and MC68020, and on a RISC architecture—the SPUR multiprocessor. For the SPUR architecture, single indirection slows applications by 3–4%, and double indirection slows applications by 6–8%. The performance benefits of direct function calls are considerably smaller for the VAX and MC68020 architectures. We discuss also the costs and complexities involved in implementing direct function calls.This research was funded by DARPA contract number N00039-85-C-0269 as part of the SPUR research project.  相似文献   

2.
The Motorola MC68020 is one of the first 32-bit microprocessors. The 68020 is part of the 68000 family, which has a register-based architecture. In the 68020 a number of instructions and addressing modes, and some data types, have been implemented, to increase input and to help in the implementation of modular high-level languages and their associated constructs and data structures. The 68020 is made with the HCMOS process.  相似文献   

3.
K. M. Liu  E. L. Ortiz 《Computing》1989,41(3):205-217
We apply a recent new formulation of the Tau Method to reduce the numerical treatment of eigenvalue problems for ordinary and partialfunctional-differential equations to that of generalized algebraic eigenvalue problems. We find accurate numerical results through the use of a simple algorithm which we discuss in applications to several concrete examples. Extrapolation is used to refine the results already obtained.  相似文献   

4.
Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation \(^{\circledR }\) 3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at \(1280 \times 960\) resolution, 11.75 fps at \(640 \times 480\) resolution, and 62.52 fps at \(320 \times 240\) resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.  相似文献   

5.
The Chinese information processing system(CIPS)introduced in this paper can producegraphs,tables,flowcharts,mathematical equations,forms and also provides typesettingfacilities.The system can process not only Chinese text but also English text or a mixture ofthem.It is written in C language and runs on VAX Ⅱ/780 under Unix operating system.TheCIPS system is very easy to use and provides user-defined macro which allows abbreviationsof commonly used Chinese phrases and reduce the complexity of Chinese characters coding.  相似文献   

6.
More recently, personal area network connectivity in vehicles has evolved, bringing with it a new set of challenges and associated opportunities. Principal among these challenges is the last-inch problem. The telecommunications industry uses the term "last mile" to refer to the challenge of bringing high-bandwidth connectivity to individual homes. We use the term "last inch" to characterize the challenges of delivering computing services through in-vehicle human-machine interfaces (HMIs) to users who are sometimes driving at speeds upwards of 70 miles per hour. We focus on the nonfunctional requirements of security, privacy, usability, and reliability (SPUR)). These attributes both encompass safety concerns and offer insight into the consumer experience. We took the SPUR requirements into account in designing our vehicle consumer services interface (VCSI), a service-oriented middleware architecture that we've implemented in a demonstration vehicle  相似文献   

7.
Battery life is a major concern on portable devices like smartphones and tablet PCs. On these devices, games constitute the class of most popular applications and are at the same time highly compute-intensive. Every game consists of several states like the loading, main menu and the gaming state. Each of those states has its own workload characteristics, e.g., the loading phase is likely to be memory bound and the main menu state is less interactive than the gaming state. We propose an interception technique that allows to profile the game and detect its current state based on the game’s communication with the underlying OS. Current power management governors are unaware of the running applications and scale the processor’s voltage and frequency merely based on the system’s utilization. We provide the game’s state information and workload profile to our governor which selects the processing frequency such that the desired frame rate of the current state is ensured. This leads to an optimal choice of processing frequencies and thereby significantly reduces power consumption. We have implemented the scheme on an Android-based Samsung Galaxy Nexus smartphone using popular games like Jetpack Joyride and Temple Run. We reduced the CPU’s power consumption by up to 43.2 % compared to the Android interactive governor without impacting the gaming experience. Motivated by these results we propose a power management API that would allow game developers to significantly reduce the power consumption of their game using simple API calls.  相似文献   

8.
This paper comparatively evaluates the microarchitectural performance of two representative Computational Fluid Dynamics (CFD) applications on the Intel Many Integrated Core (MIC) product, the Intel Knights Corner (KNC) coprocessor, and the Intel Sand Bridge (SNB) processor. Performance Monitoring Unit-based measurement method is used, along with a two-phase measurement method and some considerations to minimize the errors and instabilities. The results show that the CFD applications are sensitive to architecture factors. Their single thread performance and efficiency on KNC are much lower than that on SNB. Branch prediction and memory access are two primary factors that make the performance difference. The applications’ low-computational intensity and inefficient vector instruction usage are two additional factors. To be more efficient for the CFD applications, the MIC architecture needs to improve its branch prediction mechanism and memory hierarchy. Fine tuning of application codes is also crucial and is hard work.  相似文献   

9.
The sequential Prolog machine PEK currently under development is described. PEK is an experimental machine designed for high speed execution of Prolog programs. The PEK machine is controlled by horizontal-type microinstructions. The machine includes bit slice microprocessor elements comprising a microprogram sequencer and ALU, and possesses hardware circuits for unification and backtracking. The PEK machine consists of a host processor (MC68000) and a backend processor (PEK engine). A Prolog interpreter has been developed on the machine and the machine performance evaluated. A single inference can be executed in 89 microinstructions, and execution speed is approximately 60–70 KLIPS.  相似文献   

10.
Prolog-X is an implemented portable interactive sequential Prolog system in which clauses are incrementally compiled for a virtual machine called the ZIP Machine. At present, the ZIP Machine is emulated by software, but it has been designed to permit easy implementation in microcode or hardware. Prolog-X running on the software-based emulator provides performance comparable with existing Prolog interpreters. To demonstrate its efficiency, compatibility, and comprehensiveness of implementation, Prolog-X has been used to compile and run several large applications programs. Several novel techniques are used in the implementation, particularly in the areas of the representation of therecordx database, the selection of clauses, and the compilation of arithmetic expressions.  相似文献   

11.
Radio frequency identification (RFID) tags have been widely deployed in many applications, such as supply chain management, inventory control, and traffic card payment. However, these applications can suffer from security issues or privacy violations when the underlying data-protection techniques are not properly designed. Hence, many secure RFID authentication protocols have been proposed. According to the resource usage of the tags, secure RFID protocols are classified into four types: full-fledged, simple, lightweight, and ultra-lightweight. In general, non-full-fledged protocols are vulnerable to desynchronization, impersonation, and tracking attacks, and they also lack scalability. If the tag resources allow more flexibility, full-fledged protocols seem to be an attractive solution. In this study, we examine full-fledged RFID authentication protocols and discuss their security issues. We then design a novel RFID authentication protocol based on elliptic curve cryptography, to avoid these issues. In addition, we present a detailed security analysis and a comparison with related studies; the results show that our scheme is more resistant to a variety of attacks and that it has the best scalability, while maintaining competitive levels of efficiency.  相似文献   

12.
The debuggers of Ref. 11) and most of their derivatives are of themeta-interpreter type. The computation of the debugger tracks the computation of the program to be diagnosed at the level of procedure call. This is adequate if the intuitive understanding of the programmer is in terms of procedure calls; as is indeed the case in Prolog. InLDL however, while the semantics of the language are described in a bottom-up, fixpoint model of computation,8) theactual execution of a program is a complex sequence of low-level procedure calls determined (and optimized) by the compiler. Consequently, a trace of these procedure calls is of little use to the programmer. Further, one cannot “execute” anLDL program as if it was a Prolog program; the program may simply not terminate in its Prolog reading and severalLDL constructs have no obvious Prolog counterparts. We identify the origin of a fault in theLDL program by a top-down, query/subquery approach. The basic debugger, implemented in Prolog, is a shell program between the programmer and theLDL program: it poses queries and uses the results to drive the interaction with the user. It closely resembles the one presented in Ref. 11). The core of a more sophisticated debugger is presented as well. Several concepts are introduced in order to quantify debugging abilities. One is that of agenerated interpretation, in which the structureless intended interpretation of Ref. 11) is augmented with causality. Another is the (idealized) concept of areliable oracle. We show that given an incorrect program and a reliable oracle which uses a generated interpretation, a cause for the fault will be found in finitely many steps. This result carries over to the more sophisticated debugger.  相似文献   

13.
In this paper we consider a deductive question-answering system for relational databases as a logic database system, and propose a knowledge assimilation method suitable for such a system. The concept of knowledge assimilation for deductive logic is constructed in an implementable form based on the notion of amalgamating object language and metalanguage. This concept calls for checks to be conducted on four subconcepts, provability, contradiction, redundancy, independency, and their corresponding internal database updates. We have implemented this logic database knowledge assimilation program in PROLOG, a logic programming language, and have found PROLOG suitable for knowledge assimilation implementation.  相似文献   

14.
This paper presents a highly parallel machine architecture for logic programs. We propose a Reduction-Based Parallel Inference Machine: PIM-R and describe the parallel execution mechanisms for PIM-R to run Prolog and Concurrent Prolog programs and sofware simulation results. PIM-R uses the structure-copy method. It also uses the only reducible goal copy method, a unique process-structuring method, and the reverse compaction method to decrease the amount, of copying and various copyingrelated operations and the number of packets passing through the network. PIM-R architecture features include the distributed shared memory for Concurrent Prolog, network nodes for efficient packet distribution, and the structure memory to store a part of structured data for reducing the copying overhead.  相似文献   

15.
16.
Ziyang Ma  Enhua Wu 《The Visual computer》2014,30(10):1133-1144
In this paper, we introduce a novel, real-time and robust hand tracking system, capable of tracking the articulated hand motion in full degrees of freedom (DOF) using a single depth camera. Unlike most previous systems, our system is able to initialize and recover from tracking loss automatically. This is achieved through an efficient two-stage k-nearest neighbor database searching method proposed in the paper. It is effective for searching from a pre-rendered database of small hand depth images, designed to provide good initial guesses for model based tracking. We also propose a robust objective function, and improve the Particle Swarm Optimization algorithm with a resampling based strategy in model based tracking. It provides continuous solutions in full DOF hand motion space more efficiently than previous methods. Our system runs at 40 fps on a GeForce GTX 580 GPU and experimental results show that the system outperforms the state-of-the-art model based hand tracking systems in terms of both speed and accuracy. The work result is of significance to various applications in the field of human–computer-interaction and virtual reality.  相似文献   

17.
High definition video applications often require heavy computation, high bandwidth and high memory requirements which make their real-time implementation difficult. Multi-core architecture with parallelism provides new solutions to implementing complex multimedia applications in real-time. It is well-known that the speed of the H.264 encoder can be increased on a multi-core architecture using the parallelism concept. Most of the parallelization methods proposed earlier for these purposes suffer from the drawbacks of limited scalability and data dependency. In this paper, we present a result obtained using data-level parallelism at the Group-Of-Pictures (GOP) level for the video encoder. The proposed technique involves each GOP being encoded independently and implemented on JM 18.0 using advanced data structures and OpenMP programming techniques. The performance of the parallelized video encoder is evaluated for various resolutions based on the parameters such as encoding speed, bit rate, memory requirements and PSNR. The results show that with GOP level parallelism, very high speed up values can be achieved without much degradation in the video quality.  相似文献   

18.
1cc is a retargetable, production compiler for ANSI C; it has been ported to the VAX, Motorola 68020, SPARC, and MIPS R3000, and some versions have been in use for over a year and a half. It is smaller and faster than generally available alternatives, and its local code is comparable. This paper describes the interface between the target-independent front end and the target-dependent back ends. The interface consists of shared data structures, a few functions, and a dag language. While this approach couples the front and back ends tightly, it results in efficient, compact compilers. The interface is illustrated by detailing a code generator that emits naive VAX code.  相似文献   

19.
Chee -Keng Yap 《Algorithmica》1988,3(1-4):279-288
We give a parallel method for triangulating a simple polygon by two (parallel) calls to the trapezoidal map computation. The method is simpler and more elegant than previous methods. Along the way we obtain an interesting partition of one-sided monotone polygons. Using the best-known trapezoidal map algorithm, ours run in timeO(logn) usingO(n) CREW PRAM processors.  相似文献   

20.
Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号