首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
本文给出了求超椭圆曲线除子加法并行算法的一个易于实现的一般性方法,使用该方法得到的并行算法的并行轮数是最小的.将该方法应用于亏格为3的超椭圆曲线除子加法运算中,得到分别使用9和7个乘法处理器,可在15轮运算中实现除子加法和倍点运算的一个并行算法.  相似文献   

2.
We present a design methodology for real-time vision applications aiming at significantly reducing the design-implement-validate cycle time on dedicated parallel platforms. This methodology is based upon the concept of algorithmic skeletons, i.e., higher order program constructs encapsulating recurring forms of parallel computations and hiding their low-level implementation details. Parallel programs are built by simply selecting and composing instances of skeletons chosen in a predefined basis. A complete parallel programming environment was built to support the presented methodology. It comprises a library of vision-specific skeletons and a chain of tools capable of turning an architecture-independent skeletal specification of an application into an optimized, deadlock-free distributive executive for a wide range of parallel platforms. This skeleton basis was defined after a careful analysis of a large corpus of existing parallel vision applications. The source program is a purely functional specification of the algorithm in which the structure of a parallel application is expressed only as combination of a limited number of skeletons. This specification is compiled down to a parametric process graph, which is subsequently mapped onto the actual physical topology using a third-party CAD software. It can also be executed on any sequential platform to check the correctness of the parallel algorithm. The applicability of the proposed methodology and associated tools has been demonstrated by parallelizing several realistic real-time vision applications both on a multi-processor platform and a network of workstations. It is here illustrated with a complete road-tracking algorithm based upon white-line detection. This experiment showed a dramatic reduction in development times (hence the term fast prototyping), while keeping performances on par with those obtained with the handcrafted parallel version. Received: 22 July 1999 / Accepted: 9 November 2000  相似文献   

3.
Nested parallelism appears naturally in many applications. It is required whenever a function performing parallel statements needs to call a subroutine using parallelism. A particular case occurs when the function is recursive. Nested parallelism is to parallel programming as basic as nested loops to sequential programming. Despite this, most existing parallel languages do not provide this feature. This paper presents a new methodology to expand message passing libraries (MPL) with nested parallelism. The tool to support the methodology has processor virtualization, load balancing, pipeline parallelism and collective operations, among other features. The computational results prove that the performance obtained is comparable to that obtained using classical message passing programs. Since the methodology does not force the programmer to leave the MPL environment, all the efficiency and portability of the MPL model is preserved. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

4.
In this paper, a design methodology for synthesizing efficient parallel algorithms and VLSI architectures is presented. A design process starts with a problem definition specified in the parallel programming language Crystal and is followed by a series of program transformations in Crystal, each aiming at optimizing the target design for a specific purpose. To illustrate the design methodology, a set of design methods for deriving systolic algorithms and architectures is given and the use of these methods in the design of a dynamic programming solver is described. The design methodology, together with this particular set of design methods, can be viewed as a general theory of systolic designs (or multidimensional pipelines). The fact that Crystal is a general purpose language for parallel programming allows new design methods and synthesis techniques, properties and theorems about problems in specific application domains, and new insights into any given problem to be integrated readily within the existing design framework.  相似文献   

5.
A major difficulty in restructuring compilation, and in parallel programming in general, is how to compare parallel performance over a range of system and problem sizes. Execution time varies with system and problem size and an initially fast implementation may become slow when system and problem size scale up. This paper introduces the concept of range comparison. Unlike conventional execution time comparison in which performance is compared for a particular system and problem size, range comparison compares the performance of programs over a range of ensemble and problem sizes via scalability and performance crossing point analysis. A novel algorithm is developed to predict the crossing point automatically. The correctness of the algorithm is proven and a methodology is developed to integrate range comparison into restructuring compilations for data-parallel programming. A preliminary prototype of the methodology is implemented and tested under Vienna Fortran Compilation System. Experimental results demonstrate that range comparison is feasible and effective. It is an important asset for program evaluation, restructuring compilation, and parallel programming  相似文献   

6.
Parallel machines (mill/turn machining centers) provide a powerful and efficient machining alternative to the traditional sequential machining process. The underutilization of parallel machines due to their operating complexity has increased interest in developing an efficient methodology for sequencing the parallel machining operations. This paper presents a mixed integer programming model for sequencing parallel machining operations. A genetic-based algorithm for finding an optimal parallel operation sequence on parallel machines is proposed. Two new genetic operators for solving order-based genetic algorithms and computational experiments are also included.  相似文献   

7.
There are generally two main directions for the investigation and development of parallel manipulators, namely macro/meso stream and micro/nano stream, in which the former one has been thoroughly investigated in recent decades, while the latter one still remains many performance related open issues that significantly affect their application potentials in critical situations such as high-precision automated cell manipulation. Improving the overall performance of parallel manipulators is the bridge to connect the academia and industry for the great development and real-world application. This research is to develop a novel methodology called performance decomposition and integration for governing the design optimization process of complicated micromanipulator. A new five degrees-of-freedom (DOF) compliant hybrid parallel micromanipulator which is configured with five identical PSS limbs and one constraining UPU limb is proposed as a case study. The performance visualization, finite element analysis, and dimensional optimization are implemented. The proposed methodology is applicable for the design improvement of different kinds of compliant/parallel mechanisms.  相似文献   

8.
We present finite-element numerical simulations of seismic wave propagation in non linear inelastic geological media. We demonstrate the feasibility of large-scale modeling based on an implicit numerical scheme and a nonlinear constitutive model. We illustrate our methodology with an application to regional scale modeling in the French Riviera, which is prone to earthquakes. The PaStiX direct solver is used to handle large matrix numerical factorizations based on hybrid parallelism to reduce memory overhead. A specific methodology is introduced for the parallel assembly in the context of soil nonlinearity. We analyse the scaling of the parallel algorithms on large-scale configurations and we discuss the physical results.  相似文献   

9.
A practical methodology for evaluating and comparing the performance of distributed memory Multiple Instruction Multiple Data (MIMD) systems is presented. The methodology determines machine parameters and program parameters separately, and predicts the performance of a given workload on the machines under consideration. Machine parameters are measured using benchmarks that consist of parallel algorithm structures. The methodology takes a workload-based approach in which a mix of application programs constitutes the workload. Performance of different systems are compared, under the given workload, using the ratio of their speeds. In order to validate the methodology, an example workload has been constructed and the time estimates have been compared with the actual runs, yielding good predicted values. Variations in the workload are analysed in terms of increase in problem sizes and changes in the frequency of particular algorithm groups. Utilization and scalability are used to compare the systems when the number of processors is increased. It has been shown that performance of parallel computers is sensitive to the changes in the workload and therefore any evaluation and comparison must consider a given user workload. Performance improvement that can be obtained by increasing the size of a distributed memory MIMD system depends on the characteristics of the workload as well as the parameters that characterize the communication speed of the parallel system.  相似文献   

10.
This research defines and analyzes a methodology for deriving a performance model for SPMD hybrid parallel applications. Hybrid parallelism combines shared memory and message passing computing models. This work extends the current practice of application performance modelling by development of a methodology for hybrid applications with these procedures.
  • Creation of a model based on complexity analysis of an application code and its data structures.
  • Enhancement of a static complexity model by dynamic factors to capture execution time phenomena, such as memory hierarchy effects.
  • Quantitative analysis of model characteristics and the effects of perturbations in measured parameters.
These research results are presented in the context of a hybrid parallel implementation of a sparse linear algebra kernel. A model for this kernel is derived and analyzed using the methodology. Application of the model on two large parallel computing platforms provides case studies for the methodology. Operating system issues, machine balance factor, and memory hierarchy effects on model accuracy are examined. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.
A hardware and software methodology for the design of the three interactive levels of intelligent robotic systems is proposed. The organization level is modeled as an expert system, the coordination level as a loosely coupled parallel processing system and the execution level as a series of specific hardware components which execute specific tasks. Microprocessor-based configurations and discrete logic design techniques are utilized for the overall system hardware configuration. The proposed methodology, does not violate the system hierarchical structure. A case study demonstrates the feasibility of the approach.  相似文献   

12.
13.
A parallel processing methodology for high-speed dynamic simulation of controlled multibody mechanical systems is proposed for shared memory multiprocessor implementation. A dual-rate integration method is developed first to account for different time scales of mechanical and control subsystems and to employ different integration algorithms and step sizes that are best suited for individual subsystems. A parallel processing algorithm is designed for shared memory multiprocessors to exploit nested parallelism in a high as well as a medium level. Procedure dependency due to the recurrence relations in the Newton-Euler formulation is eliminated by utilizing a modified system graph that creates independent parallel threads. The effectiveness of the proposed approach is demonstrated using an example on an Alliant FX/8.  相似文献   

14.
可视化建模技术虽能降低并行程序设计的难度,但复杂的硬件结构仍使软件层面上的并行程序设计方法存在一定难度。为此,提出一种基于层级式建模思想的并行程序可视化建模方法和分层建模方案,设计和实现一个面向多层次集群环境的可视化建模系统e-ParaModel,用建模实例验证其可行性和实用性。  相似文献   

15.
A quantitative performance study of two-phase locking in a parallel database machine using a simulation-based methodology is described. The DBSIM simulation methodology uses models at two levels: a Petri net model at the higher level and a queuing network model at the lower level. The Petri net model captures the characteristics of parallelism and synchronization at the workload level, while the queuing network model captures queuing aspects of the system at the physical resource level. Transactions in a workload are specified using a performance-oriented specification language based on the transaction component graph, a data flow graph with database operators. The transaction specifications are translated into Petri net representations to derive the simulation experiments. The workload is a transaction taken from an order-entry application. A shared-nothing parallel machine architecture is assumed. Results of analysis of a two-phase locking strategy with machine sizes ranging from 4 to 256 processors are presented  相似文献   

16.
A methodology for modeling a system composed of parallel activities with synchronization points is proposed. Specifically, an approach based on a modular state-transition representation of a parallel system called the stochastic automata network (SAN) is developed. The state-space explosion is handled by a decomposition technique. The dynamic behavior of the algorithm is analyzed under Markovian assumptions. The transition matrix of the chain is automatically derived using tensor algebra operators, under a format which involves a very limited storage cost  相似文献   

17.
This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design, and the overall compiler design can be used by the implementors of compilers for HPF.  相似文献   

18.
This paper gives a systematic methodology for the formulation of parallel computation structures and algorithms. The fundamental definition of a computation structure is a graph where each node is the binding of an action to a data object and the arcs are the dependency relationships between the unit computations executed at the nodes. The structure of the graph is determined by the selection of elements for the model of computation in which the graph is expressed. An abstract machine is created by defining the resources including for example instruction sets for the processors, which realize the conceptual elements of the model of parallel computations. An algorithm is a mapping of the computation graph to the abstract machine and a program which traverses the mapped graph to execute the computation. The methodology proceeds by describing parallel computations on successively more fully specified abstract machines. A model of parallel computation is selected and an abstract machine implementing the model of computation is defined. Specification of increasingly resolved abstract machines is structured by both increasing the span of elements from the model of computation represented in the machine and be increasing the level of detail resolved for each element.  相似文献   

19.
Developed in this paper is a frequency domain design methodology for disturbance rejection in a MISO plant which has a special parallel structure. Shown in this paper is that it is not necessary to close all the parallel loops in order to achieve the hard time domain constraints. The proposed methodology is applied to the idle speed control of a fuel-injected engine.  相似文献   

20.
微控制器中ALU与移位逻辑的设计与改进   总被引:2,自引:0,他引:2       下载免费PDF全文
文章结合8位微控制器IP软核的设计,分析了指令系统的功能与特点,在算法级上对其处理器中数据路径进行了合理的调整与优化,并提出一种将ALU与移位逻辑并行设计的方法。较之于传统的串行设计方法而言,这种并行设计方法不仅描述简单,而且综合得到的电路降低了功耗,具有更快的运算速度,同时并不增加资源消耗。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号