共查询到20条相似文献,搜索用时 15 毫秒
1.
Hanho LeeAuthor Vitae Gerald E. SobelmanAuthor Vitae 《Computers & Electrical Engineering》2003,29(2):357-377
As field programmable gate array (FPGA) technology has steadily improved, FPGAs are now viable alternatives to other technology implementations for high-speed classes of digital signal processing (DSP) applications. Digit-serial DSP architectures have been effective implementation method for FPGAs. In this work, a method of implementing digit-serial DSP architectures on FPGAs is presented, and their performance is evaluated with the objective of finding and developing the most efficient digit-serial DSP architectures on FPGAs. This paper discusses area costs and operational delays of the various digit-serial DSP functions and presents the area/delay models on Xilinx XC4000-series FPGAs. These area/delay models can make predictions of performance and hardware resource utilization before a lengthy layout and synthesis process is undertaken. The results show that the area/delay models proposed here are valid and the digit-serial DSP designs are promising candidates for efficient FPGA implementations. 相似文献
2.
Timed event-graphs, a special class of timed Petri nets, are used for modelling and analyzing job-shop systems. The modelling allows the steady-state performance of the system to be evaluated under a deterministic and cyclic production process. Given any fixed processing times, the productivity (i.e., production rate) of the system can be determined from the initial state. It is shown in particular that, given any desired product mix, it is possible to start the system with enough jobs in process so that some machines will be fully utilized in steady-state. These machines are called bottleneck machines, since they limit the throughput of the system. In that case, the system works at the maximal rate and the productivity is optimal. The minimal number of jobs in process allowing optimal functioning of the system is further specified as an integer linear programming problem. An efficient heuristic algorithm is developed to obtain a near-optimal solution 相似文献
3.
4.
5.
Subhash Saini Robert Ciotti Brian T.N. Gunney Thomas E. Spelce Alice Koniges Don Dossa Panagiotis Adamidis Rolf Rabenseifner Sunil R. Tiyyagura Matthias Mueller 《Journal of Computer and System Sciences》2008,74(6):965-982
The HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers—SGI Altix BX2, Cray X1, Cray Opteron Cluster, Dell Xeon Cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC Benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks results to study the performance of 11 MPI communication functions on these systems. 相似文献
6.
Hongsuk Yi 《Computer Physics Communications》2011,(1):263-265
GAIA is a recently developed IBM POWER6 supercomputer consisting of 24 SMP compute nodes with 64-way processors each, and it is currently ranked 393 on the Top500 supercomputer list published in November 2009. In this paper, we present the performance characteristics of GAIA evaluated by interconnecting the 24 computing nodes with low-latency InfiniBand network. We evaluate the performance of the new dual-core Power 595 system in terms of increased problem size using the multi-zone versions of the NAS Parallel Benchmarks. 相似文献
7.
Cloud computing is a very attractive research topic. Many studies have examined the infrastructure as a service and software as a service aspects of cloud computing; however, few studies have focused on platform as a service (PaaS). According to recent reports, demand for enterprise PaaS solutions will increase continuously. However, different sectors require different types of PaaS applications and computing resources. Therefore, an evaluation and ranking framework for PaaS solutions according to application needs is required. To address this need, this study presents the most essential aspects of PaaS solutions and provides a framework for evaluating the performance of PaaS providers. It also proposes a suitable set of benchmarking algorithms that can help determine the most appropriate PaaS provider based on different resource needs and application requirements. Performance evaluations of three well-known cloud computing PaaS providers were conducted using the analytic hierarchy process and the logic scoring of preference methods. 相似文献
8.
9.
Rod Fatoohi Ken Kardys Sumy Koshy Soundarya Sivaramakrishnan Jeffrey S. Vetter 《Parallel Computing》2006,32(11-12):794
We study the performance of high-speed interconnects using a set of communication micro-benchmarks. The goal is to identify certain limiting factors and bottlenecks with these interconnects. Our micro-benchmarks are based on dense communication patterns with different communicating partners and varying degrees of these partners. We tested our micro-benchmarks on five platforms: an IBM system of 68-node 16-way Power3, interconnected by a SP switch2; another IBM system of 264-node 4-way Power PC 604e, interconnected by an SP switch; a Compaq cluster of 128-node 4-way ES40/EV67 processor, interconnected by an Quadrics interconnect; an Intel cluster of 16-node dual-CPU Xeon, interconnected by an Quadrics interconnect; and a cluster of 22-node Sun Ultra Sparc, interconnected by an Ethernet network. Our results show many limitations of these networks including the memory contention within a node as the number of communicating processors increased and the limitations of the network interface for communication between multiple processors of different nodes. 相似文献
10.
Active schedule is one of the most basic and popular concepts in production scheduling research. For identical parallel machine scheduling with jobs’ dynamic arrivals, the tight performance bounds of active schedules under the measurement of four popular objectives are respectively given in this paper. Similar analysis method and conclusionscan be generalized to static identical parallel machine and single machine scheduling problem. 相似文献
11.
Fei Chen O'Neil T.W. Sha E.H.-M. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(6):604-614
In this paper, a method combining the loop pipelining technique with data prefetching, called Partition Scheduling with Prefetching (PSP), is proposed. In PSP, the iteration space is first divided into regular partitions. Then a two-part schedule, consisting of the ALU and memory parts, is produced and balanced to produce high throughput. These two parts are executed simultaneously, and hence, the remote memory latencies are overlapped. We study the optimal partition shape and size so that a well-balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions 相似文献
12.
Franz Schreier 《Computer Physics Communications》2006,174(10):783-792
The efficient evaluation of numerous values of functions that vary rapidly only in a small part of the region of interest is presented. An optimized algorithm using a sequence of grids with increasing resolution is developed. The algorithm does not make any assumptions about special properties of the function to be evaluated, e.g., symmetry. An additional speed-up is obtained by exploiting the asymptotic behaviour of the functions to be summed. Two applications from high resolution atmospheric radiative transfer modelling in the infrared and microwave are presented. In a third example asymmetric Rautian line shapes important for high resolution molecular spectroscopy are considered. Computational gains by more than two orders of magnitude with relative errors less than 10−3 have been achieved. 相似文献
13.
《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2006,36(4):774-785
This paper proposes a linear belief function (LBF) approach to evaluate portfolio performance. By drawing on the notion of LBFs, an elementary approach to knowledge representation in expert systems is proposed. It is shown how to use basic matrices to represent market information and financial knowledge, including complete ignorance, statistical observations, subjective speculations, distributional assumptions, linear relations, and empirical asset-pricing models. The authors then appeal to Dempster's rule of combination to integrate the knowledge for assessing the overall belief of portfolio performance and updating the belief by incorporating additional evidence. An example of three gold stocks is used to illustrate the approach. 相似文献
14.
Madhukar Anand Sebastian Fischmeister Insup Lee Linh T. X. Phan 《Real-Time Systems》2012,48(4):430-462
Distributed real-time systems require bounded communication delays and achieve them by means of a predictable and verifiable control mechanism for the communication medium. Real-time bus arbitration mechanisms control access to the medium and guarantee bounded communication delays. These arbitration mechanisms can be static dispatch tables or dynamic, algorithmic approaches. In this work, we introduce a real-time bus arbitration mechanism called tree schedules that takes the best parts of both sides: It can be analyzed like static dispatch tables, and it provides a certain degree of flexibility similar to algorithmic approaches. We present tree schedules as a framework to specify real-time traffic and introduce mechanisms to analyze it. We discuss how tree schedules can capture application-specific behavior in a time-triggered state-based supply model by means of conditional branching built into the model. We present analysis results for this model specifically aiming at schedulability in fixed and dynamic priority schemes and waiting time analysis. Finally, we demonstrate the advantages of state-based supply over stateless supply by means of two case studies. 相似文献
15.
Uwe Schwiegelshohn 《Journal of Scheduling》2011,14(6):571-581
The paper discusses a rarely used metric that is well suited to evaluate online schedules for independent jobs on massively
parallel processors. The metric is based on the total weighted completion time objective with the weight being the resource
consumption of the job. Although every job contributes to the objective value, the metric exhibits many properties that are
similar to the properties of the makespan objective. For this metric, we particularly address nonclairvoyant online scheduling
of sequential jobs on parallel identical machines and prove an almost tight competitive factor of 1.25 for nondelay schedules.
For the extension of the problem to rigid parallel jobs, we show that no constant competitive factor exists. However, if all
jobs are released at time 0, List Scheduling in descending order of the degree of parallelism guarantees an approximation
factor of 2. 相似文献
16.
17.
Performance evaluation of incremental training method for face recognition using PCA 总被引:1,自引:0,他引:1
Ch. Satyanarayana D. M. Potukuchi L. Pratap Reddy 《Journal of Real-Time Image Processing》2007,1(4):311-327
Relevance of ‘face recognition’ (FR) in the modern world requirements is presented as a case of human machine interaction.
Physical conditions that influence the face recognition process regarding the facial features, illumination changes and viewing
angles etc. are discussed. Face recognition process predominantly depends on machine perception i.e. information through an
array of pixels with respect to the facial image. Details of eigenface approach through the involvement of contemporary algebraic
and statistical analysis are revisited. Methodology involved in the Principal Component Analysis and advantages of exposing
the data to incremental training (using PCA) are discussed. A model for the implementation of IPCA over the face databases
is proposed to estimate its performance for the face recognition process. Performance of the present model is studied in the
domain of Euclidean distance, decay parameter, recognition rate, eigenvalues and overall computational time. Present IPCA
model administered over standard ORL, FERET databases along with that over the JNTU face database with large number of face
images revealed relative performance. The merit of present IPCA is inferred through enhanced recognition rate and reduced
complexity (in the algorithm), intelligent eigenvectors and lesser computational time. The results are presented in the wake
of the body of data available with other methods.
相似文献
Ch. SatyanarayanaEmail: |
18.
A. Valderruten V. M. Gulías J. Mosquera J. S. Jorge 《Control Engineering Practice》1999,7(12):771-1539
Synchronous reactive modelling provides an optimal framework for the modular decomposition of programs that engage in complex patterns of deterministic interaction, such as many real-time and communication entities. This paper presents an approach which includes performance modelling techniques in the synchronous reactive modelling method supported by ESTEREL. It defines a methodology based on timing and probabilistic quantitative constructs that complete the synchronous reactive models. A monitoring mechanism allows the computation of performance results during the simulation. This methodology is applied to study a multithreaded runtime system for a distributed functional programming language. Performance metrics are computed and validated with experimental results. 相似文献
19.
The performance evaluation of processor-memory communications for multiprocessor systems using circuit switched interconnection networks with a hold strategy is performed. Message size and processor processing time are considered and shown to have a significant effect on the overall system performance. A closed queuing network model is proposed such that only (n +2) states are required by the proposed model, in contrast to (n 2+3n +4)/2 states needed in previous studies, where n is the number of stages of the multistage interconnection network. Since a closed-form solution is obtained, the behavior of a complete cycle of memory access through multistage interconnection networks can be accurately analyzed and various performance bounds can be obtained 相似文献
20.
Granda M. Drake J.M. Gregorio J.A. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(1):55-71
Methods of calculating efficiently the performance measures of parallel systems by using unbounded generalized stochastic Petri nets are presented. An explosion in the number of states to be analyzed occurs when unbounded places appear in the model. The state space of such nets is infinite, but it is possible to take advantage of the natural symmetries of the system to aggregate the states of the net and construct a finite graph of lumped states which can easily be analyzed. With the methods developed, the unbounded places introduce a complexity similar to that of safe places of the net. These methods can be used to evaluate models of open parallel systems in which unbounded places appear; systems which are k -bounded but are complex and have large values of k can also be evaluated in an appropriate way. From the steady-state solution of the model, it is possible to obtain automatically the performance measures of parallel systems represented by this type of net 相似文献