首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Micro, IEEE》2004,24(6):118-127
Power is a major problem for scaling the hardware needed to support memory disambiguation in future out-of-order architectures. In current machines, the traditional detection of memory ordering violations requires frequent associative searches of state proportional to the instruction window size. A new class of solutions yields an order-of-magnitude reduction in the energy required to properly order loads and stores for windows of hundreds to thousands of in-flight instructions  相似文献   

2.
Modern complex embedded applications in multiple application fields impose stringent and continuously increasing functional and parametric demands. To adequately serve these applications, massively parallel multi-processor systems on a single chip (MPSoCs) are required. This paper is devoted to the design of scalable communication architectures of massively parallel hardware multi-processors for highly-demanding applications. We demonstrated that in the massively parallel hardware multi-processors the communication network influence on both the throughput and circuit area dominates the processors influence, while the traditionally used flat communication architectures do not scale well with the increase of parallelism. Therefore, we propose to design highly optimized application-specific partitioned hierarchical organizations of the communication architectures through exploiting the regularity and hierarchy of the actual information flows of a given application. We developed related communication architecture synthesis strategies and incorporated them into our quality-driven model-based multi-processor design methodology and related automated architecture exploration framework. Using this framework we performed a large series of architecture synthesis experiments. Some of the results of the experiments are presented in this paper. They demonstrate many features of the synthesized communication architectures and show that our method and related framework are able to efficiently synthesize well scalable communication architectures even for the high-end massively parallel multi-processors that have to satisfy extremely stringent computation demands.  相似文献   

3.
4.
This paper presents a new scalable hardware implementing modular multiplication. A high radix Montgomery multiplication algorithm without final subtraction is used to perform this operation. An alternative proof for the final Montgomery multiplication by 1, removing the condition on the modulus, is given. This hardware fits in any chip area and is able to work with any size of modulus. Unlike other scalable designs only one cell is used. This cell contains standard and well optimized digit multiplier and adder. Time–area trade-offs are also available before hardware synthesis for differents sizes of internal data path. The pipeline architecture of the multiplier component increases the clock frequency and the throughput. Time–area trade-offs are analyzed in order to make the best choice for given time and area constraints. This architecture seems to provide a better time–area compromise than previous scalable hardware.  相似文献   

5.
Hung  D.L. 《Micro, IEEE》1995,15(4):31-39
Fuzzy systems based on dedicated digital hardware can deliver much higher performance than those based on general-purpose computing machines. The simplicity and versatility of some successful fuzzy inference algorithms, the advent of high-density, user-programmable logic devices, together with powerful EDA tools, make dedicated digital fuzzy hardware a feasible solution for implementing high-performance fuzzy systems  相似文献   

6.
Support Vector Machine (SVM) regression is an important technique in data mining. The SVM training is expensive and its cost is dominated by: (i) the kernel value computation, and (ii) a search operation which finds extreme training data points for adjusting the regression function in every training iteration. Existing training algorithms for SVM regression are not scalable to large datasets because: (i) each training iteration repeatedly performs expensive kernel value computations, which is inefficient and requires holding the whole training dataset in memory; (ii) the search operation used in each training iteration considers the whole search space which is very expensive. In this article, we significantly improve the scalability and efficiency of SVM regression by exploiting the high performance of Graphics Processing Units (GPUs) and solid state drives (SSDs). Our key ideas are as follows. (i) To reduce the cost of repeated kernel value computations and avoid holding the whole training dataset in the GPU memory, we precompute all the kernel values and store them in the CPU memory extended by the SSD; together with an efficient strategy to read the precomputed kernel values, reusing precomputed kernel values with an efficient retrieval is much faster than computing them on-the-fly. This also alleviates the restriction that the training dataset has to fit into the GPU memory, and hence makes our algorithm scalable to large datasets, especially for large datasets with very high dimensionality. (ii) To enhance the performance of the frequently used search operation, we design an algorithm that minimizes the search space and the number of accesses to the GPU global memory; this optimized search algorithm also avoids branch divergence (one of the causes for poor performance) among GPU threads to achieve high utilization of the GPU resources. Our proposed techniques together form a scalable solution to the SVM regression which we call SIGMA. Our extensive experimental results show that SIGMA is highly efficient and can handle very large datasets which the state-of-the-art GPU-based algorithm cannot handle. On the datasets of size that the state-of-the-art GPU-based algorithm can handle, SIGMA consistently outperforms the state-of-the-art GPU-based algorithm by an order of magnitude and achieves up to 86 times speedup.  相似文献   

7.
We demonstrate a strategy for implementation a quantum full adder in a spin chain quantum computer. As an example, we simulate a quantum full adder in a chain containing 201 spins. Our simulations also demonstrate how one can minimize errors generated by non-resonant effects.  相似文献   

8.
数字指纹技术是一种可以追踪到非法拷贝源的数字版权保护技术,其中一种潜在的威胁就是几个合法用户共谋攻击。针对已有的抗合谋扩频正交数字指纹方案支持用户数较少的缺点,提出了一种关于扩频正交数字指纹码的扩展方案,从理论和实验上证明该方案提高了系统所能支持的用户数,并且保持了原方案较好的抗合谋性能。  相似文献   

9.
Fuzzy systems have been explored in diverse application fields which require reaching fuzzy inferences at high computer rates. To accomplish this task, fuzzy hardware is the best choice. At inference engine, conjunction and disjunction operations play a very important role for decision making. Common operations in existing fuzzy hardware are minimum, maximum, algebraic product and probabilistic sum. In order to extend the applicability of existing fuzzy hardware, it is necessary to consider a wider range of operations. It is even desirable to have configurable circuits which take advantage of hardware resources. This work presents the hardware implementation of configurable circuits for the realization of diverse fuzzy t-norm and t-conorm operations. Resultant circuits are low hardware resource consumers which makes them efficient to be used as add-in modules for existing fuzzy hardware in FPGA or ASIC. Comparative results are presented showing the advantages of these circuits.  相似文献   

10.
We present the disparity map computation core of a hardware system for isolating foreground objects in stereoscopic video streams. The operation is based on the computation of dense disparity maps using block-matching algorithms and two well-known metrics: sum of absolute differences and Census transform. Two sets of disparity maps are computed by taking each of the images as reference so that a consistency check can be performed to identify occluded pixels and eliminate spurious foreground pixels. Taking advantage of parallelism, the proposed architecture is highly scalable and provides numerous degrees of adjustment to different application needs, performance levels and resource usage. A version of the system for 640 × 480 images and a maximum disparity of 135 pixels was implemented in a system based on a Xilinx Virtex II-Pro FPGA and two cameras with a frame rate of 25 fps (less than the maximum supported frame rate of 40 fps on this platform). Implementation of the same system on a Virtex-5 FPGA is estimated to achieve 80 fps, while a version with increased parallelism is estimated to run at 140 fps (which corresponds to the calculation of more than 5.9 × 109 disparity-pixels per second).  相似文献   

11.
Accessibility analysis using computer graphics hardware   总被引:1,自引:0,他引:1  
Analyzing the accessibility of an object's surface to probes or tools is important for many planning and programming tasks that involve spatial reasoning and arise in robotics and automation. The paper presents novel and efficient algorithms for computing accessible directions for tactile probes used in 3D digitization with Coordinate Measuring Machines. The algorithms are executed in standard computer graphics hardware. They are a nonobvious application of rendering hardware to scientific and technological areas beyond computer graphics  相似文献   

12.
The real-time probabilistic simulation of quantum systems in classical computers is known to be limited by the so-called dynamical sign problem, a problem leading to exponential complexity. In 1981 Richard Feynman raised some provocative questions in connection to the “exact imitation” of such systems using a special device named a “quantum computer”. Feynman hesitated about the possibility of imitating fermion systems using such a device. Here we address some of his concerns and, in particular, investigate the simulation of fermionic systems. We show how quantum computers avoid the sign problem in some cases by reducing the complexity from exponential to polynomial. Our demonstration is based upon the use of isomorphisms of algebras. We present specific quantum algorithms that illustrate the main points of our algebraic approach.  相似文献   

13.
为了快速高效地判断并排除计算机主机硬件故障,更好地完成高校微机室的计算机硬件维护工作,基于多年实践工作经验,对计算机主机的常见故障作了详细的归纳总结.通过分析故障现象,运用听看结合、交叉判断、逐一排除等技巧查找到故障点,最终采用快捷、经济的实用方法排除故障,从而,确保高校教学活动的正常进行.  相似文献   

14.
Quantum cheques could be a forgery-free way to make transaction in a quantum networked banking system with perfect security against any no-signalling adversary. Here, we demonstrate the implementation of quantum cheque, proposed by Moulick and Panigrahi (Quantum Inf Process 15:2475–2486, 2016), using the five-qubit IBM quantum computer. Appropriate single qubit, CNOT and Fredkin gates are used in an optimized configuration. The accuracy of implementation is checked and verified through quantum state tomography by comparing results from the theoretical and experimental density matrices.  相似文献   

15.
This paper presents a methodology employing systems analysis techniques which may be used to evaluate alternative data processing plans for municipalities. The basic information which must be formulated for the analysis includes the following three areas: general hardware specifications, implementation schedules for database applications, and cost profiles which include capital and operating expense estimates. Specific factors are developed which may be used to assign numeric ratings to the relative advantages of each plan, and as a basis for making recommendations in the areas of hardware, software, staffing and costs. The methodology is illustrated by the analysis and evaluation of five plans developed to meet the data processing needs of a large metropolitan area.  相似文献   

16.
17.
The ability to resolve fine picture detail is of paramount importance in medical imaging systems for viewing small tissue, bone structure and anatomy in X-ray images. In this paper, we present a new digital radiographic image processing system with the property of scalability and adaptability. (i) A new automatic optimization algorithm is proposed for display. (ii) An adaptive detection of a region-of-interest is developed. (iii) A “scalable edge enhancement algorithm” is proposed to improve the image quality for showing subtle structures in digital radiographic images. The advantage of the proposed method is demonstrated through experiments on 200 digital X-ray images and 50 CT images, in which different parts of human body structures are captured.  相似文献   

18.
19.
Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 129–140, March–April, 1993.  相似文献   

20.
实验考核是实验教学改革的重要部分,是保障实验教学质量的重要手段。提出了以能力测试为目标的三位一体的实验考核方式,阐述了考核体系构建,考核内容设计,并在计算机硬件实验教学实践中收到了良好效果,提高了学生的创新能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号