首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A scalable unified multiplier for both prime fields GF(P) and binary extension fields GF(2k), where P=2m-1 and GF(2k) is generated by an irreducible all one polynomial. The proposed unified dual-field multiplier uses the LSB-first bit-serial architecture for multiplication in GF(P) and GF(2k) other than the Montgomery multiplication algorithm, which has been employed by most existing dual-field multipliers. The proposed unified dual-field multiplier costs little space and time complexities. The new multiplier is scalable for operands of any size while other existing dual-field multipliers are only scalable for operands with multiples of m. Furthermore, the proposed multiplier has simplicity, regularity, modularity and concurrency and is very suitable to be implement in VLSI.  相似文献   

2.
Advanced technology used for arithmetic computing application, comprises greater number of approximate multipliers and approximate adders. Truncation and Rounding-based Scalable Approximate Multiplier (TRSAM) distinguish a variety of modes based on height (h) and truncation (t) as TRSAM (h, t) in the architecture. This TRSAM operation produces higher absolute error in Least Significant Bit (LSB) data shift unit. A new scalable approximate multiplier approach that uses truncation and rounding TRSAM (3, 7) is proposed to increase the multiplier accuracy. With the help of foremost one bit architecture, the proposed scalable approximate multiplier approach reduces the partial products. The proposed approximate TRSAM multiplier architecture gives better results in terms of area, delay, and power. The accuracy of 95.2% and the energy utilization of 24.6 nJ is observed in the proposed multiplier design. The proposed approach shows 0.11%, 0.23%, and 0.24% less Mean Absolute Relative Error (MARE) when compared with the existing approach for the input of 8-bit, 16-bit, and 32-bit respectively. It also shows 0.13%, 0.19%, and 0.2% less Variance of Absolute Relative Error (VARE) when compared with the existing approach for the input of 8-bit, 16-bit, and 32-bit respectively. The proposed approach is implemented with Field-Programmable Gate Array (FPGA) and shows the delay of 3.640, 6.481, 12.505, 22.572, and 36.893 ns for the input of 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit respectively. The proposed approach is applied in digital filters design which shows the Peak-Signal-to-Noise Ratio (PSNR) of 25.05 dB and Structural Similarity Index Measure (SSIM) of 0.98 with 393 pJ energy consumptions when used in image application. The proposed approach is simulated with Xilinx and MATLAB and implemented with FPGA.  相似文献   

3.
This paper focusses on the challenge of building and programming scalable concurrent computers. The paper describes the inadequacy of current models of computing for programming massively parallel computers and discusses three universal models of concurrent computing — developed respectively by programming, architecture and algorithm perspectives. These models provide a powerful representation for parallel computing and are shown to be quite close. Issues in building systems architectures which efficiently represent and utilize parallel hardware resources are then discussed. Finally, we argue that by using a flexible universal programming model, an environment supporting heterogeneous programming languages can be developed.  相似文献   

4.
Fault-based side-channel cryptanalysis is a useful technique against symmetrical and asymmetrical encryption/decryption algorithms. Thus, eliminating cryptographic computation errors become critical in preventing such kind of attacks. A simple way to eliminating cryptographic computation errors is to output correct or corrected ciphers. Multiplication is the most important finite field arithmetic operation in the cryptographic computations. By using time redundancy technique, a novel dual basis (DB) multiplier over Galois fields (2m) will be presented with lower space complexity and feedback-free property. Based on the proposed feedback-free DB multiplier, the DB multiplier with a concurrent error detection (CED) capability is also easily developed. Compared with the existing DB multiplier with CED capability, the proposed one saves about 90% of time-area complexity. No existing DB multiplier in the literature has concurrent error correction (CEC) capability. Based on the proposed DB multiplier, a novel DB multiplier with CEC capability is easily designed. The proposed DB multiplier with CEC capability requires only about 3% of extra space complexity and 15% of time complexity when compared with the proposed DB multiplier without CEC.  相似文献   

5.
In recent years, deep neural networks have become a fascinating and influential research subject, and they play a critical role in video processing and analytics. Since, video analytics are predominantly hardware centric, exploration of implementing the deep neural networks in the hardware needs its brighter light of research. However, the computational complexity and resource constraints of deep neural networks are increasing exponentially by time. Convolutional neural networks are one of the most popular deep learning architecture especially for image classification and video analytics. But these algorithms need an efficient implement strategy for incorporating more real time computations in terms of handling the videos in the hardware. Field programmable Gate arrays (FPGA) is thought to be more advantageous in implementing the convolutional neural networks when compared to Graphics Processing Unit (GPU) in terms of energy efficient and low computational complexity. But still, an intelligent architecture is required for implementing the CNN in FPGA for processing the videos. This paper introduces a modern high-performance, energy-efficient Bat Pruned Ensembled Convolutional networks (BPEC-CNN) for processing the video in the hardware. The system integrates the Bat Evolutionary Pruned layers for CNN and implements the new shared Distributed Filtering Structures (DFS) for handing the filter layers in CNN with pipelined data-path in FPGA. In addition, the proposed system adopts the hardware-software co-design methodology for an energy efficiency and less computational complexity. The extensive experimentations are carried out using CASIA video datasets with ARTIX-7 FPGA boards (number) and various algorithms centric parameters such as accuracy, sensitivity, specificity and architecture centric parameters such as the power, area and throughput are analyzed. These results are then compared with the existing pruned CNN architectures such as CNN-Prunner in which the proposed architecture has been shown 25% better performance than the existing architectures.  相似文献   

6.
何冬梅  高文 《高技术通讯》2000,10(11):68-71,74
提出了一种基于小波包分解的复杂度可分级的音频编码算法。该算法对信号进行复杂度可分级的不完全不波包分解,并充分利用人耳的听觉特性和不同子带间小波系数的相关性对系数进行零树编码,不仅可在低码率上获得透明质量的重构信号,而且具有复杂度可分级编,解码和多码率可分级编码的功能,可在具有不同计算能力的计算机上实时实现音频编码的解码。  相似文献   

7.
This paper presents a fast block matching motion esti mation algorithm and its architecture. The proposed architecture is based on Global Elimination (GE) Algorithm, which uses pixel averaging to reduce complexity of motion search while keeping performance close to that of full search. GE uses a preprocessing stage which can skip unnecessary Sum Absolute Difference (SAD) calculations by comparing minimum SAD with sub-sampled SAD (SSAD). In the second stage SAD is computed at roughly matched candidate positions. GE algorithm uses fixed sub-block sizes and shapes to compute SSAD values in preprocessing stage. Complexity of this GE algorithm is further reduced by adaptively changing the sub-block sizes depending on the macro-block features. In this paper adaptive Global Elimination algorithm has been implemented which reduces the computational complexity of motion estimation algorithm and thus resulted in low power dissipation. Proposed architecture achieved 60% less number of computations compared to existing full search architecture and 50% high throughput compared to existing fixed Global Elimination Architecture.  相似文献   

8.
Chang HT  Kuo CJ 《Applied optics》1998,37(8):1310-1318
An optical parallel architecture for the random-iteration algorithm to decode a fractal image by use of iterated-function system (IFS) codes is proposed. The code value is first converted into transmittance in film or a spatial light modulator in the optical part of the system. With an optical-to-electrical converter, electrical-to-optical converter, and some electronic circuits for addition and delay, we can perform the contractive affine transformation (CAT) denoted in IFS codes. In the proposed decoding architecture all CAT's generate points (image pixels) in parallel, and these points then are joined for display purposes. Therefore the decoding speed is improved greatly compared with existing serial-decoding architectures. In addition, an error and stability analysis that considers nonperfect elements is presented for the proposed optical system. Finally, simulation results are given to validate the proposed architecture.  相似文献   

9.
Liu  L. Hong  X. Wu  J. Lin  J. 《Communications, IET》2009,3(3):487-499
As grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make this scenario a challenge for grid computing. Clearly, all the current client/server(C/S)- based architectures will become unfeasible for supporting large-scale grid applications due to their poor scalability and poor fault-tolerance. In order to address this issue, a novel C/S and peer-to-peer hybrid architecture to realise a highly scalable and flexible platform for optical grid is proposed here. The anycast algorithms, which are used to search suitable resources for job requests in this hybrid architecture, are also investigated in detail. Simulation and experimental results show that the proposed architecture is suitable and efficient for grid applications, and it outperforms C/S-based architecture in a large-scale grid environment.  相似文献   

10.
The paper presents a low-voltage (1-1.5 V) 16-bit Booth leapfrog array multiplier with emphasis on low energy dissipation, relatively high speed and small IC area. These attributes are achieved in two ways. First, low (hardware) complexity dynamic adders (DAs) are proposed and they are used to reduce spurious switching in the multiplier. Second, the specificities of the leapfrog architecture are exploited with the use of different output rates of the sum and carry outputs of the proposed DAs. When compared with other array multiplier designs, the proposed multiplier features the lowest energy dissipation and one of the shortest delays, resulting in the lowest energy-delay product. Furthermore, when compared with the reported dynamic array multiplier that features somewhat similar electrical characteristics, the proposed multiplier is advantageous in its substantially smaller (~33%) IC area. Based on a 0.35 mum dual-poly four-metal CMOS process and at 1 V operation, the proposed multiplier dissipates ~18 pJ, has a delay of ~188 ns and occupies 0.11 mm2 of IC area. The proposed design is appropriate for low-voltage energy-critical and IC area-critical applications including hearing aids  相似文献   

11.
Huang H  Itoh M  Yatagai T 《Applied optics》1994,33(26):6146-6156
Fully parallel modified signed-digit arithmetic operations are realized based on redundant bit representation of the digits proposed. A new truth-table minimizing technique is presented based on redundant-bitrepresentation coding. It is shown that only 34 minterms are enough for implementing one-step modified signed-digit addition and subtraction with this new representation. Two optical implementation schemes, correlation and matrix multiplication, are described. Experimental demonstrations of the correlation architecture are presented. Both architectures use fixed minterm masks for arbitrary-length operands, taking full advantage of the parallelism of the modified signed-digit number system and optics.  相似文献   

12.
曾步衢 《包装工程》2017,38(9):230-235
目的解决当前方法需要对图像中的相应点手动标记界标,且局限于特定对象或形状变形的问题。方法提出一种可以同时实现图像颜色、外观和形态的图像低维表示算法。结果该算法通过将形态和外观的流形约束到低维子空间上,进一步降低了流形学习的采样复杂性。结论文中方法的性能远优于目前典型的稳健型光流算法和SIFT流算法。在图像编辑和关节学习关任务中取得了令人满意的定性结果。  相似文献   

13.
Trusted P2P computing environments with role-based access control   总被引:2,自引:0,他引:2  
A P2P computing environment can be an ideal platform for resource-sharing services in an organisation if it provides trust mechanisms. Current P2P technologies offer content-sharing services for non-sensitive public domains in the absence of trust mechanisms. The lack of sophisticated trust mechanisms in the current P2P environment has become a serious constraint for broader applications of the technology although it has great potential. Therefore in this work an approach for securing transactions in the P2P environment is introduced, and ways to incorporate an effective and scalable access control mechanism - role-based access control (RBAC) - into current P2P computing environments has been investigated, proposing two different architectures: requesting peer-pull (RPP) and ultrapeer-pull (UPP) architectures. To provide a mobile, session-based authentication and RBAC, especially in the RPP architecture, lightweight peer certificates (LWPCs) are developed. Finally, to prove the feasibility of the proposed ideas, the RPP and UPP RBAC architectures are implemented and their scalability and performance are evaluated  相似文献   

14.
15.
Research on Web measurement and industrial collaboration in measurement fields in a wide-area and across-organizationally is accepted globally. This paper proposes a novel, scalable management architecture of measurement resources for the resource organization and resource access of the current wide-area collaborative measurement applications in the context of a grid. The complexity of the measurement management on a grid arises from the scale, dynamism, autonomy, heterogeneity, and distribution of the measurement systems and the relative data systems. This paper mainly discusses the interconnection, collaboration, and transparent access of the multi-measurement resources based on the proposed management architecture in the context of complexity. We first discuss the logical architecture used in the measurement fields, and then the resource management system is put at a high premium with layered architecture. Finally, the problems such as resource interconnection, sharing and collaboration are studied in the context of the proposed management environment. The typical applying instance is given to show the advancement of the proposed approach.  相似文献   

16.
17.
This article introduces a novel semisupervised automated segmentation approach for breast magnetic resonance (MR) image on multicore CPU-GPU systems. The basic idea of the proposed method is clustering-based semisupervised classifier devised by elliptical gamma mixture model (EGMM). Parameters of EGMM are identified by the iterative log-expectation maximization (EM) algorithm. The suggested classifier labels the groups of voxels in an input image first and then classifies the image slices using the EGMM. Two different implementations of the proposed algorithm have been developed based on two different types of high-performance computing architectures such as graphics processing units (GPUs) and multicore processors. To realize the real-time segmentation performance of our algorithm with two distinctive architecture, we have tested a set of breast MR images collected from MedPix. Comparison between two architectures in terms of segmentation performance and computational cost is assessed by the analysis of simulation and experimental results.  相似文献   

18.
Most current audio coding standards use the modified discrete cosine transform (MDCT) to transform an audio sequence from the time domain into the frequency domain. Existing architectures forMDCTuse a lookup table to store cosine values in read-only memory (ROM). For MPEG-2/4 AAC, the memory space taken up by the required lookup table makes the circuit implementation inflexible and large because of the long window length.Therefore this study proposes amemory-free architecture for MDCT without a lookup table. The proposed architecture adopts the arithmetic circuit module to calculate the cosine function based on the Taylor and Maclaurin series approximations and makes use of the symmetric and periodic identities of trigonometric functions to reduce circuit complexity. 0.18 μm TSMC cell library technology is used to synthesise the proposed architecture, which requires about 9213 gates with a maximum operation frequency of 100 MHz. The proposed architecture, which has a flexible window length to perform MDCT, can be implemented in an area smaller than that of ROM-based architectures.  相似文献   

19.
Digital design of a digital signal processor involves accurate and high-speed mathematical computation units. DSP units are one of the most power consuming and memory occupying devices. Multipliers are the common building blocks in most of the DSP units which demands low power and area constraints in the field of portable biomedical devices. This research works attempts multiple power reduction technique to limit the power dissipation of the proposed LUT multiplier unit. A lookup table-based multiplier has the advantage of almost constant area requirement’s irrespective to the increase in bit size of multiplier. Clock gating is usually used to reduce the unnecessary switching activities in idle circlet components. A clock tree structure is employed to enhance the SRAM based lookup table memory architecture. The LUT memory access operation is sequential in nature and instead of address decoder a ring counter is used to scan the memory contents and gated driver tree structure is implemented to control the clock and data switching activities. The proposed algorithm yields 20% of power reduction than existing.  相似文献   

20.
We propose a framework for measuring the complexity of aerospace systems and demonstrate its application. A measure that incorporates size, coupling, and modularity aspects of complexity is developed that emphasizes the importance of indirect coupling and feedback loops in the system. We demonstrate how hierarchical modular structure in the system reduces complexity and present an algorithm to decompose the system into modules. The measure is tested and found to be scalable for large-scale systems involving thousands of components and interactions (typical in modern aerospace systems). We investigate the sensitivity of the measure and demonstrate the ability of the framework to identify incorrectness in system representation. The merits of the framework are exemplified through a case study comparing three spacecraft. The framework provides the designer with three key capabilities that can positively influence the aerospace (or other) design process: the ability to identify complex subsystems, the ability to classify misrepresentations, and the ability to trade-off commercially of the shelf (COTS) and non-COTS components.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号