期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

High-throughput area-delay-efficient systolic multiplier over GF(2m) for a class of trinomials

《Microprocessors and Microsystems》2020

相似文献

2.

Low-complexity bit-parallel dual basis multipliers using the modified Booth’s algorithm

Chiou-Yng Lee Author Vitae Che Wun Chiou Author Vitae Jim-Min Lin Author Vitae 《Computers & Electrical Engineering》2005,31(7):444-459

New bit-parallel dual basis multipliers using the modified Booth’s algorithm are presented. Due to the advantage of the modified Booth’s algorithm, two bits are processed in parallel for reduction of both space and time complexities. A multiplexer-based structure has been proposed for realization of the proposed multiplication algorithm. We have shown that our multiplier saves about 9% space complexity as compared to other existing multipliers if the generating polynomial is trinomial or all one polynomial. Furthermore, the proposed multiplier is faster than existing multipliers. 相似文献

3.

Area and time efficient AB2 multipliers based on cellular automata

Hyun-Sung Kim Sung-Woon Lee 《Computer Standards & Interfaces》2009,31(1):137-143

Finite field operations have been widely used in the data communication security applications. This paper proposes two AB² multipliers based on the cellular automata over finite field. They deploy the mixture advantages in the perspective of both in area and time complexity by applying the property of irreducible all one polynomial as a modulus. First, an AB² multiplier is proposed with a new algorithm. Then, another algorithm and its architecture for AB² multiplication are derived to reduce the complex wiring in the first architecture. They can be used as a basic architecture for the public key cryptosystems in IEEE P1363. 相似文献

4.

A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

L. D. Elfimova Yu. V. Kapitonova 《Cybernetics and Systems Analysis》2001,37(1):109-121

A new fast matrix multiplication algorithm is proposed, which, as compared to the Winograd algorithm, has a lower multiplicative complexity equal to W _M 0.437n³ multiplication operations. Based on a goal-directed transformation of its basic graph, new optimized architectures of systolic arrays are synthesized. A systolic variant of the Strassen algorithm is presented for the first time. 相似文献

5.

Systematic designs of buffers in macropipelines of systolic arrays

《Journal of Parallel and Distributed Computing》1988,5(1):1-25

In a macropipeline of systolic arrays, outputs of one systolic array in a given format have to be fed as inputs to another systolic array in a possibly different format. A common memory becomes a bottleneck and limits the number of systolic arrays that can be connected together. In this paper, we study designs of buffers to convert data from one format to another. The minimum number of buffers is determined by a dynamic-programming algorithm with θ(n²) computational complexity, where n is the problem size. A general-purpose converter to convert data from any distribution to any other in a subset of the possible data distributions is also proposed. Last, buffer designs for a macropipeline to perform feature extraction and pattern classification are used to exemplify the design process. 相似文献

6.

Parallel geometric hashing for robust iris indexing

Anukul Chandra Panda Hunny Mehrotra Banshidhar Majhi 《Journal of Real-Time Image Processing》2013,8(3):341-349

In this paper an effort has been made to improve the time complexity of existing geometric hashing based indexing approach for iris biometrics [1]. In the conventional approach, the annular iris image is used for the extraction of keypoints using Scale Invariant Feature Transform [2]. Further, geometric hashing [3] is used to index the database using extracted keypoints. The existing approach performs with an accuracy of 98.5% with improvement in time. However, to further improve time complexity, existing geometric hashing approach is made parallel during indexing as well as retrieval phase. In the proposed approach, the extracted keypoints are mapped to the processors of the hypercube through shared global memory. The geometric invariants are obtained for each basis pair allotted to individual processors in parallel. During indexing phase, these invariants are stored in the hash table. For iris retrieval, the invariants are obtained and the corresponding entries in the hash table receive a vote. The time complexity of the proposed approach is O(Mn ²) for M iris images each having n keypoints, in comparison to existing approach with time complexity of O(Mn ³). This marks the suitability of proposed approach for real-time applications. 相似文献

7.

Analysis of Multi-Sort Algorithm on Multi-Mesh of Trees (MMT) architecture

Nitin Rakesh Nitin 《The Journal of supercomputing》2011,57(3):276-313

Various sorting algorithms using parallel architectures have been proposed in the search for more efficient results. This paper introduces the Multi-Sort Algorithm for Multi-Mesh of Trees (MMT) Architecture for N=n ⁴ elements with more efficient time complexity compared to previous architectures. The shear sort algorithm on Single Instruction Multiple Data (SIMD) mesh model requires $4\sqrt{N}+O\sqrt{N}$ time for sorting N elements, arranged on a $\sqrt{N}\times \sqrt{N}$ mesh, whereas Multi-Sort algorithm on the SIMD Multi-Mesh (MM) Architecture takes O(N ^1/4) time for sorting the same N elements, which proves that Multi-Sort is a better sorting approach. We have improved the time complexity of intrablock Sort. The Communication time complexity for 2D Sort in MM is O(n), whereas this time in MMT is O(log?n). The time complexity of compare–exchange step in MMT is same as that in MM, i.e., O(n). It has been found that the time complexity of the Multi-Sort on MMT has been improved as on Multi-Mesh architecture. 相似文献

8.

Action-based discovery of satisfying subsets: A distributed method for model correction

《Information and Software Technology》2013,55(2):201-214

ContextUnderstanding and resolving counterexamples in model checking is a difficult task that often takes a significant amount of resources and many rounds of regression model checking after any fix. As such, it is desirable to have algorithmic methods that correct finite-state models when their model checking for a specific property fails without undermining the correctness of the rest of the properties, called the model correction problem.ObjectiveThe objective of this paper is to mitigate time and space complexity of correction.MethodTo achieve the objective, this paper presents a distributed method that solves the model correction problem using the concept of satisfying subsets, where a satisfying subset is a subset of model computations that meets a new property while preserving existing properties. The proposed method automates the elimination of superfluous non-determinism in models of concurrent computing systems, thereby generating models that are correct by construction.ResultsWe have implemented the proposed method in a distributed software tool, called the Model Corrector (ModCor). Due to the distributed nature of the correction algorithms, ModCor exploits the processing power of computer clusters to mitigate the space and time complexity of correction. Our experimental results are promising as we have used a small cluster of five regular PCs to automatically correct large models (with about 3¹⁵⁹ reachable states) in a few hours. Such corrections would have been impossible without using ModCor.ConclusionsThe results of this paper illustrate that partitioning finite-state models based on their transition relations and distributing them across a computer cluster facilitates the automated correction of models when their model checking fails. 相似文献

9.

Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks

Santosh Ghosh^{Author Vitae} Monjur Alam Author Vitae Author Vitae Indranil Sen Gupta Author Vitae 《Computers & Electrical Engineering》2009,35(2):329-338

All elliptic curve cryptographic schemes are based on scalar multiplication of points, and hence its faster computation signifies faster operation. This paper proposes two different parallelization techniques to speedup the GF(p) elliptic curve multiplication in affine coordinates and the corresponding architectures. The proposed implementations are capable of resisting different side channel attacks based on time and power analysis. The 160, 192, 224 and 256 bits implementations of both the architectures have been synthesized and simulated for both FPGA and 0.13μ CMOS ASIC. The final designs have been prototyped on a Xilinx Virtex-4 xc4vlx200-12ff1513 FPGA board and performance analyzes carried out. The experimental result and performance comparison show better throughput of the proposed implementations as compared to existing reported architectures. 相似文献

10.

Low-Complexity Bit-Parallel Multiplier over GF(2$^m$) Using Dual Basis Representation

下载免费PDF全文

Chiou-Yng Lee Jenn-Shyong Horng and I-Chang Jou 《计算机科学技术学报》2006,21(6):887-892

Recently, cryptographic applications based on finite fields have attracted much attention. The most demanding finite field arithmetic operation is multiplication. This investigation proposes a new multiplication algorithm over GF（2^m） using the dual basis representation. Based on the proposed algorithm, a parallel-in parallel-out systolic multiplier is presented, The architecture is optimized in order to minimize the silicon covered area （transistor count）. The experimental results reveal that the proposed bit-parallel multiplier saves about 65% space complexity and 33% time complexity as compared to the traditional multipliers for a general polynomial and dual basis of GF（2^m）. 相似文献

11.

Low-complex dynamic programming algorithm for hardware/software partitioning 总被引：1，自引：0，他引：1

Jigang Wu Thambipillai Srikanthan 《Information Processing Letters》2006,98(2):41-46

A low-complex algorithm is proposed for the hardware/software partitioning. The proposed algorithm employs dynamic programming principles while accounting for communication delays. It is shown that the time complexity of the latest algorithm has been reduced from O(n²⋅A) to O(n⋅A), without increase in space complexity, for n code fragments and hardware area A. 相似文献

12.

Efficient bit-parallel multipliers over finite fields GF(2)

Chiou-Yng Lee Author Vitae Pramod Kumar Meher Author Vitae 《Computers & Electrical Engineering》2010,36(5):955-968

Hardware implementation of multiplication in finite field GF(2^m) based on sparse polynomials is found to be advantageous in terms of space-complexity as well as the time-complexity. In this paper, we present a new permutation method to construct the irreducible like-trinomials of the form (x + 1)^m + (x + 1)ⁿ + 1 for the implementation of efficient bit-parallel multipliers. For implementing the multiplications based on such polynomials, we have defined a like-polynomial basis (LPB) as an alternative to the original polynomial basis of GF(2^m). We have shown further that the modular arithmetic for the binary field based on like-trinomials is equivalent to the arithmetic for the field based on trinomials. In order to design multipliers for composite fields, we have found another permutation polynomial to convert irreducible polynomials into like-trinomials of the forms (x² + x + 1)^m + (x² + x + 1)ⁿ + 1, (x² + x)^m + (x² + x)ⁿ + 1 and (x⁴ + x + 1)^m + (x⁴ + x + 1)ⁿ + 1. The proposed bit-parallel multiplier over GF(2^4m) is found to offer a saving of about 33% multiplications and 42.8% additions over the corresponding existing architectures. 相似文献

13.

An optimization algorithm for free-form surface partitioning based on weighted gaussian image

《Graphical Models》2005,67(1):17-42

Partitioning free-form surfaces into sub-patches and finding optimal representative normal for each patch to maximize a global objective function is an important two-level operation in diverse industrial applications. In this paper, by solving a maximum hemispherical partitioning problem raised from a weighted Gaussian image, an optimization algorithm is proposed to partition a free-form surface into two sub-patches and simultaneously report the optimal representative normals. By discretizing the free-form surface with W sample points and clustering normals on the surface with m distinct sample normals, the proposed algorithm is designed, in general, with O(m²W²) time complexity and O(W²) space complexity, and in particular, if the surface is convex, in O(m² log m) time complexity. Case studies with four representative examples are presented and a real world application is exploited to demonstrate the effectiveness and usefulness of the proposed algorithm. 相似文献

14.

Novel parity-preserving reversible logic array multipliers

Mojtaba Valinataj 《The Journal of supercomputing》2017,73(11):4843-4867

Reversible logic as a new promising design domain can be used for DNA computations, nanocomputing, and especially constructing quantum computers. However, the vulnerability to different external effects may lead to deviation from producing correct results. The multiplication is one of the most important operations because of its huge usage in different computing systems. Thus, in this paper, some novel reversible logic array multipliers are proposed with error detection capability through the usage of parity-preserving gates. By utilizing the new arrangements of existing reversible gates, some new circuits are presented for partial product generation and multi-operand addition required in array multipliers which results in two unsigned and three signed parity-preserving array multipliers. The experimental results show that the best of signed and unsigned proposed multipliers have the lowest values among the existing designs regarding the main reversible logic criteria including quantum cost, gate count, constant inputs, and garbage outputs. For $4\times 4$ multipliers, the proposed designs achieve up to 28 and 46% reduction in the quantum cost and gate count, respectively, compared to the existing designs. Moreover, the proposed unsigned multipliers can reach up to 58% gate count reduction in $16\times 16$ multipliers. 相似文献

15.

A web personalizing technique using adaptive data structures: The case of bursts in web visits

E. Sakkopoulos^{Author Vitae} D. Antoniou Author VitaeAuthor Vitae N. Tsirakis Author VitaeAuthor Vitae 《Journal of Systems and Software》2010,83(11):2200-2210

The explosive growth in the size and use of the World Wide Web continuously creates new great challenges and needs. The need for predicting the users’ preferences in order to expedite and improve the browsing though a site can be achieved through personalizing of the Websites. Recommendation and personalization algorithms aim at suggesting WebPages to users based on their current visit and past users’ navigational patterns. The problem that we address is the case where few WebPages become very popular for short periods of time and are accessed very frequently in a limited temporal space. Our aim is to deal with these bursts of visits and suggest these highly accessed pages to the future users that have common interests. Hence, in this paper, we propose a new web personalization technique, based on advanced data structures.The data structures that are used are the Splay tree (1) and Binary heaps (2). We describe the architecture of the technique, analyze the time and space complexity and prove its performance. In addition, we compare both theoretically and experimentally the proposed technique to another approach to verify its efficiency. Our solution achieves O(P²) space complexity and runs in k log P time, where k is the number of pages and P the number of categories of WebPages. 相似文献

16.

A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets 总被引：3，自引：0，他引：3

Zuqiang Meng Zhongzhi Shi 《Information Sciences》2009,179(16):2774-4942

Efficient attribute reduction in large, incomplete decision systems is a challenging problem; existing approaches have time complexities no less than O(∣C∣²∣U∣²). This paper derives some important properties of incomplete information systems, then constructs a positive region-based algorithm to solve the attribute reduction problem with a time complexity no more than O(∣C∣²∣U∣log∣U∣). Furthermore, our approach does not change the size of the original incomplete system. Numerical experiments show that the proposed approach is indeed efficient, and therefore of practical value to many real-world problems. The proposed algorithm can be applied to both consistent and inconsistent incomplete decision systems. 相似文献

17.

Unified Parallel Systolic Multiplier Over GF（2^m）

下载免费PDF全文

Chiou-Yng Lee Yung-Hui Chen Che-Wun Chiou and Jim-Min Lin 《计算机科学技术学报》2007,22(1):28-38

In general, there are three popular basis representations, standard （canonical, polynomial） basis, normal basis, and dual basis, for representing elements in GF（2^m）. Various basis representations have their distinct advantages and have their different associated multiplication architectures. In this paper, we will present a unified systolic multiplication architecture, by employing Hankel matrix-vector multiplication, for various basis representations. For various element representation in GF（2^m）, we will show that various basis multiplications can be performed by Hankel matrix-vector multiplications. A comparison with existing and similar structures has shown that time complexities. the proposed architectures perform well both in space and 相似文献

18.

Distributed algorithm to detect strong conjunctive predicates

Punit Chandra 《Information Processing Letters》2003,87(5):243-249

This paper presents an on-line distributed algorithm for detection of Definitely(φ) for the class of conjunctive global predicates. The only known algorithm for detection of Definitely(φ) uses a centralized approach. A method for decentralizing the algorithm was also given, but the work load is not fairly distributed and the method uses a hierarchical structure. The centralized approach has a time, space, and total message complexity of O(n²m), where n is the number of processes and m is the maximum number of messages sent by any process. The proposed on-line distributed algorithm uses the concept of intervals rather than events, and assumes p is the maximum number of intervals at any process. The worst-case time complexity across all the processes is O(min(pn²,mn²)). The worst-case space overhead across all the processes is min(2mn²,2pn²). 相似文献

19.

Implementation and analysis of optimized architectures for rank order filter

S. M. Meena K. Linganagouda 《Journal of Real-Time Image Processing》2008,3(1-2):33-41

In this paper we present two architectures based on the replication sort algorithm (RSA) and rank based network sorting algorithm (RBNS) for implementation of Rank order filer (ROF). This paper focuses on optimization strategies for sorting in terms of operating speed (throughput) and area (no. of comparators). The RSA algorithm achieves maximum throughput by sorting, which finds the position of all the window elements in parallel using eight bit comparators, a LUT to store the bit sum and a decoder. The time cost for filtering the complete image remains constant irrespective of the size of the window and the algorithm is generalized for all rank orders. The RBNS architecture is based on Sorting Network architecture algorithm, optimized for each desired output rank with O (N) hardware complexity compared to O (N²) complexity of the existing architectures that are based on bubble-sort and quick-sort reported so far. The proposed architectures use the concepts of pipelining and grain level parallelism and accomplish the task of sorting and filtering each sample appearing at the input window of the filter in one clock cycle, excluding the initial latency. 相似文献

20.

Weighted broadcast in linear radio networks

Gautam K. Das 《Information Processing Letters》2008,106(4):136-143

The weighted version of the broadcast range assignment problem in ad hoc wireless network is studied. Efficient algorithms are presented for the unbounded and bounded-hop broadcast problems for the linear radio network, where radio stations are placed on a straight line. For the unbounded case of the problem, the proposed algorithm runs in O(n²) time and using O(n) space, where n is the number of radio stations in the network. For the h-hop broadcast problem, the time and space complexities of our algorithm are O(hn²logn) and O(hn), respectively. This improves time complexity of the existing results for the same two problems by a factor of n and , respectively [C. Ambuhl, A.E.F. Clementi, M.D. Ianni, G. Rossi, A. Monti, R. Silvestri, The range assignment problem in non-homogeneous static ad hoc networks, in: Proc. 18th Int. Parallel and Distributed Precessing Symposium, 2004]. 相似文献