首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper focuses on a problem of network synthesis for a class of quantum stochastic systems. The systems under consideration are of triplet-type form and stem from linear quantum optics and linear quantum circuits. A new quantum network realization approach is proposed by generalizing the scattering operator from the scalar form to a unitary matrix in network components. It shows that the triplet-type quantum stochastic system can be approximated by a quantum network which consists of some one-degree-of-freedom generalized open-quantum harmonic oscillators (1DGQHOs) via series, concatenation and feedback connections.  相似文献   

2.
A quantum circuit is a computational unit that transforms an input quantum state to an output state.A natural way to reason about its behavior is to compute explicitly the unitary matrix implemented by it.However,when the number of qubits increases,the matrix dimension grows exponentially and the computation becomes intractable.In this paper,we propose a symbolic approach to reasoning about quantum circuits.It is based on a small set of laws involving some basic manipulations on vectors and matrices.This symbolic reasoning scales better than the explicit one and is well suited to be automated in Coq,as demonstrated with some typical examples.  相似文献   

3.
Quaternary encoded binary circuits are more compact than their binary counterpart. Although quaternary reversible circuits are realizable, design of such circuits is still in its infancy. This work proposes a new, enhanced method of quaternary Galois field sum of products (QGFSOP) synthesis for quaternary quantum circuits. To reduce QGFSOP product terms, the algorithm makes use of 11 newly defined quaternary Galois field (QGF) expansions (for a total of 21 QGF expansions). This algorithm achieves QGFSOP minimization with the assistance of a pseudo-Kronecker Galois field decision diagram (QGFDD). This is a novel approach for QGFSOP synthesis. Finally, QGFSOP expressions are translated into quantum cost optimized quaternary quantum circuits using: (1) newly developed quaternary quantum gate realizations of controlled Feynman and Toffoli gate that are optimized in terms of quantum cost, (2) use of composite literals consisting of 1 digit and M–S gates. Performance evaluation against existing works in the literature determined that our proposed method achieves an average QGFSOP expression product term savings of 32.66 %. Also, the synthesized QGFSOP circuits were evaluated in terms of quantum cost.  相似文献   

4.
脉动阵列结构规整、吞吐量大,适合矩阵乘算法,广泛用于设计高性能卷积、矩阵乘加速结构。在深亚微米工艺下,通过增大阵列规模来提升芯片计算性能,会导致频率下降、功耗剧增等问题。因此,结合3D集成电路技术,提出了一种将平面脉动阵列结构映射到3D集成电路上的双精度浮点矩阵乘加速结构3D-MMA。首先,设计了针对该结构的分块映射调度算法,提升矩阵乘计算效率;其次,提出了基于3D-MMA的加速系统,构建了3D-MMA的性能模型,并对其设计空间进行探索;最后,评估了该结构实现代价,并同已有先进加速器进行对比分析。实验结果表明,访存带宽为160GB/s时,采用4层16×16脉动阵列的堆叠结构时,3D-MMA计算峰值性能达3TFLOPS,效率达99%,且实现代价小于二维实现。在相同工艺下,同线性阵列加速器及K40GPU相比,3D-MMA的性能是后者的1.36及1.92倍,而面积远小于后者。探索了3D集成电路在高性能矩阵乘加速器设计中的优势,对未来进一步提升高性能计算平台性能具有一定的参考价值。  相似文献   

5.
A serious obstacle to large-scale quantum algorithms is the large number of elementary gates, such as the controlled-NOT gate or Toffoli gate. Herein, we present an improved linear-depth ripple-carry quantum addition circuit, which is an elementary circuit used for quantum computations. Compared with previous addition circuits costing at least two Toffoli gates for each bit of output, the proposed adder uses only a single Toffoli gate. Moreover, our circuit may be used to construct reversible circuits for modular multiplication, Cx mod M with x < M, arising as components of Shor’s algorithm. Our modular-multiplication circuits are simpler than previous constructions, and may be used as primitive circuits for quantum computations.  相似文献   

6.
One of the elementary operations in computing systems is multiplication. Therefore, high-speed and low-power multipliers design is mandatory for efficient computing systems. In designing low-energy dissipation circuits, reversible logic is more efficient than irreversible logic circuits but at the cost of higher complexity. This paper introduces an efficient signed/unsigned 4 × 4 reversible Vedic multiplier with minimum quantum cost. The Vedic multiplier is considered fast as it generates all partial product and their sum in one step. This paper proposes two reversible Vedic multipliers with optimized quantum cost and garbage output. First, the unsigned Vedic multiplier is designed based on the Urdhava Tiryakbhyam (UT) Sutra. This multiplier consists of bitwise multiplication and adder compressors. Compared with Vedic multipliers in the literature, the proposed design has a quantum cost of 111 with a reduction of 94% compared to the previous design. It has a garbage output of 30 with optimization of the best-compared design. Second, the proposed unsigned multiplier is expanded to allow the multiplication of signed numbers as well as unsigned numbers. Two signed Vedic multipliers are presented with the aim of obtaining more optimization in performance parameters. DesignI has separate binary two’s complement (B2C) and MUX circuits, while DesignII combines binary two’s complement and MUX circuits in one circuit. DesignI shows the lowest quantum cost, 231, regarding state-of-the-art. DesignII has a quantum cost of 199, reducing to 86.14% of DesignI. The functionality of the proposed multiplier is simulated and verified using XILINX ISE 14.2.  相似文献   

7.
This paper discusses the development and refinement of several distributed matrix multiplication algorithms. Our goal in this research has been to determine if successful distribution of this problem is possible within a loosely-coupled environment. Our criteria for success are fast execution speed and, to a lesser extent, memory efficiency. Our results indicate that, perhaps counter-intuitively, it is possible to use distribution to improve the performance of dense matrix multiplication. The speed increase obtained ranges up to a factor of four, depending upon the algorithm and the process configuration used. Among the factors affecting performance are computational complexity, number and size of interprocess messages, and bookkeeping overhead. We conclude that this approach to matrix multiplication has potential. Furthermore, some of the principles discussed here may be usefully employed in the distribution of other algorithms of the same O(n3) computational complexity, such as LU decomposition (linear system solvers) and Cholesky factorization.  相似文献   

8.
标量乘运算从整体上决定了椭圆曲线密码体制的快速实现效率,在一些椭圆曲线公钥密码体制中需要计算多标量乘。多基数链的标量表示长度更短、非零比特数目更少,较好地适用于椭圆曲线标量乘的快速计算。为了提高椭圆曲线密码的效率,在已有的二进制域和素域的标量乘算法的基础上,结合滑动窗口技术、多基算法,提出新的更高效的多标量乘算法。实验结果表明,新算法与传统Shamir算法和交错NAF算法相比,其所需的运算量更少,能有效地提高椭圆曲线多标量乘算法的效率,使多标量乘的运算更高效。相比于其他算法,新算法的计算效率比已有的多标量乘算法提高了约7.9%~20.6%。  相似文献   

9.
Quantum control plays a key role in quantum technology, in particular for steering quantum systems. As problem size grows exponentially with the system size, it is necessary to deal with fast numerical algorithms and implementations. We improved an existing code for quantum control concerning two linear algebra tasks: the computation of the matrix exponential and efficient parallelisation of prefix matrix multiplication.For the matrix exponential we compare three methods: the eigendecomposition method, the Padé method and a polynomial expansion based on Chebyshev polynomials. We show that the Chebyshev method outperforms the other methods both in terms of computation time and accuracy. For the prefix problem we compare the tree-based parallel prefix scheme, which is based on a recursive approach, with a sequential multiplication scheme where only the individual matrix multiplications are parallelised. We show that this fine-grain approach outperforms the parallel prefix scheme by a factor of 2–3, depending on parallel hardware and problem size, and also leads to lesser memory requirements.Overall, the improved linear algebra implementations not only led to a considerable runtime reduction, but also allowed us to tackle problems of larger size on the same parallel compute cluster.  相似文献   

10.
矩阵乘法是数值分析以及图形图像处理算法的基础,通用的矩阵乘法加速器设计一直是嵌入式系统设计的研究热点。但矩阵乘法由于计算复杂度高,处理效率低,常常成为嵌入式系统运算速度的瓶颈。为了在嵌入式领域更好地使用矩阵乘法,提出了基于MPSoC(MultiProcessor System-on-Chip)的软硬件协同加速的架构。在MPSoC的架构下,一方面,设计了面向硬件约束的矩阵分块方法,从而实现了通用的矩阵乘法加速器系统;另一方面,通过利用MPSoC下的多核架构,提出了相应的任务划分和负载平衡调度算法,提高了并行效率和整体系统加速比。实验结果表明,所提架构及算法实现了通用的矩阵乘法计算,并且通过软硬件协同设计实现的多核并行调度算法与传统单核设计相比在计算效率方面得到了显著的提高。  相似文献   

11.
量子可逆逻辑综合的关键技术及其算法   总被引:1,自引:0,他引:1  
李志强  李文骞  陈汉武 《软件学报》2009,20(9):2332-2343
最优化量子可逆逻辑的关键在于用最小的量子代价自动构造量子可逆逻辑.为了提高可逆逻辑自动生成与优化的效率,提出了类模板技术和一种快速算法.模板技术是一个有效的优化工具,类模板技术可以显著提高模板技术的匹配效率;R-M算法是可逆逻辑综合的一种较好的迭代方法,基于R-M算法的原始思想,构造了一个Hash函数,并在此基础上提出了一种可逆逻辑综合的快速算法.实验结果表明,在同等实验环境下使用类模板技术与快速算法,其优化的效果与效率远远优于已知的其他算法.  相似文献   

12.
基于位运算的量子可逆逻辑电路快速综合算法   总被引:1,自引:0,他引:1  
量子可逆逻辑电路是构建量子计算机的基本单元.本文结合可逆逻辑电路综合的多种算法,根据可逆逻辑电路综合的本质是置换问题,巧妙应用位运算构造高效完备的Hash函数,提出了基于Hash表的新颖高效的量子可逆逻辑电路综合算法,可使用多种量子门,以极高的效率生成最优的量子可逆逻辑电路,从理论上实现制造量子电路的成本最低.按照国际同行认可的3变量可逆函数测试标准,该算法不仅能够生成全部最优电路,而且运行速度远远超过其它算法.实验结果表明,该算法按最小长度标准综合电路的平均速度是目前最好结果的69.8倍.  相似文献   

13.
多值逻辑量子置换门的酉矩阵表示   总被引:1,自引:0,他引:1  
理论上量子可逆电路不存在能量耗散问题,因此量子计算系统对环境产生的负面影响可以达到最低.多值逻辑量子置换门是构建多值逻辑量子电路的基本单元.该文从数学的角度研究多值逻辑量子置换门的酉矩阵,提出了一种构造多值逻辑量子置换门酉矩阵的方法,并对其正确性进行了讨论.在此基础之上,又给出了构造混合多值逻辑量子置换门酉矩阵的框架,利用此框架可以方便地构造任何混合逻辑量子置换门的酉矩阵.酉矩阵是量子门的数学模型,可以清晰地反映出量子门的数学性质.研究量子门的酉矩阵对验证量子门的正确性和可靠性,分析量子状态在电路中的演化过程及发展趋势具有一定的意义.  相似文献   

14.
In this paper, we develop algorithms in order of efficiency for all-to-all broadcast problem in an N=2n-node n-dimensional faulty SIMD hypercube, Qn, with up to n-1 node faults. The algorithms use a property of a certain ordering of dimensions. Our analysis includes startup time (α) and transfer time (β). We have established the lower bound for such an algorithm to be nα+(2N-3)Lβ in a faulty hypercube with at most n-1 faults (each node has a value of L bytes). Our best algorithm requires 2nα+2NLβ and is near-optimal. We develop an optimal algorithm for matrix multiplication in a faulty hypercube using all-to-all broadcast and compare the efficiency of all-to-all broadcast approach with broadcast approach and global sum approach for matrix multiplication. The algorithms are congestion-free and applicable in the context of available hypercube machines  相似文献   

15.
基于Hash表的量子可逆逻辑电路综合的快速算法   总被引:4,自引:1,他引:3  
量子可逆逻辑电路是构建量子计算机的基本单元,通过量子门的级联与组合构成量子计算机,量子可逆逻辑电路的综合就是根据电路功能,以较小的量子代价自动构造量子可逆逻辑电路.结合可逆逻辑电路综合的多种算法,提出了一种新颖高效的量子电路综合算法,巧妙构造最小完备的Hash函数,可使用多种量子门,采用任意量子代价标准,以极高的效率生成最优的量子可逆逻辑电路.为实现量子电路综合的自动化,首次提出了利用量子线的置换自动构造各种量子门库的通用算法.采用国际同行认可的3变量可逆函数测试标准,该算法不仅能够生成全部最优电路.而且运行速度远远超过其他算法·实验结果表明,该算法按最小长度、最小代价标准综合电路的平均速度分别是目前最好结果的49.15倍、365.13倍.  相似文献   

16.
In recent years, quantum computing research has been attracting more and more attention, but few studies on the limited interaction distance between quantum bits (qubit) are deeply carried out. This paper presents a mapping method for transforming multiple-control Toffoli (MCT) circuits into linear nearest neighbor (LNN) quantum circuits instead of traditional decomposition-based methods. In order to reduce the number of inserted SWAP gates, a novel type of gate with the optimal LNN quantum realization was constructed, namely NNTS gate. The MCT gate with multiple control bits could be better cascaded by the NNTS gates, in which the arrangement of the input lines was LNN arrangement of the MCT gate. Then, the communication overhead measurement model on inserted SWAP gate count from the original arrangement to the new arrangement was put forward, and we selected one of the LNN arrangements with the minimum SWAP gate count. Moreover, the LNN arrangement-based mapping algorithm was given, and it dealt with the MCT gates in turn and mapped each MCT gate into its LNN form by inserting the minimum number of SWAP gates. Finally, some simplification rules were used, which can further reduce the final quantum cost of the LNN quantum circuit. Experiments on some benchmark MCT circuits indicate that the direct mapping algorithm results in fewer additional SWAP gates in about 50%, while the average improvement rate in quantum cost is 16.95% compared to the decomposition-based method. In addition, it has been verified that the proposed method has greater superiority for reversible circuits cascaded by MCT gates with more control bits.  相似文献   

17.
We present a parallel implementation of the Bose Hubbard model, using imaginary time propagation to find the lowest quantum eigenstate and real time propagation for simulation of quantum dynamics. Scaling issues, performance of sparse matrix-vector multiplication, and a parallel algorithm for determining nonzero matrix elements are described. Implementation of imaginary time propagation yields an O(N) linear convergence on a single processor and slightly better than ideal performance on up to 160 processors for a particular problem size. The determination of the nonzero matrix elements is intractable using sequential non-optimized techniques for large problem sizes. Thus, we discuss a parallel algorithm that takes advantage of the intrinsic structural characteristics of the Fock-space matrix representation of the Bose Hubbard Hamiltonian and utilizes a parallel implementation of a Fock state look up table to make this task solvable within reasonable timeframes. Our parallel algorithm demonstrates near ideal scaling on thousand of processors. We include results for a matrix 22.6 million square, with 202 million nonzero elements, utilizing 2048 processors.  相似文献   

18.
Summary This paper analyzes the effect the data representation has on the cost of the operations on sparse matrices of matrix addition, multiplication of a matrix by a non-sparse vector, and matrix multiplication. For each operation an algorithm is given for each representation. Each algorithm is analyzed to compute the number of operations and the time required. The cost of each matrix operation is reported analytically as a function of the matrix representation used, the matrix densities, and the speeds of floating point addition and multiplication relative to fixed point operations.  相似文献   

19.
In this paper, simultaneous reduction of circuit depth and synthesis cost of reversible circuits in quantum technologies with limited interaction is addressed. We developed a cycle-based synthesis algorithm which uses negative controls and limited distance between gate lines. To improve circuit depth, a new parallel structure is introduced in which before synthesis a set of disjoint cycles are extracted from the input specification and distributed into some subsets. The cycles of each subset are synthesized independently on different sets of ancillae. Accordingly, each disjoint set can be synthesized by different synthesis methods. Our analysis shows that the best worst-case synthesis cost of reversible circuits in the linear nearest neighbor architecture is improved by the proposed approach. Our experimental results reveal the effectiveness of the proposed approach to reduce cost and circuit depth for several benchmarks.  相似文献   

20.
利用B样条基函数节点区间的对应关系,首先给出了B样条基函数间的转换矩阵的计算方法,进而给出了计算B样条乘积的区间跳跃算法。该算法仅需计算部分节点区间上的转换矩阵,因此称其为区间跳跃算法。这一方法解决了分段多项式与B样条曲线乘积的计算问题,可应用到B样条曲线的升阶、曲面间光滑拼接等问题中。通过算例验证了该方法计算简捷、易于实现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号