期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

一种基于Doolittle LU分解的线性方程组并行求解方法 总被引：1，自引：0，他引：1

徐晓飞曹祥玉姚旭陈盼《电子与信息学报》2010,32(8):2019-2022

矩阵方程的快速求解是矩量法计算电大问题的关键,LU分解是求解线性方程组的有效方法。该文详细地分析了Doolittle LU分解过程,基于分解过程的特点,在MPI(Message-Passing interface) 并行环境下,提出了按直角式循环对进程进行任务分配的并行求解方法。实验证明该方法可以有效地减少进程间数据通信量,从而加快计算速度。相似文献

2.

基于矩阵LU分解的数字水印算法 总被引：4，自引：0，他引：4

牛少彰钮心忻杨义先《电子与信息学报》2004,26(10):1620-1625

该文提出了一种新的基于矩阵LU分解的数字水印算法。该方法首先将数字图像的非负矩阵表示转化为G-对角占优矩阵,再进行LU分解,通过量化函数进行数字水印的嵌入,恢复水印时不需要原始图像。将矩阵的LU分解数字水印算法与DCT的中频系数比较法进行了对比实验。实验结果表明这种方法运算速度快并且具有很好的鲁棒性。相似文献

3.

基于可重构计算系统的矩阵三角化分解硬件并行结构研究

下载免费PDF全文

刘书勇吴艳霞张博为张国印戴葵《电子学报》2015,43(8):1642-1650

可重构计算系统成为加速计算密集型应用的重要选择之一.在众多受到关注的计算密集型问题中,矩阵三角化分解作为典型的基础类应用始终处于研究的核心地位,在求解线性方程组、求矩阵特征值等科学与工程问题中有重要的研究价值.本文面向矩阵三角化分解中共有的三角化计算过程,通过分析该过程的线性计算规律,提出一种适于硬件并行实现的子矩阵更新同一化算法及矩阵三角化计算FPGA (Field Programmable Gate Array)并行结构.针对LU矩阵三角化分解在并行结构模板上的高性能实现及优化方法开展了研究.理论分析表明,该算法针对矩阵三角化计算过程具有更高的数据并行性与流水并行性;实验结果表明,与通用处理器的软件实现相比,根据该算法实现的矩阵三角化分解FPGA并行结果在关键计算性能上可以取得10倍以上的加速比. 相似文献

4.

一种基于大同步并行编程模式的N体问题的优化实现

祝永志王喜燕《电子技术》2015,(2)

文章基于多核机群系统对并行编程模型进行了深入研究,实现了多层次并行体系结构的OpenMP/MPI混合编程模型的设计.在以SMP机群系统为背景的情况下,实现其节点间和节点内的分层,运用多层次的并行编程模型进行实验与分析.同时对多层次并行编程模型的性能进行深入的研究,提出了一种大同步混合设计新思路.设计了N-Body问题的大同步优化并行算法,并在曙光TC 5000A机群上与传统的并行算法作了性能方面的比较.通过理论研究并结合大量的实验分析统计,得到了多核机群的混合并行编程模型的性能优化的诸多结论. 相似文献

5.

矩阵乘法在Open SPARC T2多核处理器上的优化

谢林川刘杰《数字技术与应用》2012,(5):226-228

矩阵乘法是很多应用问题的核心计算模块,在OpenSPARC T2处理器平台上,对矩阵乘法算法进行了设计优化,针对矩阵乘法访存特点,利用处理器本身8核64线程的特征,基于Open MP并行编程模型设计了矩阵乘多线程并行算法,并对访存和块大小进行了优化,采用C语言编程,对比单核单线程,并行矩阵乘算法在8核64线程的加速比达到21.9%,发挥峰值性能的53.9%。相似文献

6.

CMMB标准中LDPC码编码方法研究

刘宴华雷菁文磊《通信技术》2010,43(10):8-10,16

对移动多媒体广播（CMMB）标准中的低密度奇偶校验（LDPC）码的校验矩阵结构进行了分析,针对其结构确立了一种可行的编码方法,即基于R.M.Neal的理论采用LU分解法。总结并验证了三种实用的三角矩阵（LU算法）,经过比较,在提出的算法中选择一个最优的方案进行编码。仿真结果表明,所选方法能够正确编码,且具有先进性,对于具体硬件实现有很好的实用价值。同时所列的三种LU算法,对于其他满足条件的矩阵也是通用的。相似文献

7.

单容错网络存储编码并行算法研究

郭建奇《电子科技》2014,27(7):141-143

在海量存储系统中,利用冗余数据编码技术是提高存储系统可靠性的较好方法。文中对网络存储编码的相关算法进行了研究,针对单容错网络磁盘阵列的编码操作,重点研究了RAID5编码的并行算法。针对集中式奇偶校验编码算法在运行时间上的缺陷,提出了将编码计算过程与组通信操作相融合的思想,设计了更为高效的基于多对一归约操作的并行编码算法。实验结果表明,基于组通信的算法相对于集中式算法性能优势明显。相似文献

8.

基于国产众核超级计算机的6×105核并行矩量法

顾宗静吴昊翔赵勋旺林中朝张玉张崎《电子与信息学报》2019,41(4):845-850

为实现电磁计算的安全可靠和自主可控,该文基于“天河二号”国产众核超级计算机平台,开展大规模并行矩量法(MoM)的开发工作。为减轻大规模并行计算时计算机集群的通信压力以及加速矩量法积分方程求解,通过分析矩量法电场积分方程离散生成的矩阵具有对角占优特性,提出一种新型LU分解算法,即对角块矩阵选主元LU分解(BDPLU)算法,该算法减少了panel列分解的计算量,更重要的是,完全消除了选主元过程的MPI通信开销。利用BDPLU算法,并行矩量法突破了6×105 CPU核并行规模,这是目前在国产超级计算平台上实现的最大规模的并行矩量法计算,其矩阵求解并行效率可达51.95%。数值结果表明,并行矩量法可准确高效地在国产超级计算平台上解决大规模电磁问题。相似文献

9.

基于超级计算机的矩量法性能分析与优化

下载免费PDF全文

陈岩林中朝张玉《微波学报》2017,33(3):1-5

复杂目标的精确电磁特性分析往往需要巨大的存储和极长的计算时间。针对这一问题,结合国内发展迅速的超级计算机系统,研究了具有精确高效仿真能力的高性能电磁算法——高阶矩量法。提出了单元预选法来消除矩阵并行填充过程中的无效计算,加速矩阵填充过程。提出了一种具有更少的通信次数和通信量的新型并行LU分解算法,加速矩阵方程求解过程。数值测试表明提出的矩阵并行填充算法和矩阵方程并行求解算法在超级计算机平台上都能获得较高的并行性能,大幅提高了矩量法的仿真能力。相似文献

10.

一种并行CRC算法的实现方法 总被引：2，自引：1，他引：1

陈玉泉《现代电子技术》2005,28(22):21-23,26

简要分析了CRC算法的基本原理.在传统串行CRC的实现基础上,介绍了一种快速的CRC并行算法,导出了32位并行CRC码的逻辑关系,推导过程简单.与查表法比较,此并行算法不需要存储大量的余数表,可以减少延迟.同时,这种并行处理方法也适合于其他位宽并行CRC码.最后,利用ISE开发平台和Verilog HDL硬件描述语言进行设计,实现了基于此并行算法的32位并行CRC-32码的编码器,并给出了仿真和综合结果.设计出来的CRC编码器,已经成功应用于以太网的接入系统中. 相似文献

11.

Combining replication and checkpointing redundancies for reducing resiliency overhead

Hassan Motallebi 《ETRI Journal》2020,42(3):388-398

We herein propose a heuristic redundancy selection algorithm that combines resubmission, replication, and checkpointing redundancies to reduce the resiliency overhead in fault‐tolerant workflow scheduling. The appropriate combination of these redundancies for workflow tasks is obtained in two consecutive phases. First, to compute the replication vector (number of task replicas), we apportion the set of provisioned resources among concurrently executing tasks according to their needs. Subsequently, we obtain the optimal checkpointing interval for each task as a function of the number of replicas and characteristics of tasks and computational environment. We formulate the problem of obtaining the optimal checkpointing interval for replicated tasks in situations where checkpoint files can be exchanged among computational resources. The results of our simulation experiments, on both randomly generated workflow graphs and real‐world applications, demonstrated that both the proposed replication vector computation algorithm and the proposed checkpointing scheme reduced the resiliency overhead. 相似文献

12.

提高用任务重复的检查点方案的性能 总被引：4，自引：0，他引：4

下载免费PDF全文

李凯原杨孝宗《电子学报》2000,28(5):33-36

设置检查点是减少程序在故障条件下执行时间的一种常用技术.将检查点与任务重复技术相结合,不仅能够完成有效的故障恢复,而且还能进行完善的故障检测.上述系统的开销主要来自两方面:其一是每个检查点的比较和保存开销,其二是因故障而引起的卷回.本文利用增量检查点对Ziv和Bruck提出的方法进行了改进,改进后的方法不仅能够有效地减少比较、保存检查点的开销,而且还能够避免潜伏故障引起的卷回.分析表明改进后的方法与Ziv和Bruck的方法相比表现出更好的性能. 相似文献

13.

面向异构并行计算系统的流水线式压缩检查点

下载免费PDF全文

刘勇鹏王锋卢凯刘勇燕《电子学报》2012,40(2):223-229

在大规模并行计算系统中,并行检查点触发大量结点同时保存计算状态,造成巨大文件存储空间开销,以及对通信和存储系统的巨大访问压力.数据压缩可以缩小检查点文件尺寸,从而降低存储空间开销以及对通信和存储系统的访问压力.但是,它也带来额外的压缩计算开销.本文针对异构并行计算系统,提出流水线式并行压缩检查点技术,采用一系列优化技术来降低压缩引入的计算延时,包括:流水线式双重写缓存队列、文件写操作的合并、GPU加速的流水压缩算法和GPU资源的多进程调度,等等.本文介绍了该技术在天河一号系统中的实现,并对所实现的检查点系统进行综合评测.实验数据表明该方法在大规模异构并行计算系统中是可行、高效、实用的. 相似文献

14.

An optimal checkpointing-strategy for real-time control systemsunder transient faults

Seong Woo Kwak Byung Jae Choi Byung Kook Kim 《Reliability, IEEE Transactions on》2001,50(3):293-301

Real-time computer systems are often used in harsh environments, such as aerospace, and in industry. Such systems are subject to many transient faults while in operation. Checkpointing enables a reduction in the recovery time from a transient fault by saving intermediate states of a task in a reliable storage facility, and then, on detection of a fault, restoring from a previously stored state. The interval between checkpoints affects the execution time of the task. Whereas inserting more checkpoints and reducing the interval between them reduces the reprocessing time after faults, checkpoints have associated execution costs, and inserting extra checkpoints increases the overall task execution time. Thus, a trade-off between the reprocessing time and the checkpointing overhead leads to an optimal checkpoint placement strategy that optimizes certain performance measures. Real-time control systems are characterized by a timely, and correct, execution of iterative tasks within deadlines. The reliability is the probability that a system functions according to its specification over a period of time. This paper reports on the reliability of a checkpointed real-time control system, where any errors are detected at the checkpointing time. The reliability is used as a performance measure to find the optimal checkpointing strategy. For a single-task control system, the reliability equation over a mission time is derived using the Markov model. Detecting errors at the checkpointing time makes reliability jitter with the number of checkpoints. This forces the need to apply other search algorithms to find the optimal number of checkpoints. By considering the properties of the reliability jittering, a simple algorithm is provided to find the optimal checkpoints effectively. Finally, the reliability model is extended to include multiple tasks by a task allocation algorithm 相似文献

15.

Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing

《Integration, the VLSI Journal》2017

Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters. 相似文献

16.

WOB:一种新的文件检查点设置策略 总被引：6，自引：1，他引：5

下载免费PDF全文

裴丹 WANG Dong-sheng 沈美明郑纬民《电子学报》2000,28(5):9-12

实现分布/并行系统容错的基础是单进程检查点设置和卷回恢复技术,而对进程活动文件状态进行保存和恢复则是这种技术的重要方面.本文提出的延迟写策略,实现了对用户文件的检查点设置,有效地解决了在发生故障时用户文件内容与进程全局状态的不一致问题.它对用户通明,并且通过优化设置内存缓冲区大小、时延隐藏等手段,使得这种策略在空间开销、正常运行时间、恢复时间等性能指标上优于其它方法. 相似文献

17.

Parallel sequence fault simulation for synchronous sequential circuits

Chen-Pin Kung Chen-Shang Lin 《Journal of Electronic Testing》1996,9(3):267-277

A novel parallel sequence fault simulation (PSF) algorithm for synchronous sequential circuits is presented. The algorithm successfully extend the parallel pattern method for combinational circuits to sequential circuits by proposing a multiple-pass mechanism to overcome the state dependency in sequential circuits. The fault simulation is performed in parallel by partitioning the entire sequence into subsequences of equal length. Furthermore, techniques are developed to minimize the number of simulation passes. Notably, two compact counters, C _xand C _d, are proposed to faciliate the early stabilization detection of faulty circuit simulation with minimum space overhead. The experimental results on the benchmark circuits show that the speedup ratio over a serial sequence fault simulator based on ROOFS is 9.16 on average for pseudo random vectors. The parallel sequence algorithm of PSF is especially adaptable to parallel and distributed simulation which exploits sequence partition. 相似文献

18.

A fault tolerant implementation of the Goertzel algorithm

Z. Gao P. Reviriego X. Li J.A. Maestro M. Zhao J. Wang 《Microelectronics Reliability》2014

The Goertzel algorithm is commonly used to compute single points of the Discrete Fourier Transform as it reduces the computational complexity. In this research note, a fault tolerant implementation of this algorithm is presented. The new scheme provides effective protection against single errors with a lower overhead than traditional techniques. Therefore its use can be interesting in systems that implement the Goertzel algorithm. 相似文献

19.

Optimization of test parallelism with limited hardware overhead

Sheng Feng Yashwant K. Malaiya 《Microelectronics Reliability》1991,31(2-3)

The main considerations for built-in self-test (BIST) for complex circuits are fault coverage, test time, and hardware overhead. In the BIST technique, exhaustive or pseudo-exhaustive testing is used to test the combinational logic in a register sandwich. If register sandwiches can be identified in a complex digitial system, it is possible to test several of them in parallel using the built-in logic block observation (BILBO) technique. Concurrent built-in logic block observation (CBILBO) technique can further improve the test time, but it requires significant hardware overhead. A systematic scheduling technique is suggested to optimize parallel tests of register sandwiches. Techniques are proposed to deal with shared registers for parallel testing. The proposed method attempts to reduce further the test time while only modestly increasing the hardware overhead. 相似文献

20.

An efficient optimistic message logging scheme for recoverable mobile computing systems

Taesoon Park Namyoon Woo Yeom H.Y. 《Mobile Computing, IEEE Transactions on》2002,1(4):265-277

A number of checkpointing and message logging algorithms have been proposed to support fault tolerance of mobile computing systems. However, little attention has been paid to the optimistic message logging scheme. Optimistic logging has a lower failure-free operation cost compared to other logging schemes. It also has a lower failure recovery cost compared to the checkpointing schemes. This paper presents an efficient scheme to implement optimistic logging for the mobile computing environment. In the proposed scheme, the task of logging is assigned to the mobile support station so that volatile logging can be utilized. In addition, to reduce the message overhead, the mobile support station takes care of dependency tracking and the potential dependency between mobile hosts is inferred from the dependency between mobile support stations. The performance of the proposed scheme is evaluated by an extensive simulation study. The results show that the proposed scheme requires a small failure-free overhead and the cost of unnecessary rollback caused by the imprecise dependency is adjustable by properly selecting the logging frequency. 相似文献