期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Parallel Algorithms for Some Graph Theory Problems

Ma Jun Ma Shaohan 《计算机科学技术学报》1993,8(4):76-80

In this paper,a sequential algorithm computing the aww vertex pair distance matrix D and the path matrix Pis given.On a PRAM EREW model with p,1≤p≤n^2,processors,a parallel version of the sequential algorithm is shown.This method can also be used to get a parallel algorithm to compute transitive closure array A^* of an undirected graph.The time complexity of the parallel algorithm is O(n^3/p).If D,P and A^* are known,it is shown that the problems to find all connected components,to compute the diameter of an undirected graph,to determine the center of a directed graph and to search for a directed cycle with the minimum(maximum)length in a directed graph can all be solved in O(n^2/p logp)time. 相似文献

2.

K-Dimensional Optimal Parallel Algorithm for the Solution of a General Class of Recurrence Equations 总被引：1，自引：0，他引：1

下载免费PDF全文

Gao Qingshi Liu Zhiyong 《计算机科学技术学报》1995,10(5):417-424

This paper proposes a parallel algorithm,called KDOP (K-Dimensional Optimal Parallel algorithm),to solve a general class of recurrence equations efficiently.The KDOP algorithm partitions the computation into a series of subcomputations,each of which is executed in the fashion that all the processors work simultaneously with each one executing an optimal sequential algorithm to solve a subcomputation task.The algorithm solves the equations in O(N/P) steps in EREW PRAM model (Exclusive Read Exclusive Write Parallel Random Access Machine model) using p≤N^1-∈ processors,where N is the size of the problem,and ∈ is a given constant.This is an optimal algorithm (its sepeedup is O(p)) in the case of p≤N^1-∈.Such an optimal speedup for this problem was previously achieved only in the case of p≤N^0.5.The algorithm can be implemented on machines with multiple processing elements or pipelined vector machines with parallel memory systems. 相似文献

3.

Reinventing Memory System Design for Many-Accelerator Architecture

下载免费PDF全文

王颖张磊韩银和李华伟《计算机科学技术学报》2014,29(2):273-280

The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units （FUs）, becomes a great alternative to homogeneous chip multiprocessors （CMPs） for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors （GPPs） because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System （AMS） can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS. 相似文献

4.

Pseudo-snapshot Based on Forward-backward Cyclic Correlation Function for Narrow-band Signals

Xuebin Liu Gang Wei 《通讯和计算机》2006,3(7):64-67

Many modulated communication signals exhibit a cyclostationarity property. Cyclic-MUSIC algorithm uses the property to improve the detection capability and estimation accuracy over conventional Multiple Signal Classification （MUSIC） algorithm in Direction of Arrival （DOA） estimation. Based on cyclic-MUSIC, Spectral Correlation-Signal Subspace Fitting （SC-SSF） algorithm extends application bound from narrow-band signals to wide-band signals, where a kind of pseudodata matrix is constructed to act as pseudo-snapshot. In this paper, a new kind of pseudo-snapshot is proposed based on forward-backward cyclic correlation function for narrow-band signals, and then a virtual array with doubly augmented aperture of true array can be obtained. Simulation results show that when applying conventional MUSIC to the pseudo-snapshot, two times the number of DOA can be detected, and higher resolution can be achieved in comparison with cyclic-MUSIC and SC-SSF in the case of narrow-band signals. 相似文献

5.

OpenMDSP: Extending OpenMP to Program Multi-Core DSPs

下载免费PDF全文

何江舟陈文光陈光日郑纬民汤志忠叶寒栋《计算机科学技术学报》2014,29(2):316-331

Abstract Multi-core digital signal processors （DSPs） are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits （SDKs）, which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1） data placement directives that allow programmers to control the placement of global variables conveniently, 2） distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3） stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access （DMA） of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads. 相似文献

6.

A new parallel meshing technique integrated into the conformal FDTD method for solving complex electromagnetic problems

Yang GUO Xiang-hua WANG Jun HU 《浙江大学学报:C卷英文版》2014,(12):1087-1097

A new efficient parallel finite-difference time-domain （FDTD） meshing algorithm, based on the ray tracing technique, is proposed in this paper. This algorithm can be applied to construct various FDTD meshes, such as regular and conformal ones. The Microsoft F# language is used for the algorithm coding, where all variables are unchangeable with its parallelization advantage being fully exploited. An improved conformal FDTD algorithm, also integrated with an improved surface current algorithm, is presented to simulate some complex 3D models, such as a sphere ball made of eight different materials, a tank, a J-10 aircraft, and an aircraft carrier with 20 aircrafts. Both efficiency and capability of the developed parallel FDTD algorithm are validated. The algorithm is applied to characterize the induced surface current distribution on an aircraft or a warship. 相似文献

7.

A high speed multi-level-parallel array processor for vision chips

SHI Cong YANG Jie WU NanJian WANG ZhiHua 《中国科学:信息科学(英文版)》2014,(6):207-218

This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element(PE)array and 1-D row-parallel row processor(RP)array.The two arrays both operate in a single-instruction multiple-data(SIMD)fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips. 相似文献

8.

Parallel Reservoir Integrated Simulation Platform for One Million Grid Blocks Cases

Feng Pan Jianwen Cao 《通讯和计算机》2005,2(11):29-33,42

This paper first provides a brief introduction to a numerical reservoir simulation and a parallel numerical reservoir integrated simulation platform developed by RDCPS （Research ＆ Development Center for Parallel Software, Institute of Software, Chinese Academy of Sciences）, which includes Pre-Processing, Simulator （for a Three-Dimensional ＆ Three-Phase Black-Oil models）, Post Processing, seamlessly integrated with parallel computers. We then present key technologies of the simulator, such as nonlinear and linear solvers, communications among processors, parallel I/O, etc., and corresponding content. Finally, some results of the platform to solve one million grid blocks cases from Chinese oil fields are given in the paper, which can show that the simulator has a very robust portability, high-speed for deadline and good scalability for the tested cases. As one of the application softwares, its objective is always focusing on satisfying deadlines of oil industry. Now, for one million grid blocks＇ case with 20 - 30 year-production, its elapsed time with 16 processors is less than 12 hours on parallel computers based on Myrinet or QsNet, namely ＇to submit a case just before off-duty and get its result just before on-duty＇. A decreasing line of elapsed time is given for a case with one million grid blocks. The developing trace of the simulator along with parallel computers can be also inferred from the line. 相似文献

9.

Unsupervised image segmentation based on MRFs and graph cuts

LI Qiu-xu ZHAO Jie-yu 《通讯和计算机》2009,6(9):46-53

Markov random fields （MRFs） can be used for a wide variety of vision problems. In this paper we will propose a Markov random field （MRF） image segmentation model. The theoretical framework is based on Bayesian estimation via the energy optimization. Graph cuts have emerged as a powerful optimization technique for minimizing energy functions that arise in low-level vision problem. The theorem of Ford and Fulkerson states that min-cut and max-flow problems are equivalent. So, the minimum s/t cut problem can be solved by finding a maximum flow from the source s to the sink t. we adopt a new min-cut/max-flow algorithm which belongs to the group of algorithms based on augmenting paths. We propose a parameter estimation method using expectation maximization （EM） algorithm. We also choose Gaussian mixture model as our image model and model the density associated with one of image segments （or classes） as a multivariate Gaussian distribution. Characteristic features related to the information in color, texture and position are extracted for each pixel. Experimental results will be provided to illustrate the performance of our method. 相似文献

10.

High performance word level sequential and parallel coding methods and architectures for bit plane coding

ChengYi Xiong JinWen Tian Jian Liu 《中国科学F辑(英文版)》2008,51(4):337-351

This paper introduced a novel high performance algorithm and VLSI architectures for achieving bit plane coding （BPC） in word level sequential and parallel mode. The proposed BPC algorithm adopts the techniques of coding pass prediction and parallel ＆ pipeline to reduce the number of accessing memory and to increase the ability of concurrently processing of the system, where all the coefficient bits of a code block could be coded by only one scan. A new parallel bit plane architecture （PA） was proposed to achieve word-level sequential coding. Moreover, an efficient high-speed architecture （HA） was presented to achieve multi-word parallel coding. Compared to the state of the art, the proposed PA could reduce the hardware cost more efficiently, though the throughput retains one coefficient coded per clock. While the proposed HA could perform coding for 4 coefficients belonging to a stripe column at one intra-clock cycle, so that coding for an NxN code-block could be completed in approximate N2/4 intra-clock cycles. Theoretical analysis and experimental results demonstrate that the proposed designs have high throughput rate with good performance in terms of speedup to cost, which can be good alternatives for low power applications. 相似文献

11.

Local adaptive segmentation algorithm for 3-D medical image based on robust feature statistics

ZHUO ZiHan ZHAI WeiMing LI Xin LIU LingLing TANG JinTian 《中国科学:信息科学(英文版)》2014,(10):154-165

Medical image segmentation is of pivotal importance in computer-aided clinical diagnosis. Many factors, including noises, bias field effect, local volume effect, as well as tissue movement may affect the med- ical image, thus causing blurring or uneven characteristics when forming a picture. Such quality defects will inevitably impair the gray-scale difference between adjacent tissues and lead to insufficient segmentation or even leakage during tissue or organ segmentation. In the present investigation, a local adaptive segmentation algorithm for 3-D medical image based on robust feature statistics （LARFS） was proposed. By combining segmentation algorithm principles for traditional region growing （RG） and robust feature statistics （RFS）, the location and neighborhood image information of input seed point can be comprehensively analyzed by LARFS. Results show that, for different segmentation objects, under controlling the input parameter of growing factor within certain range, LARFS segmentation algorithm can adapt well to the regional geometric shape. And be- cause the robust feature statistics is applied in the contour evolution process, LARFS algorithm is not sensitive to noises and not easily influenced by image contrast and object topology. Hence, the leakage and excessive segmentation effects are ameliorated with a smooth edge, and the accuracy can be controlled within the effective error range. 相似文献

12.

A multi-scale 3D Otsu thresholding algorithm for medical image segmentation

《Digital Signal Processing》2017

Thresholding technique is one of the most imperative practices to accomplish image segmentation. In this paper, a novel thresholding algorithm based on 3D Otsu and multi-scale image representation is proposed for medical image segmentation. Considering the high time complexity of 3D Otsu algorithm, an acceleration variant is invented using dimension decomposition rule. In order to reduce the effects of noises and weak edges, multi-scale image representation is brought into the segmentation algorithm. The whole segmentation algorithm is designed as an iteration procedure. In each iteration, the image is segmented by the efficient 3D Otsu, and then it is filtered by a fast local Laplacian filtering to get a smoothed image which will be input into the next iteration. Finally, the segmentation results are pooled to get a final segmentation using majority voting rules. The attractive features of the algorithm are that its segmentation results are stable, it is robust to noises and it holds for both bi-level and multi-level thresholding cases. Experiments on medical MR brain images are conducted to demonstrate the effectiveness of the proposed method. The experimental results indicate that the proposed algorithm is superior to the other multilevel thresholding algorithms consistently. 相似文献

13.

基于分割的三维医学图像表面重建算法 总被引：42，自引：2，他引：42

何晖光田捷赵明昌杨骅《软件学报》2002,13(2):219-226

提出了一种基于分割的三维医学图像表面重建算法,它将图像分割与MC(marching cubes)算法有机地结合,这样可以根据不同医学图像的特点,采用适合的分割方法,实现对不同组织的准确分割,并利用分割结果精确地提取等值面,避免了MC只适合于阈值分割的局限性.同时采用一种基于区域增长的立方体检测方法,提高了表面跟踪的效率.实验证明,运用本算法,重建速度和显示效果均有提高. 相似文献

14.

Segmentation and classification of brain images using firefly and hybrid kernel-based support vector machine

K. Selva Bhuvaneswari P. Geetha 《人工智能实验与理论杂志》2013,25(3):663-678

Abstract

Magnetic resonance imaging segmentation refers to a process of assigning labels to set of pixels or multiple regions. It plays a major role in the field of biomedical applications as it is widely used by the radiologists to segment the medical images input into meaningful regions. In recent years, various brain tumour detection techniques are presented in the literature. The entire segmentation process of our proposed work comprises three phases: threshold generation with dynamic modified region growing phase, texture feature generation phase and region merging phase. by dynamically changing two thresholds in the modified region growing approach, the first phase of the given input image can be performed as dynamic modified region growing process, in which the optimisation algorithm, firefly algorithm help to optimise the two thresholds in modified region growing. After obtaining the region growth segmented image using modified region growing, the edges can be detected with edge detection algorithm. In the second phase, the texture feature can be extracted using entropy-based operation from the input image. In region merging phase, the results obtained from the texture feature-generation phase are combined with the results of dynamic modified region growing phase and similar regions are merged using a distance comparison between regions. After identifying the abnormal tissues, the classification can be done by hybrid kernel-based SVM (Support Vector Machine). The performance analysis of the proposed method will be carried by K-cross fold validation method. The proposed method will be implemented in MATLAB with various images. 相似文献

15.

一种改进的指纹图像分割算法 总被引：1，自引：0，他引：1

下载免费PDF全文

苏永利张博张书玲《计算机工程与应用》2008,44(30):173-174

提出了一种传统的指纹图像分割算法——方差法的改进算法。该算法给出了图像块均值及方差的新的计算方法,并用均值和方差的比值作为前景和背景区域的分割标准。既能有效抑制噪声对分割效果的影响,又能对对比度较低的图像实现理想分割。实验结果表明该方法比传统的方差法和方向法分割更准确。相似文献

16.

利用包含度和隶属度的遥感影像模糊分割

下载免费PDF全文

赵泉华刘冬李晓丽李玉《中国图象图形学报》2017,22(7):988-995

目的传统FCM算法及其改进算法均只采用隶属度作为分割判据实现图像分割。然而,在分割过程中聚类中心易受到同质区域内几何噪声的影响,导致此类算法难以有效分割具有几何噪声的图像。为了解决这一类问题,提出一种利用包含度和隶属度的遥感影像模糊分割算法。方法该算法假设同一聚类对每个像素都有不同程度的包含度,将包含度作为一种新测度来描述聚类与像素间关系,并将包含度纳入目标函数中。该算法通过迭代最小化目标函数来得到最优的隶属度和包含度,然后,通过反模糊化隶属度和包含度之积实现带有几何噪声的遥感图像的分割。结果采用本文算法分别对模拟图像,真实遥感影像进行分割实验,并与FCM算法和FLICM算法进行对比,定性结果表明,对含有几何噪声的区域,提出算法的用户精度和产品精度均高于FCM算法和FLICM算法,且总精度和Kappa值也高于对比算法。实验结果表明,本文算法能够抵抗几何噪声对图像分割的影响,且分割精度远远高于其他两种算法的分割精度。结论提出算法通过考虑聚类对像素的包含性,能够有效抵抗几何噪声对图像分割的影响,使得算法具有较高的抗几何噪声能力,进而提高该算法对含有几何噪声图像的分割精度。提出算法适用于包含几何噪声的高分辨率遥感图像,具有很好的抗几何噪声性。相似文献

17.

基于子块的区域生长的彩色图像分割算法 总被引：2，自引：1，他引：1

金军《计算机工程与应用》2008,44(1):82-83

提出了一种基于图像子块的区域生长算法,应用于彩色图像分割。首先将图像划分成多个不重叠子块,然后利用从CIE L*a*b*颜色空间中提取出的每个子块的颜色和纹理特征,先进行子块内颜色聚类,达到子块分类的目的,再根据生长准则进行基于分类子块的区域生长,实现对自然彩色图像的分割。实验结果证明了算法的有效性,分割结果符合人的主观感知。相似文献

18.

Robust segmentation of tubular structures in 3-D medical images by parametric object detection and tracking

Behrens T. Rohr K. Stiehl H.S. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2003,33(4):554-561

We present a novel approach to the coarse segmentation of tubular structures in three-dimensional (3-D) image data. Our algorithm, which requires only few initial values and minimal user interaction, can be used to initialize complex deformable models and is based on an extension of the randomized hough transform (RHT), a robust method for low-dimensional parametric object detection. Tubular structures are modeled as generalized cylinders. By means of a discrete Kalman filter, they are tracked through 3-D space. Our extensions to the RHT are a feature adaptive selection of the sample size, expectation-dependent weighting of the input data, and a novel 3-D parameterization for straight elliptical cylinders. Experimental results obtained for 3-D synthetic as well as for 3-D medical images demonstrate the robustness of our approach w.r.t. image noise. We present the successful segmentation of tubular anatomical structures such as the aortic arc and the spinal cord. 相似文献

19.

基于分形与灰度特征的无监督纹理分割技术

下载免费PDF全文

单雅静马莉《计算机工程与应用》2008,44(9):190-192

提出了一种新的基于方向分形特征和灰度特征的纹理图像分割方法。该方法首先用一个局部窗从功率谱图像中提取不同方向上的分形维和分形截距,将它们各自的均值和方差与灰度均值、灰度方差结合起来构成一个多维特征向量,然后利用模糊C均值聚类算法进行聚类实现纹理图像的分割。实验结果表明该方法对织物纹理图像和医学图像都有着良好的分割效果,鲁棒性强。相似文献

20.

多尺度判别条件生成对抗网络的前列腺MRI图像分割方法

下载免费PDF全文

何俊吴从中丁正龙许良凤詹曙《中国图象图形学报》2019,24(9):1581-1587

目的由MRI（magnetic resonance imaging）得到的影像具有分辨率高、软组织对比好等优点,使得医生能更精确地获得需要的信息,精确的前列腺MRI分割是计算机辅助检测和诊断算法的必要预处理阶段。因此临床上需要一种自动或半自动的前列腺分割算法,为各种各样的临床应用提供具有鲁棒性、高质量的结果。提出一种多尺度判别条件生成对抗网络对前列腺MRI图像进行自动分割以满足临床实践的需求。方法提出的分割方法是基于条件生成对抗网络,由生成器和判别器两部分组成。生成器由类似U-Net的卷积神经网络组成,根据输入的MRI生成前列腺区域的掩膜;判别器是一个多尺度判别器,同一网络结构,输入图像尺寸不同的两个判别器。为了训练稳定,本文方法使用了特征匹配损失。在网络训练过程中使用对抗训练机制迭代地优化生成器和判别器,直至判别器和生成器同时收敛为止。训练好的生成器即可完成前列腺MRI分割。结果实验数据来自PROMISE12前列腺分割比赛和安徽医科大学第一附属医院,以Dice相似性系数和Hausdorff距离作为评价指标,本文算法的Dice相似性系数为88.9%,Hausdorff距离为5.3 mm,与U-Net、DSCNN（deeply-supervised convolutional neured network）等方法相比,本文算法分割更准确,鲁棒性更高。在测试阶段,每幅图像仅需不到1 s的时间即可完成分割,超出了专门医生的分割速度。结论提出了一种多尺度判别条件生成对抗网络来分割前列腺,从定量和定性分析可以看出本文算法的有效性,能够准确地对前列腺进行分割,达到了实时分割要求,符合临床诊断和治疗需求。相似文献