共查询到20条相似文献,搜索用时 0 毫秒
1.
为了满足在计算资源受限的环境下高维数据流处理的实时性要求,提出一种方法——基于 GPU(graphic processing unit)的非规则流中高维数据流的处理模型和具体的可行架构,并分析设计了相关的并行算法。该六层模型是将 GPU 处理数据的高宽带性能结合进滑动窗口中数据流的分析,进而在该框架下基于统一计算设备架构(compute unified device architecture,简称CUDA),使用数据立方模型以及降维约简技术并行分析了多条高维数据流的典型相关性。理论分析和实验结果均表明,该并行处理方法能够在线精确地识别同步滑动窗口模式下高维数据流之间的相关性。相对于纯 CPU 方法,该方法具有显著的速度优势,很好地满足了高维数据流的实时性需求,可以作为通用的分析方法广泛应用于数据流挖掘领域。 相似文献
2.
在很多应用领域中,向量的Top-k连接查询是一种很重要的操作,给定两个向量集合R和S,Top-k连接查询要求从R和S中返回距离最小的前k个向量对.由于数据的海量性和高维特性,传统的集中式算法已经无法在可接受的时间内完成连接查询任务.MapReduce作为一个并行处理框架,能够有效地处理大规模数据.由于其高可扩展性、高可用性等特点,MapReduce已经成为海量数据处理的首选实现方案,在很多领域都得到了广泛的应用.文中基于分段累积近似法对高维向量进行降维,然后利用符号累积近似法对高维向量进行分组;在此基础上,结合MapReduce框架,提出了基于SAX的并行Top-k连接查询算法.实验表明,文中所提方案具有良好的性能和扩展性. 相似文献
3.
Robust Optic Flow Computation 总被引:3,自引:3,他引:3
This paper formulates the optic flow problem as a set of over-determined simultaneous linear equations. It then introduces and studies two new robust optic flow methods. The first technique is based on using the Least Median of Squares (LMedS) to detect the outliers. Then, the inlier group is solved using the least square technique. The second method employs a new robust statistical method named the Least Median of Squares Orthogonal Distances (LMSOD) to identify the outliers and then uses total least squares to solve the optic flow problem. The performance of both methods are studied by experiments on synthetic and real image sequences. These methods outperform other published methods both in accuracy and robustness. 相似文献
4.
Position Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and MultipleScan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution. The existing algorithms rely on the enumeration on a large set of scores or words, and they are mostly not suitable for parallelization. We propose a new algorithm, BucketScoreDistribution, that is both very efficient and suitable for parallelization. We bound the error induced by this algorithm. We realized a GPU prototype for Scan, MultipleScan and BucketScoreDistribution with the CUDA libraries, and report for the different problems speedups larger than 10× on several Nvidia cards. 相似文献
5.
当前局部离群点并行检测算法在实现时,没有消除局部离群点中存在的冗余数据,存在k值不稳定、局部可达密度低、检测时间长的问题,严重影响数据的正常使用,于是提出面向高维大数据的局部离群点并行检测算法。根据信息熵原理采用E-PCA算法提取高维大数据的特征,并消除冗余特征,实现高维大数据的降维处理,提高算法的检测精度;为了在较短的时间内完成局部离群点的并行检测,结合Hadoop分布式平台中的Mapreduce分布框架和传统的离群点检测算法,在高维大数据中完成局部离群点的并行检测。仿真结果表明,所提算法的k值适中、局部可达密度高和检测时间短。 相似文献
6.
《Journal of Symbolic Computation》2002,33(1):13-29
Efficient algorithms are derived for computing the entries of the Bezout resultant matrix for two univariate polynomials of degree n and for calculating the entries of the Dixon–Cayley resultant matrix for three bivariate polynomials of bidegree (m, n). Standard methods based on explicit formulas requireO (n3) additions and multiplications to compute all the entries of the Bezout resultant matrix. Here we present a new recursive algorithm for computing these entries that uses onlyO (n2) additions and multiplications. The improvement is even more dramatic in the bivariate setting. Established techniques based on explicit formulas requireO (m4n4) additions and multiplications to calculate all the entries of the Dixon–Cayley resultant matrix. In contrast, our recursive algorithm for computing these entries uses onlyO (m2n3) additions and multiplications. 相似文献
7.
研究了多核计算机上0penMP+Vc++编程模式的并行程序,并在双核和四核计算机上分别使用传统算法和并行算法计算数列求和、矩阵乘积及矩阵Cholesky分解。试验表明,传统串行程序只能利用多核计算机的一个核资源,而采用OpenMP程序的并行效率很高。 相似文献
8.
为了提高基于二阶协方差矩阵的盲信道识别方法在脉冲噪声环境下的性能,以α稳定分布过程为脉冲噪声模型,利用m-估计的方法得到该脉冲噪声信道下接收信号协方差矩阵的鲁棒估计,再利用噪声子空间的方法实现信道的盲识别.仿真结果表明,该方法在脉冲噪声环境下的性能要明显优于传统的基于二阶统计协方差矩阵的盲信道识别方法的性能. 相似文献
9.
Parallel Biomolecular Computation: Models and Simulations 总被引:1,自引:0,他引:1
J. H. Reif 《Algorithmica》1999,25(2-3):142-175
This paper is concerned with the development of techniques for massively parallel computation at the molecular scale, which we refer to as molecular parallelism. While this may at first appear to be purely science fiction, Adleman [Ad1] has already employed molecular parallelism in the solution of the Hamiltonian path problem, and successfully tested his techniques in a lab experiment on DNA for a small graph. Lipton [L] showed that finding the satisfying inputs to a Boolean expression of size n can be done in O(n) lab steps using DNA of length O(n log n) base pairs. This recent work by Adleman and Lipton in molecular parallelism considered only the solution of NP search problems, and provided no way of quickly executing lengthy computations by purely molecular means; the number of lab steps depended linearly on the size of the simulated expression. See [Re3] for further recent work on molecular parallelism and see [Re4] for an extensive survey of molecular parallelism. Our goal is to execute lengthy computations quickly by the use of molecular parallelism. We wish to execute these biomolecular computations using short DNA strands by more or less conventional biotechnology engineering techniques within a small number of lab steps. This paper describes techniques for achieving this goal, in the context of well defined abstract models of biomolecular computation. Although our results are of theoretical consequence only, due to the large amount of molecular parallelism (i.e., large test tube volume) required , we believe that our theoretical models and results may be a basis for more practical later work, just as was done in the area of parallel computing. We propose two abstract models of biomolecular computation. The first, the Parallel Associative Memory (PAM) model, is a very high-level model which includes a Parallel Associative Matching (PA-Match) operation, that appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton [L]. We give some simulations of conventional sequential and parallel computational models by our PAM model. Each of the simulations use strings of length O(s) over an alphabet of size O(s) (which correspond to DNA of length O(s log s) base pairs). Using O(s log s) PAM operations that are not PA-Match (or O(s) operations assuming a ligation operation) and t PA-Match operations, we can: 1. simulate a nondeterministic Turing Machine computation with space bound s and time bound 2 O(s) , with t = O(s) , 2. simulate a CREW PRAM with time bound D, with M memory cells, and processor bound P, where here s = O( log (PM)) and t = O(D+s), 3. find the satisfying inputs to a Boolean circuit constructible in s space with n inputs, unbounded fan-out, and depth D, where here t = O(D+s). We also propose a Recombinant DNA (RDNA) model which is a low-level model that allows operations that are abstractions of very well understood recombinant DNA operations and provides a representation, which we call the complex , for the relevant structural properties of DNA. The PA-Match operation for lengthy strings of length s cannot be feasibly implemented by recombinant DNA techniques directly by a single step of complementary pairing in DNA; nevertheless we show this Matching operation can be simulated in the RDNA model with O(s) slowdown by multiple steps of complementary pairing of substrings of length 2 (corresponding to logarithmic length DNA subsequences). Each of the other operations of the PAM model can be executed in our RDNA model, without slowdown. We further show that, with a further O(s)/ log (1/ε) slowdown, the simulations can be done correctly with probability 1/2 even if certain recombinant DNA operations (e.g., Separation) can error with a probability ε. We also observe efficient simulations can be done by PRAMs and thus Turing Machines of our molecular models. Received December 30, 1995; revised December 30, 1996, and January 22, 1998. 相似文献
10.
11.
S. C. Kontogiannis G. E. Pantziou P. G. Spirakis M. Yung 《Theory of Computing Systems》2000,33(5-6):427-464
In this paper we present an efficient general simulation strategy for computations designed for fully operational bsp machines of n ideal processors, on n -processor dynamic-fault-prone bsp machines. The fault occurrences are fail-stop and fully dynamic, i.e., they are allowed to happen on-line at any point of
the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The
computational paradigm can be exploited for robust computations over virtual parallel settings with a volatile underlying
infrastructure, such as a network of workstations (where workstations may be taken out of the virtual parallel machine by their owner).
Our simulation strategy is Las Vegas (i.e., it may never fail, due to backtracking operations to robustly stored instances
of the computation, in case of locally unrecoverable situations). It adopts an adaptive balancing scheme of the workload among
the currently live processors of the bsp machine.
Our strategy is efficient in the sense that, compared with an optimal off-line adversarial computation under the same sequence
of fault occurrences, it achieves an \cal O \left( (log n ⋅log log n)
2
\right) multiplicative factor times the optimal work (namely, this measure is in the sense of the ``competitive ratio' of on-line
analysis). In addition, our scheme is modular, integrated, and considers many implementation points.
We comment that, to our knowledge, no previous work on robust parallel computations has considered fully dynamic faults in
the bsp model, or in general distributed memory systems. Furthermore, this is the first time an efficient Las Vegas simulation in
this area is achieved.
Online publication October 26, 2000. 相似文献
12.
区间矩阵的鲁棒稳定性判据 总被引:8,自引:2,他引:8
基于一种非线性的幂变换,将区间矩阵的鲁棒稳定性问题转化成相应参数矩阵的非奇异性问题,并结合Gershgorin圆盘定理,得到了保证系统鲁棒稳定的充分条件。该判据对矩阵元素没有任何附加要求,简单而且实用。最后通过实例论证了所给判据的有效性。 相似文献
13.
《Journal of Symbolic Computation》2002,33(1):57-65
We describe a “semi-modular" algorithm which computes for a given integer matrix A of known rank and a given prime p the multiplicities of p in the factorizations of the elementary divisors ofA . Here “semi-modular" means that we apply operations to the integer matrix A but the operations are driven by considering only reductions of row vectors modulo p. 相似文献
14.
Selim G. Akl 《The Journal of supercomputing》2004,29(1):89-111
Can a parallel computer with n processors solve a computational problem more than n times faster than a sequential computer? Can it solve it more than n times better? New computational paradigms offer an affirmative answer to the above questions through concrete examples in which the improvement in speed or quality is superlinear in the number of processors used by the parallel computer. Furthermore, the improvement is consistent and provable. All examples are characterized by the presence of one or several real-time input streams. In one of the examples, an exponential improvement in speed is achieved despite the fact that the processors of the parallel computer are significantly slower than their sequential counterpart. In another example, the improvement in quality is unbounded. A metaphor from everyday life motivates each computational paradigm in which a superlinear improvement in performance is exhibited. 相似文献
15.
随着四核微机走向市场和八十核处理器在实验室研制成功,多核正引领软件研发发生基础性变化。开发人员需要在代码中添加线程来利用系统所提供的多个内核,从而提升PC应用软件的功能和性能。文中探讨在多核微机上进行并行计算的实现技术。介绍了共享存储系统并行编程接口OpenMP的模型、指令和库函数,以及Intel C 编译器9.1和Microsoft Visual Studio 2005等对OpenMP的支持;着重探讨了二维离散快速傅里叶变换并行算法的设计、实现与优化技术;展望了高性能并行计算软构件库的开发前景。 相似文献
16.
采用计算流体力学方法,对高超声速流场进行了多区并行计算研究。基于MPI消息传递库采用Fortran语言编制了CFD并行计算程序,对NS方程采用AUSMPW+格式和LU-SGS方法求解。针对流场采用多区剖分,将每一个子区分配给相应节点进行计算。每一迭代步,相邻子区域间交换边界数据。计算表明,本文所建立的程序和方法是可行的,能够进一步延伸到大规模并行计算和工程应用中。 相似文献
17.
CRC校验码并行计算的FPGA实现 总被引:4,自引:1,他引:4
用软件实现CRC校验码计算很难满足高速数据通信的要求,基于硬件的实现方法中,有串行经典算法LFSR电路以及由软件算法推导出来的其它各种并行计算方法。以经典的LFSR电路为基础,研究了按字节并行计算CRC校验码的原理,并以常见的CRC-16和CRC-CCITT为例,用VHDL语言进行了可综合设计。结果表明这种实现方法在速度和占用资源方面优于常见的设计,适合在FPGA中实现CRC校验码的计算。 相似文献
18.
CRC校验码并行计算的FPGA实现 总被引:6,自引:0,他引:6
用软件实现CRC校验码计算很难满足高速数据通信的要求,基于硬件的实现方法中,有串行经典算法LFSR电路以及由软件算法推导出来的其它各种并行计算方法。以经典的LFSR电路为基础,研究了按字节并行计算CRC校验码的原理,并以常见的CRC-16和CRC-CCITT为例,用VHDL语言进行了可综合设计。结果表明这种实现方法在速度和占用资源方面优于常见的设计,适合在FPGA中实现CRC校验码的计算。 相似文献
19.
Problems of Information Transmission - We consider the problem of detecting (testing) Gaussian stochastic sequences (signals) with imprecisely known means and covariance matrices. An alternative is... 相似文献
20.
介绍PC机群计算环境下的电力系统潮流计算模型,结合基于节点分割的网络分块方法和PC机群环境的特点,提出了一种基于网络数学分割的电力系统潮流分解协调算法.测试结果表明此算法具有较高的加速度和计算精度,适合在网络计算环境中实现. 相似文献