首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
高效多子空间Skyline查询处理算法   总被引:1,自引:0,他引:1  
随着Skyline查询应用的增多,子空间Skyline查询成为热点。针对实际应用中用户从多角度审视某一数据集的需求,充分研究了多子空间Skyline查询问题。在分析现有子空间Skyline查询算法解决该问题不足的基础上,提出了子空间立方体群(subspace skycube group,SSG)结构,并给出了基于该结构的同时计算任意多个子空间Skyline查询的MSSC(multiple subspace skycube)算法。该算法采用子空间候选集(subspace candidate sets,SCS),并充分利用了子空间立方体群结构中各子空间Skyline结果间的共享关系;在此基础上,算法采用求和过滤以及最大值过滤等方法,对数据集进行剪枝和过滤,从而进一步提高算法效率。最后,分别用人造数据和真实数据对算法进行实验,并与现有算法进行比较,结果表明MSSC算法可以高效地解决多子空间Skyline查询问题。  相似文献   

2.
Skyline查询在多维决策和数据挖掘等方面发挥重要作用,然而随着数据属性维度的增大, Skyline集变得非常庞大.为克服该不足,提出Skyline代表点查询.文中提出新的评价函数改进Skyline点的得分计算方法以选择k个具有代表性的Skyline点.在二维空间提出动态规划算法(DPBA),利用覆盖圆的性质确定非代表点与代表点间的覆盖距离,迭代计算评价函数值,从而得到k个代表点;在高维空间针对NP-hard问题提出一个基于aR-tree结构的近似解决方法,遍历索引结构中的节点,通过与候选Skyline集比较判断是否被支配进行剪枝,降低计算开销.大量基于合成数据与真实数据的实验证明该算法的有效性.  相似文献   

3.
基于高维空间的在线高效子空间Skyline算法——CSky   总被引:2,自引:0,他引:2  
Skyline计算是要发现数据集中不被其他点支配的所有点的集合.近来,它在实时在线服务方面的良好应用前景,使其成为数据库研究领域的一个热点.实际应用中,用户通常期望快速、渐进地返回Skyline计算结果,因此文中主要讨论了高维空间子空间Skyline渐进查询问题.据我们所知,现有的Skyline计算方法都不能直接或者通过简单修改来高效解决该种查询问题.BNL(Blocked Nested Loop)算法是一个可用来进行子空间Skyline计算的算法,但是,该方法低效且非渐进.基于此,文中提出了在线高效子空间Skyline算法--CSky(Count the Skyline).该算法充分利用了一个新颖数据结构--InvertS的特征,即通过对目标数据集进行排序,存放最可能为Skyline点的数据于算法优先扫描的位置,这使得CSky算法能高效计算出任意子空间上的Skyline;同时,CSky每次计算子空间Skyline查询时,至多访问一遍数据库;再有,算法扫描一个点时,只需和当前已发现的Skyline点进行比较即能判断该点是否为Skyline点,保证了算法的渐进性.这样,相比BNL,CSky大大减少了计算开销,具有其他基于索引的Skyline算法计算Skyline时的高效,且这种高效适用于所有子空间.理论分析和实验表明,在解决高维空间子空间Skyline查询问题方面,CSky性能大大优于BNL.  相似文献   

4.
随着数据规模的日益庞大,在大规模数据集中帮助用户定位出数据量可控的代表性信息显得越发重要。虽然Top-k Skyline查询能够找到数据集中前k个最具代表性的信息,在获取代表性信息的同时又控制了结果规模,满足了上述要求,但是现有的Top-k Skyline查询在面对大规模数据集时效率较低,并不适用于大规模数据集。为了解决这个问题,将Top-k Skyline查询与并行化处理相结合,提出了一种面向大规模数据集的并行化Top-k Skyline查询算法PTKS(parallel Top-k Skyline),通过充分利用分布式资源,将原有查询进行有效的并行化处理,同时设计了基于用户偏好的用于缩减结果数据量的筛选规则,满足用户需求。在真实数据集上进行了相关实验,并与现有方法进行了对比,结果表明PTKS在大规模数据集上的查询效率更具有优势,能很好地适用于大规模数据集。  相似文献   

5.
李淼  谷峪  陈默  于戈 《软件学报》2017,28(2):310-325
随着地理位置定位技术的蓬勃发展,基于在线位置服务技术的应用也越来越多.提出一种查询类型——反向空间偏好top-k查询.类似于传统的反向空间top-k查询,对于给定的空间查询对象,该查询返回使该对象满足top-k属性得分的那些用户.但不同的是,该对象的属性不是自身具有的特性,而是通过计算该对象与其他偏好对象之间的空间关系(如距离)而确定.这种查询在市场分析等许多重要领域具有需求,例如,根据查询结果,分析出某个地区中某个设施受欢迎的程度.但是,由于大量空间对象的存在导致对象之间空间关系的计算代价非常高,如何实时地计算出对象的空间属性得分,给查询处理带来很大的挑战.针对该问题提出优化的查询处理算法包括:数据集剪枝、数据集批量处理、基于权重的用户分组等策略.通过理论分析和充分的实验验证,证明了所提出方法的有效性.与普通方法相比,这些方法能够大幅度提高查询处理的执行时间和I/O效率.  相似文献   

6.
大数据对传统的Skyline研究产生了挑战,利用并行框架MapReduce计算大数据下的Skyline已成为一个研究热点。研究了不确定移动对象的Skyline查询问题,提出了一种MapReduce框架下基于事件跟踪的连续概率Skyline查询算法——MR-DTrack(domination-track algorithm based on MapReduce)。首先采用基于角度的划分方法保证负载均衡,通过预计算获取Skyline集可能变化的时刻,在Reduce阶段获取候选概率Skyline集;然后利用局部过滤点剪枝,减少计算开销;最后合并计算出全局概率Skyline集。在人工数据集和真实数据集上的实验验证了算法的有效性。  相似文献   

7.
随着互联网、物联网等信息技术的快速发展,多维数据日益增多,这些海量数据中往往伴随着大量的不完整数据,如何从海量不完整数据中高效地获取用户所需的近似的结果集是一个亟需解决的问题。针对海量高维的不完整数据集,提出了一种基于维度组合的Skyline查询算法,通过构建Rank List数据结构提高查询效率,并减少不完整数据对查询结果的影响;利用维度的不同组合,划分出查询子空间,并渐进地查询出每个子空间的最优先点,从而获得海量不完整数据集上均匀分布的Skyline点。实验结果表明,该算法与Iskyline算法相比,平均查询效率提高了85%,并且在数据量大、维度高时,较普通方法查询效率更高。  相似文献   

8.
传统的Skyline-join查询仅适用于完整数据库,随着新的应用需要的出现,实际应用中考虑到非完整数据库中的Skyline-join查询。概率Skyline利用概率值表示非完整数据项之间的支配关系,有效地避免了传统非完整数据库Skyline查询存在的支配性丢失问题。在分析概率Skyline无法有效处理多关系查询的基础上,对概率Skyline定义进行了扩充,使其适用于多关系查询,并提出了基于多层次分组的PSkyline-join算法。该算法首先基于连接键值及缺失位图对各个关系进行多层次分组,再计算各组数据项的局部Skyline概率上界,然后连接数据项并更新数据项的全局Skyline概率上界,最后利用全局Skyline概率上界与全局Skyline概率下界设计了两种剪枝策略,高效地计算全局概率Skyline结果集。在模拟数据集上验证了PSkyline-join算法效率相较传统算法有着几十倍的提升。  相似文献   

9.
不确定数据流上的Skyline查询技术逐步引起研究者的关注,传统的集中式流处理算法难以满足海量数据的查询需求,并且云计算所提供的海量计算资源和有效的存储管理模式,为研究并行Skyline查询技术提供了充足的条件。基于上述事实,提出了一种不确定数据流上的并行Skyline查询算法(parallel Skyline over uncertain data streams,PSUDS)。该算法通过交叉划分滑动窗口的方式,将集中式流查询转化为并行处理,以并行执行的方式来解决集中式算法处理性能不足的问题。大量实验结果表明,该算法具有较好的并行可扩展性。  相似文献   

10.
为解决海量RDF数据的Skyline查询问题,通过分析现有Skyline查询算法的优缺点,提出一种针对海量RDF数据的查询机制。对RDF数据的存储结构进行分析,根据RDF数据垂直存储结构,设计一种候选Skyline点筛选策略,提前修剪部分非Skyline元组,减少Skyline支配点计算的数据量;在筛选的基础上,给出基于MapReduce的Skyline并行化查询算法。实验结果表明,提前筛选能有效减小查询的数据集,并行化算法能够有效提高查询的效率。  相似文献   

11.
For a set TT of nn points (terminals) in the plane, a Manhattan network   on TT is a network N(T)=(V,E)N(T)=(V,E) with the property that its edges are horizontal or vertical segments connecting points in V⊇TVT and for every pair of terminals, the network N(T)N(T) contains a shortest l1l1-path between them. A minimum Manhattan network   on TT is a Manhattan network of minimum possible length. The problem of finding minimum Manhattan networks has been introduced by Gudmundsson, Levcopoulos, and Narasimhan [J. Gudmundsson, C. Levcopoulos, G. Narasimhan, Approximating a minimum Manhattan network, Nordic Journal of Computing 8 (2001) 219–232. Proc. APPROX’99, 1999, pp. 28–37] and its complexity status is unknown. Several approximation algorithms (with factors 8, 4, and 3) have been proposed; recently Kato, Imai, and Asano [R. Kato, K. Imai, T. Asano, An improved algorithm for the minimum Manhattan network problem, ISAAC’02, in: LNCS, vol. 2518, 2002, pp. 344–356] have given a factor 2-approximation algorithm, however their correctness proof is incomplete. In this paper, we propose a rounding 2-approximation algorithm based on an LP-formulation of the minimum Manhattan network problem.  相似文献   

12.
For two distinct vertices u,vV(G), a cycle is called geodesic cycle with u and v if a shortest path of G joining u and v lies on the cycle; and a cycle C is called balanced cycle with u and v if dC(u,v)=max{dC(x,y)|x,yV(C)}. A graph G is pancyclic [J. Mitchem, E. Schmeichel, Pancyclic and bipancyclic graphs a survey, Graphs and applications (1982) 271-278] if it contains a cycle of every length from 3 to |V(G)| inclusive. A graph G is called geodesic pancyclic [H.C. Chan, J.M. Chang, Y.L. Wang, S.J. Horng, Geodesic-pancyclic graphs, in: Proceedings of the 23rd Workshop on Combinatorial Mathematics and Computation Theory, 2006, pp. 181-187] (respectively, balanced pancyclic) if for each pair of vertices u,vV(G), it contains a geodesic cycle (respectively, balanced cycle) of every integer length of l satisfying max{2dG(u,v),3}?l?|V(G)|. Lai et al. [P.L. Lai, J.W. Hsue, J.J.M. Tan, L.H. Hsu, On the panconnected properties of the Augmented cubes, in: Proceedings of the 2004 International Computer Symposium, 2004, pp. 1249-1251] proved that the n-dimensional Augmented cube, AQn, is pancyclic in the sense that a cycle of length l exists, 3?l?|V(AQn)|. In this paper, we study two new pancyclic properties and show that AQn is geodesic pancyclic and balanced pancyclic for n?2.  相似文献   

13.
We consider the following partition problem: Given a set S of n elements that is organized as k sorted subsets of size n/k each and given a parameter h with 1/k ≤ h ≤ n/k , partition S into g = O(n/(hk)) subsets D 1 , D 2 , . . . , D g of size Θ(hk) each, such that, for any two indices i and j with 1 ≤ i < j ≤ g , no element in D i is bigger than any element in D j . Note that with various combinations of the values of parameters h and k , several fundamental problems, such as merging, sorting, and finding an approximate median, can be formulated as or be reduced to this partition problem. The partition problem also finds many applications in solving problems of parallel computing and computational geometry. In this paper we present efficient parallel algorithms for solving the partition problem and a number of its applications. Our parallel partition algorithm runs in O( log n) time using processors in the EREW PRAM model. The complexity bounds of our parallel partition algorithm on the respective special cases match those of the optimal EREW PRAM algorithms for merging, sorting, and finding an approximate median. Using our parallel partition algorithm, we are also able to obtain better complexity bounds (even possibly on a weaker parallel model) than the previously best known parallel algorithms for several important problems, including parallel multiselection, parallel multiranking, and parallel sorting of k sorted subsets. Received May 5, 1996; revised July 30, 1998.  相似文献   

14.
Experimental investigation followed by thermodynamic assessment of the V-Zn system was carried out in the present study. A series of V-Zn alloys annealed at various temperatures were examined using scanning electron microscopy coupled with energy dispersive spectroscopy/wavelength dispersive X-ray spectrometer, X-ray diffraction and differential thermal analysis. It was confirmed that V Zn16, with a V content of about 5.8 at.%, was indeed an equilibrium phase. DTA results indicated that the peritectic temperature for V Zn16 was about 427 °C. Two new metastable compounds, V Zn9 and V 3Zn2, with V contents of 8.5-11.3 at.% and 60 at.%, respectively, were discovered. DTA results together with SEM-EDS examinations revealed that V Zn9 was formed at around 450 °C in Zn75V25 alloy with a cooling rate greater than 12 °C/min. The V Zn9 phase, however, decomposed into V Zn3 and liquid Zn when the alloy was held above 442 °C. The peritectic temperatures for two equilibrium phases, V 4Zn5 and V Zn3, were 651 °C and 621 °C, respectively. These measurements were slightly lower than the values determined in prior studies. The onset temperature for forming V Zn3 decreased significantly with increasing cooling rate while its exothermic peak widened during fast cooling. These phenomena indicated that both the nucleation and growth processes for V Zn3 were kinetically challenged.In addition, the solubility of Zn in α-V was measured. It was 2.1 at.%, 2.5 at.%, 2.6 at.%, 2.9 at.% and 3.3 at.% at 450 °C, 600 °C, 670 °C, 800 °C and 1000 °C, respectively. Based on the results obtained in the present study and previous investigations, the V-Zn system was reassessed thermodynamically. The assessment was in good agreement with experimental results.  相似文献   

15.
Sequence data for a group of species is often summarized by a distance matrix M where M[s,t] is the dissimilarity between the sequences of species s and t . An ordinal assertion is a statement of the form ``species a and b are as similar as species c and d ' and is supported by distance matrix M if M[a,b] ≤ M[c,d] . Recent preliminary research suggests that ordinal assertions can be used to reconstruct the evolutionary history of a group of species effectively. However, further research on the mathematical and algorithmic properties of ordinal assertions is needed to facilitate the development and assessment of inference methods that utilize ordinal assertions for reconstructing evolutionary histories. A (weighted ) ordinal representation of a distance matrix M is a (weighted) phylogeny T such that, for all species a , b , c , and d labeling T , d T (a,b) ≤ d T (c,d) if and only if M[a,b] ≤ M[c,d], where d T is the weighted path length when T is weighted, otherwise d T is the unweighted path length. Hence, an ordinal representation of M is a phylogeny that supports the same ordinal assertions supported by M , and so is the focus of our examination of the mathematical and algorithmic properties of ordinal assertions. As it turns out, ordinal representations are rich in structure. In this paper several results on weighted and unweighted ordinal representations are presented: — The unweighted ordinal representation of a distance matrix is unique. This generalizes the well-known result that no two phylogenies share the same distance matrix [10], [21]. — The unweighted ordinal representation of a distance matrix can be found in O(n 2 log 2 (n)) time. The algorithm presented improves upon an O(n 3 ) algorithm by Kannan and Warnow [13] that finds binary unweighted ordinal representations of distance matrices. — Under certain conditions, weighted ordinal representations can be found in polynomial time. Received May 11, 1997; revised March 13, 1998.  相似文献   

16.
We systematically investigate minimum capacity unbalanced cut problems arising in social networks. Let k be an input parameter. A cut (A, B) is unbalanced if the size of its smaller side is at most k (called k-size) or exactly k (called Ek-size). An s-t cut (A, B) is unbalanced if its s-side is either k-size or Ek-size. In the min k-size cut (s-t cut, resp.) problem, we want to find a k-size cut (s-t cut, resp.) with the minimum capacity. The corresponding min Ek-size cut (and s-t cut) problem is defined in a similar way. While the classical min s-t cut problem has been studied extensively, the minimum capacity unbalanced cut problem has only recently attracted the attention of researchers. In this paper, we prove that the min k-size s-t cut problem is NP-hard, and give O(log n)-approximation algorithms for the min k-size s-t cut problem, the min Ek-size s-t cut problem, and the min Eksize cut problem. These results, together with previous results, complete our research into minimum capacity unbalanced cut problems.  相似文献   

17.
18.
We consider the problem of minimizing mean flow time for the Imprecise Computation Model introduced by Lin et al. A task system TS=({T i },{r(T i )},{d(T i )},{m(T i )},{o(T i )}) consists of a set of n independent tasks, where r(T i ),d(T i ),m(T i ) , and o(T i ) denote the ready time, deadline, execution time of the mandatory part, and execution time of the optional part of T i , respectively. Given a task system TS and an error threshold K , our goal is to find a preemptive schedule on one processor such that the average error is no more than K and the mean flow time of the schedule is minimized. Such a schedule is called an optimal schedule. In this article we show that the problem of finding an optimal schedule is NP-hard, even if all tasks have identical ready times and deadlines. A pseudopolynomial-time algorithm is given for a set of tasks with identical ready times and deadlines, and oppositely ordered mandatory execution times and total execution times (i.e., there is a labeling of tasks such that m(T i )≤ m(T i+1 ) and m(T i )+o(T i )≥ m(T i+1 )+o(T i+1 ) for each 1≤ i<n ). Finally, polynomial-time algorithms are given for (1) a set of tasks with identical ready times, and similarly ordered mandatory execution times and total execution times and (2) a set of tasks with similarly ordered ready times, deadlines, mandatory execution times, and total execution times. Received January 19, 1995; revised August 2, 1996.  相似文献   

19.
There are many reconstruction algorithms for tomography, raft for short, and some of them are considered “classic” by researchers. The so-called raft library, provide a set of useful and basic tools, usually needed in many inverse problems that are related to medical imaging. The subroutines in raft are free software and written in C language; portable to any system with a working C compiler. This paper presents source codes written according to raft routines, applied to a new imaging modality called X-ray fluorescence tomography.

Program summary

Program title: raftCatalogue identifier: AEJY_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJY_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public Licence, version 2No. of lines in distributed program, including test data, etc.: 218 844No. of bytes in distributed program, including test data, etc.: 3 562 902Distribution format: tar.gzProgramming language: Standard C.Computer: Any with a standard C compilerOperating system: Linux and WindowsClassification: 2.4, 2.9, 3, 4.3, 4.7External routines:
  •  
    raft:
    •  
      autoconf 2.60 or later – http://www.gnu.org/software/autoconf/
    •  
      GSL scientific library – http://www.gnu.org/software/gsl/
    •  
      Confuse parser library – http://www.nongnu.org/confuse/
raft-fun: gengetopt – http://www.gnu.org/software/gengetopt/gengetopt.htmlNature of problem: Reconstruction algorithms for tomography, specially in X-ray fluorescence tomography.Solution method: As a library, raft covers the standard reconstruction algorithms like filtered backprojection, Novikov?s inversion, Hogan?s formula, among others. The input data set is represented by a complete sinogram covering a determined angular range. Users are allowed to set solid angle range for fluorescence emission at each algorithm.Running time: 1 second to 15 minutes, depending on the data size.  相似文献   

20.
S. Kwek  L. Pitt 《Algorithmica》1998,22(1-2):53-75
A randomized learning algorithm {POLLY} is presented that efficiently learns intersections of s halfspaces in n dimensions, in time polynomial in both s and n . The learning protocol is the PAC (probably approximately correct) model of Valiant, augmented with membership queries. In particular, {POLLY} receives a set S of m = poly(n,s,1/ε,1/δ) randomly generated points from an arbitrary distribution over the unit hypercube, and is told exactly which points are contained in, and which points are not contained in, the convex polyhedron P defined by the halfspaces. {POLLY} may also obtain the same information about points of its own choosing. It is shown that after poly(n , s , 1/ε , 1/δ , log(1/d) ) time, the probability that {POLLY} fails to output a collection of s halfspaces with classification error at most ε , is at most δ . Here, d is the minimum distance between the boundary of the target and those examples in S that are not lying on the boundary. The parameter log(1/d) can be bounded by the number of bits needed to encode the coefficients of the bounding hyperplanes and the coordinates of the sampled examples S . Moreover, {POLLY} can be extended to learn unions of k disjoint polyhedra with each polyhedron having at most s facets, in time poly(n , k , s , 1/ε , 1/δ , log(1/d) , 1/γ ) where γ is the minimum distance between any two distinct polyhedra. Received February 5, 1997; revised July 1, 1997.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号