首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 312 毫秒
1.
高钰  刘国华 《计算机工程》2008,34(3):105-107
选择性估计是设计空间数据库查询优化器的基础。该文利用空间对象的MBR缓冲区和线段缓冲区,根据数据集在空间连接时的特点和特征数据的分布规律,提出了一种基于点缓冲区的选择性估计方法,用于特征数据的距离连接选择性估计。实验证明,该方法能较好地应用于特征数据的距离连接选择性估计中,较为准确地对特征线段集的距离连接作出估计,明显减少特征数据在估计时的相对误差。  相似文献   

2.
李阳  高鹏  马骏 《计算机工程与设计》2007,28(18):4325-4328,4332
谓词选择率估计是关系数据库管理系统查询优化器决策的重要依据.提出了一种基于压缩直方图的谓词选择率估计方法.采用基于MCV和等高直方图的压缩直方图存储数据库的数据分布特征信息,给出了该压缩直方图的构建方法,研究了谓词选择率估计算法.该方法的有效性已经在实践中得到证明,能够取得准确的选择率估计结果,同时具有较低的构建代价.  相似文献   

3.
查询选择率估计是查询处理和优化中的关键之一。提出一种基于区域分布密度的方法,用于构造直方图,使其每个桶具有均匀分布或近似均匀分布,利用直方图估计查询选择率。实验结果表明,该方法对低维数据估计得到的查询选择率精度较高,并能对高维数据进行估计。  相似文献   

4.
空间查询优化是空间数据库中的关键问题之一,以查询代价估算为基础的查询优化技术是提高查询效率的一种重要方法,而估算代价的主要问题是估算查询结果(选择率)的大小。针对空间数据库中最常用的两种查询—空间选择和空间连接,阐述了几种主要用于查询选择率佑计的直方图算法,并对各算法的优缺点做了分析,最后对空间查询选择率估计的研究方向进行了展望。  相似文献   

5.
雷斌  许嘉  谷峪  于戈 《软件学报》2013,24(S2):188-199
以无线传感器网络为代表的新型数据应用和以图像处理为基础的传统数据应用都产生了大规模的概率数据.在概率数据的管理中,Top-k相似性连接操作返回最相似的k 对概率数据,具有重要应用价值.直方图是最常用的概率数据模型之一,而EMD(Earth Mover’s Distance)距离因其较强的鲁棒性可更准确地量化直方图概率数据之间的相似性.然而EMD距离的计算却具有三次方的时间复杂度,给基于EMD距离的Top-k 相似性连接带来巨大挑战.基于流行的MapReduce并行处理框架,利用EMD距离对偶线性规划问题的优良特性,提出了两种大规模概率数据上基于EMD距离的Top-k相似性连接算法.首先提出基于块嵌套循环连接思想的基本解决方法,命名为Top-k BNLJ算法.进而改进数据划分策略,提出基于数据局部性进行数据划分的Top-k DLPJ 算法,有效降低了MapReduce作业执行过程中的数据传输量.使用大规模真实数据集对两种算法进行评估,证实了本文提出的Top-k DLPJ算法的高效性和处理大规模数据集时的良好扩展性.  相似文献   

6.
相似查询是基于向量空间的一种重要查询方法。点、线段、区域是向量空间对空间对象的三种基本表达。本文在不改变结点MBR区域前提下,通过区域扫描对MBR区域重叠面积进行计算。利用R*树结点MBR允许重叠的特性,在不能消除区域重叠产生的死空间情况下,研究了更为精确的MBR边界的线段关系,并给出线段的最近邻查询算法和相似线段选取算法。实验结果表明该方法的CPU计算代价较低且显著提高了相似查询与更新的效率。  相似文献   

7.
针对张新鹏等人提出的抗统计分析的LSB隐写方案对图像差分直方图的影响,本文提出了一种基于差分直方图分析的隐写分析方案.该方案使用隐写前后图像差分直方图转移矩阵结合对差分直方图广义拉普拉斯分布的拟合偏差实现对嵌入数据长度的估计.实验证明该算法具有较好的估计精度.  相似文献   

8.
现代数据管理系统普遍存在劣质数据,影响了数据质量,给数据管理带来了新的挑战.已经有不少管理劣质数据的数据模型,实体关系数据模型就是其中一种,该模型允许劣质数据的存在,并给出衡量数据质量的方法,并且可根据对结果质量的需求给出查询结果.鉴于该模型的特点,传统的估计查询代价的优化方法很难再适用,需要新的代价估计技术.本文提出了一种新的估计连接结果大小的方法.使用加权的最小哈希函数获得某一属性的最小哈希签名,这使得属性具有相同维数,便于利用直方图进行快速估计;然后建立其直方图,最后使用改进的离散余弦变换压缩直方图信息,使用压缩信息直接进行代价估计,这使得即使对于高维数据也能保证低错误率和低存储代价.此外,此方法可以很好的支持动态数据更新,消除周期性重建直方图的时间开销.  相似文献   

9.
基于链码检测的直线段检测方法   总被引:12,自引:0,他引:12       下载免费PDF全文
直线是图像的重要特征,直线参数是进行图像识别和直线段三维重建的重要基础数据。基于链码检测的直线段的检测方法分为4步:以边缘图像为基础进行链码检测;根据链码估计曲率,检测链码角点,并在角点处拆分链码;通过链码直方图检测直线链码;对直线链码进行直线参数估计,并根据连接准则,进行直线连接。实验证明,该方法可以对直线段进行有效的检测。  相似文献   

10.
快速而准确的图像分割算法是实时理解环境的基础.本文在缩减数据集思想的基础上提出一种新的快速FCM算法vFCM:采用更为简便的直方图统计方法,并进一步利用直方图为FCM算法预设原型初值,最后用Voronoi距离改造隶属函数,降低算法复杂度,从而加快算法速度.实验结果表明.该方法能够在基本保证分割质量的前提下明显提高运算速度,对实时图像处理要求较高的场合具有实用价值.  相似文献   

11.
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that they compute the relationships (e.g. intersect) among the spatial data. Thus, accurate estimation of the spatial join selectivity is essential to generate an efficient spatial query execution plan that takes advantages of spatial access methods efficiently. For the multi-way spatial joins, the selectivity estimation formulae only for the two kinds of query types, tree and clique, have been developed. However, the selectivity estimation for the general query graph which contains cycles has not been developed yet. To fill this gap, we devise a formula for the multi-way spatial ring join selectivity. This is an indispensable step to compute the selectivity of the general multi-way spatial join whose join graph contains cycles. Our experiment shows that the estimated sizes of query results using our formula are close to the sizes of actual query results.  相似文献   

12.
As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.  相似文献   

13.
潘茜  张育平  陈海燕 《计算机科学》2016,43(10):190-192, 219
针对大规模空间数据的K-近邻连接查询问题,设计了一种CUDA编程模型下K-近邻连接算法的并行优化方法。将K-近邻连接算法的并行过程分两个阶段:1)对参与查询的数据集P和Q分别建立R-Tree索引;2)基于R-Tree索引进行KNNJ查询。首先根据结点所在位置划分最小外包框,在CUDA下基于递归网格排序算法创建R-Tree索引。然后在CUDA下基于R-Tree索引进行KNNJ查询,其中涉及并行求距离和并行距离排序两个阶段:求距离阶段利用每一个线程计算任意两点之间的距离,点与点之间距离的求取无依赖并行;排序阶段将快速排序基于CUDA以实现并行化。实验结果表明,随着样本量的不断增大,基于R-Tree索引的并行K-近邻连接算法的优势更加明显,具有高效性和可扩展性。  相似文献   

14.
Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad-tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew-tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real-life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency.  相似文献   

15.
In this paper, we study the node distribution of an R-tree storing region data, like, for instance, islands, lakes, or human-inhabited areas. We will show that real region datasets are packed in an R-tree into minimum bounding rectangles (MBRs) whose area distribution follows the same power law, named REGAL (REGion Area Law), as that for the regions themselves. Moreover, these MBRs are packed in their turn into MBRs following the same law, and so on iteratively, up to the root of the R-tree. Based on this observation, we are able to accurately estimate the search effort for range queries, using a small number of easy-to-retrieve parameters. Furthermore, since our analysis exploits, through a realistic mathematical model, the proximity relations existing among the regions in the dataset, we show how to use our model to predict the selectivity of a self-spatial join query posed on the dataset. Experiments on a variety of real datasets (islands, lakes, human-inhabited areas) show that our estimations are accurate, enjoying a geometric average relative error ranging from 22 percent to 32 percent for the search effort of a range query, and from 14 percent to 34 percent for the selectivity of a self-spatial join query. This is significantly better than using a naive model based on uniformity assumption, which gives rise to a geometric average relative error up to 270 percent and up to 85 percent for the two problems, respectively  相似文献   

16.
为了解决高维数据相似性连接查询中存在的维度灾难和计算代价高等问题,基于p-稳态分布,将高维数据映射到低维空间。根据卡方分布的性质,证明了如果低维空间的距离大于,则原始空间距离大于ε的概率具有一定的下界,从而可以在低维空间以较低的计算代价进行有效过滤。在此基础上,提出了基于卡方分布的高维数据相似性连接查询算法。为了进一步提高查询效率,提出了基于双重过滤的高维数据相似性连接查询算法。利用真实数据集进行了实验,实验结果表明所提方法具有较好的性能。基于卡方分布的相似性连接查询算法召回率可以达到90%以上。基于双重过滤的相似性连接查询算法可以进一步提高性能,但是会损失一定的召回率。对时间性能要求比较高、对召回率要求不太严格的查询任务可以采用基于双重过滤的相似性连接查询算法;反之,可以采用基于卡方分布的相似性连接查询算法。  相似文献   

17.
一种新的优化串行直方图构造算法   总被引:1,自引:0,他引:1  
串行直方图是基于频度排列对关系进行优化分区构造而成的,其连接结果大小估计是最优的,并可用于等值和范围查询结果大小估计。但是,串行直方图的构造算法复杂,影响了实际应用。本文从实用的角度出发,设计了一种构造优化串行直方图的算法BOS,该算法的时间复杂度大大降低,且估计精度接近最优直方图,从而使其具有较高的实用价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号