首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
基于划分的数据仓库查询归并技术及其应用   总被引:1,自引:0,他引:1  
在数据仓库中存在着大量的数据,对这些数据的查询与处理要消耗大量的资源,解决这一问题的有效办法是先将数据划分为便于处理的数据块,再对数据块进行处理,最后将处理结果进行归并。介绍了常用的基于划分的数据仓库查询归并技术及其在VB中的应用,并结合自己的编程经验给出了例证。  相似文献   

2.
本设计实现一种动态归并算法,主要应用在对于分布式结构化数据的跨节点跨表实时分页查询的业务场景中.分布式数据库中数据表都会被拆分为若干子表并存储于若干数据节点中,在对数据进行单表查询和多表查询时都需要进行数据的归并,本算法被设计用来处理中间数据的归并问题,在归并策略上采用了二路归并,从而保证了较高的节点并发度,使得归并的计算负载能够均衡地分配在各计算节点上;采用动态的归并过程而不是在任务一开始就确定节点之间的归并配对关系,确保算法的自适应性,避免了预先制定归并策略而可能导致的数据等待.实验结果表明随着参与归并的节点数量的提高,该算法执行效率明显优于单节点归并以及预先设定归并策略的多节点归并.  相似文献   

3.
在数据仓库、大量交易记录系统、移动计算、联机分析处理系统(OLAP)等许多领域中聚集数据的处理是一个非常重要的核心问题。该文首先分析了聚集数据查询的特点,引入了聚集查询语言和聚集查询重写;其次对于聚集查询环境下如何实现快速查询,给出了一个基于聚集数据的近似查询计算模型;最后将该计算模型应用于人口统计系统,从而实现对统计信息类数据进行快速的查询处理,获得有效的查询结果。  相似文献   

4.
曹立伟  于磊 《软件世界》2000,(8):104-105
一 、过多的数据垃圾对AS/400上的应用会造成负面影响 我们每个人都曾经遇到过匆匆忙忙查找文件的情况,如果积累了许许多多的文件,查找的工作就变得十分困难。在大量的不需要的东西中进行查询必定会浪费时间,减慢查找速度,同样,在AS/400上进行数据查询也是如此。 随着日积月累,在AS/400上会产生大量无用的或利用率很低的数据。这些数据不但占据着大量的存储空间,而且严重地影响着AS/400的性能。虽然AS/400在数据处理方面的性能是卓越的,但在数以百万计的记录中查询或进行其他的处理也会大量消耗系统…  相似文献   

5.
《软件》2016,(3):79-83
伴随海量数据存储、处理技术的发展,数据中心中积累了大量的格式化历史数据,此类数据呈现出数据规模庞大、被查询频次低和查询内容规律不确定等特点,而当前以文件为操作对象的系统在查询此类数据时主要采用分布式计算引擎对数据进行全局遍历,存在处理时间长、系统资源消耗高等问题。因此,本文提出了一种基于列式多级索引的海量数据高效查询方法,使得查询过程中只有承载相关数据的节点参与计算,大幅降低了系统资源消耗。实验表明,本文方法在用于大规模历史数据内容查询时,相对于较主流的文件系统查询技术有明显的效率提升。  相似文献   

6.
为了能够快速获得第三级太阳风粒子可分析处理的结构化数据,本文采用将数据映射到内存和采用语言集成查询技术解决数据的完整映射和快速访问。通过设计数据映射模型,将用PDS数据的数据标签和数据产品分别映射到模型的值域和属性域,解决本地或网络数据内存映射的问题。为测试这个实例,采用双倍缓冲和贝塞尔插值技术对数据能谱进行了实时绘制。结果表明该模型具有高效性、完备性和高吻合度。这个基于内存映射的模型能够较好地解决异构数据快速结构化数据访问的数据源动力不足问题,为进一步处理和分析太阳高能粒子的频谱、成份和通量及随时间、空间变化的分布特征等提供了基础。  相似文献   

7.
当前网络中分布着大量与出生缺陷相关的电子病历、医学文献和临床实验数据库。如果对这些数据库进行数据整合,实现有效地管理,将便于医学工作者对海量数据进行快速查询和综合分析。基于以上需求,设计了基于本体的出生缺陷相关医学知识管理平台,实现了基于本体的数据整合、数据标注和数据查询。该系统已用于国家科技支撑计划重点课题人口和生殖健康综合信息服务大型门户系统中。  相似文献   

8.
由于传统的数据处理系统的数据存储与数据处理能力有限,不能满足处理大量数据的需求。为了发挥数据的价值,高效、高性能地处理大量数据集,提出基于Spark系统结合SIMBA的思路共同建立的大数据分析处理系统,基于Spark SQL的查询方式进行检索;在Spark中嵌入索引管理机制,将其封装在RDD内,用于提高查询效率;通过建立线段树存储数据的方式提高数据检索的效率。对于数据预处理时采用Range Partitioner分区策略的方式对数据进行分区,基于全局过滤和局部索引进行查询。保证该系统在进行查询操作时能够保持高吞吐量和低延迟特性,提高查询效率。  相似文献   

9.
遥感影像的存储与查询是地理信息处理中重要的内容,在海量遥感影像的实时处理中发挥着重要作用。针对传统的遥感影像处理中存在单节点故障、扩展性低和处理效率低等问题,提出了一种基于HBase的遥感数据分布式存储与查询方案。该方法首先采用均匀网格对遥感影像进行划分,并根据划分结果设计了一种基于网格ID和Hilbert曲线相结合的索引方案。然后,通过利用HBase的过滤机制设计了过滤列族,达到了在查询时筛选数据的目的。另外,采用MapReduce的并行处理方法对影像数据进行并行写入和查询。实验结果表明,与MySQL和MapFile相比,该方法可以有效地提高数据的写入和查询速度,且具有较好的可扩展性。  相似文献   

10.
刘惊雷 《计算机工程》2004,30(24):159-161,179
以C Builder中的TQuery和TUpdateQuery控件为例,阐述了进行现场数据的查询和缓冲更新的原理和技术,并解决了更新查询中的多个基表的问题,同时介绍了TDataSource控件的动态数据交换与验证的功能,最后在计算机上编制程序实现了以数据缓冲技术实现多表更新的问题。  相似文献   

11.
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.  相似文献   

12.
时序数据库中日志结构合并树(LSM-tree)在高写入负载或资源受限情况下的不及时的文件合并会导致LSM的C0层数据大量堆积,从而造成近期写入数据的即席查询延迟增加。针对上述问题,提出了一种在保持面向大块数据的高效查询的基础上实现对最新写入的时序数据的低延迟查询的两阶段LSM合并框架。首先将文件的合并过程分为少量乱序文件快速合并与大量小文件合并这两个阶段,然后在每个阶段内提供多种文件合并策略,最后根据系统的查询负载进行两阶段合并的资源分配。通过在时序数据库Apache IoTDB上分别实现传统的LSM合并策略以及两阶段LSM合并框架和测试,结果表明与传统的LSM相比,两阶段的文件合并模块在提升策略灵活性的情况下使即席查询读盘次数大大降低,并且使历史数据分析查询性能提升了约20%。实验结果表明,两阶段的LSM合并框架能够提高近期写入数据的即席查询效率,提高历史数据分析查询性能,而且提升合并策略的灵活性。  相似文献   

13.
B-spline surfaces, extracted from scanned sensor data, are usually required to represent objects in inspection, surveying technology, metrology and reverse engineering tasks. In order to express a large object with a satisfactory accuracy, multiple scans, which generally lead to overlapping patches, are always needed due to, inter-alia, practical limitations and accuracy of measurements, uncertainties in measurement devices, calibration problems as well as skills of the experimenter. In this paper, we propose an action sequence consisting of division and merging. While the former divides a B-spline surface into many patches with corresponding scanned data, the latter merges the scanned data and its overlapping B-spline surface patch. Firstly, all possible overlapping cases of two B-spline surfaces are enumerated and analyzed from a view of the locations of the projection points of four corners of one surface in the interior of its overlapping surface. Next, the general division and merging methods are developed to deal with all overlapping cases, and a simulated example is used to illustrate aforementioned detailed procedures. In the sequel, two scans obtained from a three-dimensional laser scanner are simulated to express a large house with B-spline surfaces. The simulation results show the efficiency and efficacy of the proposed method. In this whole process, storage space of data points is not increased with new obtained overlapping scans, and none of the overlapping points are discarded which increases the representation accuracy. We believe the proposed method has a number of potential applications in the representation and expression of large objects with three-dimensional laser scanner data.  相似文献   

14.
This paper presents an overview of our research project on digital preservation of cultural heritage objects and digital restoration of the original appearance of these objects. As an example of these objects, this project focuses on the preservation and restoration of the Great Buddhas. These are relatively large objects existing outdoors and providing various technical challenges. Geometric models of the great Buddhas are digitally achieved through a pipeline, consisting of acquiring data, aligning multiple range images, and merging these images. We have developed two alignment algorithms: a rapid simultaneous algorithm, based on graphics hardware, for quick data checking on site, and a parallel alignment algorithm, based on a PC cluster, for precise adjustment at the university. We have also designed a parallel voxel-based merging algorithm for connecting all aligned range images. On the geometric models created, we aligned texture images acquired from color cameras. We also developed two texture mapping methods. In an attempt to restore the original appearance of historical objects, we have synthesized several buildings and statues using scanned data and a literature survey with advice from experts.  相似文献   

15.
云数据存储的快速发展对数据的可用性提出了较高要求.目前,主要采用纠删码计算数据编码块进行分布式冗余数据存储来保证数据的可用性.虽然这种数据编码技术保证了存储数据的安全性并减少了额外的存储空间,但在损坏数据恢复时会产生较大的计算和通信开销提出一种基于多级网络编码的多副本生成和损坏数据恢复算法算法基于多级网络编码对纠删码的...  相似文献   

16.
龚敬 《测控技术》2017,36(8):86-89
NAND Flash具有容量大、存取数据快的优点.但是,在使用过程中NAND Flash会产生坏块,且坏块是随机分布的.因此,操作NAND Flash需要相当的技巧,不能往坏块里写入数据.同时,NAND Flash更容易发生位翻转,需要使用ECC算法确保信息的正确性.在航空发动机上NAND Flash主要用来记录发动机控制(FADEC)的历史数据.FADEC的历史数据记录要求大容量、实时性和正确性.提出的适用FADEC的NAND Flash的文件存储系统(FFS_N)可以有效地解决NAND Flash使用上的问题,同时满足航空发动机控制历史数据记录要求.  相似文献   

17.
External mergesort is normally implemented so that each run is stored continuously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting  相似文献   

18.
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering   总被引:2,自引:0,他引:2  
In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depends on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two-phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.  相似文献   

19.
为了避免多传感器获得的数据在物体相同部分存在重叠的问题,提出了一种基于2维网格处理的多传感器数据统一算法。该算法通过对等值面点数据依次进行改进的2维Delaunay三角剖分、顶点删除法网格简化、三角片取重心、数据整合及采样等操作,实现了多传感器数据的统一。对实验室激光3维人体扫描系统获得的人体数据进行的实际处理结果表明,该算法具有处理速度快、操作简便、能有效删除噪声等优点。  相似文献   

20.
Efficient storage techniques for digital continuous multimedia   总被引:4,自引:0,他引:4  
The problem of collocational storage of media strands, which are sequences of continuously recorded audio samples or video frames, on disk to support the integration of storage and transmission of multimedia data with computing is examined. A model that relates disk and device characteristics to the playback rates of media strands and derives storage patterns so as to guarantee continuous retrieval of media strands is presented. To efficiently utilize the disk space, mechanisms for merging storage patterns of multiple media strands by filling the gaps between media blocks of one strand with media blocks of other strands are developed. Both an online algorithm suitable for merging a new media strand into a set of already stored strands and an offline merging algorithm that can be applied a priori to the storage of a set of media strands before any of them have been stored on disk are proposed. As a consequence of merging, storage patterns of media strands may become perturbed slightly. To compensate for this read-ahead and buffering are required so that continuity of retrieval remains satisfied are also presented  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号