首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
相比于确定图上的相似性连接,不确定图上的相似性连接通常具有更大的实际应用价值以及计算复杂性。文中研究了基于MapReduce分布式编程框架的不确定图上的相似性连接问题,提出了基于概率和的Map方剪枝和Reduce方剪枝的两种剪枝策略。Map方剪枝策略在映射过程中过滤掉了不可能具有相似图的不确定图。Reduce方剪枝策略用于减少约减过程中的候选图对。基于这两种剪枝策略,文中提出了一种基于MapReduce框架的不确定图上的相似性连接算法MUGSJoin。实验结果证明,该算法与同类算法相比具有更好的性能和可扩展性。  相似文献   

2.
基于图形处理器的并行方体计算   总被引:1,自引:0,他引:1  
方体(cube)计算是数据仓库和联机分析处理(Online analytical processing,OLAP)领域的核心问题,如何提高方体计算性能获得了学术界和工业界的广泛关注,但目前大部分方体算法都没有考虑最新的处理器架构.近年来,处理器从单一计算核心进化为多个或许多个计算核心,如多核CPU、图形处理器(Graphic Processing Units, GPU)等.为了充分利用现代处理器的多核资源,该文提出了基于GPU的并行方体算法GPU-Cubing,算法采用自底向上、广度优先的划分策略,每次并行完成一个cuboid的计算并输出;在计算cuboid过程中多个分区同步处理,分区内多线程并行.GPU-Cubing算法适合GPU体系结构,并行度高.与BUC算法相比,基于真实数据集的完全方体计算可以获得一个数量级以上的加速比,冰山方体获得至少2倍以上的加速.  相似文献   

3.
基于聚类方法的空间度量物化选择算法   总被引:1,自引:0,他引:1       下载免费PDF全文
梁银 《计算机工程》2011,37(8):58-60
在空间数据仓库中,由于物化视图中空间度量的聚集结果需要占用较大的存储空间,因此只能选择部分空间度量进行物化.而现有的物化视图选择算法大部分只是针对视图选择设计的,没有考虑视图中度量的选择.为此,针对空间度量的区域合并操作,提出基于聚类方法的空间度量物化选择算法.把可合并的空间对象组进行聚类,在每个聚类中计算合并组的收益...  相似文献   

4.
王丽  秦小麟  许建秋 《计算机科学》2015,42(1):201-205,214
室内空间变得越发的庞大和复杂,随之产生了越来越多的室内空间查询需求.目前已有文献提出了针对室内空间环境的范围查询和最近邻查询,而作为常见的空间查询类型的反向最近邻查询,尚未有相关的研究.为此,提出了室内概率阈值反向最近邻查询和基于定位设备的设备可达图模型.在图模型基础上,提出了室内概率阈值反向最近邻查询处理算法,该算法由基于图模型的批量剪枝、基于室内距离的剪枝、基于概率的剪枝和概率计算4部分构成,通过剪枝策略修剪掉不可能出现在结果集中的对象,从而缩小了查询空间,提高了效率.  相似文献   

5.
徐红艳  王丹  王富海  王嵘冰 《计算机应用》2019,39(11):3288-3292
用户相关性度量是异构信息网络研究的基础与核心。现有的用户相关性度量方法由于未充分开展多维度分析和链路分析,其准确性尚存在提升空间。为此,提出了一种融合狄利克雷分布(LDA)与元路径分析的用户相关性度量方法。首先利用LDA进行主题建模,通过分析网络中节点的内容来计算节点的相关性;然后,引入元路径来刻画节点间关系类型,通过关联度量(DPRel)方法对异构信息网络中的用户进行相关性测量;接着,将节点的相关性融入到用户相关性度量计算中;最后,采用IMDB真实电影数据集进行实验,将所提方法和嵌入LDA主题模型的协同过滤推荐方法(ULR-CF)、基于元路径的相关性度量方法(PathSim)进行了对比分析。实验结果表明,所提方法能够克服数据稀疏性弊端,提高用户相关性度量的准确性。  相似文献   

6.
在传统剪枝策略中,具有相同事务集的父子结点搜索空间没有充分剪枝,效率较低.为此,提出父子等价的剪枝策略.采用深度优先搜索集合枚举树,对于父子结点中具有相同事务集的搜索空间进行剪枝,有效地缩小搜索空间,减少频繁项计算的次数,给出基于该剪枝策略的最大频繁项集挖掘算法.实验结果表明,该算法可缩短同一支持度下的最大频繁项集挖掘时间.  相似文献   

7.
障碍空间中不确定数据聚类算法   总被引:2,自引:0,他引:2  
近些年,由于数据采集的不精确和数据本身的不确定性,使不确定性在位置数据中普通存在。在障碍空间中,聚类不确定数据面临新的挑战。提出了障碍空间中聚类不确定数据的OBS-UK-means(obstacle uncertain K-means)算法,并提出了分别基于R树和Voronoi图的两种剪枝策略和最近距离区域的概念,大大减少了计算量。通过实验验证了OBS-UK-means算法的高效性和准确性,同时证明了剪枝策略在不损害聚类有效性的情况下,能够有效地提高聚类效率。  相似文献   

8.
局部异常检测(Local outlier factor,LOF)能够有效解决数据倾斜分布下的异常检测问题,在很多应用领域具有较好的异常检测效果.本文面向大数据异常检测,提出了一种快速的Top-n局部异常点检测算法MTLOF(Multi-granularity upper bound pruning based top-n LOF detection),融合索引结构和多层LOF上界设计了多粒度的剪枝策略,以快速发现Top-n局部异常点.首先,提出了四个更接近真实LOF值的上界,以避免直接计算LOF值,并对它们的计算复杂度进行了理论分析;其次,结合索引结构和UB1、UB2上界,提出了两层的Cell剪枝策略,不仅采用全局Cell剪枝策略,还引入了基于Cell内部数据对象分布的局部剪枝策略,有效解决了高密度区域的剪枝问题;再次,利用所提的UB3和UB4上界,提出了两个更加合理有效的数据对象剪枝策略,UB3和UB4上界更加接近于真实LOF值,有利于剪枝更多数据对象,而基于计算复用的上界计算方法,大大降低了计算成本;最后,优化了初始Top-n局部异常点的选择方法,利用区域划分和建立的索引结构,在数据稀疏区域选择初始局部异常点,有利于将LOF值较大的数据对象选为初始局部异常点,有效提升初始剪枝临界值,使得初始阶段剪枝掉更多的数据对象,进一步提高检测效率.在六个真实数据集上的综合实验评估验证MTLOF算法的高效性和可扩展性,相比最新的TOLF(Top-n LOF)算法,时间效率提升可高达3.5倍.  相似文献   

9.
针对多智能体系统(multi-agent systems,MAS)中环境具有不稳定性、智能体决策相互影响所导致的策略学习困难的问题,提出了一种名为观测空间关系提取(observation relation extraction,ORE)的方法,该方法使用一个完全图来建模MAS中智能体观测空间不同部分之间的关系,并使用注意力机制来计算智能体观测空间不同部分之间关系的重要程度。通过将该方法应用在基于值分解的多智能体强化学习算法上,提出了基于观测空间关系提取的多智能体强化学习算法。在星际争霸微观场景(StarCraft multi-agent challenge,SMAC)上的实验结果表明,与原始算法相比,带有ORE结构的值分解多智能体算法在收敛速度和最终性能方面都有更好的性能。  相似文献   

10.
针对异质信息网络中的影响力最大化(IM)问题,提出了一种基于有向无环图(DAG)的影响力最大化算法(DAGIM)。首先基于DAG结构度量节点的影响力,然后采用边际增益策略选择影响力最大的节点。DAG结构表达力强,不仅描述了不同类型节点之间的显性关系,也刻画了节点之间的隐性关系,较完整地保留了网络的异质信息。在三个真实数据集上的实验结果验证所提DAGIM的性能优于Degree、PageRank、局部有向无环图(LDAG)以及基于元路径的信息熵(MPIE)算法。  相似文献   

11.
Emerging applications face the need to store and analyze interconnected data. Graph cubes permit multi-dimensional analysis of graph datasets based on attribute values available at the nodes and edges of these graphs. Like the data cube that contains an exponential number of aggregations, the graph cube results in an exponential number of aggregate graph cuboids. As a result, they are very hard to analyze. In this work, we first propose intuitive measures based on the information entropy in order to evaluate the rich information contained in the graph cube. We then introduce an efficient algorithm that suggests portions of a precomputed graph cube based on these measures. The proposed algorithm exploits novel entropy bounds that we derive between different levels of aggregation in the graph cube. Per these bounds we are able to prune large parts of the graph cube, saving costly entropy calculations that would be otherwise required. We experimentally validate our techniques on real and synthetic datasets and demonstrate the pruning power and efficiency of our proposed techniques.  相似文献   

12.
n维的立方体将生成2n个聚集立方体.如何进行立方体计算,在存储空间和查询时间方面寻求平衡,成为多维分析应用中的关键问题.基于部分物化的策略,并结合水利普查数据特征,改进Minimal cubing方法,提出了层次维编码片段方法HDEF cubing.该方法利用编码长度较小的层次维编码及其前缀,快速检索出与查询关键字相匹配的层次维编码,减少了多表连接操作,从而提高查询效率.以水利普查数据为例,验证了改进的立方体计算方法能高效地对立方体进行存储和查询,适用于水利普查成果分析.  相似文献   

13.
传统的协同过滤算法没有充分考虑用户和商品的交互信息,且面临数据稀疏、冷启动等问题,造成了推荐系统的结果不准确.在本文中提出了一种新的推荐算法,即基于融合元路径的图神经网络协同过滤算法.该算法首先由二部图嵌入用户和商品的历史互动,并通过多层神经网络传播获取用户和商品的高阶特征;然后基于元路径的随机游走来获取异质信息网络中的潜在语义信息;最后将用户和商品的高阶特征和潜在特征融合并做评分预测.实验结果表明,基于融合元路径的图神经网络协同过滤算法比传统的推荐算法有明显提升.  相似文献   

14.
This article presents a method for adaptively representing multidimensional data cubes using wavelet view elements in order to more efficiently support data analysis and querying involving aggregations. The proposed method decomposes the data cubes into an indexed hierarchy of wavelet view elements. The view elements differ from traditional data cube cells in that they correspond to partial and residual aggregations of the data cube. The view elements provide highly granular building blocks for synthesizing the aggregated and range-aggregated views of the data cubes. We propose a strategy for selectively materializing alternative sets of view elements based on the patterns of access of views. We present a fast and optimal algorithm for selecting a non-expansive set of wavelet view elements that minimizes the average processing cost for supporting a population of queries of data cube views. We also present a greedy algorithm for allowing the selective materialization of a redundant set of view element sets which, for measured increases in storage capacity, further reduces processing costs. Experiments and analytic results show that the wavelet view element framework performs better in terms of lower processing and storage cost than previous methods that materialize and store redundant views for online analytical processing (OLAP).  相似文献   

15.
With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and OLAP of spatial data. In this paper, we study methods for spatial OLAP, by integrating nonspatial OLAP methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for the computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies being proposed, including approximation and selective materialization of the spatial objects resulting from spatial OLAP operations. The focus of our study is on a method for spatial cube construction, called object-based selective materialization, which is different from cuboid-based selective materialization (proposed in previous studies of nonspatial data cube construction). Rather than using a cuboid as an atomic structure during the selective materialization, we explore granularity on a much finer level: that of a single cell of a cuboid. Several algorithms are proposed for object-based selective materialization of spatial data cubes, and a performance study has demonstrated the effectiveness of these techniques  相似文献   

16.
We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.  相似文献   

17.
网络化的数据形式能够表示实体以及实体和实体之间的联系,网络结构在现实世界中普遍存在。研究网络中节点和边的关系具有重要意义。网络表示技术将网络的结构信息转换为节点向量,能够降低图表示的复杂度,同时能够有效运用到分类、网络重构和链路预测等任务中,具有很广泛的应用前景。近年提出的SDNE(Structural Deep Network Embedding)算法在图自编码领域取得了突出成果,文中针对网络表示算法SDNE在有权、有向网络中的局限性,从网络结构和衡量指标两个角度入手,提出了新的基于图自编码的网络表示模型,在原有节点向量的基础上引入了接收向量和发出向量的概念,优化了自编码器的解码部分,进而优化了神经网络的结构,减少了网络的参数以加快收敛速度;提出了基于节点度的衡量指标,将网络的加权特性反映在网络表示的结果中。在3个有向加权数据集中的实验证明,在进行网络重构和链路预测任务时,所提方法能够取得比传统方法和SDNE原始方法更好的结果。  相似文献   

18.
文献信息网络是典型的异构信息网络,基于其进行相似性搜索是图挖掘领域的一个研究热点。然而,现有的方法主要采用元路径或元结构的方式,并未考虑节点自身的语义特征,从而导致搜索结果出现偏差。对此,基于文献信息网络提出了一种基于向量的语义特征提取方法,并设计实现了基于向量的节点相似性计算方法VSim;此外,结合元路径设计了基于语义特征的相似性搜索算法VPSim;为提高算法的执行效率,针对文献网络数据的特点,设计了剪枝策略。通过在真实数据上的实验,验证了VSim对搜索语义特征相似实体的适用性,以及VPSim算法的有效性、高执行效率和高可扩展性。  相似文献   

19.
The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2n cuboids, the conventional methods compute 2ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2n delta cuboids. We formulate an optimization problem to find the optimal subset of 2n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.  相似文献   

20.
现实生活中大量数据都可以使用多维网络进行建模,如何更好地对多维网络进行分析至今仍是研究人员关注的重点.OLAP(联机分析处理)技术已被证实是对多维关系数据进行分析的有效工具,但应用OLAP技术管理和分析多维网络数据以支持有效决策仍旧是一项巨大的挑战.本文设计并提出了一种新的图立方体模型:路径-维度立方体,并针对提出的立方体模型将物化过程划分为关系路径物化与关联维度物化两部分,分别提出了物化策略并基于Spark框架设计了相关算法;在此基础上,我们针对网络数据设计并细化了相关的GraphOLAP(图联机分析处理)操作,丰富了框架的分析角度,提高了对多维网络的分析能力;最后,在Spark上实现了相关算法,通过对多个真实应用场景中的数据构建多维网络,在分析框架上进行了分析,实验表明我们提出的图立方体模型和物化算法具有一定有效性和可扩展性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号