首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
《电子技术应用》2017,(9):132-136
K-匿名算法及现存K-匿名改进算法大多使用牺牲时间效率降低发布数据信息损失量的方法实现数据的匿名化,但随着数据量的急剧增长,传统的数据匿名化方法已不适用于对较大数据的处理。针对K-匿名算法在单机执行过程中产生大量频繁项集和重复搜索数据表的缺点,将MapReduce模型引入到抽样泛化路径K-匿名算法中对其进行优化。该方法兼具MapReduce及抽样泛化算法的优点,高效分布式匿名化数据集,降低发布数据集信息损失量,提高数据的可用性。实验结果表明:当数据量较大时,该优化算法在时间效率及数据精度方面有显著提高。  相似文献   

2.
一种面向并行空间查询的数据划分方法   总被引:1,自引:0,他引:1  
在并行空间数据库中,空间数据集在各计算节点是否聚集划分,对提高空间并行查询效率起着关键的作用.Oracle Spatial采用的基于格网的划分方法只考虑了数据集在各节点是否均衡划分,而未考虑空间数据的拓扑特征.基于空间数据聚集划分的目的,提出了一种基于K-平均聚类算法的空间数据划分方法.实验证明,该方法极大地提高了空间数据并行检索和查询效率.  相似文献   

3.
《电子技术应用》2016,(12):115-118
K-匿名是信息隐私保护的一种常用技术,而使用K-匿名技术不可避免会造成发布数据的信息损失,因此,如何提高K-匿名化后数据集的可用性一直以来都是K-匿名隐私保护的研究重点。对此提出了一种基于抽样路径的局域泛化算法——SPOLG算法。该算法基于泛化格寻找信息损失较小的泛化路径,为减少寻径时间,引入等概率抽样的思想,选用等概率抽样中的系统抽样方法进行取样,利用样本代替数据集在泛化格上寻找目标泛化路径,最后在该路径上对数据集进行泛化。同时,本算法使用局域泛化技术,能够降低信息损失量,提高发布数据集的可用性。实验结果证明,本算法匿名化的数据集信息损失度低,数据可用性高。  相似文献   

4.
基于多维泛化路径的K-匿名算法   总被引:3,自引:1,他引:2       下载免费PDF全文
为使微数据发布在满足K-匿名要求的同时提高匿名数据的精度,提出多维泛化路径的概念及相应的2种K-匿名算法,包括完整Filter K-匿名算法和部分Filter K-匿名算法。将它们与Incognito算法和Datafly算法进行比较,实验结果表明2种算法都能有效降低匿名信息损失,提高匿名数据精度和处理效率。  相似文献   

5.
大数据时代,数据的共享与挖掘存在隐私泄露的安全隐患。针对使用K-匿名隐藏实现隐私保护会大幅降低数据分类挖掘性能问题,提出一种基于随机森林特征重要性的K-匿名特征选择算法(RFKA)用于分类挖掘。使用随机森林特征重要性度量特征的分类性能;采用前向序列搜索策略每次选择不破坏K-匿名且分类性能最大的特征加入特征子集;使用特征子集对应的数据集构建模型进行分类实验。实验结果表明,该算法能更有效地平衡K-匿名和分类挖掘性能,且算法运行效率更高。  相似文献   

6.
建模是不确定性数据管理的基础,K-匿名隐私保护模型中不确定性数据有其特殊性:它是人为泛化后的不确定性数据,泛化后的每个实例还原成泛化前元组的概率是相等的。由于其特殊性,以往针对非人为造成不确定性的数据建模方法已经不能简单地用于描述K-匿名隐私保护模型中不确定性数据。为了描述K-匿名隐私保护模型中不确定性数据,本文提出几种针对它的新建模方法:Kattr模型使用attrib-ute-ors方法来描述K-匿名数据中准标识符属性值的不确定性;Ktuple模型把K-匿名表不确定属性值看成是一个关系值,对关系值使用tuple-ors方法来描述;Kupperlower模型把K-匿名表泛化值范围分开成两个字段:上限和下限;Ktree模型根据K-匿名表是对普通表通过泛化树泛化而形成这一特性逆向拆分成树形结构。由这几种模型及它们之间的组合构成了一个描述K-匿名隐私保护模型中不确定性数据的模型空间。并且,本文讨论了模型空间里各种模型的完备性和封闭性等性质。  相似文献   

7.
保护隐私的(L,K) 匿名*   总被引:1,自引:1,他引:0  
提出了一种在K-匿名之上的科学与工程系(L,K)-匿名方法,用于对K-匿名后的数据进行保护,并给出了(L,K)-匿名算法.实验显示该方法能有效地消除K-匿名后秘密匿名属性信息的泄漏,增强了数据发布的安全性.  相似文献   

8.
王莉  宫照煊 《计算机工程》2011,37(2):163-165
针对传统遗传编码存在求解效率低且信息失真大的问题,提出一种基于不定长密歇根编码的遗传算法,采用多种启发式策略进行杂交操作,将基于遗传算法的聚类方法应用到K-匿名化问题中。实验结果表明,该方法可以更好地降低信息失真,从而实现K-匿名化问题。  相似文献   

9.
针对快递单号被盗取和快递单信息保护不当造成的隐私泄露问题进行了研究, 提出了一种新型K-匿名模型对快递信息进行匿名处理。该方法通过随机打破记录中属性值之间的关系来匿名数据, 相比于其他传统方法, 克服了数据间统计关系丢失的问题和先验知识攻击。实验结果表明, 新型K-匿名方法能够加强隐私保护和提高知识保护的准确性。  相似文献   

10.
针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(K-means Clustering Algorithm Based on Random Sampling , Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集中进行随机抽样,得到规模较小的工作集,在工作集上进行传统K-均值聚类,得到聚类中心和半径,并得到抽样结果;然后通过衡量剩下的聚类样本与已得到的抽样结果之间的关系,对剩余的样本进行归类。该方法通过随机抽样大大地减小了参与K-均值聚类的问题规模,从而有效提高了聚类效率,可解决大规模数据的聚类问题。实验结果表明,Kmeans_RS方法在大规模数据集中在保持聚类效果的同时大幅度提高了聚类效率。  相似文献   

11.
查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。  相似文献   

12.
组最近邻查询是空间对象查询领域的一类重要查询,通过该查询可找到距离给定查询点集最近的空间对象.由于图像分辨率或解析度的限制等因素,空间对象的存在不确定性广泛存在于某些涉及图像处理的查询应用中.这些对象位置数据的存在不确定性会对组最近邻查询结果产生影响.本文给出面向存在不确定对象的概率阈值组最近邻查询定义,设计了高效的查询处理机制,通过剪枝优化等手段提高概率阈值组最近邻查询效率,并进一步提出了高效概率阈值组最近邻查询算法.采用多个真实数据集对概率阈值组最近邻算法进行了实验验证,结果表明所提算法具有良好的查询效率.  相似文献   

13.
Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method.  相似文献   

14.
谷峪  于晓楠  于戈 《软件学报》2014,25(8):1806-1816
随着智能移动设备和无线定位技术的飞速发展,使用基于位置服务应用的用户越来越多.特别地,不同于传统的针对固定位置的快照查询,移动的用户往往基于移动轨迹发出连续的查询.在真实和虚拟的空间环境中,障碍物的影响都是广泛存在的,障碍空间内的查询处理技术得到了越来越多的关注,其中,障碍空间内的连续反k近邻查询处理有着重要的应用.对障碍空间中的连续反k近邻查询问题进行了定义和系统的研究,通过定义控制点和分割点,提出了针对该问题的处理框架.进一步地,提出了一系列的过滤和求精算法,包括剪枝数据集、获取障碍物、剪枝和计算控制点和更新结果集等处理策略.基于多种数据集对所提出的算法进行了实验评估.与针对每个数据点进行k 近邻计算的基本方法相比,这些方法可以大幅度提高查询处理的CPU 和I/O 效率.  相似文献   

15.
Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation.  相似文献   

16.
基于MapX的空间查询应用   总被引:6,自引:0,他引:6       下载免费PDF全文
空间查询是GIS应用系统的基本功能之一,空间查询的功能和效率是GIS应用系统的重要指标。本文讨论了利用MapX实现空间查询的方法,包括基本的图形与属性数据互查和基于空间关系的复杂查询,并给出了详细的实现方法和流程。  相似文献   

17.
空间查询优化是空间数据库中的关键问题之一,以查询代价估算为基础的查询优化技术是提高查询效率的一种重要方法,而估算代价的主要问题是估算查询结果(选择率)的大小。针对空间数据库中最常用的两种查询—空间选择和空间连接,阐述了几种主要用于查询选择率佑计的直方图算法,并对各算法的优缺点做了分析,最后对空间查询选择率估计的研究方向进行了展望。  相似文献   

18.
In a heterogeneous database system, a query for one type of database system (i.e., a source query) may have to be translated to an equivalent query (or queries) for execution in a different type of database system (i.e., a target query). Usually, for a given source query, there is more than one possible target query translation. Some of them can be executed more efficiently than others by the receiving database system. Developing a translation procedure for each type of database system is time-consuming and expensive. We abstract a generic hierarchical database system (GHDBS) which has properties common to database systems whose schema contains hierarchical structures (e.g., System 2000, IMS, and some object-oriented database systems). We develop principles of query translation with GHDBS as the receiving database system. Translation into any specific system can be accomplished by a translation into the general system with refinements to reflect the characteristics of the specific system. We develop rules that guarantee correctness of the target queries, where correctness means that the target query is equivalent to the source query. We also provide rules that can guarantee a minimum number of target queries in cases when one source query needs to be translated to multiple target queries. Since the minimum number of target queries implies the minimum number of times the underlying system is invoked, efficiency is taken into consideration  相似文献   

19.
This paper proposes a graph indexing technique for processing constrained spatial queries and discusses the application of such a technique to road map databases where the graph topology is relatively stationary. The fundamental idea of our technique is to augment the original graph with selected augmented links so that query processing cost, especially I/O cost, is minimized. Based on the computational results derived from the probabilistic analysis, we found that the proposed graph indexing technique is a promising approach for significantly reducing costs of spatial queries.Scope and purposeSpatial data is found in geographic information systems where data attributes are associated with nodes and links in directed graphs. Queries on spatial data are generally expensive because of the recursive nature of spatial data traversal. We propose a graph indexing technique to expedite queries on spatial data. The graph index is an instrument for early identification of the relevant nodes and links to the query so that repeated accesses to the same data pages can be eliminated. This paper presents the graph indexing technique in the context of road map databases and shows that the graph indexing technique can improve significantly on the efficiency of constrained queries on spatial data.  相似文献   

20.
Traditional spatial queries return, for a given query object q, all database objects that satisfy a given predicate, such as epsilon range and k-nearest neighbors. This paper defines and studies inverse spatial queries, which, given a subset of database objects Q and a query predicate, return all objects which, if used as query objects with the predicate, contain Q in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse k-nearest neighbor queries, and inverse skyline queries. Furthermore, we show how to relax the definition of inverse queries in order to ensure non-empty result sets. Our experiments show that our framework is significantly more efficient than naive approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号