共查询到20条相似文献,搜索用时 125 毫秒
1.
当前高能物理实验产生的数据量越来越大,利用大数据处理平台Hadoop进行高能物理数据处理时,面临数据迁移的实际需求,而现有迁移工具不支持HDFS与其他文件系统间的数据传输,性能存在明显缺陷。从高能物理数据同步、归档等需求出发,设计和实现了一个通用的海量数据迁移系统,通过扩展HDFS数据访问方式,使用Map-Reduce直接在HDFS数据节点和其他存储系统/介质之间迁移数据。此外,系统设计实现了动态优先级调度模型,进行多任务的动态优先级评定和选取。该系统已经应用于大型高海拔空气簇射观测站(LHAASO)宇宙线等物理实验中的数据迁移,实际运行结果表明系统性能良好,能够满足各个实验的数据迁移需求。 相似文献
2.
随着中国各地城市化的不断推进,地名地址数据的及时更新也迫在眉睫,本文通过分析地名地址数据库标准化需求,介绍了一套基于ArcGIS平台的多源数据地名地址更新流程,该方法能够有效解决地名地址数据库的现势性问题,提高数据生产效率,满足各部门对地理信息数据的需求。 相似文献
3.
基于ODS的数据订阅及其更新策略 总被引:2,自引:0,他引:2
随着信息技术的不断进步,企业对信息的需求在不断增大,由于职能部门繁多和不同历史条件的限制等原因,各个业务部门会根据各自信息的特点选择不同的数据库系统,如何实现这些业务系统之间数据的共享成了当前研究的课题。针对上述特定的应用,文中介绍了一种基于操作数据存储(ODS)技术的数据订阅,介绍了数据订阅的功能结构和流程。并在此基础上讨论了数据订阅的更新策略。在实际项目实施中,使用数据订阅方案达到了项目预期的目的,很好地满足了实际应用中对数据的一致性、实时性和高效性的需求。 相似文献
4.
随着政务云的广泛使用,政务数据隐私安全问题逐渐被重视,传统的数据加密方法不足以解决日趋复杂的数据隐私安全问题。针对政务云环境下隐私保护需求,本文通过对数据隐私安全风险和现状进行分析,研究了隐私保护关键技术,结合实际的政务数据应用特点和安全机制,设计了基于角色的隐私保护访问控制模型,并结合云服务提供商、数据拥有者、可信访问控制中心和数据访问者等云环境中的不同角色主体的安全需求,构建了隐私保护总体框架和隐私保护策略,设计了隐私保护访问控制流程,最后对该模型的安全性进行分析,为政务云环境下的数据安全和数据隐私保护提供借鉴。 相似文献
5.
《A&S(安防工程商)》2007,(12):54-55
指纹U盘采用了USB移动存储技术和指纹身份认证技术,既满足了用户对移动存储设备使用上的方便性和兼容性的需求,又能满足用户对数据保密性、数据存储安全性等的需求。本文就指纹识别移动U盘技术的发展作一简单介绍,以飨读者。 相似文献
6.
7.
8.
黎辛晓 《数字社区&智能家居》2007,2(6):1191-1192,1194
通过对某大型有色金属企业的组织机构地域分布特点、采购作业流程和采购系统数据特征的分析,利用Oracle对分布式数据库的良好支持.结合各种采购数据的分布特征及各部门的数据访问需求,设计了该行业采购系统的分布式数据库方案,以提高数据访问效率。基于该方案所实现的采购系统较好的解决了大量数据传输效率问题,避免集中式数据库的通信瓶颈,实现了数据处理负载均衡,同时满足了数据共享需求。 相似文献
9.
10.
11.
Efficient bulk-loading of gridfiles 总被引:2,自引:0,他引:2
This paper considers the problem of bulk-loading large data sets for the gridfile multiattribute indexing technique. We propose a rectilinear partitioning algorithm that heuristically seeks to minimize the size of the gridfile needed to ensure no bucket overflows. Empirical studies on both synthetic data sets and on data sets drawn from computational fluid dynamics applications demonstrate that our algorithm is very efficient, and is able to handle large data sets. In addition, we present an algorithm for bulk-loading data sets too large to fit in main memory. Utilizing a sort of the entire data set it creates a gridfile without incurring any overflows 相似文献
12.
13.
廖多杨 《计算机测量与控制》2018,26(2):183-185
针对医院临床数据数量庞大,数据之间关联性大,容易出现数据提取不准确等问题,提出基于模糊分类处理技术的医院临床数据智能分类方法。通过对临床运营各项指标的说明,根据指标分析数据的特性;对医院临床数据进行检索,将检索出来的数据进行提取,根据数据的特点,使用模糊分类的技术对数据进行处理,完成临床数据的智能分类。实验结果表明,所提方法对临床数据的分类效果远远优于传统方法,满足了医院对数据处理的要求,为未来医院大量的数据分类处理奠定了坚实的基础。 相似文献
14.
针对"大数据"中常见的大规模无监督数据集中特征选择速度难以满足实际应用要求的问题,在经典粗糙集绝对约简增量式算法的基础上提出了一种快速的属性选择算法。首先,将大规模数据集看作一个随机到来的对象序列,并初始化候选约简为空集;然后每次都从大规模数据集中无放回地随机抽取一个对象,并且每次都判断使用当前候选约简能否区分这一对象和当前对象集中所有应当区分的对象,并将该对象放入到当前对象集中,如果不能区分则向候选约简中添加合适的属性;最后,如果连续I次都没有发现无法区分的对象,那么将候选约简作为大规模数据集的约简。在5个非监督大规模数据集上的实验表明,所求得的约简能够区分95%以上的对象对,并且求取该约简所需的时间不到基于区分矩阵的算法和增量式约简算法的1%;在文本主题挖掘的实验中,使用约简后的数据集挖掘出的文本主题同原始数据集挖掘出的主题基本一致。两组实验结果表明该方法能够有效快速对大规模数据集进行属性选择。 相似文献
15.
针对大数据量排序算法优化问题,提出一种基于Java的按位拆分的排序新算法。该排序算法按照位拆分数据,并结合Java的多线程对拆分的数据进行并行处理。数据实验结果表明,对于大数据量排序,该算法性能明显优于快速排序算法,而且算法具有很好的并行效率。 相似文献
16.
Prodip Hore Author Vitae Author Vitae Dmitry B. Goldgof Author Vitae 《Pattern recognition》2009,42(5):676-1901
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. 相似文献
17.
Wee-Keong Ng Ravishankar C.V. 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(2):314-328
Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. The authors explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modifications are supported. They examine the applicability and performance of three methods. Two of these are adaptions of existing methods, but the third, called tuple differential coding (TDC), is a new method that allows conventional access mechanisms to be used with the compressed data to provide efficient access. They demonstrate how the performance of queries that involve large data transfers can be improved with these database compression techniques 相似文献
18.
随着社会进步和信息化高速发展,网络数据规模大幅度扩大,面对大规模网络数据环境,基于Hadoop和Spark设计可拓展性大数据分析系统。系统Flume模块的Source组件负责采集大数据,Sink组件将大数据传输至Kafka;分析检测模块采用Spark离线训练可扩展性数据,将训练完成的模型传输到Spark streaming中,依据训练模型特征对普通大数据分类,获取可扩展性大数据。系统软件采用ALS算法、PageRank算法得到可扩展性大数据的有效性与价值度排名,据此向用户推荐优质可扩展性大数据。实验结果显示:系统分析可拓展性大数据精准度高于90%,优于对比系统,且具备低能耗、高稳定性的优点,实际应用价值高。 相似文献
19.
Tikhonova A Correa CD Ma KL 《IEEE transactions on visualization and computer graphics》2010,16(6):1551-1559
Interactivity is key to exploration of volume data. Interactivity may be hindered due to many factors, e.g. large data size,high resolution or complexity of a data set, or an expensive rendering algorithm. We present a novel framework for visualizing volume data that enables interactive exploration using proxy images, without accessing the original 3D data. Data exploration using direct volume rendering requires multiple (often redundant) accesses to possibly large amounts of data. The notion of visualization by proxy relies on the ability to defer operations traditionally used for exploring 3D data to a more suitable intermediate representation for interaction--proxy images. Such operations include view changes, transfer function exploration, and relighting. While previous work has addressed specific interaction needs, we provide a complete solution that enables real-time interaction with large data sets and has low hardware and storage requirements. 相似文献
20.
Abhijit Pol Christopher Jermaine Subramanian Arumugam 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):997-1018
Random sampling is one of the most fundamental data management tools available. However, most current research involving sampling
considers the problem of how to use a sample, and not how to compute one. The implicit assumption is that a “sample” is a
small data structure that is easily maintained as new data are encountered, even though simple statistical arguments demonstrate
that very large samples of gigabytes or terabytes in size can be necessary to provide high accuracy. No existing work tackles
the problem of maintaining very large, disk-based samples from a data management perspective, and no techniques now exist
for maintaining very large samples in an online manner from streaming data. In this paper, we present online algorithms for
maintaining on-disk samples that are gigabytes or terabytes in size. The algorithms are designed for streaming data, or for
any environment where a large sample must be maintained online in a single pass through a data set. The algorithms meet the
strict requirement that the sample always be a true, statistically random sample (without replacement) of all of the data
processed thus far. We also present algorithms to retrieve small size random sample from large disk-based sample which may
be used for various purposes including statistical analyses by a DBMS.
A version of this work appeared in the proceedings of SIGMOD 2004, Paris, France, under the title, “Online Maintenance of
Very Large Random Samples”. Material in this paper is based upon work supported by the National Science Foundation under Grant
No. 0347408. 相似文献