首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
作为一门新兴的学科领域,数据科学的科学性受到了关注且其科学问题未明确提出。文中从科学研究范式及方法论、可证伪性和可再现性、科学精神及快速迭代以及科学研究纲领及理论体系4个方面探讨了数据科学的“科学性”,并解答了为什么数据科学是一门新兴科学的问题。在此基础上,结合DIKW模型(DIKW Pyramid or Hierarchy)、DMP(Data-Model-Problem)模型、数据科学的统计学和机器学习方法论以及数据科学的流程与活动,提出了数据科学的7个核心科学问题:解释在先还是在后或无、问题对齐数据还是数据对齐问题、更加相信数据还是模型、更加重视性能还是可解释性、如何划分数据、如何用已知数据解决未知数据的问题、人在环路还是人出环路。最后,提出了数据科学研究的4点建议:聚焦数据科学本身的理论研究,推动数据的科学、技术和工程需要进一步分离和专业化,加强人工智能赋能的数据科学的理论与实践以及数据科学学科(Data Science as A Discipline)与学科中的数据科学(Data Science Within A Discipline)的联动。  相似文献   

2.
基于大数据时代的新学科——数据科学的研究方法正在被包括管理学在内的其他学科应用。首先,探讨了以大数据为理论基础的数据科学研究范式与管理学研究的经典范式之间的联系和区别。其次,分析了国家自然科学基金委认定的A类重要管理学期刊文献及引证文献,对当前国内管理学领域为数不多的基于数据驱动的公共管理、基于复杂网络仿真的网络行为管理和基于多源数据融合的创新管理等热点领域进行了分类梳理。然后, 归纳总结了当前国内管理学领域采用数据科学研究方法的特征。最后,提出了数据科学在管理学科学研究应用中的趋势,即范式融合、大数据利用、场景融合、专家合作。  相似文献   

3.
数据密集型科学与工程:需求和挑战   总被引:4,自引:0,他引:4  
科学研究在经历了实验科学、理论科学、计算科学阶段后,进入了数据密集型科学阶段,与之相伴的是大数据时代的到来.大数据泛指规模达到几百TB,甚至PB级的数据①,其典型的特征是分布、异构、低质量等.尽管传统数据库管理技术(特别是商业关系型数据库)在过去40年间取得了巨大成功,但是这些技术和系统无法有效管理支持数据密集型科学与工程(Data-Intensive Science and Engineering,DISE)的大数据.文中探讨数据密集型科学与工程的具体需求和现实挑战.它涵盖的内容表现在4个层面,包括数据存储与组织、计算方法、数据分析以及用户接口技术等.同时,数据质量、数据安全、数据监护等内容也需要在各层面得到重视.文中尝试梳理了数据密集型科学与工程的整体架构,回顾了相关领域的新近发展,分析了面临的挑战,探讨了未来的研究方向.  相似文献   

4.
数据密集型科研第四范式   总被引:1,自引:0,他引:1  
陈明 《计算机教育》2013,(9):103-106
由于数据爆炸式增长,为了解决数据密集型知识发现,出现了科学研究的第四范式。文章介绍第四范式的产生背景、核心内容、格雷法则、范式转变和第四范式时代等内容。  相似文献   

5.
数据科学与大数据技术专业作为一门新兴专业,对我国信息技术发展及综合实力的提高有举足轻重的意义.文章首先指出数据科学与大数据技术专业在师资、科研、教学方面存在的主要问题,其次围绕"新工科","工程认证"等理念,从数据科学与大数据技术专业的人才培养模式以及课程教学模式创新两方面作出了实践,制定了数据科学与大数据技术专业的课程体系,给出了针对数据科学与大数据技术专业学生的能力培养矩阵方案.  相似文献   

6.
大数据具有数据量巨大、数据形式多样化等特点,大数据时代的到来对现代人才的综合素质提出了更高的要求,其中信息素养也成了现代人参与大数据时代的必备素质.信息与计算科学专业是高校培养信息专业人才的主要学科,而在当前的教育环境中,专业教学不仅要求学生掌握专业的计算机操作能力,还应该从信息运用的角度对数据进行收集与筛选,因此,本文从大数据时代的特点出发,结合目前高校信息与计算科学专业在人才培养中存在的问题,探究科学的人才培养模式.  相似文献   

7.
数据网格的数据管理策略   总被引:6,自引:0,他引:6  
数据网格的目标是使数据密集型的高性能计算和数据密集型的数据共享事务处理及科学研究成为可能,数据网格主要包括数据存储系统和数据管理系统两大部分.数据管理系统对所存储的数据进行管理,主要包括数据的传送和复制等操作.文章对数据管理策略进行了详细的分类评述并且讨论了目前数据管理系统中的某些局限性和进一步的工作.  相似文献   

8.
大数据相关专业的人才培养正在广泛受到国内外高校的重视。针对不同的专业发展方向和学科定位,需探索适合大数据类专业建设的模式和配套的课程体系。以大数据管理与应用专业为切入点,重点分析了该专业的发展定位与人才培养目标。基于成果导向的教育模式,以围绕项目的学习实践,注重数据全生命周期管理的能力培养,构建大数据管理与应用专业的培养模式并优化设置该专业的课程体系。相关研究工作可为高校大数据管理与应用专业建设提供基本思路和借鉴参考。  相似文献   

9.
随着数据分析研究的兴起,数据预处理越来越得到研究者的重视,其中缺失数据填补问题的重要性也逐渐显现。在ROUSTIDA数据补齐算法的基础上,针对具有关键属性的重复数据的特点,文中提出了一种改进的ROUSTIDA算法——Key&Rpt_RS算法。Key&Rpt_RS算法继承了ROUSTIDA算法的优势,同时考虑了目标数据的重复性特点,分析了关键属性对填补效果的影响,得到了更加准确且有效的填补结果。  相似文献   

10.
科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法(BPSO)的数据布局优化策略,从而减少对云计算传输资源的租赁费用。  相似文献   

11.
This article examines how to use big data analytics services to enhance business intelligence (BI). More specifically, this article proposes an ontology of big data analytics and presents a big data analytics service-oriented architecture (BASOA), and then applies BASOA to BI, where our surveyed data analysis shows that the proposed BASOA is viable for enhancing BI and enterprise information systems. This article also explores temporality, expectability, and relativity as the characteristics of intelligence in BI. These characteristics are what customers and decision makers expect from BI in terms of systems, products, and services of organizations. The proposed approach in this article might facilitate the research and development of business analytics, big data analytics, and BI as well as big data science and big data computing.  相似文献   

12.
Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society. The European Open Science Cloud (EOSC) initiative is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications and store, share and reuse research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. To meet those resource demands, computing paradigms such as High-Performance Computing (HPC) and Cloud Computing are applied to e-science applications. However, adapting applications and services to these paradigms is a challenging task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a general barrier to its uptake by scientists. In this context, EOSC-Synergy, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC’s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-Synergy to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services can be transferred to new services for the adoption of the EOSC ecosystem framework. The article made several recommendations for the integration of thematic services in the EOSC ecosystem regarding Authentication and Authorization (federated regional or thematic solutions based on EduGAIN mainly), FAIR data and metadata preservation solutions (both at cataloguing and data preservation—such as EUDAT’s B2SHARE), cloud platform-agnostic resource management services (such as Infrastructure Manager) and workload management solutions.  相似文献   

13.
云计算为大数据处理提供了一种强大而高效的解决方案.在此模式下,数据管理者(Data Manager,DM)可以租用多个数据中心以实时处理地理分散的数据.然而,由于数据产生的动态性以及资源价格的波动性,将数据迁移至哪些数据中心并提供合适的计算资源来处理它们成为DM低成本处理多源数据的一大问题.本文首先将以上问题转换成联合随机优化问题,然后利用李雅普诺夫(Lyapunov)优化框架将原问题分解成两个独立的子问题进行求解,最后基于求解结果设计在线算法.理论分析表明,所提算法可不断趋近线下最优解并能够保证数据处理时延.在WorldCup98和Youtube数据集上的实验验证了理论分析结果的正确性以及本方法的优越性.  相似文献   

14.
The paper presents a platform for distributed computing, developed using the latest software technologies and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented as a cloud-based web application with a graphical user interface which supports the construction and execution of data mining workflows, including web services used as workflow components. As a web application, the ClowdFlows platform poses no software requirements and can be used from any modern browser, including mobile devices. The constructed workflows can be declared either as private or public, which enables sharing the developed solutions, data and results on the web and in scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to any number of computing nodes. From a developer’s perspective the platform is easy to extend and supports distributed development with packages. The paper focuses on big data processing in the batch and real-time processing mode. Big data analytics is provided through several algorithms, including novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining module for continuous parallel workflow execution. The batch mode and real-time processing mode are demonstrated with practical use cases. Performance analysis shows the benefit of using all available data for learning in distributed mode compared to using only subsets of data in non-distributed mode. The ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.  相似文献   

15.
Cloud computing is an emerging computing paradigm that offers on-demand, flexible, and elastic computational and storage services for the end-users. The small and medium-sized business organization having limited budget can enjoy the scalable services of the cloud. However, the migration of the organizational data on the cloud raises security and privacy issues. To keep the data confidential, the data should be encrypted using such cryptography method that provides fine-grained and efficient access for uploaded data without affecting the scalability of the system. In mobile cloud computing environment, the selected scheme should be computationally secure and must have capability for offloading computational intensive security operations on the cloud in a trusted mode due to the resource constraint mobile devices. The existing manager-based re-encryption and cloud-based re-encryption schemes are computationally secured and capable to offload the computationally intensive data access operations on the trusted entity/cloud. Despite the offloading of the data access operations in manager-based re-encryption and cloud-based re-encryption schemes, the mobile user still performs computationally intensive paring-based encryption and decryption operations using limited capabilities of mobile device. In this paper, we proposed Cloud-Manager-based Re-encryption Scheme (CMReS) that combines the characteristics of manager-based re-encryption and cloud-based re-encryption for providing the better security services with minimum processing burden on the mobile device. The experimental results indicate that the proposed cloud-manager-based re-encryption scheme shows significant improvement in turnaround time, energy consumption, and resources utilization on the mobile device as compared to existing re-encryption schemes.  相似文献   

16.
由于互联网技术急速发展及其用户迅速地增加,很多网络服务公司每天不得不处理TB级甚至更大规模的数据量。在如今的大数据时代,如何挖掘有用的信息正变成一个重要的问题。关于数据挖掘(Data Mining)的算法在很多领域中已经被广泛运用,挖掘频繁项集是数据挖掘中最常见且最主要的应用之一,Apriori则是从一个大的数据集中挖掘出频繁项集的最为典型的算法。然而,当数据集比较大或使用单一主机时,内存将会被快速消耗,计算时间也将急剧增加,使得算法性能较低,基于MapReduce的分布式和并行计算则被提出。文中提出了一种改进的MMRA (Matrix MapReduce Algorithm)算法,它通过将分块数据转换成矩阵来挖掘所有的频繁k项集;然后将提出的算法和目前已经存在的两种算法(one-phase算法、k-phase算法)进行比较。采用Hadoop-MapReduce作为实验平台,并行和分布式计算为处理大数据集提供了一个潜在的解决方案。实验结果表明,改进算法的性能优于其他两种算法。  相似文献   

17.
With the maturation of grid computing facilities and recent explosion of cloud computing data centers, midscale computational science has more options than ever before to satisfy computational needs. But heterogeneity brings complexity. We propose a simple abstraction for interaction with heterogeneous resource managers spanning grid and cloud computing and on features that make the tool useful for the midscale physical or natural scientist. Key strengths of the abstraction are its support for multiple standard job specification languages, preservation of direct user interaction with the service, removing the delay that can come through layers of services, and the predictable behavior under heavy loads. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
Facing the scale, heterogeneity and dynamics of the global computing platform emerging on top of the Internet, autonomic computing has been raised recently as one of the top challenges of computer science research. Such a paradigm calls for alternative programming abstractions, able to express autonomic behaviours. In this quest, nature-inspired analogies regained a lot of interest. More specifically, the chemical programming paradigm, which envisions a program’s execution as a succession of reactions between molecules representing data to produce a result, has been shown to provide some adequate abstractions for the high-level specification of autonomic systems.  相似文献   

19.
Cloud computing advocates a promising paradigm that facilitates the access within heterogeneous services, platforms, and end users. However, platforms (or host servers) have confined to devices which require a considerable computing resources. In this case, solutions concerning the efficient use of pervasive devices with constrained resources become an open issue. This study investigates the seamless connection between embedded devices and cloud resources to enhance the capability of computing and furthermore provide context-aware services. A method for wireless program dissemination and boot loading is proposed to transfer necessary information and resources between service and target device(s). The experiment results on time delay and energy cost demonstrate the feasibility and performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号