首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Big Data has emerged with new opportunities for research, development, innovation and business. It is characterized by the so-called four Vs: volume, velocity, veracity and variety and may bring significant value through the processing of Big Data. The transformation of Big Data's 4 Vs into the 5th (value) is a grand challenge for processing capacity. Cloud Computing has emerged as a new paradigm to provide computing as a utility service for addressing different processing needs with a) on demand services, b) pooled resources, c) elasticity, d) broad band access and e) measured services. The utility of delivering computing capability fosters a potential solution for the transformation of Big Data's 4 Vs into the 5th (value). This paper investigates how Cloud Computing can be utilized to address Big Data challenges to enable such transformation. We introduce and review four geospatial scientific examples, including climate studies, geospatial knowledge mining, land cover simulation, and dust storm modelling. The method is presented in a tabular framework as a guidance to leverage Cloud Computing for Big Data solutions. It is demostrated throught the four examples that the framework method supports the life cycle of Big Data processing, including management, access, mining analytics, simulation and forecasting. This tabular framework can also be referred as a guidance to develop potential solutions for other big geospatial data challenges and initiatives, such as smart cities.  相似文献   

2.
Multimedia Tools and Applications -  相似文献   

3.
4.
Provenance is information about the origin and creation of data. In data science and engineering related with cloud environment, such information is useful and sometimes even critical. In data analytics, it is necessary for making data-driven decisions to trace back history and reproduce final or intermediate results, even to tune models and adjust parameters in a real-time fashion. Particularly, in cloud, users need to evaluate data and pipeline trustworthiness. In this paper, we propose a solution: LogProv, toward realizing these functionalities for big data provenance, which needs to renovate data pipelines or some of big data software infrastructure to generate structured logs for pipeline events, and then stores data and logs separately in cloud space. The data are explicitly linked to the logs, which implicitly record pipeline semantics. Semantic information can be retrieved from the logs easily since they are well defined and structured beforehand. We implemented and deployed LogProv in Nectar Cloud,* associated with Apache Pig, Hadoop ecosystem, and adopted Elasticsearch to provide query service. LogProv was evaluated and empirically case studied. The results show that LogProv is efficient since the performance overhead is no more than 10%; the query can be responded within 1 second; the trustworthiness is marked clearly; and there is no impact on the data processing logic of original pipelines.  相似文献   

5.
We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere’s features include “in situ” data processing, a declarative query language, treatment of user-defined functions as first-class citizens, automatic program parallelization and optimization, support for iterative programs, and a scalable and efficient execution engine. Stratosphere covers a variety of “Big Data” use cases, such as data warehousing, information extraction and integration, data cleansing, graph analysis, and statistical analysis applications. In this paper, we present the overall system architecture design decisions, introduce Stratosphere through example queries, and then dive into the internal workings of the system’s components that relate to extensibility, programming model, optimization, and query execution. We experimentally compare Stratosphere against popular open-source alternatives, and we conclude with a research outlook for the next years.  相似文献   

6.
Information Systems and e-Business Management - Big data analytics (BDA) projects are expected to provide organizations with several benefits once the project closes. Nevertheless, many BDA...  相似文献   

7.
The age of big data analytics is now here, with companies increasingly investing in big data initiatives to foster innovation and outperform competition. Nevertheless, while researchers and practitioners started to examine the shifts that these technologies entail and their overall business value, it is still unclear whether and under what conditions they drive innovation. To address this gap, this study draws on the resource-based view (RBV) of the firm and information governance theory to explore the interplay between a firm’s big data analytics capabilities (BDACs) and their information governance practices in shaping innovation capabilities. We argue that a firm’s BDAC helps enhance two distinct types of innovative capabilities, incremental and radical capabilities, and that information governance positively moderates this relationship. To examine our research model, we analyzed survey data collected from 175 IT and business managers. Results from partial least squares structural equation modelling analysis reveal that BDACs have a positive and significant effect on both incremental and radical innovative capabilities. Our analysis also highlights the important role of information governance, as it positively moderates the relationship between BDAC’s and a firm’s radical innovative capability, while there is a nonsignificant moderating effect for incremental innovation capabilities. Finally, we examine the effect of environmental uncertainty conditions in our model and find that information governance and BDACs have amplified effects under conditions of high environmental dynamism.  相似文献   

8.

点对学习(pairwise learning)是指损失函数依赖于2个实例的学习任务. 遗憾界对点对学习的泛化分析尤为重要. 现有的在线点对学习分析只提供了凸损失函数下的遗憾界. 为了弥补非凸损失函数下在线点对学习理论研究的空白,提出了基于稳定性分析的非凸损失函数在线点对学习的遗憾界. 首先提出了一个广义的在线点对学习框架,并给出了具有非凸损失函数的在线点对学习的稳定性分析;然后,根据稳定性和遗憾界之间的关系,对非凸损失函数下的遗憾界进行研究;最后证明了当学习者能够获得离线神谕(oracle)时,具有非凸损失函数的广义在线点对学习框架实现了最佳的遗憾界$O({T^{ - 1/2}})$.

  相似文献   

9.
大数据下不完备信息系统近似空间的并行算法   总被引:1,自引:0,他引:1  
上、下近似空间是粗糙理论的重要概念,解决上、下近似问题是海量数据挖掘的基础。经典的近似空间算法不适合处理海量数据,更不适合处理带缺失信息的海量数据问题。为此,通过深度分析带缺失信息的海量数据特征,结合MapReduce编程模型,提出了基于MapReduce框架下近似空间的并行算法,以处理带缺失信息的海量数据,实验结果表明了该并行算法的有效性。  相似文献   

10.
11.
Drawing on a revelatory case study, we identify four big data analytics (BDA) actualization mechanisms: (1) enhancing, (2) constructing, (3) coordinating, and (4) integrating, which manifest in actions on three socio-technical system levels, i.e., the structure, actor, and technology levels. We investigate the actualization of four BDA affordances at an automotive manufacturing company, i.e., establishing customer-centric marketing, provisioning vehicle-data-driven services, data-driven vehicle developing, and optimizing production processes. This study introduces a theoretical perspective to BDA research that explains how organizational actions contribute to actualizing BDA affordances. We further provide practical implications that can help guide practitioners in BDA adoption.  相似文献   

12.

Big data analytics in cloud environments introduces challenges such as real-time load balancing besides security, privacy, and energy efficiency. This paper proposes a novel load balancing algorithm in cloud environments that performs resource allocation and task scheduling efficiently. The proposed load balancer reduces the execution response time in big data applications performed on clouds. Scheduling, in general, is an NP-hard problem. Our proposed algorithm provides solutions to reduce the search area that leads to reduced complexity of the load balancing. We recommend two mathematical optimization models to perform dynamic resource allocation to virtual machines and task scheduling. The provided solution is based on the hill-climbing algorithm to minimize response time. We evaluate the performance of proposed algorithms in terms of response time, turnaround time, throughput metrics, and request distribution with some of the existing algorithms that show significant improvements.

  相似文献   

13.
Social media has drastically entered into a new concept by empowering people to publish their data along with their locations in order to provide benefits to the community and the country overall. There is a significant increase in the use of geosocial networks, such as Twitter, Facebook, Foursquare, and Flickr. Therefore, people worldwide can now voice their opinion, report an event instantly, and connect with others while sharing their views. Thus, geosocial network data provides full information on human current trends in terms of behavior, lifestyle, incidents and events, disasters, current medical infections, and much more with respect to location. Hence, current geosocial media can serve as data assets for countries and their government by analyzing geosocial data in a real time. However, there are millions of geosocial network users who generate terabytes of heterogeneous data with a variety of information every day and at high speed; such information is called “Big Data.” Analyzing such a significant amount of data and making real-time decisions regarding event detection is a challenging task. Therefore, in this paper, we propose an efficient system for exploring geosocial networks while harvesting data in order to make real-time decisions while detecting various events. A novel system architecture is proposed and implemented in a real environment in order to process an abundant amount of various social network data to monitor Earth events, incidents, medical diseases, user trends, and views to make future real-time decisions and facilitate future planning. The proposed system consists of five layers, i.e., data collection, data processing, application, communication, and data storage. The system deploys Spark at the top of the Hadoop ecosystem to run a real-time analysis. Twitter and Flickr data are analyzed using the proposed architecture in order to identify current events or disasters, such as earthquakes, fires, Ebola virus contagion, and snow. The system is evaluated on the Tweeter’s data by considering the recent earthquake detection occurred in New Zealand. The system is also evaluated with respect to efficiency while considering system throughput on large datasets. We prove that the system has higher throughput and is capable of analyzing a huge amount of geosocial network data at a real time while detecting any event.  相似文献   

14.
Information Technology and Management - Manufacturing firms generate a massive amount of data points because of higher than ever connected devices and sensor technology adoption. These data points...  相似文献   

15.
This paper explains a novel approach for knowledge discovery from data generated by Point of Care (POC) devices. A very important element of this type of knowledge extraction is that the POC generated data would never be identifiable, thereby protecting the rights and the anonymity of the individual, whilst still allowing for vital population-level evidence to be obtained. This paper also reveals a real-world implementation of the novel approach in a big data analytics system. Using Internet of Things (IoT) enabled POC devices and the big data analytics system, the data can be collected, stored, and analyzed in batch and real-time modes to provide a detailed picture of a healthcare system as well to identify high-risk populations and their locations. In addition, the system offers benefits to national health authorities in forms of optimized resource allocation (from allocating consumables to finding the best location for new labs) thus supports efficient and timely decision-making processes.  相似文献   

16.
17.
Events formulate the world of the human being and could be regarded as the semantic units in different granularities for information organization. Extracting events and temporal information from texts plays an important role for information analytics in big data because of the wide use of multilingual texts. This paper surveys existing research work on text-based event temporal resolution and reasoning including identification of events, temporal information resolutions of events in English and Chinese texts, the rule-based temporal relation reasoning between events and relevant temporal representations. For the scientific big data analytics, we point out the shortcomings of existing research work and give the argument about the future research work for advancing identification of events, establishment of temporal relations and reasoning of temporal relations.  相似文献   

18.
19.
Big data has become an important issue for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The rise of different big data frameworks such as Apache Hadoop and, more recently, Spark, for massive data processing based on the MapReduce paradigm has allowed for the efficient utilisation of data mining methods and machine learning algorithms in different domains. A number of libraries such as Mahout and SparkMLib have been designed to develop new efficient applications based on machine learning algorithms. The combination of big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas as social media and social networks. These new challenges are focused mainly on problems such as data processing, data storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and tracking data, among others. In this paper, we present a revision of the new methodologies that is designed to allow for efficient data mining and information fusion from social media and of the new applications and frameworks that are currently appearing under the “umbrella” of the social networks, social media and big data paradigms.  相似文献   

20.
Multimedia Tools and Applications - Recent real medical datasets show that the number of outpatients in China has sharply increased since 2013, when the Chinese health insurance reform started....  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号