首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Big data has become an important issue for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The rise of different big data frameworks such as Apache Hadoop and, more recently, Spark, for massive data processing based on the MapReduce paradigm has allowed for the efficient utilisation of data mining methods and machine learning algorithms in different domains. A number of libraries such as Mahout and SparkMLib have been designed to develop new efficient applications based on machine learning algorithms. The combination of big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas as social media and social networks. These new challenges are focused mainly on problems such as data processing, data storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and tracking data, among others. In this paper, we present a revision of the new methodologies that is designed to allow for efficient data mining and information fusion from social media and of the new applications and frameworks that are currently appearing under the “umbrella” of the social networks, social media and big data paradigms.  相似文献   

2.
Ischemic stroke is one of the most deadly illnesses in the world, leading to high mortality. Due to lung disease, stroke is the abnormal growth of cells characterized by a single irregular cell and spreads throughout the body. Therefore, to detect and heal the affected area at an early stage, it is necessary to detect the affected area after application. Ischemic stroke is generally regarded as an essential indicator of stroke rehabilitation care. The previous method uses SVM (Support Vector Machine) and STFT (Short Time Fourier Transform Algorithm) to process an image processing system based on stroke detection. This is more accurate and efficient for CT (Computed Tomography) images. The conversion method is significantly slower, and the advanced risk architecture cannot verify the image. The proposed FPGA (Field Programmable Gate Array) and CNN (Convolutional Neural Network) are used to develop image processing and easily interact with the database without introducing complexity. FPGA (Field Programmable Gate Array) is mainly realized by ASIC (Application Specific Integrated Circuit). The system speeds up detecting strokes and lung diseases and can be used as a single process system or another biomedical imaging system component. According to the medical big data system, the image processing system relies on bilateral filtering, edge detection, multiple thresholds, image segmentation, morphological image processing, and image labeling to collect stroke symptoms.  相似文献   

3.
大数据下的典型机器学习平台综述   总被引:1,自引:0,他引:1  
焦嘉烽  李云 《计算机应用》2017,37(11):3039-3047
由于大数据海量、复杂多样、变化快,传统的机器学习平台已不再适用,因此,设计一个高效的、通用的大数据机器学习平台成为目前的研究热点。通过介绍和分析机器学习算法的特点以及大规模机器学习的数据和模型并行化,引出常见的并行计算模型。简单介绍了整体同步并行模型(BSP)、SSP并行计算模型以及BSP、SSP模型与AP模型的区别,主要介绍了基于这些并行模型的典型的机器学习平台和这些平台的优缺点,并指出各个平台最适合处理何种大数据问题。最后从采用的抽象数据结构、并行计算模型、容错机制等方面对典型的机器学习平台进行了总结,并提出一些建议和展望。  相似文献   

4.
遥感大数据研究现状与发展趋势   总被引:2,自引:0,他引:2       下载免费PDF全文
目的 遥感数据空间分辨率、时间分辨率、光谱分辨率以及辐射分辨率不断提高,数据类型也不断增加,从航天、航空、临近空间等遥感平台所获取的遥感数据量急剧增加,遥感数据已经具有明显的大数据特征。本文旨在从系统应用的角度分析遥感大数据处理中涉及的关键技术与问题,为相关研究人员提供有价值的参考。方法 在参考大量文献的基础上,首先阐明遥感大数据的特点。其次,从GPU硬件加速、集群、网格、云计算、云格、复杂高性能计算等角度介绍了遥感大数据处理系统。再次,从分布式集群化存储技术等,分析了遥感大数据处理的关键技术。最后,从遥感大数据的多类不确定性、信息融合、机器学习、分析平台等出发,说明了目前研究存在的问题;从遥感大数据多类不确定性建模,面向遥感大数据的机器学习方法等角度说明了遥感大数据发展的趋势。结果 本文详细梳理了遥感大数据的特点、典型的处理系统、核心技术,力图总结出在实际应用与学术研究中该领域需要解决的关键问题以及未来的发展趋势。结论 大数据技术为遥感数据挖掘与知识获取带来了机遇与挑战,面向大数据的机器学习、数据统一分析框架、面向大数据的信息深度融合等问题的突破,将促进遥感知识挖掘的进一步发展。  相似文献   

5.
The blooming proliferation of aeronautics and astronautics platforms, together with the ever-increasing remote sensing imaging sensors on these platforms, has led to the formation of rapidly-growing earth observation data with the characteristics of large volume, large variety, large velocity, large veracity and large value, which raises awareness about the importance of large-scale image processing, fusion and mining. Unconsciously, we have entered an era of big earth data, also called remote sensing (RS) big data. Although RS big data provides great opportunities for a broad range of applications such as disaster rescue, global security, and so forth, it inevitably poses many additional processing challenges. As one of the most fundamental and important tasks in RS big data mining, image retrieval (i.e., image information mining) from RS big data has attracted continuous research interests in the last several decades. This paper mainly works for systematically reviewing the emerging achievements for image retrieval from RS big data. And then this paper further discusses the RS image retrieval based applications including fusion-oriented RS image processing, geo-localization and disaster rescue. To facilitate the quantitative evaluation of the RS image retrieval technique, this paper gives a list of publicly open datasets and evaluation metrics, and briefly recalls the mainstream methods on two representative benchmarks of RS image retrieval. Considering the latest advances from multiple domains including computer vision, machine learning and knowledge engineering, this paper points out some promising research directions towards RS big data mining. From this survey, engineers from industry may find skills to improve their RS image retrieval systems and researchers from academia may find ideas to conduct some innovative work.  相似文献   

6.
谷洪彬  杨希  魏孔鹏 《计算机时代》2020,(5):109-111,115
针对高校本身业务系统带来的不同结构海量数据的存储管理和高效利用问题,通过比较新兴的数据湖技术和传统的数据仓库的区别,构建了基于数据湖的高校数据管理体系和数据处理机制,为高校的数据治理提供了数据层的存储支持,为使用机器学习方法进行大数据分析提供了非结构化数据来源。  相似文献   

7.
ABSTRACT

The ability to exploit students’ sentiments using different machine learning techniques is considered an important strategy for planning and manoeuvring in a collaborative educational environment. The advancement of machine learning technology is energised by the healthy growth of big data technologies. This helps the applications based on Sentiment Mining (SM) using big data to become a common platform for data mining activities. However, very little has been studied on the sentiment application using a huge amount of available educational data. Therefore, this paper has made an attempt to mine the academic data using different efficient machine learning algorithms. The contribution of this paper is two-fold: (i) studying the sentiment polarity (positive, negative and neutral) from students’ data using machine learning techniques, and (ii) modelling and predicting students’ emotions (Amused, Anxiety, Bored, Confused, Enthused, Excited, Frustrated, etc.) using the big data frameworks. The developed SM techniques using big data frameworks can be scaled and made adaptable for source variation, velocity and veracity to maximise value mining for the benefit of students, faculties and other stakeholders.  相似文献   

8.
The amounts of digital data, when it is generated for each generation, valuable information called big data, have been retained. The cluster is typically used as a research technique; this practical information mining is the process. A considerable amount of diagnosis in the context of big data is established to measure the clustering processing for big data analysis. The so-called fuzzy mechanism-only framework assembled in the security storage sector may include access to the sub-iterative method. The algorithm, based on the incentive of the design and implementation of its low computational needs fuzzy clustering algorithm, big data is possible to cluster the vast data set and biased. Handle the Random Data Storing with Optimization Fuzzy Logic algorithm (RDS-FLA) proposes random data security storage and optimization be applied to the cluster data, the fuzzy logic algorithm. Some of the large-scale data set of experimental learning data has been shown. To evaluate the vague and random data security storage and the time, the attempted performance of RDS-FLA is a form of recommendation for the execution of scalability on a big data cluster. The calculations, the complexity of time and space, run the time, cluster quality, RDS-FLA is, without affecting the quality of clustering, it is about measures in the face to show that that can be performed in a short period. Therefore, the proposed algorithm, shortening the processing time, increase the efficiently stored data security. Advantages such as optimization and efficiency of such data security costs can be determined from the algorithm's experimental results.  相似文献   

9.

Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.

  相似文献   

10.
Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.  相似文献   

11.
工业大数据是在工业领域信息化应用中所产生的海量数据,作为决策问题服务的大数据集、大数据技术和大数据应用的总称。首先分析工业大数据4V特性与工业数据的特有特征,以及工业大数据来源;从多源异构工业数据集成与数据融合方法、工业大数据计算架构、大数据带来的信息安全等三方面论述工业大数据面临的挑战与潜在价值。探讨了工业大数据分析与挖掘方法,提出了工业大数据平台的计算架构与大数据处理平台,构建轮胎企业大数据资源中心、大数据分析与决策应用系统。从销售数据分析和宏观数据趋势两个层面进行轮胎销售大数据分析与预测。采用多个不同领域的销售数据源来解决销售预测历史数据特征空间稀疏的问题,使用LASSO(The Least Absolute Shrinkage and Selectionator Operator)方法的多任务学习方法来解决高维样本空间的缺点,实验数据验证能够提升轮胎销售预测的准确率。  相似文献   

12.
为了提高掌上医疗器械的信息化检索和管理能力,提出基于大数据的掌上医疗器械检索方法,构建掌上医疗器械检索的大数据分布模型,采用有向图模型构建掌上医疗器械信息库的检索节点分布结构模型,在掌上医疗器械信息库库中进行语义关联规则分析,采用字符串的匹配技术,建立掌上医疗器械信息库检索的模糊决策模型,采用大数据融合方法实现掌上医疗器械检索的算法设计,结合自相关特征匹配方法实现掌上医疗器械信息库的语义特征提取,实现掌上医疗器械检索平台的优化设计。仿真结果表明,采用该方法进行掌上医疗器械检索的智能性较好,检索的查准性较高,时延较低。  相似文献   

13.
The paper presents a platform for distributed computing, developed using the latest software technologies and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented as a cloud-based web application with a graphical user interface which supports the construction and execution of data mining workflows, including web services used as workflow components. As a web application, the ClowdFlows platform poses no software requirements and can be used from any modern browser, including mobile devices. The constructed workflows can be declared either as private or public, which enables sharing the developed solutions, data and results on the web and in scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to any number of computing nodes. From a developer’s perspective the platform is easy to extend and supports distributed development with packages. The paper focuses on big data processing in the batch and real-time processing mode. Big data analytics is provided through several algorithms, including novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining module for continuous parallel workflow execution. The batch mode and real-time processing mode are demonstrated with practical use cases. Performance analysis shows the benefit of using all available data for learning in distributed mode compared to using only subsets of data in non-distributed mode. The ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.  相似文献   

14.
15.
Providing insight into healthcare consumers' behaviors and attitudes is critical information in an environment where healthcare delivery is moving rapidly towards patient-centered care. We apply a two-stage methodology using both supervised and unsupervised machine learning methods to a patient data set from the electronic medical records of an academic medical center located in central Pennsylvania. The data are from patients who had total joint replacement surgery between December 2013 and September 2015. Two clustering methods and four classification algorithms were applied to the data set. Patients cluster into six distinct health market segments from which the cluster assignment is used as the response variable in supervised learning to classify patients. The classification model accurately predicts the cluster assignment for out-of-sample patients, while offering insight into patient behaviors and attributes to help clinicians, health marketers, and healthcare consumers move toward the goal of patient-centered and value-based healthcare.  相似文献   

16.
蒋胤傑    况琨    吴飞   《智能系统学报》2020,15(1):175-182
数据驱动的机器学习(特别是深度学习)在自然语言处理、计算机视觉分析和语音识别等领域取得了巨大进展,是人工智能研究的热点。但是传统机器学习是通过各种优化算法拟合训练数据集上的最优模型,即在模型上的平均损失最小,而在现实生活的很多问题(如商业竞拍、资源分配等)中,人工智能算法学习的目标应该是是均衡解,即在动态情况下也有较好效果。这就需要将博弈的思想应用于大数据智能。通过蒙特卡洛树搜索和强化学习等方法,可以将博弈与人工智能相结合,寻求博弈对抗模型的均衡解。从数据拟合的最优解到博弈对抗的均衡解能让大数据智能有更广阔的应用空间。  相似文献   

17.
近年来,随着计算机互联网信息技术的蓬勃发展,我国已经进入大数据时代。在此背景之下,计算机软件技术已被广泛应用于各大领域和产业中。文章首先介绍了大数据时代计算机软件技术的发展现状,重点解析了现代计算机技术中几种常见的计算机软件技术类型,剖析了大数据时代计算机软件技术的实际应用价值,并探讨了大数据时代计算机软件关键技术的应用,旨在促进当代计算机软件技术更好地为人类社会和企业服务。  相似文献   

18.

The majority of older people wish to live independently at home as long as possible despite having a range of age-related conditions including cognitive impairment. To facilitate this, there has been an extensive focus on exploring the capability of new technologies with limited success. This paper investigates whether MS Kinect (a motion-based sensing 3-D scanner device) within the MiiHome (My Intelligent Home) project in conjunction with other sensory data, machine learning and big data techniques can assist in the diagnosis and prognosis of cognitive impairment and hence prolong independent living. A pool of Kinect devices and various sensors powered by minicomputers providing internet connectivity are being installed in up to 200 homes. This enables continuous remote monitoring of elderly residents living alone. Passive and off-the-shelf sensor technologies were chosen to implement data acquisition specifically from sources that are part of the fabric of the homes, so that no extra effort is required from the participants. Various constraints including environmental, geometrical and big data were identified and appropriately dealt with. A visualization tool (MAGID) was developed for validation and verification of numerous behavioural activities. Then, a subset of data, from twelve pensioners aged over 65 with age-related cognitive decline and frailty, were collected over a period of 6 months. These data were subjected to several machine learning algorithms (multilayer perceptron neural network, neuro-fuzzy and deep learning) for classification and to extract routine behavioural patterns. These patterns were then analysed further to ascertain any health-related information and their attributes. For the first time, important routine behaviour related to Activities of Daily Living (ADL) of elderly people with cognitive and physical decline has been learnt by machine learning techniques from selected sample data obtained by MS Kinect. Medically important behaviour, e.g. eating, walking, sitting, was best learnt by deep learning with accuracy of 99.30% during training stage and average error rate of 1.83% with maximum of 12.98% during the implementation phase. Observations obtained from the application of the above learnt behaviours are presented as trends over a period of time. These trends, supplemented by other sensory signals, have provided a clearer picture of physical (in)activities (including falls) of the pensioners. The calculated behavioural attributes related to key indicators of health events can be used to model the trajectory of health status related to cognitive decline in a home setting. These results, based on a small number of elderly residents over a short period of time, imply that within the results obtained from the MiiHome project, it is possible to find indicators of cognitive decline. However, further studies are needed for full clinical validation of these indications in conjunction with assessment of cognitive decline of the participants.

  相似文献   

19.
近年来,随着计算机互联网信息技术的蓬勃发展,我国已经进入大数据时代。在此背景之下,计算机软件技术已被广泛应用于各大领域和产业中。文章首先介绍了大数据时代计算机软件技术的发展现状,重点解析了现代计算机技术中几种常见的计算机软件技术类型,剖析了大数据时代计算机软件技术的实际应用价值,并探讨了大数据时代计算机软件关键技术的应用,旨在促进当代计算机软件技术更好地为人类社会和企业服务。  相似文献   

20.
李敏  倪少权  邱小平  黄强 《计算机应用》2015,35(5):1267-1272
针对物联网环境下异构大数据处理实时性低的问题,探讨了基于Hadoop框架实现数据处理与持久化的方法,提出了一种基于"上下文"的Hadoop大数据处理系统模型HDS,HDS利用Hadoop框架完成数据并行处理与持久化,将物联网环境下异构数据抽象为"上下文"作为HDS处理对象;并提出了"上下文距离"上下文邻域系统(CNS)"的定义;对于Hadoop框架本身数据处理实时性不高的问题,HDS在设计上增加了"上下文队列(CQ)"作为辅助存储来提高数据处理实时性;利用"上下文"的时空特性,建立了用户请求"上下文邻域系统"对任务进行重组.以成品油配送车辆调度问题为例,利用MapReduce并行实验对HDS的数据处理与实时性能进行了验证与分析.实验结果表明,在物联网环境下,HDS不仅在大数据处理性能上较传统单点处理模型(SDS)具有明显优势,在实验环境中10台服务器的情况下,其计算性能能够超过SDS 200倍以上;同时也验证了CQ作为辅助存储能够有效提高数据处理实时性,在10台服务器环境下,其数据处理实时性能够提高270倍以上.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号