排序方式: 共有27条查询结果,搜索用时 15 毫秒
1.
2.
随着互联网发展带来的数据爆炸,使得 Web日志的数据量也越来越大,如何从海量的 Web 日志中挖掘有价值的信息成为了目前研究的热点。本文提出基于 Hadoop 集群框架对 Web 日志进行挖掘。实验结果表明,该集群系统既可以处理海量的 web 日志,同时也能够挖掘出有价值的信息,并证实了利用sqoop在 Hive仓库和传统数据库之间数据迁移的可行性。 相似文献
3.
融合通信是当今计算机应用领域研究热点之一,人们对融合通信系统中应用服务的要求也越来越高.在数据存取方面,基于传统关系型数据库或者基于传统文件系统的存储方式已经越来越不能满足应用的需求.随着Hadoop技术以及相关子系统的发展,分布式存储的优势日渐明显.因此,本文在分析HBase、Hive各自特点及其体系结构的基础上,结合融合通信具体项目提出了基于HBase-Hive集成设计的存储引擎设计方法,以此来解决融合通信系统中数据安全性、数据获取效率等方面不满足的情况.通过对比实验表明,该设计方案提高系统数据查询获取效率,也为后续数据挖掘方面的开发做好准备. 相似文献
4.
5.
研究当今恶意程序的发展趋势,系统比较了在注册表隐藏和检测方面的诸多技术和方法,综合分析了它们存在的不足,提出了一种基于注册表Hive文件来进行恶意程序隐藏检测的方法,使得针对恶意程序的检测更加完整和可靠。实验表明,该方法可以检测出当前所有进行了注册表隐藏的恶意程序。 相似文献
6.
为了提高对环境空气质量监测系统中省级环境监测中心站里已汇集的海量监测数据的统计和分析效率,提出了一种基于Spark 集群在Hive上进行多维数据分区的查询优化方法。以湖北省环境监测中心站中的空气质量监测数据为研究对象,将数据转移到Spark集群利用Spark SQL连接Hive并进行分区存储。设计了12种查询,查询4个数据集,通过与采用传统查询方法的实验对比得出结论。实验结果表明:基于Hive的分区优化方法对空气质量大数据的查询时间有47%到96%的优化,而随着查询的复杂程度和数据量的增加,该方法的优化效果越明显。 相似文献
7.
8.
《Digital Communications & Networks》2016,2(3):108-121
Sarcasm is a type of sentiment where people express their negative feelings using positive or intensified positive words in the text. While speaking, people often use heavy tonal stress and certain gestural clues like rolling of the eyes, hand movement, etc. to reveal sarcastic. In the textual data, these tonal and gestural clues are missing, making sarcasm detection very difficult for an average human. Due to these challenges, researchers show interest in sarcasm detection of social media text, especially in tweets. Rapid growth of tweets in volume and its analysis pose major challenges. In this paper, we proposed a Hadoop based framework that captures real time tweets and processes it with a set of algorithms which identifies sarcastic sentiment effectively. We observe that the elapse time for analyzing and processing under Hadoop based framework significantly outperforms the conventional methods and is more suited for real time streaming tweets. 相似文献
9.
10.
Many organizations rely on relational database platforms for OLAP-style querying (aggregation and filtering) for small to medium size applications. We investigate the impact of scaling up the data sizes for such queries. We intend to illustrate what kind of performance results an organization could expect should they migrate current applications to big data environments. This paper benchmarks the performance of Hive (Thusoo et al., 2009) [9], a parallel data warehouse platform that is a part of the Hadoop software stack. We set up a 4-node Hadoop cluster using Hortonworks HDP 1.3.2 (Hortonworks HDP 1.3.2). We use the data generator provided by the TPC-DS benchmark (DSGen v1.1.0) to generate data of different scales. We compare the performance of loading data and querying for SQL and Hive Query Language (HiveQL) on a relational database installation (MySQL) and on a Hive cluster, respectively. We measure the speedup for query execution for three dataset sizes resulting from the scale up. Hive loads the large datasets faster than MySQL, while it is marginally slower than MySQL when loading the smaller datasets. Query execution in Hive is also faster. We also investigate executing Hive queries concurrently in workloads and conclude that serial execution of queries is a much better practice for clusters with limited resources. 相似文献