共查询到20条相似文献,搜索用时 0 毫秒
1.
《计算机工程》2017,(7):1-8
现有对视频网站电视剧流行度预测的研究中考虑因素较少,并且极少能在电视剧首播前进行预测,这会使视频网站在做出版权购买、广告投放等决策时考虑不全面并且出现预测时间滞后的问题。为此,提出一种在首播前预测视频网站电视剧流行度的方法,综合考虑电视剧剧名和演员搜索数据,通过分析时间序列确定最早预测时间,使用多元线性回归模型实现电视剧流行度的预测。实验结果表明,该方法可利用首播前第13—18天的剧名和演员的百度搜索指数对PPTV和优酷2014年、2015年上线的电视剧预测上线后30天的点播量,预测值与真实值之间的皮尔森相关系数分别达到0.943 7和0.967 6,具有较好的预测效果。 相似文献
2.
3.
The trend toward wireless communications and advances in mobile technologies are increasing consumer demand for ubiquitous access to Internet-based information and services. A 3D framework provides a basis for designing, analyzing, and evaluating strategies to address data consistency issues in mobile wireless environments. A proposed relay-peer-based cache consistency protocol offers a generic and flexible method for carrying out cache invalidation 相似文献
4.
研究出生人口预测问题,对于未来人口的预测起着极其重要的作用。如今常用的出生人口预测算法都是基于影响出生人口的主观因素而设计的,如各年龄段妇女总数、死亡率、生育率等,没有考虑经济因素、国家政策因素、国民受教育程度、人口总抚养比等客观因素的影响。近年来我国实际的出生量远低于国家所预测的出生量就充分体现了相关算法所存在的缺陷。鉴于此,论文在考虑了客观因素的条件下,利用历年来我国的人口相关数据并结合主成分分析法、多元线性回归算法、Spass软件等算法和软件对分年龄组生育率法进行了优化,从而得到了一种新的出生人口优化算法。仿真数据结果表明,该优化预测算法极大地提高了出生人口预测的精度,具有一定的理论参考意义和较高实用价值。 相似文献
5.
6.
7.
计算机技术和网络的发展使得数据呈爆炸式的涌现,社交媒体不断融入到人们的生活中,社会网络分析已成为研究的热点。随着大数据时代的到来,对社交网络链接算法研究产生巨大影响,原有的基于网络结构的预测方法已经渐渐不适应现状。因此,提出了一种基于主题模型的社交网络链接预测方法。首先以微博社交网络为数据源,将实验网络分为测试集和训练集;其次利用主题模型得到用户的主题特征,结合命名实体集和用户联系特征集合得到用户的兴趣特征相似性度量,加上网络结构相似性从而得到用户节点相似度,进而对社交网络链接进行预测;最终使用链接预测最常用的评价体系AUC来评价链接预测方法的效果。通过实验验证,该方法的预测准确率更高。 相似文献
8.
Kaladhar Voruganti M. Tamer Özsu Ronald C. Unrau 《Distributed and Parallel Databases》2004,15(2):137-177
Data-shipping is an important form of data distribution architecture where data objects are retrieved from the server, and are cached and operated upon at the client nodes. This architecture reduces network latency and increases resource utilization at the client. Object database management systems (ODBMS), file-systems, mobile data management systems, multi-tiered Web-server systems and hybrid query-shipping/data-shipping architectures all use some variant of the data-shipping. Despite a decade of research, there is still a lack of consensus amongst the proponents of ODBMSs as to the type of data shipping architectures and algorithms that should be used. The absence of both robust (with respect to performance) algorithms, and a comprehensive performance study comparing the competing algorithms are the key reasons for this lack of agreement. In this paper we address both of these problems. We first present an adaptive data-shipping architecture which utilizes adaptive data transfer, cache consistency and recovery algorithms to improve the robustness (with respect to performance) of a data-shipping ODBMS. We then present a comprehensive performance study which evaluates the competing client-server architectures and algorithms. The study verifies the robustness of the new adaptive data-shipping architecture, provides new insights into the performance of the different competing algorithms, and helps to overturn some existing notions about some of the algorithms. 相似文献
9.
针对信息查 要求,分析了常规MetaSearch系统的不足,提出了一种基于队的缓冲机制。分析影响缓冲系统性能的因素。 相似文献
10.
Integrating Web Prefetching and Caching Using Prediction Models 总被引:2,自引:0,他引:2
Web caching and prefetching have been studied in the past separately. In this paper, we present an integrated architecture for Web object caching and prefetching. Our goal is to design a prefetching system that can work with an existing Web caching system in a seamless manner. In this integrated architecture, a certain amount of caching space is reserved for prefetching. To empower the prefetching engine, a Web-object prediction model is built by mining the frequent paths from past Web log data. We show that the integrated architecture improves the performance over Web caching alone, and present our analysis on the tradeoff between the reduced latency and the potential increase in network load. 相似文献
11.
12.
Pull-based overlays are used in some of today’s largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime and executing it. This model helps overcome the problems of direct job submission in the highly complex grid environments, namely, heterogeneity, imprecise status information, relatively high failure rates and slow adaptation to changes of grid conditions or user priorities. This article presents a distributed scheduling architecture for such late-binding overlays. In this architecture, execution nodes share a distributed hash table and cooperatively perform job assignment. As our experiments prove, scalability problems of centralized matching are avoided, achieving low and predictable scheduling overheads even for execution of large workflows, and total turnaround times are improved. This is in line with the predictions of a theoretical model of grid workflow execution that the article also discusses. Scalability makes fine-grained scheduling possible and enables new functionalities, like a distributed data cache shared by the execution nodes, which helps alleviate the commonly congested storage services. In addition, we show that our system is more resilient to problems like communication breakdowns between computation centres. Moreover, the new architecture is better prepared to deal with demanding scenarios like intense demand of popular data files or remote data processing. 相似文献
13.
Big Data 总被引:1,自引:0,他引:1
14.
15.
Dr. Michael Schermann Dr. Holmer Hemsen Christoph Buchmüller Till Bitter Prof. Dr. Helmut Krcmar Prof. Dr. Volker Markl Prof. Dr. Thomas Hoeren 《WIRTSCHAFTSINFORMATIK》2014,56(5):277-279
“Big data” describes technologies that promise to fulfill a fundamental tenet of research in information systems, which is to provide the right information to the right receiver in the right volume and quality at the right time. For information systems research as an application-oriented research discipline, opportunities and risks arise from using big data. Risks arise primarily from the considerable number of resources used for the explanation and design of fads. Opportunities arise because these resources lead to substantial knowledge gains, which support scientific progress within the discipline and are of relevance to practice as well. From the authors’ perspective, information systems research is ideally positioned to support big data critically and use the knowledge gained to explain and design innovative information systems in business and administration – regardless of whether big data is in reality a disruptive technology or a cursory fad. The continuing development and adoption of big data will ultimately provide clarity on whether big data is a fad or if it represents substantial progress in information systems research. Three theses also show how future technological developments can be used to advance the discipline of information systems. Technological progress should be used for a cumulative supplement of existing models, tools, and methods. By contrast, scientific revolutions are independent of technological progress. 相似文献
16.
17.
18.
《计算机科学与探索》2018,(3):360-369
如何从数量众多的Web数据源集合中选择数量合适的数据源,使得在满足特定查询需求的前提下尽可能地减少访问数据源的数量,是Web大数据系统集成中的关键问题之一。提出了一个两阶段数据源选择方案:第一阶段通过各个数据源模式与中间模式的相似度选择与查询相关度高的数据源,通过计算依赖数据源的质量来选取质量较好的数据源;第二阶段基于最大熵理论计算数据源之间的重复率,设计实现了一个查询最小代价模型动态选择数据源算法。最后在实验平台上对算法进行了评估,实验表明该算法具有较高的效率与扩展性。 相似文献
19.
命名数据网络(NDN)缓存策略通常较少关注内容所属的服务类型及不同服务类型的服务质量需求差异,难以应用于服务类型多样、用户需求复杂的实际场景。为充分利用有限的缓存资源,借鉴IP网络中的Diffserv模型,提出一个适用于NDN的缓存内容分类模型,并给出同时考虑内容分类、路由器本地流行度和内容下载时延的概率缓存算法DiffCache。实验结果表明,该算法可实现缓存资源的动态分配,在不影响全局命中率和下载时延的情况下,能够准确区分每种内容类型的性能指标表现。 相似文献
20.
Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization
下载免费PDF全文

We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IR-MUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplication, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that 1) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency. 相似文献