首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
We propose a parallelization scheme for an existing algorithm for constructing a web‐directory, that contains categories of web documents organized hierarchically. The clustering algorithm automatically infers the number of clusters using a quality function based on graph cuts. A parallel implementation of the algorithm has been developed to run on a cluster of multi‐core processors interconnected by an intranet. The effect of the well‐known Latent Semantic Indexing on the performance of the clustering algorithm is also considered. The parallelized graph‐cut based clustering algorithm achieves an F‐measure in the range [0.69,0.91] for the generated leaf‐level clusters while yielding a precision‐recall performance in the range [0.66,0.84] for the entire hierarchy of the generated clusters. As measured via empirical observations, the parallel algorithm achieves an average speedup of 7.38 over its sequential variant, at the same time yielding a better clustering performance than the sequential algorithm in terms of F‐measure. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
With the recent emergence of cloud computing based services on the Internet, MapReduce and distributed file systems like HDFS have emerged as the paradigm of choice for developing large scale data intensive applications. Given the scale at which these applications are deployed, minimizing power consumption of these clusters can significantly cut down operational costs and reduce their carbon footprint—thereby increasing the utility from a provider’s point of view. This paper addresses energy conservation for clusters of nodes that run MapReduce jobs. The algorithm dynamically reconfigures the cluster based on the current workload and turns cluster nodes on or off when the average cluster utilization rises above or falls below administrator specified thresholds, respectively. We evaluate our algorithm using the GridSim toolkit and our results show that the proposed algorithm achieves an energy reduction of 33% under average workloads and up to 54% under low workloads.  相似文献   

3.
With the exponential growth of WWW traffic, web proxy caching becomes a critical technique for Internet web services. Well-organized proxy caching systems with multiple servers can greatly reduce the user perceived latency and decrease the network bandwidth consumption. Thus, many research papers focused on improving web caching performance with the efficient coordination algorithms among multiple servers. Hash based algorithm is the most widely used server coordination mechanism, however, there's still a lot of technical issues need to be addressed. In this paper, we propose a new hash based web caching architecture, Tulip. Tulip aggregates web objects that are likely to be accessed together into object clusters and uses object clusters as the primary access units. Tulip extends the locality-based algorithm in UCFS to hash based web proxy systems and proposes a simple algorithm to reduce the data grouping overhead. It takes into consideration the access speed dispatch between memory and disk and replaces expensive small disk I/O with less large ones. In case a client request cannot be fulfilled by the server in the memory, the system fetches the whole cluster which contains the required object into memory, the future requests for other objects in the same cluster can be satisfied directly from memory and slow disk I/Os are avoided. It also introduces a simple and efficient data dupllication algorithm, few maintenance work need to be done in case of server join/leave or server failure. Along with the local caching strategy, Tulip achieves better fault tolerance and load balance capability with the minimal cost. Our simulation results show Tulip has better performance than previous approaches.  相似文献   

4.
一种节能的无线传感器网络分簇时间同步算法   总被引:1,自引:1,他引:0       下载免费PDF全文
叶雪  孙燕 《计算机工程》2009,35(19):117-119
提出一种节能的无线传感器网络分簇时间同步算法(CBTS)。利用高性能的晶体振荡器稳定性原理,通过高性能簇头组成簇状拓扑结构实现时间同步,取得延长簇的同步更新周期和减少簇内节点双向同步交换数据包次数成效。实验结果表明,CBTS算法与TPSN算法相比,在精度一定条件下,能有效降低整个网络的能耗。  相似文献   

5.
开放网络环境下存在大量的信息文档,如何判断文档内容的可信性、安全性一直是一个值得深入研究的问题。论文研究了可信文本分类的方法,收集了体现文本可信性的点滴素材,建立了文本的信任特征向量,并结合已有的特征选择方法,实现了一个基于向量空间模型的文本可信性分类算法,实验表明该方法具有较好的分类效果。  相似文献   

6.
Interval Set Clustering of Web Users with Rough K-Means   总被引:1,自引:0,他引:1  
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.  相似文献   

7.
数据聚集是传感器网络中比较耗时的操作,特别是在高密度网络中.因此,最小化数据聚集延迟问题成为人们研究的热点,该问题已经被证明是NP难问题.提出一个基于分簇思想的多信道多功率数据聚集调度算法MPMC,来降低聚集延迟.该算法采用一种簇内小功率、簇间大功率的分簇思想,结合信道分配来降低数据聚集延迟,簇间可无冲突同步进行数据聚集.并分析了不同网络拓扑下使用的信道个数趋于常数.在模拟实验中,算法MPMC与目前最好的单信道以及多信道数据聚集调度算法进行了比较,验证了MPMC的平均延迟最小.  相似文献   

8.
In recent decades, several optimization algorithms have been developed for selecting the most energy efficient clusters in order to save power during transmission to a shorter distance while restricting the Primary Users (PUs) interference. The Cognitive Radio (CR) system is based on the Adaptive Swarm Distributed Intelligent based Clustering algorithm (ASDIC) that shows better spectrum sensing among group of multiusers in terms of sensing error, power saving, and convergence time. In this research paper, the proposed ASDIC algorithm develops better energy efficient distributed cluster based sensing with the optimal number of clusters on their connectivity. In this research, multiple random Secondary Users (SUs), and PUs are considered for implementation. Hence, the proposed ASDIC algorithm improved the convergence speed by combining the multi-users clustered communication compared to the existing optimization algorithms. Experimental results showed that the proposed ASDIC algorithm reduced the node power of 9.646% compared to the existing algorithms. Similarly, ASDIC algorithm reduced 24.23% of SUs average node power compared to the existing algorithms. Probability of detection is higher by reducing the Signal-to-Noise Ratio (SNR) to 2 dB values. The proposed ASDIC delivers low false alarm rate compared to other existing optimization algorithms in the primary detection. Simulation results showed that the proposed ASDIC algorithm effectively solves the multimodal optimization problems and maximizes the performance of network capacity.  相似文献   

9.
唐伟  郭伟 《计算机应用研究》2009,26(8):3082-3085
将节点功率控制与数据聚合有机结合,为进一步降低网络能耗提供了可能,但是也给路由算法的设计带来了新的挑战。为此,针对WSNs中结合数据聚合的节能数据传递方式进行了研究,提出了一种新的最大化网络生命期的路由算法。该算法采用模拟退火算法最优化数据聚合点的选择,均衡节点能耗,最大化网络生命期。仿真结果表明该算法性能明显优于现有算法,达到了提高网络生命期的目的。  相似文献   

10.
邓亚平  陈峥 《计算机应用》2011,31(6):1465-1468
针对无线传感网(WSN)中分簇路由协议在簇首分布及节点能耗不均问题,提出了一种节点能量负载均衡的分组成簇算法。根据节点能量分组,并随着节点能量的减少动态调整分组个数,组内根据能量重心进行簇首选举,利用簇首轮转和簇间多跳路由进一步均衡节点能耗。仿真结果表明,该算法有效实现了负载均衡,并显著延长了网络的稳定期。  相似文献   

11.
如今的网络安全主要联系于无线传感网络,定位算法的优劣决定了无线传感网络的能力,也就决定了网络安全的能力。分簇定位算法具有能量效率高、可扩展性好、简单可行的优点,但是分簇算法属于一种模糊定位算法,不具有较高的精度。此外,分簇算法的簇首替换如果在全局范围内进行选择,很容易造成高消耗。AOA(angle of arrival)算法在复杂环境中具有精确的定位能力。文章通过运用AOA测距算法,使分簇算法的每个节点具有自探测能力,通过节点相对夹角,用坐标算法算出相应于簇首的坐标。为了让能量消耗低于普通分簇算法,文章根据完全图中能量有效原理,让分簇只在初始时进行一次,在每个簇内无簇首或能量低时,对簇首进行重选举,再将信息发送给总簇。Matlab仿真实验表明,文章算法实现了range-based的分簇定位算法,且保留了快速部署、能量效率高、高精度的优点。根据实验,文章算法适用于不松散部署的无线传感网络。  相似文献   

12.
为了组建一个家庭电能网络监测和管理各种电器,使所有的电器协调运行而达到节能最大化的目的,提出了一种基于电力线载波的嵌入式家庭网关的设计方案,并完成了系统的软硬件设计。该网关以ATmega128为主控制器,集成了IT700PIM电力线载波通信芯片和ENC28J60网卡芯片,移植了uIP协议栈,在智能家庭网关的硬件平台上实现了嵌入式的web server功能。实验结果表明,该网关达到了设计要求,满足智能家居的要求。  相似文献   

13.
基于多粒度树模型的Web站点描述及挖掘算法   总被引:2,自引:0,他引:2  
田永鸿  黄铁军  高文 《软件学报》2004,15(9):1393-1404
随着Web所拥有的信息量和信息种类的急剧增长,Web站点挖掘对于自动实现特定主题的Web资源发现和分类具有重要的意义.然而现有的Web站点分类或挖掘算法在利用上下文语义信息、去除噪声信息以进一步提高分类准确率等方面还缺乏深入研究.从站点的采样尺寸、分析粒度和描述结构3个方面分析了设计高效的Web站点挖掘算法所需要解决的问题.在此基础上,提出了一种新的Web站点多粒度树描述模型,并描述了包括基于隐Markov树的两阶段分类算法、粒度间上下文融合算法、两阶段去噪程序以及基于熵的动态剪枝策略在内的多粒度Web站点挖掘算法.站点的多粒度描述方法及挖掘算法为多站点查询优化、Web效用挖掘等的深入研究奠定了基础.实验表明,该算法相对于基线系统平均可以提高16%的分类准确率,并减少了34.5%的处理时间.  相似文献   

14.
在无线Ad Hoc网络路由协议中引入功率控制不但可以降低网络能量消耗,同时还能改善网络的吞吐量、投递率等性能,已成为当前Ad Hoc网络的一个研究热点.本文提出了一种基于跨层功率控制的按需路由算法CPC-AODV(Cross-layer Power Control Ad hoc On-demand Distance Vector).算法按需建立多个不同功率级的路由,节点选择到目的节点最小功率级的路由来传递分组,并对网络层的数据分组、路由分组和MAC层控制帧的传输采用不同功率控制策略来降低能量消耗.仿真结果表明:算法有利于降低通信能量开销,延长网络寿命,提高网络投递率及改善网络时延.  相似文献   

15.
Web search for a planet: The Google cluster architecture   总被引:11,自引:0,他引:11  
Barroso  L.A. Dean  J. Holzle  U. 《Micro, IEEE》2003,23(2):22-28
Amenable to extensive parallelization, Google's web search application lets different queries run on different processors and, by partitioning the overall index, also lets a single query use multiple processors. to handle this workload, Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software. This architecture achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.  相似文献   

16.
一种基于非均匀分簇的无线传感器网络路由协议   总被引:94,自引:0,他引:94  
在路由协议中利用分簇技术可以提高无线传感器网络的可扩展性.当簇首以多跳通信的方式将数据传输至数据汇聚点时,靠近汇聚点的簇首由于转发大量数据而负载过重,可能过早耗尽能量而失效,这将导致网络分割.该文提出一种新颖的基于非均匀分簇的无线传感器网络多跳路由协议.它的核心是一个用于组织网络拓扑的能量高效的非均匀分簇算法,其中候选簇首通过使用非均匀的竞争范围来构造大小不等的簇.靠近汇聚点的簇的规模小于远离汇聚点的簇,因此靠近汇聚点的簇首可以为簇间的数据转发预留能量.模拟实验结果表明,该路由协议有效地平衡了簇首的能量消耗,并显著地延长了网络的存活时间.  相似文献   

17.
基于Web日志挖掘的Web文档聚类   总被引:3,自引:1,他引:2  
Web日志挖掘是Web挖掘的一种,介绍了Web日志挖掘的一般过程,研究了k-means聚类算法,并分析了k-means聚类算法的不足.k-means聚类算法迭代过程中每次都需要计算每个数据对象到簇质心的距离,使得聚类效率不高,针对这个问题,提出了k-means聚类算法的改进算法,该算法避免了重复计算数据对象到簇质心的距离,并用这两种算法实现了Web文档的聚类.试验结果表明,该改进算法提高了聚类效率.  相似文献   

18.
Energy efficiency at the software level has gained much attention in the past decade. This paper presents a performance-aware frequency assignment algorithm for reducing processor energy consumption using Dynamic Voltage and Frequency Scaling (DVFS). Existing energy-saving techniques often rely on simplified predictions or domain knowledge to extract energy savings for specialized software (such as multimedia or mobile applications) or hardware (such as NPU or sensor nodes). We present an innovative framework, known as EClass, for general-purpose DVFS processors by recognizing short and repetitive utilization patterns efficiently using machine learning. Our algorithm is lightweight and can save up to 52.9% of the energy consumption compared with the classical PAST algorithm. It achieves an average savings of 9.1% when compared with an existing online learning algorithm that also utilizes the statistics from the current execution only. We have simulated the algorithms on a cycle-accurate power simulator. Experimental results show that EClass can effectively save energy for real life applications that exhibit mixed CPU utilization patterns during executions. Our research challenges an assumption among previous work in the research community that a simple and efficient heuristic should be used to adjust the processor frequency online. Our empirical result shows that the use of an advanced algorithm such as machine learning can not only compensate for the energy needed to run such an algorithm, but also outperforms prior techniques based on the above assumption.  相似文献   

19.
This article introduces the use of a multi-instance genetic programming algorithm for modelling user preferences in web index recommendation systems. The developed algorithm learns user interest by means of rules which add comprehensibility and clarity to the discovered models and increase the quality of the recommendations. This new model, called G3P-MI algorithm, is evaluated and compared with other available algorithms. Computational experiments show that our methodology achieves competitive results and provide high-quality user models which improve the accuracy of recommendations.  相似文献   

20.
为了提高网页目录的构建效率、增加其灵活性,提出了一种改进的文本聚类算法.改进的CBC算法用于快速确定文本的聚类中心,根据网页目录的特点,该算法增加了层次聚类方法,以形成文本类别的层次结构,考虑到网页文本的快速增长,采用增量方式对新网页进行聚类.把该算法应用于网页文本集,产生了有意义的聚类结果,对比K-Means算法,获得了更高的精度,并具有较高的时间性能,实验结果表明了该算法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号