期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱靖君吴海燕高国柱程志锐《计算机工程》2010,36(23):25-27

提出一种基于日志分析的Web负载测试方法。通过序列模式挖掘得到用户访问的频繁序列模式,日志分析得到负载的相关参数,并据此构造出逼近于真实的测试负载。利用性能测试工具LoadRunner对Web应用系统进行负载测试。将测试日志与真实日志进行对比,验证了测试负载与真实负载的相似性。相似文献

2.

Web访问序列模式挖掘算法的研究

李陶深王伟娜陈庆峰《计算机科学》2013,40(12):41-44

针对现有Web访问序列模式挖掘算法和PrefixSpan算法存在的问题,提出一种基于投影位置的Web访问序列模式挖掘算法(PWSPM)。该算法通过序列模式分析,发现用户的行为模式,预测用户对网页的访问模式,进而改进站点的性能和组织结构,提高用户查找信息的质量和效率,以及对用户开展个性化的信息服务。实验和应用结果表明,提出的算法具有更好的执行效率,适用于Web日志挖掘,可用于构建智能化Web站点和解决个性化的信息服务问题。相似文献

3.

一种基于序列数的Web使用挖掘算法

下载免费PDF全文

方刚《计算机系统应用》2010,19(12):100-104

针对Web服务器日志中会话模式的页面属性为布尔量的特点,提出一种基于序列数的Web使用挖掘算法。该算法将用户会话模式转换成二进制数,然后用数字递增方式搜索候选频繁项;算法通过序列数的维来计算支持数,实现一次扫描用户会话模式,有效地提高了Web使用挖掘的效率。实验表明其效率比现有算法更快速而有效。相似文献

4.

基于Web日志文件的关联规则挖掘模块的实现

米娜瓦尔·努拉合买提玛依拉·别克强塔依娃张太红曾明 Osmar. R. Zaiane 《微机发展》2011,(9):51-54

在对Web应用挖掘的基本步骤作系统性研究的基础上,设计了一个基于Web日志文件的关联规则挖掘模块。该系统应能够对用户访问Web时服务器方留下的访问记录进行挖掘,从中得出用户的访问模式和访问兴趣。为了识别用户浏览模式,实现了利用关联规则挖掘算法Apriori对Web应用挖掘过程中预处理阶段所产生的用户会话文件进行挖掘的模块,该模块针对用户选定的若干页面产生满足最小支持度和最小置信度的页面之间的强关联规则,并以文本的形式显示挖掘的结果。相似文献

5.

一种面向Web日志挖掘的免疫网络聚类算法

吕佳《计算机科学》2007,34(4):204-206

Web日志隐含了用户访问网站的行为和特点，对其进行聚类分析可以获取用户的浏览模式，发现用户访问网站的偏好和兴趣，从而优化站点结构，实现个性化的服务。针对Web日志数据特点，本文提出免疫网络聚类算法。该算法将Web服务器看成生物机体，用户访问Web的请求序列看成需要检测的入侵抗原，模拟抗体学习抗原的生物机理，自动生成代表用户访问模式的记忆抗体，实现动态聚类。相似文献

6.

基于用户访问模式的Web预取算法

张晓丽壮志剑史明《计算机工程与设计》2009,30(22)

缩短Web访问中的用户感知时间,是Web应用中的一个重要问题,服务器需要预测用户未来的HTTP请求和处理当前的网页以提高Web服务器的响应速度,为此提出了一种基于用户访问模式的Web预取算法.该算法根据Web日志信息分析了用户的访问模式,并计算出Web页面间的转移概率,以此作为对用户未来请求预取的依据.实验结果表明,该预取算法能有效提高预测精度和命中率,有效地缩短了用户的感知时间. 相似文献

7.

基于MFP算法的Web日志挖掘技术的研究

张友志钱萌程玉胜《电脑与信息技术》2006,14(2):60-62

为了更加合理地组织Web服务器的结构,需要通过Web日志挖掘分析用户的访问模式.数据预处理和日志挖掘算法是Web日志挖掘中的关键技术.文章就此进行了深入的研究,在已知用户访问路径的基础上,提出一种基于MFP算法的日志挖掘算法,并结合实例具体介绍了该算法的执行过程. 相似文献

8.

基于m-Markov模型的交叉用户会话识别

黄浩李兵姜丹《计算机科学》2012,39(Z3)

Web访问日志数据是由单个用户点击形成的数据集,各点击是独立的,会话识别的任务就是将各个独立的点击划分成有意义的会话片段.一般的会话识别算法无法对包含交叉会话数据的Web访问日志数据成功地进行会话识别,利用自适应m-Markov模型能对Web访问日志数据进行交叉服务器会话识别和重构,并在m-Markov模型的基础上结合不同的会话结束判断准则进行会话识别准确率的比较.实验结果显示,将m-Markov模型与基于奖惩策略的会话结束算法结合能明显提高会话识别和重构的准确率. 相似文献

9.

Web日志挖掘中3G WAP子网的获取研究

下载免费PDF全文

鲍钰《计算机工程与应用》2009,45(18):132-134

随着3G时代的到来,手机上网已逐步普及,由于手机屏幕较小及上网带宽限制,需要为手机访问者提供只需保留原Web站点主干分支的WAP子网。WWW上用户的访问路径信息会被记录在Web服务器的日志记录中,分析这些日志并挖掘出用户的主要行为模式,可以提取出Web网站被频繁访问的主干部分。首先将原始日志序列转化成用户访问路径会话集UVPSD,然后通过约束的加权网站结构图WWSSG,最终实现了此Web站点的频繁主干子网的发现。在上海社区网上采用此算法提取出的3G WAP子网,实验数据表明,该子网覆盖了上海社区网的大部分热门栏目页面。相似文献

10.

基于过滤器的Web访问模式挖掘

下载免费PDF全文

佟强周园春吴开超阎保平《计算机工程》2007,33(6):59-61

针对传统Web访问模式挖掘系统中用户识别和会话识别的复杂性和不准确性，该文提出了基于过滤器的Web访问模式挖掘系统。它能够准确地识别用户和会话，为挖掘算法提供优质的数据。给出了日志过滤器的实现和部署，提出了Web访问模式的挖掘算法。目前该方法已经广泛地应用于科学数据库系统中。相似文献

11.

Applying Concept Analysis to User-Session-Based Testing of Web Applications

Sampath S. Sprenkle S. Gibson E. Pollock L. Greenwald A.S. 《IEEE transactions on pattern analysis and machine intelligence》2007,33(10):643-658

The continuous use of the Web for daily operations by businesses, consumers, and the government has created a great demand for reliable Web applications. One promising approach to testing the functionality of Web applications leverages the user-session data collected by Web servers. User-session-based testing automatically generates test cases based on real user profiles. The key contribution of this paper is the application of concept analysis for clustering user sessions and a set of heuristics for test case selection. Existing incremental concept analysis algorithms are exploited to avoid collecting and maintaining large user-session data sets and to thus provide scalability. We have completely automated the process from user session collection and test suite reduction through test case replay. Our incremental test suite update algorithm, coupled with our experimental study, indicates that concept analysis provides a promising means for incrementally updating reduced test suites in response to newly captured user sessions with little loss in fault detection capability and program coverage. 相似文献

12.

Parametric calibration of speed-density relationships in mesoscopic traffic simulator with data mining

Zhu Jiang Yong-Xuan Huang 《Information Sciences》2009,179(12):2002-5052

Speed-density relationships are used by mesoscopic traffic simulators to represent traffic dynamics. While classical speed-density relationships provide useful insights into the traffic dynamics problem, they may be restrictive for such applications. This paper addresses the problem of calibrating speed-density relationship parameters using data mining techniques, and proposes a novel hierarchical clustering algorithm based on K-means clustering. By combining K-means with agglomerative hierarchical clustering, the proposed new algorithm is able to reduce early-stage errors inherent in agglomerative hierarchical clustering resulted in improved clustering performance. Moreover, in order to improve the precision of parametric calibration, densities and flows are utilized as variables. The proposed approach is tested against sensor data captured from the 3rd Ring Road of Beijing. The testing results show that the performance of our algorithm is better than existing solutions. 相似文献

13.

面向用户会话的Web应用测试用例生成及其优化 总被引：2，自引：0，他引：2

下载免费PDF全文

钱忠胜缪淮扣《计算机科学与探索》2008,2(6):627-640

测试是保证Web应用的高质量、高可靠性的一种有效手段,然而,由于其特殊性和复杂性,使得传统的测试理论与方法很难直接运用到Web应用的测试中,一个关键的问题就是测试用例的生成及其优化。提出了一种将遗传算法用于基于用户会话的Web应用测试用例生成及其优化的方法。通过分析服务器的用户日志,清除无关的数据,得到大量有意义的用户会话,利用约简技术进一步剔除其中的冗余。为便于测试的重用和并发执行,将用户会话进行合理的分组,每一组称为一个测试套件,并在测试套件之间以及测试套件内部(测试用例之间)进行初步的优先排序。这样就得到了初始的测试套件和测试用例,以及它们的初始执行顺序。这种初始的测试方案离最优解的近似程度还不是很高,需进一步利用遗传算法对它们进行分组优化并优先排序。同时提出了一种利用交叉算子产生新的测试用例的方法,新的测试用例可以检测不同用户共享数据时可能带来的冲突而产生的错误。相似文献

14.

基于序列模式的用户浏览行为提取与分析

车高营张磊张禄旭《计算机技术与发展》2012,(9):9-12,17

当今互联网所提供的功能和服务越来越多,Web内容也越来越丰富,移动应用越来越流行。然而,复杂的Web服务应用对用户提出了更高的要求,给用户浏览带来了很多问题,很多时候用户会感到无所适从。文中提出基于用户浏览序列模式的用户行为提取与分析方法。该方法可以分为浏览模式分析和用户聚类两部分。在浏览模式分析时,首先根据用户行为数据得到浏览序列,然后运用序列模式挖掘PrefixSpan算法获取用户习惯的浏览模式,最后把分析获取的用户浏览模式应用到Web浏览中,为不同的用户需求提供个性化的服务。在用户聚类时,运用层次聚类方法按照浏览模式的相似性对用户进行聚类,以分析用户的不同属性（如年龄、职业、学历等）对用户浏览模式的影响。实验结果表明,文中采用的PrefixSpan算法和层次聚类方法在用户浏览模式分析和研究方面具有很好的可行性和有效性。相似文献

15.

基于全序列比对相似度的用户会话自动谱聚类

姜大庆周勇《计算机科学》2012,39(11):142-144

针对现有个性化推荐服务系统中用户会话聚类算法存在相似性度量准确性低和需要事先确定聚类数目的问题,对序化的用户访问页面和对应的访问时间信息进行整合,提出一种基于动态规划算法的全序列比对方法来度量用户会话的相似性。在此基础上,运用改进的NJ W谱聚类算法对用户会话进行自动谱聚类。实验结果表明,算法充分考虑了用户会话的整体特征和局部信息,较相关比对算法具有更高的聚类性能,可以提高网站个性化推荐服务的效率。相似文献

16.

基于K-Means的文本层次聚类算法研究 总被引：6，自引：0，他引：6

尉景辉何丕廉孙越恒《计算机应用》2005,25(10):2323-2324

提出了一种基于K-Means的文本层次聚类算法。它结合凝聚层次聚类和K Means算法的特点,减少凝聚层次法在凝聚过程中的错误,提高了聚类质量。实验结果表明,该算法的聚类质量优于层次聚类法。相似文献

17.

通过查询模式聚类结构化的Deep Web资源

陈娟王贤黄青松《现代计算机》2006,(9):19-21,62

近几年,网络被在线数据库迅速地深化.在深网中,大量的资料提供了丰富的数据模式,这些模式详细说明了它们的目标领域和查询性能,因此对大规模数据的整合是当前面临的挑战.在数据挖掘中,聚类分析是一个重要方法.本文论述通过查询接口采用凝聚层次聚类方法聚类结构化的Web资源,并采用先聚类后分类的方法稍加改进.实验显示对于聚类Web查询模式,凝聚的层次聚类能正确地组织资料. 相似文献

18.

Shortest-linkage-based parallel hierarchical clustering on main-belt moving objects of the solar system

《Future Generation Computer Systems》2014

Data clustering is an important data preparation process in many scientific analysis researches. In astronomy, although the distributed environments and modern observation techniques enable users to collect and access huge amounts of data, the corresponding clustering process may become very costly. One of the challenges is that the sequential clustering algorithms, that can be applied to cluster hundreds of thousand main-belt asteroids to reason about the origins of the main-belt asteroids, may not be used in the distributed environment directly. Therefore, this study focuses on the problem of parallelizing the traditional hierarchical agglomerative clustering algorithm using shortest-linkage. We propose a new parallel hierarchical agglomerative clustering algorithm based on the master–worker model. The master process divides the whole computation into several small tasks, and distributes the tasks to the worker processes for parallel processing. Then, the master process merges the results from the worker processes to form a hierarchical data structure. The proposed algorithm uses a pruning threshold to reduce the execution time and the storage requirement during the computation. It also supports fast incremental update that merges new data items into a constructed hierarchical tree in seconds, given a tree of about 550,000 data items. To evaluate the performance of our algorithm, this study has conducted several experiments using the MPCORB dataset and a dataset from the DVO database. The results confirm the efficiency of our proposed methodology. Compared with prior similar studies, the proposed algorithm is more flexible and practical in the problem of distributed hierarchical agglomerative clustering. 相似文献

19.

Web user clustering and Web prefetching using Random Indexing with weight functions 总被引：1，自引：1，他引：0

Miao Wan Arne J?nsson Cong Wang Lixiang Li Yixian Yang 《Knowledge and Information Systems》2011,33(1):89-115

Users of a Web site usually perform their interest-oriented actions by clicking or visiting Web pages, which are traced in access log files. Clustering Web user access patterns may capture common user interests to a Web site, and in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. The conventional Web usage mining techniques for clustering Web user sessions can discover usage patterns directly, but cannot identify the latent factors or hidden relationships among users?? navigational behaviour. In this paper, we propose an approach based on a vector space model, called Random Indexing, to discover such intrinsic characteristics of Web users?? activities. The underlying factors are then utilised for clustering individual user navigational patterns and creating common user profiles. The clustering results will be used to predict and prefetch Web requests for grouped users. We demonstrate the usability and superiority of the proposed Web user clustering approach through experiments on a real Web log file. The clustering and prefetching tasks are evaluated by comparison with previous studies demonstrating better clustering performance and higher prefetching accuracy. 相似文献

20.

Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm 总被引：1，自引：0，他引：1

Sungjune Nallan C. Bong-Keun 《Data & Knowledge Engineering》2008,65(3):512-543

We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated. 相似文献