共查询到20条相似文献,搜索用时 265 毫秒
1.
2.
3.
针对Web服务器日志中会话模式的页面属性为布尔量的特点,提出一种基于序列数的Web使用挖掘算法。该算法将用户会话模式转换成二进制数,然后用数字递增方式搜索候选频繁项;算法通过序列数的维来计算支持数,实现一次扫描用户会话模式,有效地提高了Web使用挖掘的效率。实验表明其效率比现有算法更快速而有效。 相似文献
4.
在对Web应用挖掘的基本步骤作系统性研究的基础上,设计了一个基于Web日志文件的关联规则挖掘模块。该系统应能够对用户访问Web时服务器方留下的访问记录进行挖掘,从中得出用户的访问模式和访问兴趣。为了识别用户浏览模式,实现了利用关联规则挖掘算法Apriori对Web应用挖掘过程中预处理阶段所产生的用户会话文件进行挖掘的模块,该模块针对用户选定的若干页面产生满足最小支持度和最小置信度的页面之间的强关联规则,并以文本的形式显示挖掘的结果。 相似文献
5.
Web日志隐含了用户访问网站的行为和特点,对其进行聚类分析可以获取用户的浏览模式,发现用户访问网站的偏好和兴趣,从而优化站点结构,实现个性化的服务。针对Web日志数据特点,本文提出免疫网络聚类算法。该算法将Web服务器看成生物机体,用户访问Web的请求序列看成需要检测的入侵抗原,模拟抗体学习抗原的生物机理,自动生成代表用户访问模式的记忆抗体,实现动态聚类。 相似文献
6.
缩短Web访问中的用户感知时间,是Web应用中的一个重要问题,服务器需要预测用户未来的HTTP请求和处理当前的网页以提高Web服务器的响应速度,为此提出了一种基于用户访问模式的Web预取算法.该算法根据Web日志信息分析了用户的访问模式,并计算出Web页面间的转移概率,以此作为对用户未来请求预取的依据.实验结果表明,该预取算法能有效提高预测精度和命中率,有效地缩短了用户的感知时间. 相似文献
7.
8.
9.
鲍钰 《计算机工程与应用》2009,45(18):132-134
随着3G时代的到来,手机上网已逐步普及,由于手机屏幕较小及上网带宽限制,需要为手机访问者提供只需保留原Web站点主干分支的WAP子网。WWW上用户的访问路径信息会被记录在Web服务器的日志记录中,分析这些日志并挖掘出用户的主要行为模式,可以提取出Web网站被频繁访问的主干部分。首先将原始日志序列转化成用户访问路径会话集UVPSD,然后通过约束的加权网站结构图WWSSG,最终实现了此Web站点的频繁主干子网的发现。在上海社区网上采用此算法提取出的3G WAP子网,实验数据表明,该子网覆盖了上海社区网的大部分热门栏目页面。 相似文献
10.
11.
Sampath S. Sprenkle S. Gibson E. Pollock L. Greenwald A.S. 《IEEE transactions on pattern analysis and machine intelligence》2007,33(10):643-658
The continuous use of the Web for daily operations by businesses, consumers, and the government has created a great demand for reliable Web applications. One promising approach to testing the functionality of Web applications leverages the user-session data collected by Web servers. User-session-based testing automatically generates test cases based on real user profiles. The key contribution of this paper is the application of concept analysis for clustering user sessions and a set of heuristics for test case selection. Existing incremental concept analysis algorithms are exploited to avoid collecting and maintaining large user-session data sets and to thus provide scalability. We have completely automated the process from user session collection and test suite reduction through test case replay. Our incremental test suite update algorithm, coupled with our experimental study, indicates that concept analysis provides a promising means for incrementally updating reduced test suites in response to newly captured user sessions with little loss in fault detection capability and program coverage. 相似文献
12.
Speed-density relationships are used by mesoscopic traffic simulators to represent traffic dynamics. While classical speed-density relationships provide useful insights into the traffic dynamics problem, they may be restrictive for such applications. This paper addresses the problem of calibrating speed-density relationship parameters using data mining techniques, and proposes a novel hierarchical clustering algorithm based on K-means clustering. By combining K-means with agglomerative hierarchical clustering, the proposed new algorithm is able to reduce early-stage errors inherent in agglomerative hierarchical clustering resulted in improved clustering performance. Moreover, in order to improve the precision of parametric calibration, densities and flows are utilized as variables. The proposed approach is tested against sensor data captured from the 3rd Ring Road of Beijing. The testing results show that the performance of our algorithm is better than existing solutions. 相似文献
13.
测试是保证Web应用的高质量、高可靠性的一种有效手段,然而,由于其特殊性和复杂性,使得传统的测试理论与方法很难直接运用到Web应用的测试中,一个关键的问题就是测试用例的生成及其优化。提出了一种将遗传算法用于基于用户会话的Web应用测试用例生成及其优化的方法。通过分析服务器的用户日志,清除无关的数据,得到大量有意义的用户会话,利用约简技术进一步剔除其中的冗余。为便于测试的重用和并发执行,将用户会话进行合理的分组,每一组称为一个测试套件,并在测试套件之间以及测试套件内部(测试用例之间)进行初步的优先排序。这样就得到了初始的测试套件和测试用例,以及它们的初始执行顺序。这种初始的测试方案离最优解的近似程度还不是很高,需进一步利用遗传算法对它们进行分组优化并优先排序。同时提出了一种利用交叉算子产生新的测试用例的方法,新的测试用例可以检测不同用户共享数据时可能带来的冲突而产生的错误。 相似文献
14.
当今互联网所提供的功能和服务越来越多,Web内容也越来越丰富,移动应用越来越流行。然而,复杂的Web服务应用对用户提出了更高的要求,给用户浏览带来了很多问题,很多时候用户会感到无所适从。文中提出基于用户浏览序列模式的用户行为提取与分析方法。该方法可以分为浏览模式分析和用户聚类两部分。在浏览模式分析时,首先根据用户行为数据得到浏览序列,然后运用序列模式挖掘PrefixSpan算法获取用户习惯的浏览模式,最后把分析获取的用户浏览模式应用到Web浏览中,为不同的用户需求提供个性化的服务。在用户聚类时,运用层次聚类方法按照浏览模式的相似性对用户进行聚类,以分析用户的不同属性(如年龄、职业、学历等)对用户浏览模式的影响。实验结果表明,文中采用的PrefixSpan算法和层次聚类方法在用户浏览模式分析和研究方面具有很好的可行性和有效性。 相似文献
15.
针对现有个性化推荐服务系统中用户会话聚类算法存在相似性度量准确性低和需要事先确定聚类数目的问
题,对序化的用户访问页面和对应的访问时间信息进行整合,提出一种基于动态规划算法的全序列比对方法来度量用
户会话的相似性。在此基础上,运用改进的NJ W谱聚类算法对用户会话进行自动谱聚类。实验结果表明,算法充分
考虑了用户会话的整体特征和局部信息,较相关比对算法具有更高的聚类性能,可以提高网站个性化推荐服务的效
率。 相似文献
16.
17.
18.
Data clustering is an important data preparation process in many scientific analysis researches. In astronomy, although the distributed environments and modern observation techniques enable users to collect and access huge amounts of data, the corresponding clustering process may become very costly. One of the challenges is that the sequential clustering algorithms, that can be applied to cluster hundreds of thousand main-belt asteroids to reason about the origins of the main-belt asteroids, may not be used in the distributed environment directly. Therefore, this study focuses on the problem of parallelizing the traditional hierarchical agglomerative clustering algorithm using shortest-linkage. We propose a new parallel hierarchical agglomerative clustering algorithm based on the master–worker model. The master process divides the whole computation into several small tasks, and distributes the tasks to the worker processes for parallel processing. Then, the master process merges the results from the worker processes to form a hierarchical data structure. The proposed algorithm uses a pruning threshold to reduce the execution time and the storage requirement during the computation. It also supports fast incremental update that merges new data items into a constructed hierarchical tree in seconds, given a tree of about 550,000 data items. To evaluate the performance of our algorithm, this study has conducted several experiments using the MPCORB dataset and a dataset from the DVO database. The results confirm the efficiency of our proposed methodology. Compared with prior similar studies, the proposed algorithm is more flexible and practical in the problem of distributed hierarchical agglomerative clustering. 相似文献
19.
Miao Wan Arne J?nsson Cong Wang Lixiang Li Yixian Yang 《Knowledge and Information Systems》2011,33(1):89-115
Users of a Web site usually perform their interest-oriented actions by clicking or visiting Web pages, which are traced in access log files. Clustering Web user access patterns may capture common user interests to a Web site, and in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. The conventional Web usage mining techniques for clustering Web user sessions can discover usage patterns directly, but cannot identify the latent factors or hidden relationships among users?? navigational behaviour. In this paper, we propose an approach based on a vector space model, called Random Indexing, to discover such intrinsic characteristics of Web users?? activities. The underlying factors are then utilised for clustering individual user navigational patterns and creating common user profiles. The clustering results will be used to predict and prefetch Web requests for grouped users. We demonstrate the usability and superiority of the proposed Web user clustering approach through experiments on a real Web log file. The clustering and prefetching tasks are evaluated by comparison with previous studies demonstrating better clustering performance and higher prefetching accuracy. 相似文献
20.
Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm 总被引:1,自引:0,他引:1
We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated. 相似文献