期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs

《Information Systems》2017

相似文献

2.

熊熙《网络安全技术与应用》2010,(6):61-64

随着互联网的迅速普及和广泛应用,网络信息资源的数量及网站设计的复杂度也呈急剧增长趋势。如今,针对用户特性并向用户提供个性化服务已经成为计算机技术的研究热点之一。本文首先简述了Web日志挖掘的相关概念和具体实现过程,然后重点讲述了Web日志挖掘的关键技术。最后采用了用户群体聚类算法与Web页面聚类算法相结合实现挖掘用户访问模式,并针对个性化服务的应用和发展方向进行了研究和分析。相似文献

3.

一种基于动态时间阈值的会话识别方法 总被引：2，自引：1，他引：2

戴智丽王鑫昱《计算机应用与软件》2010,27(2):244-246

会话识别是Web日志挖掘的关键步骤,会话识别的质量直接影响后续挖掘的准确性。在Timeout方法固定时间阈值的基础上,提出动态时间阈值,通过对样本日志的分析,得到不同时段的时间阈值。在处理日志文件时,根据当前会话开始记录的访问时间选择时间阈值。实验表明,该方法识别会话的质量比Timeout方法有了明显提高。相似文献

4.

扩展AL log挖掘日志本体的ILP方法*

孙明陈波刘东周明天《计算机应用研究》2009,26(6):2328-2331

为发现Web使用记录中所蕴涵的用户访问模式,在深入分析日志本体中事件间的抽象关系后,提出适用于原子事件和复合事件间整分关系推理的ALC传播规则扩展已有的推理模式,并在此基础上提出一种挖掘日志本体的ILP方法。该方法结合描述逻辑和Horn规则在知识表示和推理过程中互补的特点,采用ALlog混合系统构建知识库,利用约束SLD反驳消解和扩展ALC传播规则从日志本体中学习用户访问模式,达到站点商业智能和个性化的目的。最后给出验证该方法的实例,实验结果表明了该方法的可行性和有效性。相似文献

5.

Using incremental Web log mining to create adaptive web servers

Tapan Kamdar Anupam Joshi 《International Journal on Digital Libraries》2005,5(2):133-150

Personalization of content returned from a Web site is an important problem in general and affects e-commerce and e-services in particular. Targeting appropriate information or products to the end user can significantly change (for the better) the user experience on a Web site. One possible approach to Web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. We present a system that mines the logs to obtain profiles and uses them to automatically generate a Web page containing URLs the user might be interested in. Profiles generated are only based on the prior traversal patterns of the user on the Web site and do not involve providing any declarative information or require the user to log in. Profiles are dynamic in nature. With time, a users traversal pattern changes. To reflect changes to the personalized page generated for the user, the profiles have to be regenerated, taking into account the existing profile. Instead of creating a new profile, we incrementally add and/or remove information from a user profile, aiming to save time as well as physical memory requirements. 相似文献

6.

基于数据仓库的Web日志挖掘技术研究

席景科张辰谢红侠《计算机工程与设计》2007,28(24):5890-5892

Web日志挖掘是目前Web挖掘研究的一个重点.针对Web日志挖掘中存在的问题,给出了基于数据仓库技术的Web日志挖掘方案,就数据预处理、数据立方体设计及数据挖掘技术的应用进行了较为深入的探讨.并以一个Web站点日志为例,详细阐述了Web日志数据预处理、Web日志立方体设计以及数据挖掘算法的实现过程,并实现了一个Web日志多维数据集,能够有效解决Web日志分析中的难题. 相似文献

7.

Web日志挖掘中数据预处理技术的研究 总被引：2，自引：0，他引：2

马瑞民李向云《计算机工程与设计》2007,28(10):2358-2360

在Web日志挖掘中数据预处理是整个挖掘过程的基础,由于客户端缓存的存在,在已往的预处理过程中都是通过路径补充技术得到用户完整的访问路径之后,才能进行事务识别.提出了一种只需根据网站的拓扑结构,不需要使用路径补充技术,由用户访问序列直接生成事务的算法. 相似文献

8.

结合DL-safe规则的评估日志本体访问模式

孙明陈波周明天《计算机应用》2009,29(5):1401-1404

为从Web使用记录中获得有效模式,在DL-Safe规则的限定下,将日志本体和应用访问规则相结合构建为一个推理过程可判定的混合日志知识库,并在此基础上提出一种从候选用户访问模式集中评估有效模式的方法。该方法首先借助归纳逻辑编程的思想对候选模式进行观察覆盖测试,并通过计算模式支持度找出其频繁项,然后利用语义普遍性测量提高了模式评估的质量。同时该方法还引入日志本体事件分类关系修剪冗余访问模式以提高模式评估的效率。实验结果表明了该方法的有效性和可行性。相似文献

9.

挖掘Web日志中的分类关联规则 总被引：1，自引：0，他引：1

下载免费PDF全文

林文龙刘业政姜元春焦宁《计算机工程与应用》2007,43(31):123-125

用户分类是Web访问模式挖掘研究的一个重要任务。提出一种应用关联分类技术对Web用户进行分类的方法：首先通过对Web日志文件预处理得到训练事务数据集,然后从该事务集中挖掘分类关联规则,并利用所挖掘的规则集构建了一个分类器,从而实现了根据用户访问历史对用户进行分类。相似文献

10.

Mining sequential patterns from data streams: a centroid approach 总被引：1，自引：0，他引：1

Alice Marascu Florent Masseglia 《Journal of Intelligent Information Systems》2006,27(3):291-307

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds. 相似文献

11.

一种基于购物网站的Web挖掘工具架构

王玉珍《自动化与仪器仪表》2009,(2):42-44

Web挖掘是数据挖掘的新方向之一,其应用领域非常广泛。架构基于购物网站的Web数据挖掘工具,通过该工具可发现客户识别、客户获取及客户保持等方面的有用信息,有效地使用这些信息可促进购物网站的发展。相似文献

12.

Factors affecting response rates of the web survey: A systematic review 总被引：2，自引：0，他引：2

Weimiao Fan Zheng Yan 《Computers in human behavior》2010

The lower response rate in web surveys has been a major concern for survey researchers. The literature has sought to identify a wide variety of factors that affect response rates in web surveys. In this article, we developed a conceptual model of the web survey process and use the model to systematically review a wide variety of factors influencing the response rate in the stage of survey development, survey delivery, survey completion, and survey return. Practical suggestion and future research directions on how to increase the response rate are discussed. 相似文献

13.

Web usage mining: extracting unexpected periods from web logs 总被引：3，自引：0，他引：3

F. Masseglia P. Poncelet M. Teisseire A. Marascu 《Data mining and knowledge discovery》2008,16(1):39-65

Existing Web usage mining techniques are currently based on an arbitrary division of the data (e.g. “one log per month”) or guided by presumed results (e.g. “what is the customers’ behaviour for the period of Christmas purchases?”). These approaches have two main drawbacks. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. In this paper, we propose a specific data mining process (in particular, to extract frequent behaviour patterns) in order to reveal the densest periods automatically. From the whole set of possible combinations, our method extracts the frequent sequential patterns related to the extracted periods. A period is considered to be dense if it contains at least one frequent sequential pattern for the set of users connected to the website in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods. 相似文献

14.

Web使用信息挖掘综述 总被引：29，自引：1，他引：29

郭岩白硕于满泉《计算机科学》2005,32(1):1-7

Web使用信息挖掘可以帮助我们更好地理解Web和Web用户访问模式,这对于开发Web的最大经济潜力是非常关键的。一般来说,使用信息挖掘包含三个阶段：数据预处理,模式发现和模式分析。文章以这三个阶段为PWeb框架,分别介绍了数据预处理的技术与困难,Web使用信息挖掘中常用的方法和算法,以及主要应用。相似文献

15.

Web使用挖掘系统研制中的主要问题和应对策略 总被引：6，自引：0，他引：6

张锋常会友《计算机科学》2003,30(6):129-132

With the rapid development of WWW,Web Usage Mining,as well as Web Mining,has become a hot direction in academic and industrial circles.It is generally believed that there are three tasks,preprocessing,knowledge discovery and pattern analysis,in Web Usage Mining.Though Web Usage Mining is still ranged in the application of traditional data mining techniques,in view of changes in application environment and operated data concerned,some new difficulties have arisen accordingly.This paper takes efforts to address such challenges in the three phases and introduces some proposed solutions simultaneously. 相似文献

16.

基于MFP算法的Web日志挖掘技术的研究

张友志钱萌程玉胜《电脑与信息技术》2006,14(2):60-62

为了更加合理地组织Web服务器的结构,需要通过Web日志挖掘分析用户的访问模式.数据预处理和日志挖掘算法是Web日志挖掘中的关键技术.文章就此进行了深入的研究,在已知用户访问路径的基础上,提出一种基于MFP算法的日志挖掘算法,并结合实例具体介绍了该算法的执行过程. 相似文献

17.

网络日志数据中条件因果挖掘算法的优化研究

刘云肖添《计算机工程与科学》2021,43(9):1584-1590

网络操作中收集了大量的系统日志数据,找出精确的系统故障成为重要的研究方向.提出一种条件因果挖掘算法(CCMA),通过从日志消息中生成一组时间序列数据,分别用傅里叶分析和线性回归分析删除大量无关的周期性时间序列后,利用因果推理算法输出有向无环图,通过检测无环图的边缘分布,消除冗余关系得出最终结果.仿真结果表明,对比依赖挖... 相似文献

18.

Performance improvement of web caching in Web 2.0 via knowledge discovery

Carlos Guerrero Isaac Lera Carlos Juiz 《Journal of Systems and Software》2013

Web 2.0 systems are more unpredictable and customizable than traditional web applications. This causes that performance techniques, such as web caching, limit their improvements. Our study was based on the hypotheses that the use of web caching in Web 2.0 applications, particularly in content aggregation systems, can be improved by adapting the content fragment designs. We proposed to base this adaptation on the analysis of the characterization parameters of the content elements and on the creation of a classification algorithm. This algorithm was deployed with decision trees, created in an off-line knowledge discovery process. We also defined a framework to create and adapt fragments of the web documents to reduce the user-perceived latency in web caches. The experiment results showed that our solution had a remarkable reduction in the user-perceived latency even losses in the cache hit ratios and in the overhead generated on the system, in comparison with other web cache schemes. 相似文献

19.

Web日志挖掘中的数据预处理技术研究与实现

李甲林《数字社区&智能家居》2009,(14)

Web日志挖掘可以使我们发现Web用户潜在的使用规律和模式。为了将存在着缺失、错误、噪音的原始Web日志数据转化为可靠、完整、准确的用户访问事务数据库,数据预处理工作是十分关键和重要的一步。文章就Web日志挖掘的预处理模型进行了深入的研究,并将其应用到实际日志数据预处理中,得到了理想的结果。相似文献

20.

一个可以准确反映Web 浏览兴趣的度量值——偏爱度 总被引：7，自引：0，他引：7

下载免费PDF全文

邢东山沈钧毅《控制与决策》2004,19(3):307-310

在分析如何准确反映Web浏览兴趣的基础上提出偏爱度的概念，并依据这个概念设计了基于用户浏览偏爱树的偏爱路径挖掘算法，首先用Web日志构筑用户浏览偏爱树(PNT)；然后利用PNT树进行用户浏览兴趣模式的挖掘，发现用户浏览偏爱路径，该算法可广泛应用于电子商务领域。相似文献