首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
朱志国  孔立平 《微机发展》2008,18(6):228-232
随着电子商务的深入发展,电子商务站点每天需要处理大量的数据,但数据资源中蕴涵的重要信息却至今未能得到充分的挖掘和利用。在日益激烈的电子商务市场竞争中,任何与消费者行为有关的信息对经营者来说都是非常宝贵的。企业了解用户的访问模式显得非常重要。给出Web使用挖掘的定义和完整模型框架,然后对Web使用挖掘中主要步骤的最新研究进展状况做详细的阐述和分析,其中包括:数据采集、数据预处理、模式发现、模式分析。最后对传统的和基于Web使用挖掘技术的电子商务结构模型做了对比,并深入分析了Web使用挖掘在电子商务的应用。  相似文献   

2.
Users of web sites often do not know exactly which information they are looking for nor what the site has to offer. The purpose of their interaction is not only to fulfill but also to articulate their information needs. In these cases users need to pass through a series of pages before they can use the information that will eventually answer their questions. Current systems that support navigation predict which pages are interesting for the users on the basis of commonalities in the contents or the usage of the pages. They do not take into account the order in which the pages must be visited. In this paper we propose a method to automatically divide the pages of a web site on the basis of user logs into sets of pages that correspond to navigation stages. The method searches for an optimal number of stages and assigns each page to a stage. The stages can be used in combination with the pages’ topics to give better recommendations or to structure or adapt the site. The resulting navigation structures guide the users step by step through the site providing pages that do not only match the topic of the user’s search, but also the current stage of the navigation process.  相似文献   

3.
Mining Navigation Patterns Using a Sequence Alignment Method   总被引:2,自引:0,他引:2  
In this article, a new method is illustrated for mining navigation patterns on a web site. Instead of clustering patterns by means of a Euclidean distance measure, in this approach users are partitioned into clusters using a non-Euclidean distance measure called the Sequence Alignment Method (SAM). This method partitions navigation patterns according to the order in which web pages are requested and handles the problem of clustering sequences of different lengths. The performance of the algorithm is compared with the results of a method based on Euclidean distance measures. SAM is validated by means of user-traffic data of two different web sites. Empirical results show that SAM identifies sequences with similar behavioral patterns not only with regard to content, but also considering the order of pages visited in a sequence.  相似文献   

4.
Data Mining for Measuring and Improving the Success of Web Sites   总被引:4,自引:0,他引:4  
For many companies, competitiveness in e-commerce requires a successful presence on the web. Web sites are used to establish the company's image, to promote and sell goods and to provide customer support. The success of a web site affects and reflects directly the success of the company in the electronic market. In this study, we propose a methodology to improve the success of web sites, based on the exploitation of navigation pattern discovery. In particular, we present a theory, in which success is modelled on the basis of the navigation behaviour of the site's users. We then exploit WUM, a navigation pattern discovery miner, to study how the success of a site is reflected in the users' behaviour. With WUM we measure the success of a site's components and obtain concrete indications of how the site should be improved. We report on our first experiments with an online catalog, the success of which we have studied. Our mining analysis has shown very promising results, on the basis of which the site is currently undergoing concrete improvements.  相似文献   

5.
Mining linguistic browsing patterns in the world wide web   总被引:2,自引:0,他引:2  
 World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and can be used to provide some appropriate suggestions to web-server managers.  相似文献   

6.
Abstract. In meta-searchers accessing distributed Web-based information repositories, performance is a major issue. Efficient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve data. In this work, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctive Web queries based on signature files. Our algorithms cope with both relations of semantic containment and intersection between a query and the corresponding cache items. We also develop the cache replacement strategy to treat situations when cached items differ in size and contribution when providing partial query answers. We report results of experiments and show how the caching mechanism is realized in the Knowledge Broker system. Received June 15, 1999 / Accepted December 24, 1999  相似文献   

7.
This paper provides an overview of a project aimed at using knowledge-based technology to improve accessibility of the Web for visually impaired users. The focus is on the multi-dimensional components of Web pages (tables and frames); our cognitive studies demonstrate that spatial information is essential in comprehending tabular data, and this aspect has been largely overlooked in the existing literature. Our approach addresses these issues by using explicit representations of the navigational semantics of the documents and using a domain-specific language to query the semantic representation and derive navigation strategies. Navigational knowledge is explicitly generated and associated to the tabular and multi-dimensional HTML structures of documents. This semantic representation provides to the blind user an abstract representation of the layout of the document; the user is then allowed to issue commands from the domain-specific language to access and traverse the document according to its abstract layout. Published online: 6 November 2002  相似文献   

8.
In today’s competitive business environment, the majority of companies are expected to be represented on the Internet in the form of an electronic commerce site. In an effort to keep up with current business trends, certain aspects of interface design such as those related to navigation and perception may be overlooked. For instance, the manner in which a visitor to the site might perceive the information displayed or the ease with which they navigate through the site may not be taken into consideration. This paper reports on the evaluation of the electronic commerce sites of three different companies, focusing specifically on the human factors issues such as perception and navigation. Heuristic evaluation, the most popular method for investigating user interface design, is the technique employed to assess each of these sites. In light of the results from the analysis of the evaluation data, virtual environments are suggested as a way of improving the navigation and perception display constraints.  相似文献   

9.
Online personalization is of great interest to e-companies. Virtually all personalization technologies are based on the idea of storing as much historical customer session data as possible, and then querying the data store as customers navigate through a web site. The holy grail of online personalization is an environment where fine-grained, detailed historical session data can be queried based on current online navigation patterns for use in formulating real-time responses. Unfortunately, as more consumers become e-shoppers, the user load and the amount of historical data continue to increase, causing scalability-related problems for almost all current personalization technologies. This paper chronicles the development of a real-time interaction management system through the integration of historical data and online visitation patterns of e-commerce site visitors. It describes the scientific underpinnings of the system as well as its architecture. Experimental evaluation of the system shows that the caching and storage techniques built into the system deliver performance that is orders of magnitude better than those derived from off-the-shelf database components. Received: 30 October 2000 / Accepted: 19 December 2000 Published online: 27 April 2001  相似文献   

10.
11.
Correlation-Based Web Document Clustering for Adaptive Web Interface Design   总被引:2,自引:2,他引:2  
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the previous algorithms based on experiments on several realistic web log files. Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001  相似文献   

12.
Web站点导航是Web数据挖掘的一个重要研究领域,是准确理解用户访问网站行为的关键;传统Web站点导航技术很难全面反映出用户对页面浏览的兴趣程度,找到用户感兴趣页面路径准确度比较低;为提高找到用户感兴趣页面路径准确度,提出一种基于蚁群算法的Web站点导航技术;将网络用户看作人工的蚂蚁,用户的浏览兴趣作蚂蚁的信息素,通过利用Web日志数据采用正负反馈机制和路径概率选择机制建立一个Web站点导航模型,挖掘用户感兴趣页面的导航路径;仿真实验结果表明,基于蚁群算法的Web站点导航技术提高了找到用户感兴趣页面路径准确度,更加能够准确反映出用户的浏览兴趣,用于Web站点导航是可行的。  相似文献   

13.
A rapidly increasing number of Web databases are now become accessible via their HTML form-based query interfaces. Query result pages are dynamically generated in response to user queries, which encode structured data and are displayed for human use. Query result pages usually contain other types of information in addition to query results, e.g., advertisements, navigation bar etc. The problem of extracting structured data from query result pages is critical for web data integration applications, such as comparison shopping, meta-search engines etc, and has been intensively studied. A number of approaches have been proposed. As the structures of Web pages become more and more complex, the existing approaches start to fail, and most of them do not remove irrelevant contents which may affect the accuracy of data record extraction. We propose an automated approach for Web data extraction. First, it makes use of visual features and query terms to identify data sections and extracts data records in these sections. We also represent several content and visual features of visual blocks in a data section, and use them to filter out noisy blocks. Second, it measures similarity between data items in different data records based on their visual and content features, and aligns them into different groups so that the data in the same group have the same semantics. The results of our experiments with a large set of Web query result pages in di?erent domains show that our proposed approaches are highly effective.  相似文献   

14.
Web mining involves the application of data mining techniques to large amounts of web-related data in order to improve web services. Web traversal pattern mining involves discovering users’ access patterns from web server access logs. This information can provide navigation suggestions for web users indicating appropriate actions that can be taken. However, web logs keep growing continuously, and some web logs may become out of date over time. The users’ behaviors may change as web logs are updated, or when the web site structure is changed. Additionally, it can be difficult to determine a perfect minimum support threshold during the data mining process to find interesting rules. Accordingly, we must constantly adjust the minimum support threshold until satisfactory data mining results can be found.The essence of incremental data mining and interactive data mining is the ability to use previous mining results in order to reduce unnecessary processes when web logs or web site structures are updated, or when the minimum support is changed. In this paper, we propose efficient incremental and interactive data mining algorithms to discover web traversal patterns that match users’ requirements. The experimental results show that our algorithms are more efficient than other comparable approaches.  相似文献   

15.
We compare two link analysis ranking methods of web pages in a site. The first, called Site Rank, is an adaptation of PageRank to the granularity of a web site and the second, called Popularity Rank, is based on the frequencies of user clicks on the outlinks in a page that are captured by navigation sessions of users through the web site. We ran experiments on artificially created web sites of different sizes and on two real data sets, employing the relative entropy to compare the distributions of the two ranking methods. For the real data sets we also employ a nonparametric measure, called Spearman's footrule, which we use to compare the top-ten web pages ranked by the two methods. Our main result is that the distributions of the Popularity Rank and Site Rank are surprisingly close to each other, implying that the topology of a web site is very instrumental in guiding users through the site. Thus, in practice, the Site Rank provides a reasonable first order approximation of the aggregate behaviour of users within a web site given by the Popularity Rank.  相似文献   

16.
Web使用挖掘技术的分析与研究*   总被引:6,自引:0,他引:6  
首先给出Web使用挖掘的定义和完整模型框架;然后对Web使用挖掘中主要步骤的最新研究进展状况作了详细的阐述和分析,其中包括数据采集、数据预处理、模式发现和模式分析;最后对未来的研究重点进行了展望.  相似文献   

17.
Web sites contain an ever increasing amount of information within their pages. As the amount of information increases so does the complexity of the structure of the web site. Consequently it has become difficult for visitors to find the information relevant to their needs. To overcome this problem various clustering methods have been proposed to cluster data in an effort to help visitors find the relevant information. These clustering methods have typically focused either on the content or the context of the web pages. In this paper we are proposing a method based on Kohonen’s self-organizing map (SOM) that utilizes both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of the web site. SOM can be used to identify clusters of web sessions with similar context and also clusters of web pages with similar content. It can also provide means of visualizing the outcome of this processing. In this paper we show how this two-level clustering can help visitors identify the relevant information faster. This procedure has been tested to the access-logs and web pages of the Department of Informatics and Telecommunications of the University of Athens.  相似文献   

18.
基于GEP的多层关联规则挖掘算法及其应用   总被引:1,自引:1,他引:0  
为了在Web使用挖掘中挖掘网站服务器日志数据库的热点Web页面访问集及发现其关联规则,提出了一种新的基于GEP(gene expression programming,基因表达式编程)的适用于挖掘多层关联规则的算法.将泛化技术应用于GEP作为它的适应性函数度量,引入GEP强大的自搜索功能,进化到较优的种群后,再利用传统的支持度一置信度的方法在子数据库的多个层及层间挖掘频繁项及关联规则.该算法改进了传统多层关联规则挖掘框架,实验结果表明了该算法在大数据库中的有效性和高效性.  相似文献   

19.
A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these “semi-structured” Web sources, wrapper programs must be built to provide a “machine-readable” view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap.  相似文献   

20.
集成Web使用挖掘和内容挖掘的用户浏览兴趣迁移挖掘算法   总被引:2,自引:0,他引:2  
提出了一种集成Web使用挖掘和内容挖掘的用户浏览兴趣迁移模式的模型和算法。介绍了Web页面及其聚类。通过替代用户事务中的页面为相应聚类的方法得到用户浏览兴趣序列。从用户浏览兴趣序列中得到用户浏览兴趣迁移模式。该模型对于网络管理者理解用户的行为特征和安排Web站点结构有较大的意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号