首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
自然语言理解在Web数据挖掘中的应用   总被引:1,自引:1,他引:0  
Internet的迅猛发展,使其日益成为人们查找有用数据的重要来源。一般的搜索引擎是基于关键字的查询,命中率较低,且不能针对特定用户给出特定服务。提出了将自然语言理解技术与Web数据挖掘相结合,根据用户的特殊需求定制个性化的Web数据挖掘系统,给出了面向新闻挖掘这一特定领域的Web挖掘系统News-Miner的应用方案及设计实现。初步实验结果表明该方案是可行的。该方法可方便地扩展到其它专业应用领域。  相似文献   

2.
    
Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational.Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation.Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.  相似文献   

3.
With increasing growth of Internet commerce, online fraud accounts for as much as 20% of identity theft cases. The present study evaluated Privacy Bird®, a computer program that warns users of privacy preference violations by displaying a colored bird. Users rated their trust of, and willingness to give financial information to, web sites in three categories (financial, retail, and social networking) before and after using Privacy Bird. Privacy Bird improved participants’ privacy practices, increasing their trust in (and willingness to provide financial information to) web sites that yielded green birds, reducing it for sites that yielded red birds, and inducing further consideration of policies for sites that yielded yellow birds. These results suggest that e-commerce sites should address the privacy concerns of users and make salient the cues that inform users that their privacy is protected.  相似文献   

4.
Search engines are among the most popular as well as useful services on the web. There is a need, however, to cater to the preferences of the users when supplying the search results to them. We propose to maintain the search profile of each user, on the basis of which the search results would be determined. This requires the integration of techniques for measuring search quality, learning from the user feedback and biased rank aggregation, etc. For the purpose of measuring web search quality, the “user satisfaction” is gauged by the sequence in which he picks up the results, the time he spends at those documents and whether or not he prints, saves, bookmarks, e-mails to someone or copies-and-pastes a portion of that document. For rank aggregation, we adopt and evaluate the classical fuzzy rank ordering techniques for web applications, and also propose a few novel techniques that outshine the existing techniques. A “user satisfaction” guided web search procedure is also put forward. Learning from the user feedback proceeds in such a way that there is an improvement in the ranking of the documents that are consistently preferred by the users. As an integration of our work, we propose a personalized web search system.  相似文献   

5.
基于Web数据挖掘的用户浏览兴趣路径研究   总被引:1,自引:0,他引:1  
使用Web日志与用户浏览行为相结合的方式对用户浏览兴趣模式进行挖掘。分别建立以访问次数、平均到网页中字符数的访问时间和拉动滑动条次数为元素值的矩阵,通过对矩阵进行路径兴趣度的计算得到兴趣子路径,进行合并生成用户兴趣路径集。实例分析表明该算法是可行和有效的,对于电子商务网站的优化和实施个性化服务具有意义。  相似文献   

6.
Web挖掘是数据挖掘的新方向之一,其应用领域非常广泛。架构基于购物网站的Web数据挖掘工具,通过该工具可发现客户识别、客户获取及客户保持等方面的有用信息,有效地使用这些信息可促进购物网站的发展。  相似文献   

7.
随着互联网的迅速普及和广泛应用,网络信息资源的数量及网站设计的复杂度也呈急剧增长趋势。如今,针对用户特性并向用户提供个性化服务已经成为计算机技术的研究热点之一。本文首先简述了Web日志挖掘的相关概念和具体实现过程,然后重点讲述了Web日志挖掘的关键技术。最后采用了用户群体聚类算法与Web页面聚类算法相结合实现挖掘用户访问模式,并针对个性化服务的应用和发展方向进行了研究和分析。  相似文献   

8.
This study examines the relationships among perceived usability before actual use, task completion time, and preference, and the effects of design attributes on user preference for e-commerce web sites. Nine online bookstore web sites were used by ten participants. Results indicate: (1) pre-use usability and task completion time were correlated; (2) the relationship between pre-use usability and preference was greater than that of task completion time and preference; (3) design attribute assessments after actual use were highly intercorrelated; and (4) organizational structure and layout had a greater effect on user preference than aesthetic aspects, such as color and typography. These findings can be used to construct a conceptual framework for understanding user preferences and to develop design guidelines to yield more highly preferred e-commerce web sites. Also, the methodology in this study can be applied to other computerized-applications.  相似文献   

9.
Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user’s patterns is important in supporting intelligent Web applications like personalized services. Although numerous studies have been done on Web usage mining, few of them consider the temporal evolution characteristic in discovering web user’s patterns. In this paper, we propose a novel data mining algorithm named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation by considering the temporality property in Web usage evolution. Moreover, three kinds of new measures are proposed for evaluating the temporal evolution of navigation patterns under different time periods. Through experimental evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction precision, in particular when the web user’s navigating behavior changes significantly with temporal evolution.  相似文献   

10.
概述了Web挖掘的组成部分、分类和现状,指出了现有一些Web挖掘方法的局限.介绍了目前比较新的技术--软计算技术,总结了软计算技术在Web挖掘中的应用.Web数据的固有无标记、不精确、异构性和动态性,处理人与机器的交互,上下文敏感性和近似查询,个性化学习,软计算都是很合适的解决方法.  相似文献   

11.
The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies.  相似文献   

12.
Constructing semantic queries is a demanding task for human users, as it requires mastering a query language as well as the schema which has been used for storing the data. In this paper, we describe QUICK, a novel system for helping users to construct semantic queries in a given domain. QUICK combines the convenience of keyword search with the expressivity of semantic queries. Users start with a keyword query and then are guided through a process of incremental refinement steps to specify the query intention. We describe the overall design of QUICK, present the core algorithms to enable efficient query construction, and finally demonstrate the effectiveness of our system through an experimental study.  相似文献   

13.
Newer interaction techniques enable users to explore interfaces in a more natural and intuitive way. However, we do not yet have a scientific understanding of their contribution to user experience and theoretical mechanisms underlying the impact. This study examines how a naturally mapped interface, page-flipping interface, can influence user learning and attitudes. An online experiment with two conditions (page flipping vs. clicking) tests the impact of this naturally mapped interaction technique on user learning and attitudes. The result shows that the page-flipping feature creates more positive evaluations of the website in terms of usability and engagement, as well as greater behavioral intention towards the website by evoking greater perception of natural mapping and greater feeling of presence. In terms of learning outcomes, however, participants who flip through the online magazine show less recall and recognition memory, unless they perceive page flipping as more natural and intuitive to interact with. Participants perceive the same content as more credible when they flip through the content, but only if they appreciate the coolness of the medium. Theoretical and practical implications will be discussed.  相似文献   

14.
冉丽  何毅舟  许龙飞 《计算机应用》2004,24(10):158-160
搜索引擎作弊行为从搜索引擎优化中演变而来,却对网络发展带来负面影响。通过构造站内站外精简模型用于判断几类作弊行为,得出PageRank改进算法中惩罚因子的公式和其中三个函数的特征,展望了搜索引擎作弊检测方法的发展前景。  相似文献   

15.
随着Internet的迅猛发展,青少年已经成为我国网民的重要组成部分,伴随而来的青少年网瘾问题已经引起社会各界的高度关注.过滤不良Web网页是绿色网络建设的重大难题.一般的网页过滤系统都只是针对URL级别的,没有做到对内容级别的过滤,只要不法分子改变URL,就没有办法起到过滤的作用.提出了将自然语言理解与Web挖掘技术相结合并应用到网页过滤模块设计之中的解决方案,以做到对Web内容级别的过滤.  相似文献   

16.
"天网"目录导航服务研究   总被引:9,自引:0,他引:9  
为了提高搜索引擎的查准率,帮助用户快速地定位其感兴趣的网页,研究了如何在Spider式搜索引擎“天网”系统中提供目录导航服务。基本思想就是利用有指导的机器学习方法实现中文网页的自动分类。主要贡献有两点:①搜集并建立了一个面向中文网页并且支持层次模型的大规模中文网页数据集,这是实现中文网页自动分类的前提和基础;②针对中文网页信息的自身特性以及CHI方法的固有缺陷,提出一种自动清除“噪音”的特征选取算法,并实现了一个能够处理海量中文网页的分类器。实验结果表明该分类器有较高的分类质量,满足了搜索引擎目录导航服务的要求。  相似文献   

17.
In the present article an approach to automatic determination of a user’s sphere of interests is proposed. The approach is based on a method involving clustering of documents which the user is interested in. The process of clustering of documents is reduced to a problem of discrete optimization for which quadratic-and linear-type models are proposed. Identification of interests makes it possible to determine the context of a request without any effort on the user’s part. Different methods are proposed for determining the context of a request. An ant algorithm for solving a quadratic-type discrete optimization problem is also proposed in the present study.  相似文献   

18.
针对传统的防火墙技术和网络检测技术不再能准确、及时地发现对服务器的攻击行为提出了基于Web数据挖掘技术的一种服务器入侵检测方法:首先由目前已经掌握的对服务器攻击行为特征作为样本点,采用k-均值聚类分析算法进行无监督学习,生成K个聚类的特征攻击库;其次采用邻近分类算法,根据计算访问样本点与特征攻击库中心的距离对样本点进行归并;最后对特征攻击库中心点进行重新调整,确保对新的样本点行为分析更加准确。  相似文献   

19.
To determine how well user agents conform to UAAG 1.0, capabilities of user agents were investigated with UAAG 1.0 Test Suite. It was found that 20 Priority 1 checkpoints were met by all the user agents, while 12 Priority 1 checkpoints, relating to multimedia control and time-dependent interactions, were failed by all of them. The results showed that two major Japanese user agents did not have enough functions to navigate through the Web, whereas the latest ones did have those functions. These results show that there are user agents which meet many requirements of UAAG 1.0 but Web authors still have to pay attention to the capability of the user agents that are considered to be used to browse their content.
Masahiro UmegakiEmail:
  相似文献   

20.
Web数据挖掘系统的设计及实现研究   总被引:9,自引:4,他引:9  
在全球信息化进程中,信息超载已经成为一个大问题。Web上信息虽多,但想找到需要的信息却很困难。人们通过点击和搜索引擎与Web进行交互,但是都不能从中准确快捷地获取需要的信息,Web数据挖掘技术就是解决此问题的好方法。讲述了Web数据挖掘的基本理论,根据挖掘对象的不同将其划分为Web内容挖掘、Web链接结构挖掘和Web访问信息挖掘;利用HTML网页的特殊结构性质,提出了一种Web数据挖掘系统的通用框架,并讨论了一些实现的具体技术。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号