首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒
Mining linguistic browsing patterns in the world wide web   总被引:2,自引:0,他引:2  
 World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and can be used to provide some appropriate suggestions to web-server managers.  相似文献   

With the rapid development of personalized information retrieval, user profile plays an important role. In this paper, we propose a fuzzy clustering method of construction of ontology-based user profiles (FCOU). In the FCOU method, we employ fuzzy clustering techniques combined with optimization techniques to develop ontology-based user profiles. One key feature of FCOU is that it employs an augmented Lagrangian function to create fuzzy clustering model for the construction of user profiles. Another key feature of FCOU is that it employs the combination of FCM, PHR and simulated annealing to develop ontology-based user profiles. The method allows some information to belong to several user profiles simultaneously with different degrees of accuracy.  相似文献   

本文利用模糊聚类的原理(神经网络SOM算法)提出一种个性化WEB信息检索系统结构,包括用户个性化模糊聚类和网络信息模糊聚类,并分别论述其实现过程。  相似文献   

基于兴趣度的Web用户访问模式分析   总被引:1,自引:0,他引:1  
吕佳 《计算机工程与设计》2007,28(10):2403-2404,2407
Web日志隐含了用户访问Web行为的动因和规律,如何有效地从中挖掘出用户访问模式是Web日志挖掘的重要研究内容.构造了User_ID-URL矩阵,矩阵元素为用户访问页面的兴趣度.应用经典的模糊C-均值聚类算法进行用户访问模式分析,通过在真实数据集上的实验,结果表明引入了用户兴趣度的日志挖掘算法是行之有效的.  相似文献   

王勇  张伟  陈军 《计算机工程与设计》2007,28(6):1484-1485,F0003
在Web挖掘研究中,传统硬聚类技术常被用来分析网站浏览者对网页的浏览偏好.然而该方法只能将每一用户浏览路径归类到单一群组中,即事先假设每一浏览路径只包含单一种用户偏好,却忽略了同一用户浏览路径可能包含多个网页偏好.针对这种情况,提出用模糊聚类技术取代传统的硬聚类技术以弥补不足,使聚类结果更符合实际浏览情况.  相似文献   

 Relevance feedback techniques have demonstrated to be a powerful means to improve the results obtained when a user submits a query to an information retrieval system as the world wide web search engines. These kinds of techniques modify the user original query taking into account the relevance judgements provided by him on the retrieved documents, making it more similar to those he judged as relevant. This way, the new generated query permits to get new relevant documents thus improving the retrieval process by increasing recall. However, although powerful relevance feedback techniques have been developed for the vector space information retrieval model and some of them have been translated to the classical Boolean model, there is a lack of these tools in more advanced and powerful information retrieval models such as the fuzzy one. In this contribution we introduce a relevance feedback process for extended Boolean (fuzzy) information retrieval systems based on a hybrid evolutionary algorithm combining simulated annealing and genetic programming components. The performance of the proposed technique will be compared with the only previous existing approach to perform this task, Kraft et al.'s method, showing how our proposal outperforms the latter in terms of accuracy and sometimes also in time consumption. Moreover, it will be showed how the adaptation of the retrieval threshold by the relevance feedback mechanism allows the system effectiveness to be increased.  相似文献   

Web日志挖掘可以通过对用户访问模式进行分析,以获取用户的访问兴趣程度。目前,大多数的web日志挖掘是基于频率的,其挖掘的信息没有太大的价值。而提出的聚类技术是基于访问时间的,使用模糊向量表示用户浏览模式,记录用户是否浏览过该页面以及停留的时间。通过不同的聚类方法对用户的访问序列进行聚类分析。将模糊粗糙[k]-均值和夹角余弦相结合,提出了一种双层聚类技术,减少了对初始聚类中心的敏感性,并且通过一系列实验,论证了该聚类方法的可行性。而且,实验通过使用Davies-Bouldin指标来验证不同聚类方法的效果并进行比较。由于数据量大时,仍然存在算法效率低的问题,因此,使用MapReduce实现双层聚类的并行化,提高了聚类的效率。  相似文献   

Fuzzy User Modeling for Information Retrieval on the World Wide Web   总被引:5,自引:1,他引:4  
Information retrieval from the World Wide Web through the use of search engines is known to be unable to capture effectively the information needs of users. The approach taken in this paper is to add intelligence to information retrieval from the World Wide Web, by the modeling of users to improve the interaction between the user and information retrieval systems. In other words, to improve the performance of the user in retrieving information from the information source. To effect such an improvement, it is necessary that any retrieval system should somehow make inferences concerning the information the user might want. The system then can aid the user, for instance by giving suggestions or by adapting any query based on predictions furnished by the model. So, by a combination of user modeling and fuzzy logic a prototype system has been developed (the Fuzzy Modeling Query Assistant (FMQA)) which modifies a user's query based on a fuzzy user model. The FMQA was tested via a user study which clearly indicated that, for the limited domain chosen, the modified queries are better than those that are left unmodified. Received 10 November 1998 / Revised 14 June 2000 / Accepted in revised form 25 September 2000  相似文献   

 In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition, we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches.  相似文献   

 Starting from unification based on similarity, a logic programming language, called LIKEness in LOGic (Likelog) is derived, thorougly relying on similarity. An operational semantics and a fix-point semantics of the language are defined, using an extension principle for fuzzy operators. The two approaches are proved to be related and a fuzzy extension of the least Herbrand model is given. One of the principal feature of such a logic programming language is to allow flexible query answering to deductive databases, which we show through an example. Moreover, we describe a system for web information retrieval through Likelog. I want to thank Ferrante Formato with whom I started and I continued this research and Prof. Giangiacomo Gerla for his great support and contribution given to this field.  相似文献   

 The strong (perfect) fuzzy function have been applied to approximate reasoning and vague algebra in the literature of fuzzy sets. The construction of strong (perfect) fuzzy functions possesses an important role for their applications. In the presented paper, some of the results on the construction of strong (perfect) fuzzy functions are improved, and several new and desirable results in this direction are obtained. Furthermore, it is also shown that how these results can be used to point out the connections between fuzzy functions in the classical sense and the strong (perfect) fuzzy functions.  相似文献   

Interval Set Clustering of Web Users with Rough K-Means   总被引:1,自引:0,他引:1  
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.  相似文献   

In this paper, we extend the work of Kraft et al. to present a new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. First, we present a fuzzy agglomerative hierarchical clustering algorithm for clustering documents and to get the document cluster centers of document clusters. Then, we present a method to construct fuzzy logic rules based on the document clusters and their document cluster centers. Finally, we apply the constructed fuzzy logic rules to modify the user's query for query expansion and to guide the information retrieval system to retrieve documents relevant to the user's request. The fuzzy logic rules can represent three kinds of fuzzy relationships (i.e., fuzzy positive association relationship, fuzzy specialization relationship and fuzzy generalization relationship) between index terms. The proposed fuzzy information retrieval method is more flexible and more intelligent than the existing methods due to the fact that it can expand users' queries for fuzzy information retrieval in a more effective manner.  相似文献   

将数据挖掘的聚类算法应用到基于内容的图像检索中可以有效提高检索的速度和效果。模糊聚类算法更符合图像检索本身所具有的模糊性,但这种方法存在聚类分析时间过久影响检索性能的问题,因此本文提出了一种基于优化分块颜色直方图及模糊C聚类的彩色图像检索方法。首先对图像库中的每幅图像进行分块,并提取出每一块的优化颜色特征信息;然后采用模糊C均值聚类算法对得到的颜色特征向量进行聚类,得到每个图像类的聚类中心;最后计算查询示例图像和对应图像类的图像之间的相似度,按照相似度的大小返回检索结果。实验表明,本文提出的方法不仅具有较高的查全率和查准率,而且提取的特征维数较少,聚类时间短,检索速度快。  相似文献   

The rapid development of the World Wide Web as a medium of commerce and information dissemination has generated a growing interest of web portal managers in systems able to identify user profiles from the web access logs. The interpretation of these profiles can help re-organize the web portal, e.g., by restructuring the site’s content more efficiently, or even to build adaptive web portals, i.e., portals whose organization and presentation change depending on the specific visitor’s needs. In this paper, we assume that the pages of the web portal have been prearranged in a number of different categories. We introduce a systematic approach to determine a hierarchy of user profiles from the history of users’ accesses to the categories. First, we filter the access log by removing both occasional users and categories of poor interest. Then, we apply an Unsupervised Fuzzy Divisive Hierarchical Clustering (UFDHC) algorithm to cluster the users of the web portal into a hierarchy of fuzzy groups characterized by a set of common interests and each represented by a prototype, which defines the profile of the group typical member. To identify the profile a specific user belongs to, we propose a novel classification method which completely exploits the information contained in the hierarchy. To prove the effectiveness of our approach, we apply the UFDHC algorithm to access log data collected over a period of 15 days and use the classification method to associate a profile with the users defined by access log data collected during subsequent 60 days. Finally, we highlight the good characteristics of our system by comparing our results with the ones obtained by applying a profiling system based on a modified version of the fuzzy C-means.  相似文献   

进行客户关系管理系统建设,是企业争取竞争优势的重要手段,数据挖掘技术在CRM的实施中起着关键的作用。文章介绍了数据挖掘技术和CRM技术,具体介绍了在酒店CRM建设中用到的决策树和模糊聚类这两种数据挖掘的实现方法,并做出了实验分析。  相似文献   

提出一种基于用户动机模型的网络搜索引擎和一种提高用户行为模型构建效率的方案.动机模型建立于用户与搜索引擎之间,用以辅助用户检索,以达到提高搜索引擎检索效率和准确率的目的.以人类行为学为理论基础,以个性化技术为手段,从而合并相似的用户行为模型以构建用户动机模型.通过实验,验证了基于用户动机模型的搜索引擎比通用搜索引擎能更好地适应用户的需求.  相似文献   

基于音乐语义标签的音乐相似计算研究是音乐信息检索领域的另一个新的热点。该文提出一种基于标签挖掘的歌曲分类方法,以Last.fm音乐网站上的用户标签为特征进行歌曲相似性研究。文中将文本聚类中常用的潜在语义分析(LSA)方法和改进的K-means聚类方法相结合,应用于音乐语义标签的自动抽取;从音乐网站last.fm上抽取了6大类600首歌曲的8000多个用户标签作为音乐语义特征,并利用LSA进行歌曲向量的降维,形成了一个表示歌曲间相似关系的600×150维向量矩阵。最后利用K均值,根据音乐歌曲间的相似度进行歌曲分类,完成歌曲相似性比较。实验结果同没有LSA降维前及已有的HCC结果比较表明,使用文中提出的基于音乐标签的模型对歌曲进行分类,能得到较好的分类效果。  相似文献   

将自组织映射神经网络(SOM)与FCM结合,利用SOM的并行计算能够减少模糊C均值算法在处理海量数据时的聚类时间,可以提高聚类算法的速度和效果,同时使用该算法对校园网Web日志进行数据挖掘,能够对用户行为进行分析,从而提出相应的方法,更好地提高服务效率和管理质量。  相似文献   

为了更好地向用户提供个性化的Web检索服务,实现了一种改进的个性化词典的生成算法——IGAUPD,用于在用户浏览的大量兴趣网页中挖掘出真正符合用户兴趣的词语,以此缩小传统词库的容量,使得在用户兴趣建模时,能更快更准确地形成兴趣网页的特征描述,并更好地支持个性化检索。IGAUPD算法采用新的词权计算公式IWTUPD,以更好地描述词语在网页集中的重要性,有效排除频繁词。最后,用实验验证了由IGAUPD算法生成的个性化词典的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号