首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 78 毫秒
基于扩展DOM树的Web页面信息抽取   总被引:1,自引:0,他引:1  
随着Internet的发展,Web页面提供的信息量日益增长,信息的密集程度也不断增强.多数Web页面包含多个信息块,它们布局紧凑,在HTML语法上具有类似的模式.针对含有多信息块的Web页面,提出一种信息抽取的方法:首先创建扩展的DOM(Document Object Model)树,将页面抽取成离散的信息条;然后根据扩展DOM树的层次结构,并结合必要的视觉特性和语义信息对离散化的信息条重新整合;最后确定包含信息块的子树,深度遍历DOM树实现信息抽取.该算法能对多信息块的Web页面进行信息抽取.  相似文献   

研究基于CURE聚类的Web页面分块方法及正文块的提取规则。对页面DOM树增加节点属性,使其转换成为带有信息节点偏移量的扩展DOM树。利用CURE算法进行信息节点聚类,各个结果簇即代表页面的不同块。最后提取了正文块的三个主要特征,构造信息块权值公式,利用该公式识别正文块。  相似文献   

Web正文信息抽取是信息检索、文本挖掘等Web信息处理工作的基础。在统计分析了主题网页的正文特征及结构特征的基础上,提出了一种结合网页正文信息特征及HTML标签特点的主题网页正文信息抽取方法。在将Web页面解析成DOM树的基础上,根据页面DOM树结构获取正文信息块,分析正文信息块块内噪音信息的特点,去除块内噪音信息。实验证明,这种方法具有很好的准确率及召回率。  相似文献   

面向主题的Web信息采集需判断提取的URL链接主题相关性。基于主题链接上下文提取,主题型语义块采用提取链接周围一定长度的文本,目录型和图片型语义块利用DOM树层次结构,对链接数据进行URL相关性判定;利用知网基于语义相似度的链接判定,给出一种综合内容和链接结构分析的URL主题相关性判定NPR算法,比较PageRank算法能提供更精确的主题页面。其成果对我国信息机构进行学科网络信息资源的深度建设有实用价值。  相似文献   

Web页面信息块的自动分割   总被引:8,自引:2,他引:8  
随着Internet的发展,Web页面数量的急剧增加,如何快速有效地获取信息变得越来越重要。一类Web页面往往包含着多个信息单元,它们在展现上排列紧凑、风格相似,在HTML语法上具有类似的模式,例如一个BBS页面上多个发言,每个信息被称为一个信息块。对于信息抽取、信息过滤等应用,需要首先将原始页面中分割为若干合适的信息块以便于后续的处理。本文提出了一种自动将Web页面分割为信息块的方法:首先通过创建Web页面结构化的HMTL分析树,然后根据包含有效文本量等确定包含信息块的子树,最后根据子树深度信息利用2-rank PAT算法进行分割。通过对BBS页面的信息块抽取实验,证明了该方法的有效性。  相似文献   

一种改进的Web日志会话识别方法   总被引:4,自引:0,他引:4  
会话识别是Web日志挖掘中的数据预处理中的一个重要步骤.文中提出了一种改进的会话识别方法.首先,在用户识别后,进行框架页面的过滤,从而大大地减少了实验产生的有效页面,然后为页面设置访问时间阈值,并根据页面内容及站点结构确定的页面重要程度对该阈值进行调整.通过实验证明,相对于传统的对所有页面使用单一的先验阈值进行会话识别的方法,该方法所得到的会话集更具有真实性.  相似文献   

随着互联网的发展,We b数据挖掘在帮助人们获取主题信息方面越来越具有重要意义。本研究基于树结构,将We b网页解析为标签树;在树匹配算法的基础上,提出了数据区域挖掘和语义链接块识别算法,实现了去链接的预处理;提出了文本结构权重的概念,并采用文本结构权重的计算结果发现主题区域,去噪后获得主题信息。实验表明该研究结果对新闻、博客类网页具有很好的识别效果。  相似文献   

现有Web新闻内容自动抽取方法多数未考虑文本中的话题特征,容易将样式排版与正文相似的噪音文本识别为正文内容。为此,提出基于通配符节点话题权重的抽取方法。将HTML文档解析成DOM树后,匹配DOM树对应的通配符树,并计算每个通配符中的话题权重,将高权重话题的通配符节点所覆盖的文本节点识别为正文节点。实验结果表明,与传统新闻抽取方法相比,该方法能降低Web新闻内容边缘噪音文本的错误识别率,抽取的新闻内容准确率更高。  相似文献   

随着Web数据库的不断增长,通过对Deep Web的访问逐渐成为获取信息的主要手段.如何有效地抽取Deep Web中结果页面所包含的实体信息成为一个值得研究的问题.通过分析Deep Web结果页面的特点,提出了一种基于DOM树的Deep Web实体抽取机制(DOM-tree based entity extraction mechanism for Deepweb,D-EEM),能够有效解决Deep Web环境中的实体抽取问题.D-EEM采用基于DOM树的自动实体抽取策略,利用DOM树中的文本内容和层次结构来确定数据区域和实体区域,提高了实体抽取的准确性;另外,提出了一种基于上下文距离和共现次数的语义标注方法,有效地将来自不同数据源的抽取结果进行合成.通过实验验证了D-EEM中所采用的关键技术的可行性和有效性,同其他实体抽取策略相比,D-EEM在抽取效率及抽取准确性等方面具有一定的优势.  相似文献   

Web日志是目前Web数据挖掘的重要研究方向。数据预处理是Web日志挖掘中的关键技术。详细的介绍了Web日志挖掘的预处理过程。数据预处理包括数据清理、识别用户、识别会话和框架页面清理、路径补充。用户识别后,框架页面降低了数据挖掘的效率,可以通过过滤框架页面大幅度减少产生的无效页面数。  相似文献   

Reflecting on a feasibility study into archiving social media, this article traces how “events” are defined in various domains and contexts, and employs case studies to analyze key relationships between hashtags and events to provide a critical analysis of how archival events can be constructed out of social events. It provides an overview of the archival and curatorial considerations involved in defining and preserving a social media event, and outlines the technologies developed for the process of collecting, annotating, and preserving social media events. Overall, the article endeavors to reveal how pragmatic considerations, computational approaches and curatorial perspectives shape digital archives and historical narratives.  相似文献   

BBS中信息传播模式的特征分析   总被引:2,自引:1,他引:1       下载免费PDF全文
通过比较传染病传播机制与信息传播机制,提出BBS中的信息传播机制模型。通过对BBS中帖子数量变化规律的建模,分析了BBS中信息传播模式的特征,并使用实际数据说明BBS中的信息传播模式。实验表明:BBS可以吸引大批的用户参与,但用户只对部分话题感兴趣并参与讨论;绝大多数话题(占94.9%)帖子数的增长率先增加再减小直至为0,而少量话题(占5.1%)帖子数的增长率直接减小至0。这些结论有助于认识BBS的信息传播机制,对控制和管理BBS的信息传播有启发意义。  相似文献   

The use of social media to share information, enhance learning, and connect with an online community has grown rapidly over the past 10 years. As social media becomes a more common tool in both formal and informal education, it is imperative to understand how it is used by individuals with disabilities. Through a systematic study of the literature, 215 articles on social media used by individuals with disabilities were selected and 29 selected for in-depth thematic analysis. Six major themes were identified: community, cyberbullying, self-esteem, self-determination, access to technology, and accessibility. To confirm these six categories, we expanded our search, yielding an additional 30 articles, for a total 59 articles reviewed in-depth. Interactions between individuals with disabilities within online communities often had the goal of acquiring knowledge or learning new information. A communities of practice theoretical framework is used to discuss interactions among the elements of social media design, learning, and the building of community by individuals with disabilities.  相似文献   

网络实名制的提出,是为了解决网络匿名性所带来的问题,却又面临实名信息泄露的诟病。造成信息泄露的根源在于实名认证依赖于实名信息。基于社会认证的网络身份模型,依赖社会关系进行身份认证,其利用OSN节点的社会关系构建网络身份,在发挥网络监管作用的同时,避免实名信息的泄漏。模型首先在OSN中依据一定策略选择根节点;然后,采用担保方式进行社会认证;最后,在不依赖实名信息的基础上,构建节点的唯一网络身份SANI。SNAI身份含节点的社会认证信息,具有身份认证和行为溯源的功能。  相似文献   

This study explores the relationship between perceived bridging social capital and specific Facebook‐enabled communication behaviors using survey data from a sample of U.S. adults (N=614). We explore the role of a specific set of Facebook behaviors that support relationship maintenance and assess the extent to which demographic variables, time on site, total and “actual” Facebook Friends, and this new measure (Facebook Relationship Maintenance Behaviors) predict bridging social capital. Drawing upon scholarship on social capital and relationship maintenance, we discuss the role of social grooming and attention‐signaling activities in shaping perceived access to resources in one's network as measured by bridging social capital.  相似文献   

在社会网络的影响的测量在数据采矿社区收到了很多注意。影响最大化指发现尽量利用信息或产品采纳的有影响的用户的过程。在真实设置,在一个社会网络的一个用户的影响能被行动的集合建模(例如,份额,重新鸣叫,注释) 在其出版物以后由网络的另外的用户表现了。就我们的知识而言,在文学的所有建议模型同等地对待这些行动。然而,它是明显的一工具少些比一样的出版的份额影响的一份出版物相似。这建议每个行动有它影响的自己的水平(或重要性) 。在这份报纸,我们建议一个模型(叫的社会基于行动的影响最大化模型, SAIM ) 为在社会网络的影响最大化。在 SAIM,行动没在测量一个个人的影响力量同等地被考虑,并且它由二主要的步组成。在第一步,我们在社会网络计算每个个人的影响力量。这影响力量用 PageRank 从用户行动被计算。在这步的结束,我们得到每个节点被它的影响力量在标记的一个加权的社会网络。在 SAIM 的第二步,我们计算一个新概念说出 influence-BFS 树的使用的有影响的节点的一个最佳的集合。在大规模真实世界、合成的社会网络上进行的实验在计算揭示我们的模型 SAIM 的好表演,在可接受的时间规模,允许信息的最大的传播的有影响的节点的一个最小的集合。  相似文献   

Posting behaviour on social networking sites (SNS) has become a method enabling unsatisfied users to vent emotions. Based on social cognition theory (SCT), personal outcome expectations and self-efficacy affect posting behaviour for venting emotions on SNS. However, perceived social support (PSS) may alter the relationships within the SCT model. Thus, this study aimed to explore the moderating effect of PSS on the relationships between variables in the SCT model for venting emotions on SNS. In total, 310 unsatisfied customers in Taiwan were investigated, and structural equation modelling was performed to test the hypotheses. The results indicated that personal outcome expectations and self-efficacy were positively associated with posting behaviour which, in turn, increased venting emotions on SNS. Moreover, PSS moderated the relationships between variables in the SCT model.  相似文献   

This paper explores the affordances of social technologies for supporting the construction of a shareable artefact by a group of learners. A qualitative study that captures the use of five different types of social technologies (Facebook, blogs, wikis, Google Documents and Dropbox) in three different classroom settings sheds light on the potentials and challenges of these tools for supporting material exploration, artefact construction and evaluation. Qualitative content analysis of instructors’ field notes, students’ and instructors’ reflections, interviews and focus groups sheds light on the potential of social technologies to transform the activity of learning across a new culture of computational tools. The affordances of social technologies are discussed as well as design principles that need to be followed in these new arenas.  相似文献   

Nowadays,more and more users share real-time news and information in micro-blogging communities such as Twitter,Tumblr or Plurk.In these sites,information is shared via a followers/followees social network structure in which a follower will receive all the micro-blogs from the users he/she follows,named followees.With the increasing number of registered users in this kind of sites,finding relevant and reliable sources of information becomes essential.The reduced number of characters present in micro-posts along with the informal language commonly used in these sites make it difficult to apply standard content-based approaches to the problem of user recommendation.To address this problem,we propose an algorithm for recommending relevant users that explores the topology of the network considering different factors that allow us to identify users that can be considered good information sources.Experimental evaluation conducted with a group of users is reported,demonstrating the potential of the approach.  相似文献   

Increasing interactions and engagements in social networks through monetary and material incentives is not always feasible. Some social networks, specifically those that are built on the basis of fairness, cannot incentivize members using tangible things and thus require an intangible way to do so. In such networks, a personalized recommender could provide an incentive for members to interact with other members in the community. Behavior‐based trust models that generally compute social trust values using the interactions of a member with other members in the community have proven to be good for this. These models, however, largely ignore the interactions of those members with whom a member has interacted, referred to as “friendship effects.” Results from social studies and behavioral science show that friends have a significant influence on the behavior of the members in the community. Following the famous Spanish proverb on friendship “Tell Me Your Friends and I Will Tell You Who You Are,” we extend our behavior‐based trust model by incorporating the “friendship effect” with the aim of improving the accuracy of the recommender system. In this article, we describe a trust propagation model based on associations that combines the behavior of both individual members and their friends. The propagation of trust in our model depends on three key factors: the density of interactions, the degree of separation, and the decay of friendship effect. We evaluate our model using a real data set and make observations on what happens in a social network with and without trust propagation to understand the expected impact of trust propagation on the ranking of the members in the recommended list. We present the model and the results of its evaluation. This work is in the context of moderated networks for which participation is by invitation only and in which members are anonymous and do not know each other outside the community. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号