首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
为了解决网页信息的自动抽取,该文提出了一种基于视觉特征和领域本体的Web信息抽取算法.该算法以基于领域本体的信息抽取为基础,根据网页的视觉特征来准确划定信息抽取区域,然后结合DOM树技术和抽取路径的启发式学习,获得Web贞面中信息项的抽取路径.通过信息项的抽取路径自动生成信息项的领域本体,通过信息项的领域本体解析出信息项的抽取规则.使用本算法来进行Web信息的抽取,具有查全率与查准率高、时间复杂度低、用户负担较轻和自动化程度高的特点.  相似文献   

2.
张鑫  陈梅  王翰虎  王嫣然 《微机发展》2011,(2):58-61,65
为了解决网页信息的自动抽取,该文提出了一种基于视觉特征和领域本体的Web信息抽取算法。该算法以基于领域本体的信息抽取为基础,根据网页的视觉特征来准确划定信息抽取区域,然后结合DOM树技术和抽取路径的启发式学习,获得Web页面中信息项的抽取路径。通过信息项的抽取路径自动生成信息项的领域本体,通过信息项的领域本体解析出信息项的抽取规则。使用本算法来进行Web信息的抽取,具有查全率与查准率高、时间复杂度低、用户负担较轻和自动化程度高的特点。  相似文献   

3.
网页检索结果中,用户经常会得到内容相同的冗余页面。提出了一种通过新闻主题要素学习新闻内容的新闻网页去重算法。该方法的基本思想是:首先,抽取新闻要素中关于事件发生的时间和地点短语;然后,通过抽取的时间和地点短语抽取新闻的内容;最终,根据学习的新闻内容通过计算它们的相似度来判断新闻网页的重复度。实验结果表明,该方法能够完成针对新闻内容的新闻网页的去重,并得到较高的查全率和查准率。  相似文献   

4.
提出了一种新颖的REA(Review Extract Algorithm)算法进行评论信息的发现与抽取.算法采用了页面分块与信息熵的迭代计算技术实现了评论块的自动发现与抽取.其中,页面分块技术的运用有效地去除了噪声信息;基于块的熵值计算精确定位了每一个用户评论.实验结果证明该算法具有较高的查全率与查准率.  相似文献   

5.
文章提出一种基于静态网页特征的文本信息抽取方法。该方法首先根据静态网页的URL特征判断其是否是静态网页,然后根据静态网页的结构特征和内容特征对标题和正文文本内容进行抽取.再按照统一规范将结果顺序存储便于再处理。实验结果表明,网页内容信息抽取的查全率和查准率分别为96.2%和95.9%,该方法计算量小、抽取速度快、正确率高,可实际应用于大规模的网页内容安全分析。  相似文献   

6.
针对现今较流行的动态Web网页数量巨大、数据价值高,并且网页结构高度模板化的特点,设计了一个基于网页聚类的Web信息自动抽取系统。在DOM抽取技术基础上利用网页聚类寻找高相似簇,并引入列相似度和全局自相似度计算方法,提高了聚类结果的准确性。抽取模板中应用了可选节点对模板的修正和调整,以提高内容节点的正确标识。实验结果表明,该方法能够自动寻找并抽取网页主要信息,达到了较高的准确率和查全率。  相似文献   

7.
电子商务网站允许用户对商品发表评论,用户评论通常含有用户对商品的主观性体验,常被潜在顾客作为比较不同商品并作出购买选择的参考,也可被生产厂商作为市场反馈调查的数据来源.然而,由于电子商务的发展,热门商品常常拥有成百甚至上千条用户评论,这使得阅读所有评论十分耗时.提出了一种基于特征的用户评论自动摘要方法,能够自动生成简洁、全面的摘要 .首先自动从评论中识别用户评价的商品特征,根据特征对评论句分类,然后使用句子抽取的方法生成摘要 .实验证明该特征识别和特征过滤算法的查准率平均可达81%,查全率为52%.相较于Hu和Liu使用的频繁项集挖掘算法.查全率降低了6%,而查准率提高了24%,F1值提高为6%.算法更加注重特征识别的查准率,总体的摘要效果比较好.  相似文献   

8.
李慧  张舒  顾天竺  陈晓红  吴颜 《计算机应用》2006,26(10):2509-2512
准确挖掘商务网站中的用户评论对于商家进行有效的推荐具有重要意义。提出了一种新颖的用户评论抽取(CRE)算法进行评论信息的抽取。该算法采用了页面分块与信息熵的迭代计算技术实现了评论块的自动发现与抽取。实验结果证明了该算法具有较高的查全率与查准率。  相似文献   

9.
随着互联网的高速发展,各种各样的信息资源呈指数级增长,随之出现许多负面影响,需要构建一个安全健康的网络环境。为此,提出针对网页文本内容的敏感信息过滤算法(SWDT-IFA)。该算法不依赖词典与分词,通过构建敏感词决策树,将网页文本内容以数据流形式检索决策树,记录敏感词词频、区域信息以及敏感词级别,计算文本整体敏感度,过滤敏感文本。实验结果表明,SWDT-IFA算法具有较高的查准率和查全率,且执行时间能够满足当前网络环境的实时性要求。  相似文献   

10.
基于内容的中文网页自动分类研究   总被引:7,自引:0,他引:7  
本文主要介绍基于内容的网页自动分类系统,具体介绍了类别词典的建造方法,网页超文本类别词切分的方法,中文网页自动分类算法以及利用类别词与网页间的模糊关系对网页文本进行自动分类等内容.通过对旅游网页进行测试,自动分类正确率可达93.37%以上,有效地提高了查准率和查全率.  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号