首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
随着Internet技术的发展,万维网上的文档数目成指数级增长。在如此浩瀚的信息库中,用户很难找到自己所需要的信息,如何自动且高效地处理这些海量文档信息成为了目前重要的研究课题。文章通过对抽取到的数据集文档中的标题,超连接和标记等超文本信息,以及文档内容本身分别建立分类模型。然后根据神经网络集成各个分类模型得出判别结果,提出了一种基于元信息的超文本集成分类算法,该算法能更好的综合利用超文本的多元结构化信息。实验结果表明,相对于单独利用某种超文本结构信息进行分类的方法。基于元信息的超文本集成分类算法具有更好的分类性能。  相似文献   

2.
为了网页信息的有效组织和检索,针对网页中的超文本结构特征,在研究网页间的超链接、超文本标记对信息提取的作用的基础上,阐述了一种基于“超文本标记加权”和“超链接森林”的因特网信息提取方法,并与传统方法进行了对比。实验结果表明,该方法用于网页的自动分类具有较好的效果。  相似文献   

3.
基于内容的中文网页自动分类研究   总被引:7,自引:0,他引:7  
本文主要介绍基于内容的网页自动分类系统,具体介绍了类别词典的建造方法,网页超文本类别词切分的方法,中文网页自动分类算法以及利用类别词与网页间的模糊关系对网页文本进行自动分类等内容.通过对旅游网页进行测试,自动分类正确率可达93.37%以上,有效地提高了查准率和查全率.  相似文献   

4.
陈宇 《计算机应用研究》1999,16(5):86-88,102
超文本技术是当今流行的信息管理技术,本文论述了超文本技术的概念,基本元素,基本模型结构与专家超文本结构,并结合应用实践,就客运站超文本信息系统开发中的主要问题;系统结构、方案选取,功能设计,实现技术与方法,进行了论述与探讨。  相似文献   

5.
面对当前海量的Internet数据信息,如何帮助人们准确定位所需信息,成为重要的研究课题.基于回填机制的网页加权分类是解决该问题的一个有效方法.该方法充分利用Web文本结构信息,以类轴分类为基础,在回填机制下进行加权处理.实验结果表明,该方法使具有明显分类特征的特征词发挥了较好的分类效果,抑制了权重小的特征词的干扰,有效地提高的分类的准确率与召回率.  相似文献   

6.
分析HTML文档的结构和特点,对当前基于超文本实现信息隐藏所采用的技术进行了分析,提出一种对超文本文档修改较少且不影响页面显示效果,同时可以实现较大信息量隐藏的新方法.经试验证明,该方法具有较好的不可见性和较高的安全性.  相似文献   

7.
丛翀  吕宝粮 《计算机仿真》2008,25(2):96-99,103
二类分类问题是机器学习中的最基本的一类重要问题.目前广泛使用的,也是最为有效的学习算法是支持向量机 (SVM).然而对于某些非线性分类问题,SVM 还不能给出令人满意的解,因此希望能找到一种方法对 SVM 解决非线性分类问题的能力加以改进.对二类分类问题,提出一种基于感知器的样本空间划分方法.该方法首先用感知器提取样本的分布信息,将整体问题划分为局部空间中的分类问题,而后使用 SVM 求出各个局部问题的最优分界面,并用最小最大模块化网络对局部分界面进行综合,得到问题的全局解.仿真实验表明,新方法能够有效地分析样本空间,提取样本分布信息,在测试数据上得到了比原有方法更好的准确率.新方法实现了预期的目标,提高了分类器处理非线性分类问题的能力.  相似文献   

8.
NewsGrouper:一个自动抽取重要新闻的软件工具   总被引:1,自引:0,他引:1  
介绍了一种自动从因特网上抽取重要新闻的方法,以及按此方法研制而成的软件工具:NewsGrouper。它利用超文本中的信息,对超文本所指向的网页进行聚类分析,进而得到当日的重要新闻。该方法的优点是只需要用户提交信息源,不需要其他人工的干预,自动化能力高。因此,类似的技术也可以应用到文献的自动分类、检索等领域中。  相似文献   

9.
为了获得高效的超文本分类算法,提出了一种新的协调分类超文本算法,并将k-NN,Bayes和文档相似性引入了超文本分类领域,且这对3种分类器的超的分类效果进行了实验比较,最终得出一个高效的超文本分类器,目前,该方法已应用于新开发的两个实验系统;智能搜索引擎系统WebSearch和智能软件助理WebSoft。  相似文献   

10.
知识空间理论提供了一种描述给定知识域结构的方法,它被看作是有效评估学生知识程度的一个基础,但它是一种基于问题的知识空间理论.利用超文本结构和知识空间结构相似的特性,将知识点超文本结构转换为超文本知识空间,使知识空间理论建立在知识点上,并在此超文本知识空间上利用自动机原理实现对学生知识结构的自适应测评过程.给出了基于自动机的自适应测评算法,并进一步分析了算法的有效性和复杂性.  相似文献   

11.
A Study of Approaches to Hypertext Categorization   总被引:34,自引:2,他引:34  
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related Web sites all provide rich information for classifying hypertext documents. How to appropriately represent that information and automatically learn statistical patterns for solving hypertext classification problems is an open question. This paper seeks a principled approach to providing the answers. Specifically, we define five hypertext regularities which may (or may not) hold in a particular application domain, and whose presence (or absence) may significantly influence the optimal design of a classifier. Using three hypertext datasets and three well-known learning algorithms (Naive Bayes, Nearest Neighbor, and First Order Inductive Learner), we examine these regularities in different domains, and compare alternative ways to exploit them. Our results show that the identification of hypertext regularities in the data and the selection of appropriate representations for hypertext in particular domains are crucial, but seldom obvious, in real-world problems. We find that adding the words in the linked neighborhood to the page having those links (both inlinks and outlinks) were helpful for all our classifiers on one data set, but more harmful than helpful for two out of the three classifiers on the remaining datasets. We also observed that extracting meta data from related Web sites was extremely useful for improving classification accuracy in some of those domains. Finally, the relative performance of the classifiers being tested provided insights into their strengths and limitations for solving classification problems involving diverse and often noisy Web pages.  相似文献   

12.
Craven  Mark  Slattery  Seán 《Machine Learning》2001,43(1-2):97-119
We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages, (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone.  相似文献   

13.
The World Wide Web has turned hypertext into a success story by enabling world-wide sharing of unstructured information and informal knowledge. The Semantic Web targets the sharing of structured information and formal knowledge pursuing objectives of achieving collective intelligence on the Web. Germane to the structure of the Semantic Web is a layering and standardization of concerns. These concerns are reflected by an architecture of the Semantic Web that we present through a common use case. Semantic Web data for the use case is now found on the Web and is part of a quickly growing set of Semantic Web resources available for formal processing.  相似文献   

14.
We investigate the possibility of using Semantic Web data to improve hypertext Web search. In particular, we use relevance feedback to create a ‘virtuous cycle’ between data gathered from the Semantic Web of Linked Data and web-pages gathered from the hypertext Web. Previous approaches have generally considered the searching over the Semantic Web and hypertext Web to be entirely disparate, indexing, and searching over different domains. While relevance feedback has traditionally improved information retrieval performance, relevance feedback is normally used to improve rankings over a single data-set. Our novel approach is to use relevance feedback from hypertext Web results to improve Semantic Web search, and results from the Semantic Web to improve the retrieval of hypertext Web data. In both cases, an evaluation is performed based on certain kinds of informational queries (abstract concepts, people, and places) selected from a real-life query log and checked by human judges. We evaluate our work over a wide range of algorithms and options, and show it improves baseline performance on these queries for deployed systems as well, such as the Semantic Web Search engine FALCON-S and Yahoo! Web search. We further show that the use of Semantic Web inference seems to hurt performance, while the pseudo-relevance feedback increases performance in both cases, although not as much as actual relevance feedback. Lastly, our evaluation is the first rigorous ‘Cranfield’ evaluation of Semantic Web search.  相似文献   

15.
Web站点的超链结构挖掘   总被引:11,自引:0,他引:11  
WWW是一个由成千上万个分布在世界各地的Web站点组成的全球信息系统,每个Web站点又是一个由许多Web页构成的信息(子)系统。由于一个文档作者可以通过超链把自己的文档与任意一个已知的Web页链接起来,而一个 Web站点上的信息资源又通常是由许多人共同提供的, 因此 Web站点内的超链链接通常是五花八门、各种各样的,它们可以有各种含义和用途。文章分析了WWW系统中超链的使用特征和规律,提出了一个划分超链类型、挖掘站点结构的方法,初步探讨了它在信息收集和查询等方面的应用。  相似文献   

16.
Before the Web there was Gopher   总被引:1,自引:0,他引:1  
The World Wide Web, universally well known today, was preceded by an efficient software tool that was fondly named Gopher. The Internet Gopher, much like the Web, enabled users to obtain information quickly and easily. Why, then, did it disappear but the Web did not? Gopher faded into obscurity for two main reasons: hypertext and commerce.  相似文献   

17.
Web页面两种模型对照研究   总被引:3,自引:1,他引:2  
Web技术可实现Internet上跨平台超文本和超媒体的链接,使得信息查询和发布方便快捷,于是Web页面的相关技术有了深入的发展,出现了静态和动态两种页面模型。对这两种模型页面的模型及其原理、页面制作、信息(数据)传送方式、数据库的连接与访问技术等四个方面进行了对照研究。  相似文献   

18.
19.
Imprudent linking weaves a tangled Web   总被引:1,自引:0,他引:1  
Lynch  P.J. Horton  S. 《Computer》1997,30(7):115-117
Hypertext linking is often embraced uncritically by Web authors eager to explore the power of hypertext without first considering its effects on their readers' comprehension. Hypertext linking is not a substitute for thought-we think with ideas, not with dissociated snippets of raw information. Even the most germane bits of information cannot become ideas, regardless of how cleverly they are stacked, listed or linked. Ideas define relevance, provide context and establish patterns. With patterns, most concepts become intelligible, and we need continuity and stability of theme and context to recognize patterns. So, like most powerful technologies, hypertext links are a mixed blessing. Used improperly, they can be detrimental to Web sites. “Loose links” can drive away an audience, dilute the site's message, confuse the reader with irrelevant digressions and become a continuing maintenance headache for site authors and Webmasters  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号