首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Data mining to extract information from Web pages can help provide value-added services. The MDR (mining data records) system exploits Web page structure and uses a string-matching algorithm to mine contiguous and noncontiguous data records.  相似文献   

2.
Adapting Web pages for small-screen devices   总被引:3,自引:0,他引:3  
We propose a page-adaptation technique that splits existing Web pages into smaller, logically related units. To do this, we must first solve two technical problems: how to detect an existing Web page's semantic structure, and how to split a Web page into smaller blocks based on that structure. To date, we've implemented our technique in Web browsers for mobile devices, in a proxy server for adapting Web pages on the fly, and as an authoring tool plug-in for converting existing Web pages. The Web page can then be adapted to form a two-level hierarchy with a thumbnail representation at the top level for providing a global view and an index to a set of subpages at the bottom level for detailed information.  相似文献   

3.
跨平台Web网页防火墙构建   总被引:1,自引:0,他引:1  
该文探讨了跨平台Web网页的安全性问题,利用php&myysql技术,提出了构建跨平台Web网页防火墙的IP地址技术和用户权限验证法,并给出了实现过程,对基于Web的应用程序开发和网络数据库的安全技术的研究具有实际意义.  相似文献   

4.
Web服务组合的目的是实现单一服务无法满足的复杂功能,而保证组合的正确性以实现服务增值是十分必要,因此组合之后的正确性验证是Web服务组合的一个重要研究方向。从Web服务交互行为方面描述多Web服务组合,在提出行为兼容性的相关理论的基础上,给出基于Pi演算的多Web服务组合行为兼容性的自动化验证算法,通过将多Web服务组合自动转化为Pi演算中的组合进程,实现多Web服务组合的自动化验证。  相似文献   

5.
航空订票业务的Web服务建模及组合兼容性验证   总被引:1,自引:0,他引:1       下载免费PDF全文
Web服务通过组合基本服务为解决复杂问题提供了方法,于是近年来越来越受到关注。当前交互式下的Web服务在实际组合中还存在诸多问题,其中就包括Web服务组合验证问题。运用Pi演算对航空订票业务的Web服务进行形式化建模,改进基于Pi演算的推理,提供一种验证多个Web服务组合是否兼容的方法。为证实该验证方法,进一步采用MWB工具展示了Web服务组合兼容性的验证过程。  相似文献   

6.
Web服务的相容性分析是为了保证多个Web服务之间的正确交互。基于Pi-演算对Web服务的相容性进行了形式化的分析,给出了二个Web服务之间相容性的两个形式化定义。同时定义了二个Web服务进程之间的投影操作,在此基础上给出了多个Web服务之间相容性的形式化定义。  相似文献   

7.
《Computer Networks》1999,31(11-16):1641-1652
One bottleneck in implementing a system that intelligently queries the Web is developing `wrappers' — programs that extract data from Web pages. Here we describe a method for learning general, page-independent heuristics for extracting data from HTML documents. The input to our learning system is a set of working wrapper programs, paired with HTML pages they correctly wrap. The output is a general procedure for extracting data that works for many formats and many pages. In experiments with a collection of 84 constrained but realistic extraction problems, we demonstrate that 30% of the problems can be handled perfectly by learned extraction heuristics, and around 50% can be handled acceptably. We also demonstrate that learned page-independent extraction heuristics can substantially improve the performance of methods for learning page-specific wrappers.  相似文献   

8.
基于通用的授权与访问控制接口GAA-API,提出了一种网页的细粒度授权与访问控制方法。为网页中的静态资源元素、动态资源元素分别提供细粒度、灵活的访问控制。最后对该方法进行实验测试,并对测试结果进行比较分析。  相似文献   

9.
提出了一种简单且高效的网页关注度计算算法。通过对网页关注度的计算,可以在网页展现时满足用户的信息检索需求。该算法针对不同用户的不同需求,可以让相同网页对不同用户体现出不同的关注度。对算法进行了详细描述,给出了算法的Java实现,并用实例对算法进行了验证,结果证明了算法的有效性。  相似文献   

10.
Liu  Xiuwen  Liu  Zhi  Jiao  Qihan  Le Meur  Olivier  Zhao  Wan-Lei 《Multimedia Tools and Applications》2019,78(15):21629-21644
Multimedia Tools and Applications - This paper proposes a novel saliency-aware inter-image color transfer method to perform image manipulation. Specifically, given the source image, the candidate...  相似文献   

11.
Underexposed, low-light, images are acquired when scene illumination is insufficient for a given camera. Camera limitation originates in the high chance of producing motion blurred images due to shaky hands. In this paper we suggest to actively use underexposing as a measure to prevent motion blurred images to appear and propose a novel color transfer as a method for low light image amplification. The proposed solution envisages a dual acquisition, containing a normally exposed, possibly blurred image and an underexposed/low-light, but sharp one. Good colors are learned from the normal exposed image and transferred to the low light one using a framework matching solution. To ensure that the transfer is spatially consistent, the images are divided into luminance perceptual consistent patches called frameworks and the optimal mapping is piece-wise approximated. The two image may differ by colors and subject to improve the robustness of the spatial matching, we added supplementary extreme channels. The proposed method shows robust results from both an objective and a subjective point of view.  相似文献   

12.
13.
Web网页中往往包含许多主题噪声,准确地自动抽取关键词成为技术难点。提出了一个文本对象网络模型DON,给出了对象节点的中心度概念和基于中心度的影响因子传播规则,并据此自动聚集DON中的主题社区(topic society),从而提高了模型的抗噪能力。提出一个基于DON的网页关键词自动抽取算法KEYDON(Keywords Extraction Algorithm Based on DON)。实验结果表明,与基于DocView模型的相应算法相比,KEYDON的准确率提高了近20%,这说明DON模型具有较强的抑制主题噪声能力。  相似文献   

14.
Increasing high volume phishing attacks are being encountered every day due to attackers’ high financial returns. Recently, there has been significant interest in applying machine learning for phishing Web pages detection. Different from literatures, this paper introduces predicted labels of textual contents to be part of the features and proposes a novel framework for phishing Web pages detection using hybrid features consisting of URL-based, Web-based, rule-based and textual content-based features. We achieve this framework by developing an efficient two-stage extreme learning machine (ELM). The first stage is to construct classification models on textual contents of Web pages using ELM. In particular, we take Optical Character Recognition (OCR) as an assistant tool to extract textual contents from image format Web pages in this stage. In the second stage, a classification model on hybrid features is developed by using a linear combination model-based ensemble ELMs (LC-ELMs), with the weights calculated by the generalized inverse. Experimental results indicate the proposed framework is promising for detecting phishing Web pages.  相似文献   

15.
为了进一步提高网页相关性判断的速度和准确率,提出了一种新的用于聚焦文摘的句子权重计算方法。在查询返回的结果集的基础上,通过计算关键词间的互信息,对输入的查询语句进行短语识别;利用网页文本中的标签信息,对网页结构进行分析,并将关键词短语和网页结构等信息融入句子权重计算。实验结果表明,基于该算法生成的查询摘要在相关性判断的速度和准确率等方面均优于现有方法。  相似文献   

16.
The Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving, and reusing information. Since Web pages are designed to be read by people, not machines, searching and reusing information on the Web is a difficult task without human participation. To this aim adding semantics (i.e meaning) to a Web page would help the machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google’s Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide a structured approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting a data reverse engineering method, combined with several heuristics, and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets’ vocabulary, i.e., all the attributes of its Review concept, and the names of the Person and Organization concepts. We implemented tools and services and evaluated the accuracy of the approach on real E-commerce Web sites.  相似文献   

17.
基于动态异构的Web信息集成网页分析方法   总被引:1,自引:0,他引:1  
将动态异构的Web信息资源进行抽取以统一的方式供用户查询和使用,是当前迫切需要解决的问题。介绍了分析相关Web页面的方法和经验,实现了自动提交HTML表单获得所需页面和对页面的信息抽取。最后,实验证明了此方法的有效性。  相似文献   

18.
导航型网页中往往包含了大量的噪声信息,为自动提取网页中的关键词带来了较大的困难。 为此,提出一个新的网页表示模型PIX-PAGE和导航型网页关键词自动抽取算法P-KEA。PIX-PAGE模型利用提出的区域合并算法,将一张网页分割为适当粒度的区域;然后,依据人类视觉特点,对各区域进行视觉“奇异性”量化,同时利用奇异性传递规则进一步强化关键词相关区域的视觉“奇异性”。P-KEA根据PIX-PAGE模型模型的视觉量化结果,能够较准确地找到视觉突出区域中的关键词。实验结果表明,与基于DocView模型的算法DVM相比,P-KEA的准确率平均提高了20.9%。  相似文献   

19.
付国瑜  黄贤英 《计算机应用》2009,29(4):1114-1116
针对Web搜索引擎的特点,提出了一种基于量子遗传克隆挖掘(QGCMA)的搜索策略。该算法将用户的查询描述为Web页面的平均质量,并通过克隆,变异,交叉的操作获取具有高亲和度的抗体(Web页面)。通过实验结果分析得出,在Web搜索中该方法比标准的遗传算法(GA)具有较明显的优势。  相似文献   

20.
Conceptual-model-based data extraction from multiple-record Web pages   总被引:7,自引:0,他引:7  
Electronically available data on the Web is exploding at an ever increasing pace. Much of this data is unstructured, which makes searching hard and traditional database querying impossible. Many Web documents, however, contain an abundance of recognizable constants that together describe the essence of a document's content. For these kinds of data-rich, multiple-record documents (e.g., advertisements, movie reviews, weather reports, travel information, sports summaries, financial statements, obituaries, and many others) we can apply a conceptual-modeling approach to extract and structure data automatically. The approach is based on an ontology – a conceptual model instance – that describes the data of interest, including relationships, lexical appearance, and context keywords. By parsing the ontology, we can automatically produce a database scheme and recognizers for constants and keywords, and then invoke routines to recognize and extract data from unstructured documents and structure it according to the generated database scheme. Experiments show that it is possible to achieve good recall and precision ratios for documents that are rich in recognizable constants and narrow in ontological breadth. Our approach is less labor-intensive than other approaches that manually or semiautomatically generate wrappers, and it is generally insensitive to changes in Web-page format.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号