首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
Automatic integration of Web search interfaces with WISE-Integrator   总被引:3,自引:0,他引:3  
An increasing number of databases are becoming Web accessible through form-based search interfaces, and many of these sources are database-driven e-commerce sites. It is a daunting task for users to access numerous Web sites individually to get the desired information. Hence, providing a unified access to multiple e-commerce search engines selling similar products is of great importance in allowing users to search and compare products from multiple sites with ease. One key task for providing such a capability is to integrate the Web search interfaces of these e-commerce search engines so that user queries can be submitted against the integrated interface. Currently, integrating such search interfaces is carried out either manually or semiautomatically, which is inefficient and difficult to maintain. In this paper, we present WISE-Integrator - a tool that performs automatic integration of Web Interfaces of Search Engines. WISE-Integrator explores a rich set of special metainformation that exists in Web search interfaces and uses the information to identify matching attributes from different search interfaces for integration. It also resolves domain differences of matching attributes. In this paper, we also discuss how to automatically extract information from search interfaces that is needed by WISE-Integrator to perform automatic interface integration. Our experimental results, based on 143 real-world search interfaces in four different domains, indicate that WISE-Integrator can achieve high attribute matching accuracy and can produce high-quality integrated search interfaces without human interactions.Received: 2 January 2004, Accepted: 25 March 2004, Published online: 12 August 2004Edited by: M. Carey  相似文献   

2.
深层网数据库的访问方式主要是通过查询接口,所以查询接口是外部访问深层网数据库的门户.为了能够同时访问同一领域多个Web数据库,需要对多个Web数据库的查询接口进行集成.因此,提出基于本体的深层网查询接口集成方法.首先构建领域核心本体,在模式匹配过程中,不断完善核心本体;然后,以本体作为媒介,在不同查询接口模式间建立属性映射关系,发现属性间的语义关联;最后,根据本体概念出现的频数生成集成接口.实验表明提出的深层网查询接口自动集成方法是可行的和高效的.  相似文献   

3.
Databases deepen the Web   总被引:2,自引:0,他引:2  
Ghanem  T.M. Aref  W.G. 《Computer》2004,37(1):116-117
The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.  相似文献   

4.
电子商务网站以查询接口的方式提供商务信息,查询接口也是隐藏在后端的Deep Web数据库模式信息的载体.有效解析查询接口是访问Deep Web资源的第1步,但是由于查询接口在不同的设计模式和开发语言下实现,所以导致了属性难以抽取、语义关系复杂的现象.为提高属性抽取的准确率且实现在语义层面上对查询接口的解读,提出一种以查询接口启发式信息为基础的属性抽取方法,通过使用本体工具对属性集合进行拓展并获取语义描述.在实际的电子商务网站上进行的广泛实验证明了提出方法的可行性与有效性.  相似文献   

5.
王兵  ;刘彩虹 《微机发展》2008,(7):176-180
随着Internet信息的迅速增长,许多Web信息已经被各种各样的可搜索在线数据库所深化,并被隐藏在Web查询接口下面。传统的搜索引擎由于技术原因不能索引这些信息——DeepWeb信息。由于DeepWeb惟一“入口点”是查询接口,为使查询接口自动产生有意义有查询,给出了DeepWeb信息集成系统框架,提出了基于数据类型的搜索驱动的用户查询转换方法,基于此设计并实现了一个针对中文DeepWeb信息集成原型系统。通过在实际DeepWeb站点上的实验证明了此方法是非常有效的。  相似文献   

6.
基于《知网》的中文Deep Web模式匹配算法研究   总被引:1,自引:1,他引:0  
金玉  范学峰 《计算机应用研究》2009,26(10):3750-3753
随着数据库在Internet中的应用日益广泛,Deep Web集成(即Web数据库集成)成为当前信息领域的研究热点,模式匹配是Deep Web查询接口集成中的一个关键问题。目前大多数这方面的研究都是基于英文的,针对这种情况,探讨了中文Deep Web查询接口的模式匹配方法,并提出了一种基于《知网》、面向中文语义的模式匹配算法,并利用属性在查询接口上的相对位置信息解决语义冲突。手工收集查询表单对算法进行验证,实验表明该方法能使得接口之间属性匹配的正确率达到90 %以上。  相似文献   

7.
一种基于树结构的Web数据自动抽取方法   总被引:8,自引:2,他引:8  
介绍了一种基于树结构的自动从HTML页面中抽取数据的方法.在HTML页面的树形结构之上,提出了基于语义块的HTML页面结构模型:HTML页面中的数据值主要存在于语义块中,不同的HTML页面的主要区别在于语义块的区别.基于语义块的结构模型,自动抽取通过4个步骤完成:通过HTML页面比较发现语义块;区分语义块中数据值的角色;推导数据模式和推导抽取规则.在实际HTML页面上的实验已经证明,这种方法能够达到较高的正确率,同时,随着文档的增大,方法也能够保证线性的时间复杂度.  相似文献   

8.
9.
基于规则的关系数据库到本体的转换方法*   总被引:3,自引:1,他引:2  
提出了一种新的全自动的关系数据库到本体的转换方法,通过分析关系模式的主键、属性、引用关系、完整性约束和部分数据来创建本体,尽量保持了关系数据库的信息,并在构建的过程中对信息进行初步的集成和分类.系统实践证明,该方法可自动进行关系模式和数据到本体的等价转换,而且完成了对关系数据库中部分语义信息的辅助挖掘.  相似文献   

10.
The amount of information contained in databases available on the Web has grown explosively in the last years. This information, known as the Deep Web, is heterogeneous and dynamically generated by querying these back-end (relational) databases through Web Query Interfaces (WQIs) that are a special type of HTML forms. The problem of accessing to the information of Deep Web is a great challenge because the information existing usually is not indexed by general-purpose search engines. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in the Deep Web. Since WQIs are the only means to access to the Deep Web, the automatic identification of WQIs plays an important role. It facilitates traditional search engines to increase the coverage and the access to interesting information not available on the indexable Web. The accurate identification of Deep Web data sources are key issues in the information retrieval process. In this paper we propose a new strategy for automatic discovery of WQIs. This novel proposal makes an adequate selection of HTML elements extracted from HTML forms, which are used in a set of heuristic rules that help to identify WQIs. The proposed strategy uses machine learning algorithms for classification of searchable (WQIs) and non-searchable (non-WQI) HTML forms using a prototypes selection algorithm that allows to remove irrelevant or redundant data in the training set. The internal content of Web Query Interfaces was analyzed with the objective of identifying only those HTML elements that are frequently appearing provide relevant information for the WQIs identification. For testing, we use three groups of datasets, two available at the UIUC repository and a new dataset that we created using a generic crawler supported by human experts that includes advanced and simple query interfaces. The experimental results show that the proposed strategy outperforms others previously reported works.  相似文献   

11.
智能搜索引擎是解决当前网络信息检索中存在诸多瓶颈问题的有效途径。智能搜索引擎需要获取、预处理、表示和集成不同层次的(如HTML/XML/RDF/OWL文档)的数据和信息,并最终转换成各领域的智能语义信息。领域本体是实行智能的关键。提出了一种实现从Web文档中(半)自动构建本体的学习系统框架,并讨论本体学习中概念的获取、相互关系的获取等关键问题。  相似文献   

12.
13.
一种基于图模型的Web数据库采样方法   总被引:5,自引:0,他引:5  
刘伟  孟小峰  凌妍妍 《软件学报》2008,19(2):179-193
Web数据库中,海量的信息隐藏在具有特定查询能力的查询接口后面,使人无法了解一个Web数据库内容的特征,比如主题的分布、更新的频率等,这就为DeepWeb数据集成带来了巨大的挑战.为了解决这个问题,提出了一种基于图模型的Web数据库采样方法,可以通过查询接口从Web数据库中以增量的方式获取近似随机的样本,即每次查询获取一定数量的样本记录,并且利用已经保存在本地的样本记录生成下一次的查询.该方法的一个重要特点是不受查询接口中属性表现形式的局限,因此是一种一般的Web数据库采样方法.在本地的模拟实验和真实Web数据库上的大量实验表明,该方法可以在较小代价下获得高质量的样本.  相似文献   

14.
The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we present Hermes, an infrastructure for data Web search that addresses a number of challenges involved in realizing search on the data Web. To provide an end-user oriented interface, we support expressive user information needs by translating keywords into structured queries. We integrate heterogeneous Web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local Web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data Web, we show the practicability and scalability of the Hermes infrastructure.  相似文献   

15.
Many types of information are geographically referenced and interactive maps provide a natural user interface to such data. However, map presentation in geographical information systems and on the Web is closed related to traditional cartography and provides a very limited interactive experience. In this paper, we present MAPBOT, an interactive Web based map information retrieval system in which Web users can easily and efficiently search geographical information with the assistance of a user interface agent (UIA). Each kind of map feature such as a building or a motorway works as an agent called a Maplet. Each Maplet has a user interface level to assist the user to find information of interest and a graphic display level that controls the presence and the appearance of the feature on the map. The semantic relationships of Maplets are defined in an Ontology Repository provided by the system which is used by the UIA to assist a user to semantically and efficiently search map information interested. An Ontology Editor with a graphic user interface has been implemented to update the Ontology Repository. Visualization on the client is based on Scalable Vector Graphics which provides a high quality Web map.  相似文献   

16.
Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.  相似文献   

17.
冯永  张洋 《计算机应用》2012,32(6):1688-1691
查询接口模式匹配是Deep Web信息集成中的关键部分,双重相关性挖掘方法(DCM)能有效利用关联挖掘方法解决复杂接口模式匹配问题。针对DCM方法在匹配效率、匹配准确性方面的不足,提出了一种基于匹配度和语义相似度的新模式匹配方法。该方法首先使用矩阵存储属性间的关联关系,然后采用匹配度计算属性间的相关度,最后利用语义相似度计算候选匹配的相似性。通过在美国伊利诺斯大学的BAMM数据集上进行实验,所提方法与DCM及其改进方法比较有更高的匹配效率和准确性,表明该方法能更好地处理接口之间模式匹配问题。  相似文献   

18.
使用分类器自动发现特定领域的深度网入口   总被引:4,自引:0,他引:4  
王辉  刘艳威  左万利 《软件学报》2008,19(2):246-256
在深度网研究领域,通用搜索引擎(比如Google和Yahoo)具有许多不足之处:它们各自所能覆盖的数据量与整个深度网数据总量的比值小于1/3;与表层网中的情况不同,几个搜索引擎相结合所能覆盖的数据量基本没有发生变化.许多深度网站点能够提供大量高质量的信息,并且,深度网正在逐渐成为一个最重要的信息资源.提出了一个三分类器的框架,用于自动识别特定领域的深度网入口.查询接口得到以后,可以将它们进行集成,然后将一个统一的接口提交给用户以方便他们查询信息.通过8组大规模的实验,验证了所提出的方法可以准确高效地发现特定领域的深度网入口.  相似文献   

19.
SEEKER:基于关键词的关系数据库信息检索   总被引:20,自引:3,他引:20  
文继军  王珊 《软件学报》2005,16(7):1270-1281
传统上,SQL是存取关系数据库中数据的主要界面.但是,对于没有经验的用户来说,学习复杂的SQL语法是一件困难的事情.实现基于关键词的关系数据库信息检索,将使用户不需要任何SQL语言和底层数据库模式的知识,用搜索引擎的方式来获取数据库中的相关数据.描述了一个基于关键词的关系数据库信息检索系统SEEKER的设计和实现.现有的关系数据库关键词查询系统只能检索关系数据库中的文本属性,而SEEKER还可以检索数据库元数据以及数字属性.并且,SEEKER采用了更合理的排序公式,支持Top-k查询.实验结果显示,SEEKER具有良好的查询性能.  相似文献   

20.
A central task in the development of context-aware applications is the modeling and management of complex context information. In this paper, we present the NexusEditor, which can ease this task by providing a graphical user interface to design schemas for spatial and technical context models, interactively create queries, send them to a server and visualize the results. One main contribution is to show how schema awareness can improve such a tool: The NexusEditor dynamically parses the underlying data model and provides additional syntactic checks, semantic checks, and short-cuts based on the schema information. Furthermore, the tool helps to design new schema definitions based on the existing ones, which is crucial for an iterative and user-centric development of context-aware applications. Finally, it provides interfaces to existing information spaces and visualization tools for spatial data like GoogleEarth.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号