期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于OEM的XML半结构数据的模式描述方法 总被引：3，自引：1，他引：3

聂培尧李战怀胡正国《计算机工程与设计》2003,24(1):9-12,29

半结构数据的类型和模式是提高半结构数据处理效率的关键技术，首先对半结构数据的特点及半结构数据的模式的特点进行了论述，然后对基于XML的模式描述形式进行了研究，提出了一种基于OEM的XMLDTD模式的定义和形式化描述方法。相似文献

2.

Extracting local schema from semistructured data based on graph-oriented semantic model

下载免费PDF全文

王腾蛟唐世渭等《计算机科学技术学报》2001,16(6):0-0

Many modern applications(e-commerce,digital library,etc.)require integrated access to various information sources(from tr5aditional RDBMS to semistructured Web repositories).Extracting schema from semistructured data is a prereuisite to integrated heterogeneous information sources.The traditional method that extracts global schema may require time (and space)to increase exponentially with the number of objects and edges in the source.A new method is presented in this paper.which is about extracting local schema,In this method,the algorithm controls the scale of extracting schema within the “schema diameter“ by examining the semantic distance of the target set and using the Hash class and its path distance operation.This method is very efficient for restraining schema from expanding.The prototype validates the new approach. 相似文献

3.

Extracting Web Data Using Instance-Based Learning 总被引：1，自引：0，他引：1

Yanhong Zhai Bing Liu 《World Wide Web》2007,10(2):113-132

This paper studies structured data extraction from Web pages. Existing approaches to data extraction include wrapper induction and automated methods. In this paper, we propose an instance-based learning method, which performs extraction by comparing each new instance to be extracted with labeled instances. The key advantage of our method is that it does not require an initial set of labeled pages to learn extraction rules as in wrapper induction. Instead, the algorithm is able to start extraction from a single labeled instance. Only when a new instance cannot be extracted does it need labeling. This avoids unnecessary page labeling, which solves a major problem with inductive learning (or wrapper induction), i.e., the set of labeled instances may not be representative of all other instances. The instance-based approach is very natural because structured data on the Web usually follow some fixed templates. Pages of the same template usually can be extracted based on a single page instance of the template. A novel technique is proposed to match a new instance with a manually labeled instance and in the process to extract the required data items from the new instance. The technique is also very efficient. Experimental results based on 1,200 pages from 24 diverse Web sites demonstrate the effectiveness of the method. It also outperforms the state-of-the-art existing systems significantly. 相似文献

4.

一个基于模式的XML存储模型 总被引：4，自引：0，他引：4

盛铁强仇建伟高天仕《计算机工程与应用》2004,40(20):184-187

XML基于半结构化数据模型,而半结构化数据很难统一存储和管理。文章提出了一个基于模式的XML存储模型SBSM,并定义了如何在该模型上进行相关的操作,该模型克服了对象-关系映射模型的局限性,并支持直接在模型上进行查询操作。相似文献

5.

网络环境下海量信息的局部模式提取方法

下载免费PDF全文

王腾蛟唐世渭杨冬青刘云峰《软件学报》2001,12(11):1639-1646

海量信息的模式提取是网络环境下海量信息集成研究的难点.给出了一种新的局部精确模式提取及其增量保持方法,通过探测目标集的路径距离,利用Hash类及其路径距离操作,将模式的生成规模控制在"模式直径"范围内,从而有效地抑制了模式膨胀. 相似文献

6.

在网络使用挖掘中的应用条件序列模式分析

佘东晓陈传波《计算机工程与科学》2003,25(5):23-26

网络使用挖掘是通过分析记录在Web服务器上的用户使用数据，来自动发现用户访问信息网的方式。其挖掘结果可以用于改善网站设计、商业决策支持、个性化服务等方面。序列模式分析是数据挖掘使用的模式分析的一种方式。本文主要介绍了一种适应复杂条件限制的序列模式分析在网络使用挖掘中的应用及其一般步骤。相似文献

7.

Web Data Extraction from Query Result Pages Based on Visual and Content Features

下载免费PDF全文

Daiyue Weng Jun Hong David A. Bell 《International Journal of Software and Informatics》2012,6(3):453-472

A rapidly increasing number of Web databases are now become accessible via their HTML form-based query interfaces. Query result pages are dynamically generated in response to user queries, which encode structured data and are displayed for human use. Query result pages usually contain other types of information in addition to query results, e.g., advertisements, navigation bar etc. The problem of extracting structured data from query result pages is critical for web data integration applications, such as comparison shopping, meta-search engines etc, and has been intensively studied. A number of approaches have been proposed. As the structures of Web pages become more and more complex, the existing approaches start to fail, and most of them do not remove irrelevant contents which may affect the accuracy of data record extraction. We propose an automated approach for Web data extraction. First, it makes use of visual features and query terms to identify data sections and extracts data records in these sections. We also represent several content and visual features of visual blocks in a data section, and use them to filter out noisy blocks. Second, it measures similarity between data items in different data records based on their visual and content features, and aligns them into different groups so that the data in the same group have the same semantics. The results of our experiments with a large set of Web query result pages in di?erent domains show that our proposed approaches are highly effective. 相似文献

8.

WEB用户的视图 总被引：13，自引：2，他引：13

阳小华周龙骧《软件学报》1999,10(7):690-693

视图不仅是数据库中的一个重要概念,也能够在Web系统中发挥重要的作用.但是,Web视图不能完全照搬数据库视图的概念,而应该体现出Web特色.文章提出了浏览区域的概念,能较好地刻画Web用户活动的特征.在此概念的基础上,给出一个能体现Web特色的用户视图的定义,初步探讨了Web用户视图的实现方法和一些可能的应用. 相似文献

9.

中文深层网络的模式匹配和接口集成

张晶星《计算机系统应用》2012,21(12):203-205,185

目前国内外在深层网络方面的研究几乎都围绕英文环境进行,还没有针对中文深层网络的研究．提出了对中文深层网络进行模式匹配和接口集成的方法．该方法首先创建一个用来存储同义词、超义词和子义词的字典,然后使用基于规则的分词算法将从接口中抽取的属性分成词．对于每一个属性,从定义的字典中找到其对应的所有同义词、超义词和子义词,生成一条相应的记录并存储到列表中,再从每条记录中选取出现次数最多的属性作为联合接口的属性．相似文献

10.

面向主题的Web信息收集系统的设计与实现 总被引：7，自引：0，他引：7

潘春华武港山《小型微型计算机系统》2003,24(12):2150-2154

随着互联网信息的持续爆炸性增长，通用搜索引擎的信息覆盖率和检索精度都在不断下降，发展面向主题信息的专用网络信息检索工具已经成为趋势。文中提出的面向主题的Web信息收集系统是这类工具的核心部件，该系统采用文档矢量模型进行文档相关度计算，并结合页面链接的上下文信息过滤页面；借鉴并修改了Shark启发式查找算法来查找相关页面；可采用多机并行下载提高收集效率；并依据站点的重要程度进行动态更新。在一个面向Internet的计算机教学资源检索的搜索引擎中具体实现了这个Web信息收集系统，整个系统在低性能的台式机上就能运行，并可获得较高的属于指定主题的页面的收集精度和收集效率。相似文献

11.

Web search engine: Characteristics of user behaviors and their implication 总被引：5，自引：0，他引：5

王建勇单松巍雷鸣谢正茂李晓明《中国科学F辑(英文版)》2001,44(5):351-365

In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine. 相似文献

12.

Web数据空间边建边用模式集成 总被引：1，自引：0，他引：1

刘正涛王建东《计算机科学与探索》2011,5(1):87-96

使用边建边用的方法,实现了Web数据空间的模式集成。在模式集成时,提出了一个Web数据空间模式集成的框架,运用组合方法,创建了一个中间模式;同时,为用户提供了Top-k个源数据模式。实验表明该方法通过用户参与可以提高查询的准确率与召回率,Top-k个源数据模式的提供,明显提高了查询的有效性。相似文献

13.

Constraint Preserving Transformation from Relational Schema to XML Schema

Chengfei Liu Dr. Millist W. Vincent Jixue Liu 《World Wide Web》2006,9(1):93-110

XML has become the standard for publishing and exchanging data on the Web. However, most business data is managed and will remain to be managed by relational database management systems. As such, there is an increasing need to efficiently and accurately publish relational data as XML documents for Internet-based applications. One way to publish relational data is to provide virtual XML documents for relational data via an XML schema which is transformed from the underlying relational database schema such that users can access the relational database through the XML schema. In this paper, we discuss issues in transforming a relational database schema into the corresponding XML schema. We aim to preserve all integrity constraints defined in a relational database schema, to achieve high level of nesting and to avoid introducing data redundancy in the transformed XML schema. In the paper, we first propose a basic transformation algorithm which introduces no data redundancy, then we improve the algorithm by exploring further nesting of the transformed XML schema. 相似文献

14.

基于源模式分裂的模式匹配算法

张凌宇刘国华褚兵义王聪麻会东苑迎《计算机研究与发展》2008,45(Z1):196-201

模式匹配就是在作为输入的模式中有对应语义关系的元素间产生一个映射.为了提高模式匹配的效率,提出了一种新型的模式匹配方法--源模式分裂模式匹配算法.它可以解决标准模式匹配难以解决的问题:1)源模式的某一个属性和多个目标模式的多个属性之间建立匹配关系;2)表格中的不同元组对应其他表格同一元组的不同属性值的匹配.在匹配过程中,该方法先搜索种类型属性,然后根据种类型属性建立选择条件,最后把源模式进行分裂形成视图,再重新生成候选匹配集合,从而提高模式匹配的质量. 相似文献

15.

Incremental mining of the schema of semistructured data

下载免费PDF全文

ZHOU Aoying JIN Wen ZHOU Shuigeng QIAN Weining TIAN Zenping 《计算机科学技术学报》2000,15(3):241-248

Semistructued data are specified in lack of any fixed and rigid schema,even though typically some implicit structure appears in the data.The huge amounts of on-line applications make it important and imperative to mine the schema of semistructured data ,both for the users(e.g.,to gather useful information and facilitate querying)and for the systems (e.g.,to optimize access).The critical problem is to discover the hidden structure in the semistructured data.Current methods in extracting Web data structure are either in a general way independent of application background,or bound in some concrete environment such as HTML,XML etc.But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data.In this paper,the problem of incremental mining of schema for semistructured data after the update of the raw data is discusses.An algorithm for incrementally mining the schema of semistructured data is provided,and some experimental results are also given,which show that incremental mining for semistructured data is more efficient than non-incremental mining. 相似文献

16.

Ｗeb数据管理研究综述 总被引：53，自引：1，他引：53

孟小峰《计算机研究与发展》2001,38(4):385-395

综述了Ｗeb数据管理技术,对Web数据管理的研究给出了定义,就Web数据管理的几个重要问题给出了阐述,在此基础上提出了一种基于XML的Web数据管理系统的框架和待研究的问题。相似文献

17.

钓鱼网页特征向量提取方法研究

司响李秋锐宋士超《信息网络安全》2011,(9):201-203

18.

基于代价的XML Schema到关系模式的映射策略研究

下载免费PDF全文

孙媛媛柴瑞敏李昊洋《计算机工程与科学》2009,31(12)

XML作为一种数据交换的国际标准,已经广泛应用于各个领域,如何准确地实现XML与关系数据库的转换已经成为一个重要的研究课题。本文主要研究XML Schema到关系模式的映射方法,通过对XML Schema的结构和语法进行分析,提取元素的语义信息,找到真正的复杂类型的元素,将XML Schema转换为E_Schema,E_Schema表达方式简单且信息完整,结合基于查询代价的思想对E_Schema进行处理,得到最优模式,再将其转换成关系模式,而且能保持原有的层次关系,从而实现了一个更为简单、完善的映射方案。相似文献

19.

一种XML Schema模式到关系模式的映射算法 总被引：4，自引：0，他引：4

康晓兵张二虎吴学毅《计算机应用》2004,24(5):106-108

XML文档数据如何存储在主流的关系数据库中,对企业信息集成起着至关重要的作用。针对此问题,提出一种XML Schema模式到关系模式的约束保留映射算法——XSD2RS,该算法基于模式对象组件建模和约束保留机制来完成XML Schema模式到关系模式的映射,进而实现XML文档数据在关系数据库中的存储。相似文献

20.

Extracting Schema from an OEM Database 总被引：1，自引：0，他引：1

下载免费PDF全文

Shen Yidong 《计算机科学技术学报》1998,13(4):289-299

While the schema-less feature of the OEM(Object Exchange Modl)gives flexibility in representing semi-structured data,it brings difficulty in formulating database queries. Extracting schema from an OEM database then becomes an important research topic.This paper presents a new approach to this topic with th following reatures.(1)In addition to representing th nested label structure of an OEM database,the proposed OEM schema keeps up-tp-date information about instance objects of the database,The object-level information is useful in speeding up query evaluation.(2)The OEM schema is explicitly represented as a label-set,which is easy to construct and update.(3)The OEM schema of a database is statically built and dynamically updated.The time complexity of building the OEM schems is linear in the size of the OEM database.(4)The approach is applicable to a wide range of areas where the underlying schema is much smaller than the database itself(e.g.data warehouses that are made from a set of heterogeneous databases). 相似文献