首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
XML结构聚类     
郝晓丽  冯志勇 《计算机应用》2005,25(6):1398-1400
针对当前XML文档结构聚类算法的一些不足,提出采用段匹配的概念来计算两棵XML文档树中的路径相似性,并在此基础上得出两棵树整体的相似度量。在整个聚类过程中,算法还把一组相关文档与一个XML聚类代表相关联,该聚类代表就包含了一个文档集合中所有文档的最相关的特征。为了构建聚类代表,算法通过构造最佳匹配树,合并树,修剪树三步来实现。通过比较聚类代表,发现新的聚类时更新聚类代表来完成文档聚类。实验结果就充分展现了算法的有效性。  相似文献   

2.
Author-X is a Java-based system that addresses the security issues of access control and policy design for XML document administration. Author-X supports the specification of policies at varying granularity levels and the specification of user credentials as a way to enforce access control. Access control is available according to both push and pull document distribution policies, and document updates are distributed through a combination of hash functions and digital signature techniques. The Author-X approach to distributed updates allows a user to verify a document's integrity without contacting the document server  相似文献   

3.
Access control techniques for XML provide a simple way to protect confidential information at the same granularity level provided by XML schemas. In this article, we describe our approach to these problems and the design guidelines that led to our current implementation of an access control system for XML information  相似文献   

4.
Clustering XML documents is extensively used to organize large collections of XML documents in groups that are coherent according to structure and/or content features. The growing availability of distributed XML sources and the variety of high-demand environments raise the need for clustering approaches that can exploit distributed processing techniques. Nevertheless, existing methods for clustering XML documents are designed to work in a centralized way. In this paper, we address the problem of clustering XML documents in a collaborative distributed framework. XML documents are first decomposed based on semantically cohesive subtrees, then modeled as transactional data that embed both XML structure and content information. The proposed clustering framework employs a centroid-based partitional clustering method that has been developed for a peer-to-peer network. Each peer in the network is allowed to compute a local clustering solution over its own data, and to exchange its cluster representatives with other peers. The exchanged representatives are used to compute representatives for the global clustering solution in a collaborative way. We evaluated effectiveness and efficiency of our approach on real XML document collections varying the number of peers. Results have shown that major advantages with respect to the corresponding centralized clustering setting are obtained in terms of runtime behavior, although clustering solutions can still be accurate with a moderately low number of nodes in the network. Moreover, the collaborativeness characteristic of our approach has revealed to be a convenient feature in distributed clustering as found in a comparative evaluation with a distributed non-collaborative clustering method.  相似文献   

5.
As XML files are less redundant and readily reorganized, it is really difficult to design a XML watermarking scheme which can get a trade-off between robust and invisible. However, this trade-off can be achieved by the Zero-Watermark method. In this paper, two Zero-Watermark methods are designed and tested for XML documents. One is XSLT-related method which is designed with embedding extra codes in XSLT file to serve as sort of copyright function. Another uses the functional dependency of XML file as feature for Zero-Watermark. Experiment results show that both methods have good real-time performances. Experiment results show that Zero-Watermark algorithm with functional dependency can resist selection attacks, alteration attacks, reorganization attacks and compression attacks.  相似文献   

6.
XML关系保护模型   总被引:1,自引:1,他引:0  
为了更好的保护节点关系所携带的信息以及XML的敏感信息.提出了利用"块"来组织的访问控制策略,并使用虚节点来指定与关系相关的机密限制.对具有块概念的XML-BB访问控制模型进行了研究,该模型能够隐藏不同块间节点的关系,以一种细粒度访问控制加强了XML敏感信息的保护.实验结果表明了该模型的可行性.  相似文献   

7.
The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most effective in exchanging data, i.e., in syntactic interoperability, it has been proven limited when it comes to handling semantics, i.e.,  semantic interoperability, since it only specifies the syntactic and structural properties of the data without any further semantic meaning. As a result, XML semantic-aware processing has become a motivating challenge in Web data management, requiring dedicated semantic analysis and disambiguation methods to assign well-defined meaning to XML elements and attributes. In this context, most existing approaches: (i) ignore the problem of identifying ambiguous XML elements/nodes, (ii) only partially consider their structural relationships/context, (iii) use syntactic information in processing XML data regardless of the semantics involved, and (iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDFdesigned to address each of the above limitations, taking as input: an XML document, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts extracted from a reference machine-readable semantic network. XSDF consists of four main modules for: (i) linguistic pre-processing of simple/compound XML node labels and values, (ii) selecting ambiguous XML nodes as targets for disambiguation, (iii) representing target nodes as special sphere neighborhood vectors including all XML structural relationships within a (user-chosen) range, and (iv) running context vectors through a hybrid disambiguation process, combining two approaches: concept-basedand context-based disambiguation, allowing the user to tune disambiguation parameters following her needs. Conducted experiments demonstrate the effectiveness and efficiency of our approach in comparison with alternative methods. We also discuss some practical applications of our method, ranging over semantic-aware query rewriting, semantic document clustering and classification, Mobile and Web services search and discovery, as well as blog analysis and event detection in social networks and tweets.  相似文献   

8.
适合于关系数据库存储的XML文档分解   总被引:5,自引:0,他引:5  
作为未来主宰Web种结构化表达应用的XML语言正得到越来越广泛的应用。但如何分解XML文档并将其数据信息存入主流的关系数据库中,却是阻止XML应用展开的一个首要问题,在考察国内外各个方面对XML文档分解的基础上,提出了适合于关系数据库存储分解内容信息7个数据类的方法。  相似文献   

9.
对XML文档查询的常用方法有两种:一种是使用查询语言;另一种是使用关键字,而使用关键字查询XML文档比使用查询语言更为简单方便。给出了一种使用关键字查询XML文档的索引查找算法。该算法只需要扫描一次关键字对应的编码列,就可以找到需要的编码,提高了查询效率。实验表明该算法是可行的和有效的。  相似文献   

10.
Using structural similarity for clustering XML documents   总被引:2,自引:2,他引:0  
In this paper, we describe a method for clustering XML documents. Its goal is to group documents sharing similar structures. Our approach is two-step. We first automatically extract the structure from each XML document to be classified. This extracted structure is then used as a representation model to classify the corresponding XML document. The idea behind the clustering is that if XML documents share similar structures, they are more likely to correspond to the structural part of the same query. Finally, for the experimentation purpose, we tested our algorithms on both real (ACM SIGMOD Record corpus) and synthetic data. The results clearly demonstrate the interest of our approach.  相似文献   

11.
Efficient extraction of schemas for XML documents   总被引:3,自引:0,他引:3  
In this paper, we present a technique for efficient extraction of concise and accurate schemas for XML documents. By restricting the schema form and applying some heuristic rules, we achieve the efficiency and conciseness. The result of an experiment with real-life DTDs shows that our approach attains high accuracy and is 20 to 200 times faster than existing approaches.  相似文献   

12.
XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an “outer union plan” to generate the content of an XML document. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

13.
Active XML(简记为AXML)文档在XML文档中引入嵌入式Web服务,通过调用这些服务,来获取相应的内涵信息,为AXML文档物化过程。研究了AXML文档物化的终止性检验问题,提出了多项式时间的检验算法,该算法通过构造AXML模式依赖图,检验其无环性来判定AXML文档物化终止性问题,证明了算法的正确性和有效性。  相似文献   

14.
This paper presents a schema matching method for the transformation of XML documents. The proposed method consists of two steps: computing preliminary matching relationships between leaf nodes in the two XML schemas based on proposed ontology and leaf node similarity, and extracting final matchings based on a proposed path similarity. Particularly, for a sophisticated schema matching, the proposed ontology is incrementally updated by users' feedback. Furthermore, since the ontology can describe various relationships between concepts, the proposed method can compute complex matchings as well as simple matchings. Experimental results with schemas used in various domains show that the proposed method performs better than previous methodologies, resulting in a precision of 97% and a recall of 83% on the average.  相似文献   

15.
Selective and authentic third-party distribution of XML documents   总被引:4,自引:0,他引:4  
Third-party architectures for data publishing over the Internet today are receiving growing attention, due to their scalability properties and to the ability of efficiently managing large number of subjects and great amount of data. In a third-party architecture, there is a distinction between the Owner and the Publisher of information. The Owner is the producer of information, whereas Publishers are responsible for managing (a portion of) the Owner information and for answering subject queries. A relevant issue in this architecture is how the Owner can ensure a secure and selective publishing of its data, even if the data are managed by a third-party, which can prune some of the nodes of the original document on the basis of subject queries and access control policies. An approach can be that of requiring the Publisher to be trusted with regard to the considered security properties. However, the serious drawback of this solution is that large Web-based systems cannot be easily verified to be secure and can be easily penetrated. For these reasons, we propose an alternative approach, based on the use of digital signature techniques, which does not require the Publisher to be trusted. The security properties we consider are authenticity and completeness of a query response, where completeness is intended with regard to the access control policies stated by the information Owner. In particular, we show that, by embedding in the query response one digital signature generated by the Owner and some hash values, a subject is able to locally verify the authenticity of a query response. Moreover, we present an approach that, for a wide range of queries, allows a subject to verify the completeness of query results.  相似文献   

16.
XML文档聚类在众多数据应用领域都具有重要作用。基于特征偏好的XML文档聚类算法是对XML文档进行特征选择,将XML文档描述为[n]维特征向量,再结合CFP(Clustering with Feature order Preference)算法,根据特征偏好为其赋予权重,每次迭代聚类过程中进行权重的更新。实验结果表明当CFP算法中的特征偏好权重和XML文档向量化时所用的层次权重设定相结合时,可弥补XML文档向量化时的弊端,提高了XML文档聚类的精度。  相似文献   

17.
18.
The revolution of XML is recognized as the trend of technology on the Internet to researchers as well as practitioners. Companies need to adopt XML technology. With investment in the current relational database systems, they want to develop new XML documents while running existing relational databases on production. They need to reengineer the relational databases into XML documents with constraints preservation. In the process, schema translation must be done before data conversion. Since the existing relational databases are usually normalized, they have to be reconstructed into XML document tree structures. This can be accomplished through denormalization by joining the normalized relations into tables according to their data dependencies constraints. The joined tables are mapped into DOMs, which are then integrated into XML document trees. The user specifies an XML document root with its relevant nodes to form a partitioned XML document tree to meet their requirements. The selected XML document tree is mapped into an XML schema in the form of DTD. We then load joined tables into DOMs, integrate them into a DOM, and transform it into an XML document.  相似文献   

19.
The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.  相似文献   

20.
Active XML (AXML) documents combine extensional XML data with intentional data defined through Web service calls. The dynamic properties of these documents pose challenges to both storage and data materialization techniques. In this paper, we present ARAXA, a non-intrusive approach to store and manage AXML documents. We also define a methodology to materialize AXML documents at query time. The storage approach of ARAXA is based on plain relational tables and user-defined functions of Object-Relational DBMS to trigger the service calls. By using a DBMS we benefit from efficient storage tools and query optimization. Approaches without DBMS support have to process XML in main memory or provide for virtual memory solutions. One of the main advantages of ARAXA is that AXML documents do not need to be loaded into main memory at query processing time. This is crucial when dealing with large documents. The experimental results with ARAXA prototype show that our approach is scalable and capable of dealing with large AXML documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号