首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 233 毫秒
1.
不一致关系数据库上的初始信任标记算法   总被引:1,自引:0,他引:1  
不一致数据无法正确反映现实世界客观事物的真实状态,导致其上的查询会得到错误的或矛盾的查询结果,降低了数据的利用价值.而现有的很多不一致数据查询处理方面的研究方案都存在信息丢失的问题.Annotation Based Query Answer方案针对这一问题,采用信任标签在属性级别上区分一致和不一致数据,避免了信息丢失.但同时考虑多类约束(函数依赖、健依赖、包含依赖和域约束)且任意分量都不可信时,该研究方案的不一致检测和初始标记算法失效,有一定的应用局限性.针对这一问题,采用启发式近似修复算法,在一个矛盾数据的各类可能修复操作中,通过比较其修复代价,以寻找出错概率更大的分量(或元组),以此纠正明显错误,并确定引起数据不一致的属性.实验结果表明,算法复杂度是候选修复数量的平方阶.  相似文献   

2.
提出不完备决策系统测试代价敏感属性约简问题,给出不一致对象集定义以及求解不一致对象集的算法。根据不一致对象的性质改进属性重要性定义,考虑测试代价因素以及不一致对象个数的改变量给出一个新的属性重要性的定义和属性重要性中权重的设置方法,并给出属性重要性的计算算法。在此基础上,给出一个时间复杂度为O(k|C|2|U|)和空间复杂度为O(|U|)的启发式属性约简算法,并通过理论分析、实例分析和实验分析说明该算法准确性和可行性。  相似文献   

3.
数据依赖是数据库的一个重要概念。函数依赖是一种常见的数据依赖关系,是数据语义的重要组成部分。随着XML文档的大量出现,这一概念被引入到XML的领域中。本文在约束限制范围的基础上,给出了XML函数依赖的定义。引入粗糙集解决XML数据不完整的特点,给出XML函数依赖的判定定理。并且提出了一个发现XML文档中最小非平凡函数依赖的算法。该算法基于一致集的概念,通过不可分辨关系划分元组集减少求一致集的运算次数,使用逐层求精的算法来计算最小非平凡XML函数依赖集的左部。通过该算法得到的XML函数依赖的语义信息对数据存储模式设计、查询优化和更新异常检查来说是十分重要的。  相似文献   

4.
路红  张莉  岳涛 《软件学报》2016,27(4):901-915
在大规模复杂系统产品线工程中,人工配置难免会导致配置的不一致,即,配置数据会违背预定义的约束(也可以称为一致性约束).对于大规模复杂系统产品线体系结构,比如信息物理系统产品线,往往存在成百上千的可变点以及约束,而且约束与可变点之间存在复杂的依赖关系,为不一致配置的修复带来很大的挑战.为了解决这个问题,针对前期提出的基于多目标搜索以及约束求解技术的自动不一致配置修复推荐框架(Zen-Fix),提出一种改进的IBEA算法(De IBEA).De IBEA通过将差分引入IBEA算法,搜索过程中,基于可行解和不可行解的差分变异产生后代,最终为用户推荐符合预定义约束并且对于配置效率来说最优的配置修复方案.基于一个工业案例海底油田采控系统产品线为例,通过模拟一个产品的配置过程,产生了10 189个优化问题,结果表明:Zen-Fix框架结合De IBEA算法,可以实时地为用户提供较优的不一致配置修复方案.此外,通过对这10 189个问题的推荐方案进行对比,证明了De IBEA算法无论从时间效率还是搜索性能上都优于原始的IBEA算法.  相似文献   

5.
将决策粗糙集与代价敏感学习相结合,提出了一种基于决策粗糙集的代价敏感分类方法。依据决策粗糙集理论和属性约简方法,对待预测样本分别计算最优测试属性集,使得样本在最优测试属性集上计算的分类结果具有最小误分类代价和测试代价,依此给出样本的最小总代价分类结果。针对全局最优测试属性集求解过程中计算复杂度高的问题,提出了局部最优测试属性集的启发式搜索算法。该算法以单个属性对降低总分类代价的贡献率为启发函数,搜索各样本的局部最优测试属性集,并输出在局部最优测试属性集上样本的代价敏感分类结果。在UCI数据上的实验分析显示,所提算法有效地降低了分类结果的总代价和测试属性个数,使得样本分类结果同时具有较小的误分类代价和较小的测试代价。  相似文献   

6.
吴爱华  谈子敬  汪卫 《软件学报》2012,23(5):1167-1182
不一致数据无法正确反映现实世界,其上的查询结果内含错误或矛盾,而现有的很多不一致数据查询处理相关研究都存在信息丢失的问题.AQA(annotation based query answer)针对这一问题采用信任标签在属性级别上区分一致和不一致数据,避免了信息丢失.但AQA假设记录在依赖左边属性上的分量可信,且只针对函数依赖一种约束,具有应用局限性.在综合约束(函数依赖、包含依赖和域约束)范围内、不确定属性任意的情况下扩展了AQA,重新审视了AQA的数据模型及其上的查询代数,讨论了任意约束在查询结果上的蕴含约束计算问题.实验结果表明,扩展后的AQA非连接类查询的性能和普通的SQL基夺相同,连接查询经优化后性能接近普通SQL查询,但AQA不丢失信息与部分同类研究相比有很大优势.  相似文献   

7.
文档有效性检验是XML领域的一个基本问题.Active XML(AXML)文档在XML文档中引入Web服务,传统用于解决XML文档有效性的检验方法并不适用于AXML文档,为文档有效性检验提出了新的挑战.研究了AXML文档有效性检验问题,在原始树自动机的基础上,定义了AXML模式树自动机-ASTA机,该树自动机能够有效地描述满足AXML模式约束的文档集合.基于ASTA机,提出了一种多项式时间的AXML文档有效性检验算法.实验数据表明,基于提出的算法能够有效的完成对AXML文档的有效性检验.  相似文献   

8.
潘振君  梁成  张化祥 《计算机应用》2021,41(12):3438-3446
针对多视图数据分析易受原始数据集噪声干扰,以及需要额外的步骤计算聚类结果的问题,提出一种基于一致图学习的鲁棒多视图子空间聚类(RMCGL)算法。首先,在各个视图下学习数据在子空间中的潜在鲁棒表示,并基于该表示得到各视图的相似度矩阵。随后,基于得到的多个相似度矩阵学习一个统一的相似度图。最后,通过对相似度图对应的拉普拉斯矩阵添加秩约束,确保得到的相似度图具有最优的聚类结构,并可直接得到最终的聚类结果。该过程在一个统一的优化框架中完成,能同时学习潜在鲁棒表示、相似度矩阵和一致图。RMCGL算法的聚类精度(ACC)在BBC、100leaves和MSRC数据集上比基于图的多视图聚类(GMC)算法分别提升了3.36个百分点、5.82个百分点和5.71个百分点。实验结果表明,该算法具有良好的聚类效果。  相似文献   

9.
不确定数据的查询处理是数据库领域近年来的热点研究课题.提出一种不确定数据上的范围受限的最近邻查询.给定不确定数据集D={o1,o2,…,on},范围约束R是一个简单多边形,q为一固定的查询点,范围受限的最近邻查询返回的是在数据集D中,既满足范围约束R,又能成为查询点q的最近邻的对象集合.为处理该查询,提出了范围受限的最近邻核心集的概念和范围受限的最近邻核心集的查找算法.并提出一种计算范围受限的最近邻候选集的优化方法,降低了查询代价.最后通过实验验证了该算法的有效性.  相似文献   

10.
在处理高度不平衡数据时,代价敏感随机森林算法存在自助法采样导致小类样本学习不充分、大类样本占比较大、容易削弱代价敏感机制等问题.文中通过对大类样本聚类后,多次采用弱平衡准则对每个集群进行降采样,使选择的大类样本与原训练集的小类样本融合生成多个新的不平衡数据集,用于代价敏感决策树的训练.由此提出基于聚类的弱平衡代价敏感随机森林算法,不仅使小类样本得到充分学习,同时通过降低大类样本数量,保证代价敏感机制受其影响较小.实验表明,文中算法在处理高度不平衡数据集时性能较优.  相似文献   

11.
Temporal XML: modeling, indexing, and query processing   总被引:1,自引:0,他引:1  
In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements.  相似文献   

12.
Dynamically Updating XML Data: Numbering Scheme Revisited   总被引:2,自引:0,他引:2  
Yu  Jeffrey Xu  Luo  Daofeng  Meng  Xiaofeng  Lu  Hongjun 《World Wide Web》2005,8(1):5-26
Almost all existing approaches use certain numbering scheme to encode XML elements to facilitate query processing when XML data is stored in databases. For example, under the most popular region-based numbering scheme, the starting and ending positions of an element in a document are used as the code to identify the element so that the ancestor/descendant relationship between two elements can be determined by merely examining their codes. While such numbering scheme can greatly improve query performance, renumbering large amount of elements caused by updates becomes a performance bottleneck if XML documents are frequently updated. Unfortunately, no satisfactory work has been reported for efficient update of XML data. In this paper, we first formalize the XML data update problem by defining the basic operators to support most XML update queries. We then present a new numbering scheme that not only requires minimal code-length in comparison with existing numbering schema but also improves update performance when XML data is frequently updated at arbitrary positions. The fundamental difference between our new scheme and existing ones is that, instead of maintaining the explicit codes for elements, we only store the necessary information and generate the codes when they are needed in query processing. In addition to present the basic scheme, we also discuss some optimization techniques to further reduce the update cost. Results of a comprehensive performance study are provided to show the advantages of the new scheme.  相似文献   

13.
XML document may contain inconsistencies that violate predefined integrity constraints, which causes the data inconsistency problem. In this paper, we consider how to get the consistent data from an inconsistent XML document. There are two basic concepts for this problem: Repair is the data consistent with the integrity constraints, and also minimally differs from the original one. Consistent data is the data common for every possible repair. First we give a general constraint model for XML, which can express the commonly discussed integrity constraints, including functional dependencies, keys and multivalued dependencies. Next we provide a repair framework for inconsistent XML document with three basic update operations: node insertion, node deletion and node value modification. Following this approach, we introduce the concept of repair for inconsistent XML document, discuss the chase method to generate repairs, and prove some important properties of the chase. Finally we give a method to obtain the greatest lower bound of all possible repairs, which is sufficient for consistent data. We also implement prototypes of our method, and evaluate our framework and algorithms in the experiment.  相似文献   

14.
因特网的不断发展使得XML成为Web上数据交换和表示的标准格式,但是大量的商业数据仍然存储在关系数据库中。因此必须将关系数据发布成XML文档进行传输。提出了一种基于分层框架结构的关系数据库向XML的映射方法,并在分层结构中定义了一种XML模式图作为XML的概念模型。得到的XML文档能够很好地反映关系数据库的语义和各种约束并且没有引入数据冗余。初步实验结果表明方法具有较高的效率和较好的准确性。  相似文献   

15.
As XML becomes increasingly popular, XML schema design has become an increasingly important issue. One of the central objectives of good schema design is to avoid data redundancies: redundantly stored information can lead not just only to a higher data storage cost but also to increased costs for data transfer and data manipulation. Furthermore, such data redundancies can lead to potential update anomalies, rendering the database inconsistent. One strategy to avoid data redundancies is to design redundancy-free schema from the start on the basis of known functional dependencies. We observe that XML databases are often “casually designed” and XML FDs may not be determined in advance. Under such circumstances, discovering XML data redundancies from the data itself becomes necessary and is an integral part of the schema refinement (or re-design) process. We present the design and implementation of the first system, DiscoverXFD, for efficient discovery of XML data redundancies. It employs a novel XML data structure and introduces a new class of partition-based algorithms. The XML data redundancies are defined on the basis of a new notion of XML functional dependency (XML FD) that (1) extends previous notions by incorporating set elements into the XML FD specification, and (2) maintains tuple-based semantics through the novel concept of Generalized Tree Tuple (GTT). Using this comprehensive XML FD notion, we introduce a new normal form (GTT-XNF) for XML documents, and provide comprehensive comparisons with previous studies. Given the set of data redundancies (in the form of redundancy-indicating XML FDs) discovered by DiscoverXFD, we describe a normalization algorithm for converting any original XML schema into one in GTT-XNF.  相似文献   

16.
一个基于XML的Web信息源集成方案   总被引:30,自引:3,他引:27  
XML是Internet上数据表示和数据交换的新标准,它允许开发人员定义各种标记来描述文档中的数据元素,用简单的嵌套和引用来表示元素间的关系。 XML能描述各种各样的数据,能将多个不同来源的数据集成在一个 XML文档中。以 XML作为集成层的数据描述工具和转换工具,建立具有多数据源集成能力的中间件,不仅能适合Web发展的需要,还将大大简化 Web数据源集成系统的实现。  相似文献   

17.
《IT Professional》2001,3(2):37-40
Schemas add data typing and inheritance features, giving XML the sophistication required to create enterprise-class business applications. The XML schema language describes the legal structure, content, and constraints of XML documents. The XML schema language provides the necessary framework for creating XML documents by specifying the valid structure, constraints, and data types for the various elements and attributes of an XML document. Schema language provides enhanced as well as more comprehensive and powerful features than a document type definition (DTD). The XML schema language provides the rich data typing associated with ordinary programming languages. The W3C XML schema specification defines several different built-in data types, such as string, integer, Boolean, date, and time, among others. The specification also provides the capability for defining new types. Developers can use these built-in as well as user-defined data types to effectively define and constrain XML document attributes and element values  相似文献   

18.
XML has become the standard for publishing and exchanging data on the Web. However, most business data is managed and will remain to be managed by relational database management systems. As such, there is an increasing need to efficiently and accurately publish relational data as XML documents for Internet-based applications. One way to publish relational data is to provide virtual XML documents for relational data via an XML schema which is transformed from the underlying relational database schema such that users can access the relational database through the XML schema. In this paper, we discuss issues in transforming a relational database schema into the corresponding XML schema. We aim to preserve all integrity constraints defined in a relational database schema, to achieve high level of nesting and to avoid introducing data redundancy in the transformed XML schema. In the paper, we first propose a basic transformation algorithm which introduces no data redundancy, then we improve the algorithm by exploring further nesting of the transformed XML schema.  相似文献   

19.
XML is becoming a prevalent format and standard for data exchange in many applications. With the increase of XML data, there is an urgent need to research some efficient methods to store and manage XML data. As relational databases are the primary choices for this purpose considering their data management power, it is necessary to research the problem of mapping XML schemas to relational schemas. The semantics of XML schemas are crucial to design, query, and store XML documents and functional dependencies are very important representations of semantic information of XML schemas. As DTDs are one of the most frequently used schemas for XML documents in these days, we will use DTDs as schemas of XML documents here. This paper proposes the concept and the formal definition of XML functional dependencies over DTDs. A method to map XML DTDs to relational schemas with constraints such as functional dependencies, domain constraints, choice constraints, reference constraints, and cardinality constraints over DTDs is given, which can preserve the structures of DTDs as well as the semantics implied by the above constraints over DTDs. The concepts and method of mapping DTDs to relational schemas presented in the paper can be extended to the field of XML Schema just with some modifications in related formal definitions.  相似文献   

20.
Static validation of WS-CDL documents   总被引:1,自引:0,他引:1  
This paper presents an approach to validate WS-CDL documents statically. To deal with those constraints appeared in CDL documents, which cannot be captured by its meta model (XML Schema) totally, we design a machine-based constraint language based on B Method, which uses abstract machines to represent the constraints of the relations among XML nodes. After modelling the constraints, the corresponding checking algorithm can be designed to implement the static validation of CDL documents. Meanwhile, the checking algorithm is integrated into the WS-CDL editor plug-in for Eclipse project. The case studies show that our approach is effective for real examples in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号