首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
网络表格的扩展是根据已知信息扩展与主列相关的其他属性列,以满足人们通过表格获取感兴趣信息的需求。目前的研究工作主要针对由主列和待扩展列组成的实体-属性二元表,并将主列视为其他属性列扩展的唯一依据,但该技术运用到具有多个待扩展列的网络表格时,由多个二元表拼接而成的结果表很容易出现实体不一致现象。综合考虑各属性列间以及元组行间的关系,提出一致性支持度概念,设计并实现了基于列重合度的表格一致性扩展系统CCA,其既能保证候选值的高匹配分数,又能使结果表中填值所使用的数据源表数目最小化,有效地避免了实体不一致问题。实验表明,与现有方法相比CCA系统有更高的精确度、覆盖率、一致性,以及更低的查询时间代价。  相似文献   

2.
针对多个实体集关系,提出了链接属性及实体链有关概念,研究发现了具有相同或相似链接属性的实体链计算方法.多关系之间的实体链计算是近似连接查询的关键,该研究通过分析链接属性相似度,解决多关系之间数据冲突问题,设计了2-实体链和k-实体链计算算法,并运用扩展的SQL查询语言实现实体链计算的主要过程.实体链能够应用于多关系高效查询及动态查询,而且可以获得较高的查询准确率.  相似文献   

3.
孙伟娟  王宁 《计算机工程》2019,45(4):181-188
现有的实体扩展技术返回单一结果,且只适用于扩展单个属性列,对于多属性列的实体扩展易产生实体不一致的问题。为此,提出2种实体top-k扩展算法。根据答案表之间的一致性匹配度,在众多网络表格中找到k个具有最高一致性支持度的答案表集合,以补充待扩展实体的缺失信息。实验结果表明,2种算法能够较好地实现实体的top-k扩展,并保持扩展结果的高一致性和高准确度。基于一致性匹配度的实体top-k扩展算法具有较高的多样性,而基于分支限界的实体top-k扩展算法在可信度方面有更好的表现。  相似文献   

4.
网络技术的发展产生了大量的网络用户,他们之间潜藏的社会关系越来越多地引起了人们的注意,大量的社交网络发现算法已被提出。但是,以前的研究多建立在关系数据可直接获取的基础之上。实际上,网络数据多以用户个体行为形式存在,数据实时变化。基于用户使用网络的行为日志分析,提出基于时空数据分析模型的社会关系发现算法,算法主要包括实际分析和关系发现两个步骤。通过实验表明,本算法能很好地发现用户行为中潜藏的社会关系。  相似文献   

5.
传统的实体关系抽取方法主要针对语义信息较为完整的文本,基于抽取模式抽取文本中的实体关系,并采用启发式算法或者概率模型来选择抽取出的候选关系.而对于半结构化的页面,由于没有成句的实体信息展示,导致这些方法不能很好适用.论文提出的实体关系抽取系统能较好地处理半结构化的页面.该系统主要包括数据抽取规则学习、数据抽取、实体间关系计算等核心功能模块,并为用户提供了关系库查询接口.用户输入关键词和选定匹配类型,系统将根据关键词及匹配类型查询实体信息库,然后用满足条件的实体再去查询实体关系库,将包含这些实体的关系返回给用户.  相似文献   

6.
随着大数据应用的不断深入,对大规模结构化/非结构化数据进行融合管理和分析的需求日益凸显.然而,结构化/非结构化数据在存储管理方式、信息获取方式、检索方式方面的差异给融合管理和分析带来了技术挑战.本文提出了适用于异构数据融合管理和语义计算的属性图扩展模型,并定义了相关属性操作符和查询语法.接着,基于智能属性图模型提出异构数据智能融合管理系统PandaDB,并详细介绍了PandaDB的总体架构、存储机制、查询机制、属性协存和AI算法集成机制.性能测试和应用案例证明,PandaDB的协存机制、分布式架构和语义索引机制对大规模异构数据的即席查询和分析具有较好的性能表现,该系统可实际应用于学术图谱实体消歧与可视化等融合数据管理场景.  相似文献   

7.
洪立印  徐蔚然 《软件》2013,(12):148-151
WAF(词激活力)是一种基于统计的描述词与词关系的算法,WAF不单纯是考虑的词之间的关联,还考虑了词前后顺序,词与词之间的距离,包含了概率和语言规则两种信息量。本文提出一种实体结构化数据的关系特征抽取算法,并基于该特征实现实体聚类。首先提取出实体结构化数据的语义和语境特征,以此来文本建模,然后对每个属性基于WAF值进行相似度计算,最后进行实体聚类。  相似文献   

8.
《软件工程师》2019,(10):1-6
互联网中的HTML表格蕴含着丰富的结构化或半结构化知识,是知识库构建与扩充的重要数据资源。然而如何对HTML表格进行正确解析并获得三元组知识用于扩充知识库,则是一个很有挑战的问题。首先,HTML表格的结构各有不同。其次,表格与知识库中的实体和属性的表示不同,需要统一,即实体链接与属性对齐。本文首先提出了一个基于知识库的在线百科表格解析与知识融合框架,该框架可针对不同类别的表格进行知识抽取;并提出了基于知识库的表格实体链接和属性对齐方法,用以将表格中的知识与知识库进行匹配与融合。实验使用了126万在线百科表格数据为CN-DBpedia扩充约1000万三元组。  相似文献   

9.
通过以关系名的同义关键字作为模式信息的索引键以及垂直分区关系元组,设计了用结构化重叠网络索引模式和数据的方法.基于这两级索引,提出了支持多属性复杂查询的算法.定性分析和比较表明,该方法比相关工作更接近P2P数据管理的理想目标.  相似文献   

10.
姚全珠  余训滨 《计算机应用》2012,32(4):1090-1093
针对目前XML关键字查询结果中包含了许多无意义的节点的问题,提出了一种语义相关的查询算法。由于XML文档具有半结构化和自描述的特点,通过充分利用节点间的语义相关性,提出了最小最低实体子树(SLEST)的概念,在这个概念中,关键字之间仅存在物理连接关系;为了捕获关键字之间的IDREF引用关系,提出基于最小相关实体子树(SIEST)的算法,并利用最小最低实体子树和最小相关实体子树代替最小最低公共祖先(SLCA)作为查询结果。实验结果表明,提出的算法能有效提高XML关键字查询结果的查准率。  相似文献   

11.
Wide-column NoSQL databases are an important class of NoSQL (Not only SQL) databases which scale horizontally and feature high access performance on sparse tables. With current trends towards big Data Warehouses (DWs), it is attractive to run existing business intelligence/data warehousing applications on higher volumes of data in wide-column NoSQL databases for low latency by mapping multidimensional models to wide-column NoSQL models or using additional SQL add-ons. For examples, applications like retail management can run over integrated data sets stored in big DWs or in the cloud to capture current item-selling trends. Many of these systems also employ Snapshot Isolation (SI) as a concurrency control mechanism to achieve high throughput for read-heavy workloads. SI works well in a DW environment, as analytical queries can now work on (consistent) snapshots and are not impacted by concurrent update jobs performed by online incremental Extract-Transform-Load (ETL) flows that refresh fact/dimension tables. However, the snapshot made available in the DW is often stale, since at the moment when an analytical query is issued, the source updates (e.g. in a remote retail store) may not have been extracted and processed by the ETL process in time due to high input data volume or slow processing speed. This staleness may cause incorrect results for time-critical decision support queries. To address this problem, snapshots which are supposed to be accessed by analytical queries need to be first maintained by corresponding ETL flows to reflect source updates based on given freshness needs. Snapshot maintenance in this work means maintaining the distributed data partitions that are required by a query. Since most NoSQL databases are not ACID compliant and do not provide full-fledged distributed transaction support, snapshot may be inconsistently derived when its data partitions are updated by different ETL maintenance jobs.This paper describes an extended version of HBelt system [1] which tightly integrates the wide-column NoSQL database HBase with a clustered & pipelined ETL engine. Our objective is to efficiently refresh HBase tables with remote source updates while a consistent snapshot is guaranteed across distributed partitions for each scan request in analytical queries. A consistency model is defined and implemented to address so-called distributed snapshot maintenance. To achieve this, ETL jobs and analytical queries are scheduled in a distributed processing environment. In addition, a partitioned, incremental ETL pipeline is introduced to increase the performance of ETL (update) jobs. We validate the efficiency gain in terms of data pipelining and data partitioning using the TPC-DS benchmark, which simulates a modern decision support system for a retail product supplier. Experimental results show that high query throughput can be achieved in HBelt when distributed, refreshed snapshots are demanded.  相似文献   

12.
Dynamic finite versioning (DFV) schemes are an effective approach to concurrent transaction and query processing, where a finite number of consistent, but maybe slightly out-of-date, logical snapshots of the database can be dynamically derived for query access. In DFV, the storage overhead for keeping additional versions of changed data to support the logical snapshots and the amount of obsolescence faced by queries are two major performance issues. We analyze the performance of DFV, with emphasis on the trade-offs between the storage cost and obsolescence. We develop analytical models based on a renewal process approximation to evaluate the performance of DFV using M⩾2 snapshots. Asymptotic closed form results for high query arrival rates are given for the case of two snapshots. Simulation is used to validate the analytical models and to evaluate the tradeoffs between various strategies for advancing snapshots when M>2. The results show that (1) the analytical models match closely with simulation; (2) storage cost and obsolescence are sensitive to the snapshot advancing strategies, and (3) usually, increasing the number of snapshots demonstrates a trade-off between storage overhead and query obsolescence. For cases with skewed accessor low update rates, a small increase in the number of snapshots beyond two can substantially reduce the obsolescence. Such a reduction in obsolescence is more significant as the coefficient of variation of the query length distribution becomes larger. Moreover, for very low update rates, a large number of snapshots can be used to reduce the obsolescence to almost zero without increasing the storage overhead  相似文献   

13.
与镜像技术相比,快照具有备份和恢复窗口短、性能损失小、容量利用率高等优点,更适合保护因人为失误等软故障造成的数据损失。提出了一种应用于卷管理的快照技术的设计方案。它可以高效地管理一个数据源的多个存储快照,并提供快照的读写、创建、回收等功能。另外针对快照技术物理容错性差的缺点,还提出了几种快照技术和镜像技术相结合的应用方案,较好地解决了存储网络软、硬故障时数据的保护。  相似文献   

14.
Sullivan  J. 《Computer》1997,30(6)
Does your relational database speak SQL fluently? It's easy to find out, because the SQL (Structured Query Language) Test Suite is now free on the Web. SQL is the standard that lets DBMS products from different vendors interoperate. It defines common data structures (tables, columns, views, and so on) and provides a data manipulation language to populate, update, and query those structures. Accessing structured data with SQL is quite different from searching the full text of documents on the Web. Structured data in the relational model means data that can be represented in tables. Each row represents a different item, and the columns represent various attributes of the item. Columns have names and integrity constraints that specify valid values. Because the column values are named and represented in a consistent format, you can select rows precisely, on the basis of their contents. This is especially helpful in dealing with numeric data. You can also join data from different tables on the basis of matching column values. It is possible to do useful types of analysis too, listing items that are in one table and are missing, present, or have specific attributes in a related table. You can extract from a large table precisely those rows of interest, regroup them, and generate simple statistics on them  相似文献   

15.
简述了oracle10g中闪回查询的类型,通过列举实例说明了闪回版本查询和闪回事物查询的应用。闪回查询是最基本的闪回功能,直接利用回滚段中的旧数据构造某一刻的一致性数据版本。由于该查询只适合单个表数据恢复,所以对事务中相关的多表数据恢复不适合,无法确保相关数据的参照完整性。  相似文献   

16.
Much of the world’s quantitative data reside in scattered web tables. For a meaningful role in Big Data analytics, the facts reported in these tables must be brought into a uniform framework. Based on a formalization of header-indexed tables, we proffer an algorithmic solution to end-to-end table processing for a large class of human-readable tables. The proposed algorithms transform header-indexed tables to a category table format that maps easily to a variety of industry-standard data stores for query processing. The algorithms segment table regions based on the unique indexing of the data region by header paths, classify table cells, and factor header category structures of two-dimensional as well as the less common multidimensional tables. Experimental evaluations substantiate the algorithmic approach to processing heterogeneous tables. As demonstrable results, the algorithms generate queryable relational database tables and semantic-web triple stores. Application of our algorithms to 400 web tables randomly selected from diverse sources shows that the algorithmic solution automates end-to-end table processing.  相似文献   

17.
Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, should be able to search and query data spread over potentially large numbers of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipedia-based semantic relatedness measure and spreading activation. Wikipedia-based semantic relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBpedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%.  相似文献   

18.
Data exchange is the problem of transforming data that is structured under a source schema into data structured under another schema, called the target schema, so that both the source and target data satisfy the relationship between the schemas. Many applications such as planning, scheduling, medical and fraud detection systems, require data exchange in the context of temporal data. Even though the formal framework of data exchange for relational database systems is well-established, it does not immediately carry over to the settings of temporal data, which necessitates reasoning over unbounded periods of time.In this work, we study data exchange for temporal data. We first motivate the need for two views of temporal data: the concrete view, which depicts how temporal data is compactly represented and on which the implementations are based, and the abstract view, which defines the semantics of temporal data as a sequence of snapshots. We first extend the chase procedure for the abstract view to have a conceptual basis for the data exchange for temporal databases. Considering non-temporal source-to-target tuple generating dependencies and equality generating dependencies, the chase algorithm can be applied on each snapshot independently. Then we define a chase procedure (called c-chase) on concrete instances and show the result of c-chase on a concrete instance is semantically aligned with the result of chase on the corresponding abstract instance. In order to interpret intervals as constants while checking if a dependency or a query is satisfied by a concrete database, we will normalize the instance with respect to the dependency or the query. To obtain the semantic alignment, the nulls (which are introduced by data exchange and model incompleteness) in the concrete view are annotated with temporal information. Furthermore, we show that the result of the concrete chase provides a foundation for query answering. We define naïve evaluation on the result of the c-chase and show it produces certain answers.  相似文献   

19.
建立临时关系的目的是使子表的记录指针随父表的记录指针的移动而移动,从而达到同时浏览多个表中数据的目的,本文首先简单介绍了表的关联、数据工作期和临时关系等基本概念,然后通过举例重点介绍了运用数据工作期如何建立表间关联、如何实施查询的步骤和方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号