首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Collaborative databases such as genome databases, often involve extensive curation activities where collaborators need to interact to be able to converge and agree on the content of data. In a typical scenario, a member of the collaboration makes some updates and these become visible to all collaborators for possible comments and modifications. At the same time, these updates are usually pending the approval or rejection from the data custodian based on the related discussion and the content of the data. Unfortunately, the approval and authorization of updates in current databases is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. In this paper, we present a scalable cloud-based collaborative database system to support collaboration and data curation scenarios. Our system is based on an Update Pending Approval model. In a nutshell, when a collaborator updates a given data item, it is marked as pending approval until the data custodian approves or rejects the update. Until then, any other collaborator can view and comment on the data, pending its approval. We fully realized our system inside HBase, a cloud-based platform. We also conducted extensive experiments showing that the system scales well under different workloads.  相似文献   

2.
Malicious users can exploit the correlation among data to infer sensitive information from a series of seemingly innocuous data accesses. Thus, we develop an inference violation detection system to protect sensitive data content. Based on data dependency, database schema and semantic knowledge, we constructed a semantic inference model (SIM) that represents the possible inference channels from any attribute to the pre-assigned sensitive attributes. The SIM is then instantiated to a semantic inference graph (SIG) for query-time inference violation detection. For a single user case, when a user poses a query, the detection system will examine his/her past query log and calculate the probability of inferring sensitive information. The query request will be denied if the inference probability exceeds the prespecified threshold. For multi-user cases, the users may share their query answers to increase the inference probability. Therefore, we develop a model to evaluate collaborative inference based on the query sequences of collaborators and their task-sensitive collaboration levels. Experimental studies reveal that information authoritativeness, communication fidelity and honesty in collaboration are three key factors that affect the level of achievable collaboration. An example is given to illustrate the use of the proposed technique to prevent multiple collaborative users from deriving sensitive information via inference.  相似文献   

3.
In recent times, large high-dimensional datasets have become ubiquitous. Video and image repositories, financial, and sensor data are just a few examples of such datasets in practice. Many applications that use such datasets require the retrieval of data items similar to a given query item, or the nearest neighbors (NN or $k$ -NN) of a given item. Another common query is the retrieval of multiple sets of nearest neighbors, i.e., multi $k$ -NN, for different query items on the same data. With commodity multi-core CPUs becoming more and more widespread at lower costs, developing parallel algorithms for these search problems has become increasingly important. While the core nearest neighbor search problem is relatively easy to parallelize, it is challenging to tune it for optimality. This is due to the fact that the various performance-specific algorithmic parameters, or “tuning knobs”, are inter-related and also depend on the data and query workloads. In this paper, we present (1) a detailed study of the various tuning knobs and their contributions on increasing the query throughput for parallelized versions of the two most common classes of high-dimensional multi-NN search algorithms: linear scan and tree traversal, and (2) an offline auto-tuner for setting these knobs by iteratively measuring actual query execution times for a given workload and dataset. We show experimentally that our auto-tuner reaches near-optimal performance and significantly outperforms un-tuned versions of parallel multi-NN algorithms for real video repository data on a variety of multi-core platforms.  相似文献   

4.
易佳  薛晨  王树鹏 《计算机科学》2017,44(5):172-177
分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。  相似文献   

5.
6.
A Collaborative Virtual Environment or CVE is a distributed, virtual reality that is designed to support collaborative activities. As such, CVEs provide a potentially infinite, graphically realised digital landscape within which multiple users can interact with each other and with simple or complex data representations. CVEs are increasingly being used to support collaborative work between geographically separated and between collocated collaborators. CVEs vary in the sophistication of the data and embodiment representations employed and in the level of interactivity supported. It is clear that systems which are intended to support collaborative activities should be designed with explicit consideration of the tasks to be achieved and the intended users' social and cognitive characteristics. In this paper, we detail a number of existing systems and applications, but first discuss the nature of collaborative and cooperative work activities and consider the place of virtual reality systems in supporting such collaborative work. Following this, we discuss some future research directions.  相似文献   

7.
Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for products or services during a live interaction. These systems, especially collaborative filtering based on user, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the kinds of commodity to Web sites in recent years poses some key challenges for recommender systems. One of these challenges is ability of recommender systems to be adaptive to environment where users have many completely different interests or items have completely different content (We called it as Multiple interests and Multiple-content problem). Unfortunately, the traditional collaborative filtering systems can not make accurate recommendation for the two cases because the predicted item for active user is not consist with the common interests of his neighbor users. To address this issue we have explored a hybrid collaborative filtering method, collaborative filtering based on item and user techniques, by combining collaborative filtering based on item and collaborative filtering based on user together. Collaborative filtering based on item and user analyze the user-item matrix to identify similarity of target item to other items, generate similar items of target item, and determine neighbor users of active user for target item according to similarity of other users to active user based on similar items of target item.In this paper we firstly analyze limitation of collaborative filtering based on user and collaborative filtering based on item algorithms respectively and emphatically make explain why collaborative filtering based on user is not adaptive to Multiple-interests and Multiple-content recommendation. Based on analysis, we present collaborative filtering based on item and user for Multiple-interests and Multiple-content recommendation. Finally, we experimentally evaluate the results and compare them with collaborative filtering based on user and collaborative filtering based on item, respectively. The experiments suggest that collaborative filtering based on item and user provide better recommendation quality than collaborative filtering based on user and collaborative filtering based on item dramatically.  相似文献   

8.
基于PHP的课件查询系统的设计与实现   总被引:4,自引:1,他引:3  
针对不同学生对课件的需求不同,介绍了一种交互式课件查询系统的设计与实现。详细描述了PHP,MySQL,Apache Web Server在Linux平台上组建该系统的优越性,以及采用这种组合进行系统构建时用户的身份验证和查询结果的分页显示程序。  相似文献   

9.
异构CAD系统协同框架与互操作   总被引:1,自引:0,他引:1  
协同CAD通过网络支持不同地域的设计人员实时地完成产品设计或决策任务,以提高群体协同工作效率;异构CAD系统协同将分布的不同的CAD系统,如ProE、UG、SolidWork等业界比较成熟且是主流的商业CAD系统,集成在一个框架平台下,从而支持人们更方便地协同工作;文章讨论了异构CAD系统协同的体系结构与共享感知问题,提出了一个支持共享感知的复制式系统结构模型;进一步将现有商业CAD系统引入到该模型中,给出了一个基于构件的协同CAD中间件框架模型,讨论了中间件接口和CAD系统的互操作。  相似文献   

10.
混合事务与分析处理数据库系统(HTAP)因其在一套系统上可以同时处理混合负载而逐渐获得大众认可. 为了不影响在线事务处理(OLTP)业务的写入性能, HTAP数据库系统往往会通过维护数据多版本或额外副本的方式来支持在线分析处理(OLAP)任务, 从而引入了TP/AP端版本的数据一致性问题. 同时, HTAP数据库系统面临资源隔离下实现高效数据共享的核心挑战, 且数据共享模型的设计综合权衡了业务对性能和数据新鲜度之间的要求. 因此, 为了系统地阐释现有HTAP数据库系统数据共享模型及优化策略, 首先根据TP生成版本与AP查询版本的差异, 通过一致性模型定义数据共享模型, 将HTAP数据共享的一致性模型分为3类, 分别为线性一致性, 顺序一致性与会话一致性. 然后, 梳理数据共享模型的全流程, 即从数据版本标识号分配, 数据版本同步, 数据版本追踪3个核心问题出发, 给出不同一致性模型的实现方法. 进一步, 以典型的HTAP数据库系统为例对具体实现进行深入的阐释. 最后, 针对数据共享过程中涉及的版本同步、追踪、回收等模块的优化策略进行归纳和分析, 并展望数据共享模型的优化方向, 指出数据同步范围自适应, 数据同步周期自调优和顺序一致性的新鲜度阈值约束控制是提高HTAP数据库系统性能和新鲜度的可能手段.  相似文献   

11.
Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results.  相似文献   

12.
Nowadays, spatial and temporal data play an important role in social networks. These data are distributed and dispersed in several heterogeneous data sources. These peculiarities make that geographic information retrieval being a non-trivial task, considering that the spatial data are often unstructured and built by different collaborative communities from social networks. The problem arises when user queries are performed with different levels of semantic granularity. This fact is very typical in social communities, where users have different levels of expertise. In this paper, a novelty approach based on three matching-query layers driven by ontologies on the heterogeneous data sources is presented. A technique of query contextualization is proposed for addressing to available heterogeneous data sources including social networks. It consists of contextualizing a query in which whether a data source does not contain a relevant result, other sources either provide an answer or in the best case, each one adds a relevant answer to the set of results. This approach is a collaborative learning system based on experience level of users in different domains. The retrieval process is achieved from three domains: temporal, geographical and social, which are involved in the user-content context. The work is oriented towards defining a GIScience collaborative learning for geographic information retrieval, using social networks, web and geodatabases.  相似文献   

13.
The LaCOLLA middleware makes it possible for collaborators to interact using their own resources without depending on centralized regimes. By contributing their own resources, group members can organize and communicate using a federated peer-to-peer model. This lets the group function regardless of whether a member removes resources and despite network or node failures or disconnection. In turn, this capacity for self-organization, together with location transparency, lets application developers create self-sufficient applications for collaborative activity  相似文献   

14.
15.
Online video has become established as a fundamental part of the fabric of the web; widely used by people for information sharing, learning and entertainment. We report results from a design study that explored how people interact to create shared multi-path video representations in a social video environment. The participants created multiple versions of a video by providing alternative and interchangeable scenes that formed different paths through the video content. This multi-path video approach was designed to circumvent limitations of traditionally linear video for use as a shared representation in collaborative knowledge building activities. The article describes how people created video resources in collaborative activities in two different settings. We discuss different modes of working that were observed and outline the specific challenges of using the video medium as shared representation. Finally we demonstrate how an analysis of collaborative dimensions of the shared multi-path video representation can be applied to discuss the design space and to raise the discourse about the usefulness of these representations in knowledge building environments.  相似文献   

16.
时序数据库中日志结构合并树(LSM-tree)在高写入负载或资源受限情况下的不及时的文件合并会导致LSM的C0层数据大量堆积,从而造成近期写入数据的即席查询延迟增加。针对上述问题,提出了一种在保持面向大块数据的高效查询的基础上实现对最新写入的时序数据的低延迟查询的两阶段LSM合并框架。首先将文件的合并过程分为少量乱序文件快速合并与大量小文件合并这两个阶段,然后在每个阶段内提供多种文件合并策略,最后根据系统的查询负载进行两阶段合并的资源分配。通过在时序数据库Apache IoTDB上分别实现传统的LSM合并策略以及两阶段LSM合并框架和测试,结果表明与传统的LSM相比,两阶段的文件合并模块在提升策略灵活性的情况下使即席查询读盘次数大大降低,并且使历史数据分析查询性能提升了约20%。实验结果表明,两阶段的LSM合并框架能够提高近期写入数据的即席查询效率,提高历史数据分析查询性能,而且提升合并策略的灵活性。  相似文献   

17.
针对目前国内族谱系统中数据共享度不高、扩展性不好、编录效率较低等问题,提出并实现了一种基于浏览器/服务器(B/S)架构和图数据库的在线族谱编录系统.首先,该系统采用B/S架构,支持多人在线协同录入,提高了数据录入效率;其次,系统使用数据库存储数据,便于集中管理和统计检索,提高了数据的共享程度;然后,考虑到族谱数据具有图的结构特性,在系统中采用图数据库进行管理,大大提高了数据处理效率;最后,使用真实族谱数据进行了系统的效率对比,验证了系统的有效性.在实验中,使用了约20万人的刘氏族谱数据,对关系数据库PostgreSQL和图数据库Neo4j管理数据进行了存储和查询的效率对比.实验结果表明,Neo4j比PostgreSQL节省约50%的存储空间,而在人物后代查询、人物祖先查询、人物亲缘关系查询以及人物后代性别统计4种常见查询中,使用Neo4j的平均响应时间约为基于PostgreSQL数据库的20%、80%、16%和15%.由此可知,基于图数据库的在线族谱编录系统可用于高效处理大量族谱数据,并且支持多用户在线协同编录.  相似文献   

18.
鲍蓉 《计算机工程》2009,35(2):39-41
针对传统数据仓库系统中多维模式进化历史的挥发性问题,提出用版本元数据来记录数据仓库进化过程中的每一种多维模式状态,给出版本元数据结构,设计了跨版本透明查询系统及相应的查询算法。查询分解算法将用户基于一种模式结构提出的查询请求分解为在各个数据仓库版本上计算的子查询,集成算法将子查询结果进行必要的汇总和转换。  相似文献   

19.
基于语义缓存的移动查询导出   总被引:19,自引:2,他引:19  
吴婷婷  周兴铭 《计算机学报》2002,25(10):1104-1110
在移动环境下,客户缓存为提高客户-服务器数据库系统的整体性能,特别是保证客户端数据可用性提供了有效途径,该文针对如何从基于语义描述的缓存中导出当前查询(部分)结果的问题,研究了查询从缓存导出的充分条件,并在定义查询与缓存之间的精确匹配,包含匹配和相互匹配几种情况的基础上,给出缓存与查询,包含与相交匹配的判断条件和相应的算法,基于该文的研究,查询可以充分利用本地语义缓存的内容,从而降低网络开销,加快响应时间,并支持移动客户断接时的数据访问。  相似文献   

20.
This work presents an evolutionary multi-agent system applied to the query optimization phase of Relational Database Management Systems (RDBMS) in a non-distributed environment. The query optimization phase deals with a known problem called query join ordering, which has a direct impact on the performance of such systems. The proposed optimizer was programmed in the optimization core of the H2 Database Engine. The experimental section was designed according to a factorial design of fixed effects and the analysis based on the Permutations Test for an Analysis of Variance Design. The evaluation methodology is based on synthetic benchmarks and the tests are divided into three different experiments: calibration of the algorithm, validation with an exhaustive method and a general comparison with different database systems, namely Apache Derby, HSQLDB and PostgreSQL. The results show that the proposed evolutionary multi-agent system was able to generate solutions associated with lower cost plans and faster execution times in the majority of the cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号