首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Personalisation and recommender systems in digital libraries   总被引:2,自引:0,他引:2  
Widespread use of the Internet has resulted in digital libraries that are increasingly used by diverse communities of users for diverse purposes and in which sharing and collaboration have become important social elements. As such libraries become commonplace, as their contents and services become more varied, and as their patrons become more experienced with computer technology, users will expect more sophisticated services from these libraries. A simple search function, normally an integral part of any digital library, increasingly leads to user frustration as user needs become more complex and as the volume of managed information increases. Proactive digital libraries, where the library evolves from being passive and untailored, are seen as offering great potential for addressing and overcoming these issues and include techniques such as personalisation and recommender systems. In this paper, following on from the DELOS/NSF Working Group on Personalisation and Recommender Systems for Digital Libraries, which met and reported during 2003, we present some background material on the scope of personalisation and recommender systems in digital libraries. We then outline the working group’s vision for the evolution of digital libraries and the role that personalisation and recommender systems will play, and we present a series of research challenges and specific recommendations and research priorities for the field.  相似文献   

2.
Many bottlenecks in drug discovery have been addressed with the advent of new assay and instrument technologies. However, storing and processing chemical compounds for screening remains a challenge for many drug discovery laboratories. Although automated storage and retrieval systems are commercially available for medium to large collections of chemical samples, these samples are usually stored at a central site and are not readily accessible to satellite research labs.Drug discovery relies on the rapid testing of new chemical compounds in relevant biological assays. Therefore, newly synthesized compounds must be readily available in various formats to biologists performing screening assays. Until recently, our compounds were distributed in screw cap vials to assayists who would then manually transfer and dilute each sample in an “assay-ready” compound plate for screening. The vials would then be managed by the individuals in an ad hoc manner.To relieve the assayist from searching for compounds and preparing their own assay-ready compound plates, a newly customized compound storage system with an ordering software application was implemented at our research facility that eliminates these bottlenecks. The system stores and retrieves compounds in 1 mL-mini-tubes or microtiter plates, facilitates compound searching by identifier or structure, orders compounds at varying concentrations in specified wells on 96- or 384-well plates, requests the addition of controls (vehicle or reference compounds), etc. The orders are automatically processed and delivered to the assayist the following day for screening. An overview of our system will demonstrate that we minimize compound waste and ensure compound integrity and availability.  相似文献   

3.
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4–78.0%, 4.7–73.8%, and 214–10,543, respectively, compared to those of 62–95%, 0.65–35%, and 20–1200 by structure-based VS and 55–81%, 0.2–0.7%, and 110–795 by other ligand-based VS tools in screening libraries of ≥1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3–87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.  相似文献   

4.
Scalable storage architectures enable digital libraries and archives for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices. Past work in this area have mainly focused on statically scaling homogeneous storage devices. However, heterogeneous devices are quickly being adopted for storage scaling since they are usually faster, larger, more widely available, and more cost-effective. We propose BroadScale, an algorithm based on Random Disk Labeling, to dynamically scale heterogeneous storage systems by distributing data objects according to their device weights. Assuming a random placement of objects across a group of heterogeneous storage devices, our optimization objectives when scaling are to ensure a uniform distribution of objects, redistribute a minimum number of objects, and maintain fast data access with low computational complexity. We show through experimentation that BroadScale achieves these requirements when scaling heterogeneous storage.  相似文献   

5.
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4–78.0%, 4.7–73.8%, and 214–10,543, respectively, compared to those of 62–95%, 0.65–35%, and 20–1200 by structure-based VS and 55–81%, 0.2–0.7%, and 110–795 by other ligand-based VS tools in screening libraries of ≥1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3–87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.  相似文献   

6.
In the context of virtual screening calculations, a multiple fingerprint-based metric is applied to generate focused compound libraries by database searching. Different fingerprints are used to facilitate a similarity step for database mining, followed by a diversity step to assemble the final library. The method is applied, for example, to build libraries of limited size for hit-to-lead development efforts. In studies designed to inhibit a therapeutically relevant protein–protein interaction, small molecular hits were initially obtained by combined fingerprint- and structure-based virtual screening and used for the design of focused libraries. We review the applied virtual screening approach and report the statistics and results of screening as well as focused library design. While the structures of lead compounds cannot be disclosed, the analysis is thought to provide an example of the interplay of different methods applied in practical lead identification.  相似文献   

7.
A fully integrated, web-based, virtual screening platform has been developed to allow rapid virtual screening of large numbers of compounds. ORACLE is used to store information at all stages of the process. The system includes a large database of historical compounds from high throughput screenings (HTS) chemical suppliers, ATLAS, containing over 3.1 million unique compounds with their associated physiochemical properties (ClogP, MW, etc.). The database can be screened using a web-based interface to produce compound subsets for virtual screening or virtual library (VL) enumeration. In order to carry out the latter task within ORACLE a reaction data cartridge has been developed. Virtual libraries can be enumerated rapidly using the web-based interface to the cartridge. The compound subsets can be seamlessly submitted for virtual screening experiments, and the results can be viewed via another web-based interface allowing ad hoc querying of the virtual screening data stored in ORACLE.  相似文献   

8.
随着信息技术的发展,出现了越来越多的大规模数据密集型的应用。这些海量信息往往需要存放在磁带库,光盘塔等三级设备中在借鉴国外成功系统的基础之上,针对海量信息数据使用的特点,提出了轻型海量信息存储系统的设计并实现了其系统原型。  相似文献   

9.
Automated liquid-handling robots and high-throughput screening (HTS) are widely used in the pharmaceutical industry for the screening of large compound libraries, small molecules for activity against disease-relevant target pathways, or proteins. HTS robots capable of low-volume dispensing reduce assay setup times and provide highly accurate and reproducible dispensing, minimizing variation between sample replicates and eliminating the potential for manual error. Low-volume automated nanoliter dispensers ensure accuracy of pipetting within volume ranges that are difficult to achieve manually. In addition, they have the ability to potentially expand the range of screening conditions from often limited amounts of valuable sample, as well as reduce the usage of expensive reagents. The ability to accurately dispense lower volumes provides the potential to achieve a greater amount of information than could be otherwise achieved using manual dispensing technology. With the emergence of the field of epigenetics, an increasing number of drug discovery companies are beginning to screen compound libraries against a range of epigenetic targets. This review discusses the potential for the use of low-volume liquid handling robots, for molecular biological applications such as quantitative PCR and epigenetics.  相似文献   

10.
Advances in virtual screening have created new channels for expediting the process of discovering novel drugs. Of particular relevance and interest are in silico techniques that enable the enumeration of combinatorial chemical libraries, generation of 3D coordinates and assessment of their propensity for drug-likeness. In a bid to provide an integrated pipeline that encompasses the common components functional for designing, managing and analyzing combinatorial chemical libraries, we describe a platform-independent, standalone Java application entitled CLEVER (Chemical Library Editing, Visualizing and Enumerating Resource). CLEVER supports chemical library creation and manipulation, combinatorial chemical library enumeration using user-specified chemical components, chemical format conversion and visualization, as well as chemical compounds analysis and filtration with respect to drug-likeness, lead-likeness and fragment-likeness based on the physicochemical properties computed from the derived molecules. Also provided is an integrated property-based graphing component that visually depicts the diversity, coverage and distribution of selected compound collections. When deployed in conjunction with large-scale virtual screening campaigns, CLEVER can offer insights into what chemical compounds to synthesize, and more importantly, what not to synthesize. The software is available at http://datam.i2r.a-star.edu.sg/clever/.  相似文献   

11.
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap—the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages, high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that are linearly proportional to the size of the program presented by the user, rather than to the aggregate size of that program plus the base libraries.  相似文献   

12.
The ever growing needs of large multimedia systems cannot be met by magnetic disks due to their high cost and low storage density. Consequently, cheaper and denser tertiary storage systems are being integrated into the storage hierarchies of these applications. Although tertiary storage is cheaper, the access latency is very high due to the need to load and unload media on the drives. This high latency and the bursty nature of I/O traffic result in the accumulation of I/O requests for tertiary storage. We study the problem of scheduling these requests to improve performance. In particular we address the issues of scheduling across multiple tapes or disks as opposed to most other studies which consider only one or two media. We focus on algorithms that minimize the number of switches and show through simulation that these result in near-optimal schedules. For single drive libraries an efficient algorithm that produces optimal schedules is developed. Formultiple drives the problem is shown to be NP-Complete. Efficient and effective heuristics are presented for both single and multiple drives. The scheduling policies developed achieve significant performance gains over naive policies. The algorithms are simple to implement and are not restrictive. The study encompasses all types of storage libraries handling removable media, such as tapes and optical disks.  相似文献   

13.
区块链系统的数据存储与查询技术综述   总被引:1,自引:0,他引:1  
目前,以比特币和以太坊为代表的区块链系统已经日趋成熟,区块链技术成为学术界与工业界的研究热点。然而,这些区块链系统在实际应用中因数据存储模式限制而普遍面临着查询功能简单、查询性能较低等严重问题。 文中重点对区块链系统的数据存储与查询技术的研究进展进行综述与展望。首先,介绍当前流行区块链系统中使用的数据存储机制和查询处理策略。然后,详细介绍在现有区块链系统基础上扩展查询处理功能的两种方法,并从查询效率、写性能优化、存储空间占用、数据安全性和可用性5个方面对其进行对比和分析。最后,分析了未来区块链系统的查询技术发展趋势,探讨了其主要的研究方向。  相似文献   

14.
随着信息存储技术的发展,现有的存储设备越来越不满足构建大型存储系统的需要。本文研究了存储设备在存储系统中的行为方式,构造了面向对象的智能化磁盘设备OSD的体系结构,并在使用OSD实现的网络存储系统中测试了其功能和性能。  相似文献   

15.
Digital libraries require not only high storage space capacity but also high performance storage systems which provide the fast accesses to the data. These requirements can not be efficiently supported with the traditional SCSI interfaces. Several serial storage interfaces have been proposed for constructing storage systems with high transfer bandwidth, large storage capacity, and fault tolerance feature. Among them, Serial Storage Architecture (SSA) and Fibre Channel-Arbitrated Loop (FC-AL) are considered as the next generation storage interfaces with broad industry support. Both technologies support simple cabling, long transmission distance, high data bandwidth, large capacity, fault tolerance, and fair sharing of link bandwidth. In this paper, a tutorial and a comparison of these two technologies are presented. The tutorial examines their interface specifications, transport protocols, fairness algorithms, and capabilities of fault tolerance. The comparison focuses on their protocol overhead, flow control, fairness algorithms, and fault tolerance. The paper also summarizes the recently proposed Aaron Proposal which incorporates features from both SSA and FC-AL and targets at merging these two technologies.  相似文献   

16.
基于多级存储结构的海量信息管理系统的研究   总被引:2,自引:0,他引:2  
It is very important for such large-scale application systems as global information systems,GIS,digital libraries,data warehousing to effectively manage and manipulate the large volumes of infor-mation(more than terabytes of data). One of challenge problems is to study massive information man-agement system and related techniques. Since the large volumes of information are mainly stored on ter-tiary storage devices,the study of massive information management system should focus on the follow-ing aspects. (l)the multilevel storage architecture in which the tertiary storage devices are main datamedia and the secondary storage devices are data cache; (2)scalable system architecture of massive in-formation management system; (3)retrieve process techniques. This paper will deeply study and discussthe above oroblems.  相似文献   

17.
Current parallel systems composed of mixed multi/manycore systems and/with GPUs become more complex due to their heterogeneous nature. The programmability barrier inherent to parallel systems increases almost with each new architecture delivery. The development of libraries, languages, and tools that allow an easy and efficient use in this new scenario is mandatory. Among the proposals found to broach this problem, skeletal programming appeared as a natural alternative to easy the programmability of parallel systems in general, but also the GPU programming in particular. In this paper, we develop a programming skeleton for Dynamic Programming on MultiGPU systems. The skeleton, implemented in CUDA, allows the user to execute parallel codes for MultiGPU just by providing sequential C++ specifications of her problems. The performance and easy of use of this skeleton has been tested on several optimization problems. The experimental results obtained over a cluster of Nvidia Fermi prove the advantages of the approach.  相似文献   

18.
面向对象存储是近年来存储系统领域的研究热点之一。与传统存储系统相比,面向对象存储系统具有高性能、可伸缩、更安全、更可靠的特点。针对面向对象存储的特点,本文给出了一种对象和数据副本的布局方法,并在原型系统上验证了实现。实验测试表明,这一布局方法能够很好地实现系统的负载均衡和可伸缩性。  相似文献   

19.
Multidimensional array I/O in Panda 1.0   总被引:1,自引:0,他引:1  
Large multidimensional arrays are a common data type in high-performance scientific applications. Without special techniques for handling input and output, I/O can easily become a large fraction of execution time for applications using these arrays, especially on parallel platforms. Our research seeks to provide scientific programmers with simpler and more abstract interfaces for accessing persistent multidimensional arrays, and to produce advanced I/O libraries supporting more efficient layout alternatives for these arrays on disk and in main memory. We have created the Panda (Persistence AND Arrays) I/O library as a result of developing interfaces and libraries for applications in computational fluid dynamics in the areas of checkpoint, restart, and time-step output data. In the applications we have studied, we find that a simple, abstract interface can be used to insulate programmers from physical storage implementation details, while providing improved I/O performance at the same time.(A preliminary version of this paper was presented at Supercomputing '94.)  相似文献   

20.
Starner  T. 《Computer》2002,35(1):133-135
A thin-client approach to mobile computing pushes as many services as possible on a remote server. Technology trends indicate, however, that an easy route to improving thin-client functionality will be to add disk storage, RAM, and a more powerful CPU. Thus, thin clients will rapidly become multipurpose thick clients. With time, users may come to consider their mobile system as their primary general-purpose computing device, maintaining their most-used files on the mobile system and relying on their desktop systems primarily for larger displays, keyboards, and other nonmobile interfaces  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号