首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There has been an ongoing trend toward collaborative software development using open and shared source code published in large software repositories on the Internet. While traditional source code analysis techniques perform well in single project contexts, new types of source code analysis techniques are ermerging, which focus on global source code analysis challenges. In this article, we discuss how the Semantic Web, can become an enabling technology to provide a standardized, formal, and semantic rich representations for modeling and analyzing large global source code corpora. Furthermore, inference services and other services provided by Semantic Web technologies can be used to support a variety of core source code analysis techniques, such as semantic code search, call graph construction, and clone detection. In this paper, we introduce SeCold, the first publicly available online linked data source code dataset for software engineering researchers and practitioners. Along with its dataset, SeCold also provides some Semantic Web enabled core services to support the analysis of Internet-scale source code repositories. We illustrated through several examples how this linked data combined with Semantic Web technologies can be harvested for different source code analysis tasks to support software trustworthiness. For the case studies, we combine both our linked-data set and Semantic Web enabled source code analysis services with knowledge extracted from StackOverflow, a crowdsourcing website. These case studies, we demonstrate that our approach is not only capable of crawling, processing, and scaling to traditional types of structured data (e.g., source code), but also supports emerging non-structured data sources, such as crowdsourced information (e.g., StackOverflow.com) to support a global source code analysis context.  相似文献   

2.
面向知识网格的本体学习研究   总被引:12,自引:1,他引:11  
网格计算正在从单纯的面向大型计算的分布式资源共享发展为一种面向服务的架构,以实现透明而可靠的分布式系统集成。网格智能是指如何获取、预处理、表示和集成不同层次的网格服务(如HTML/XML/RDF/OWL文档、服务响应时间和服务质量等)的数据和信息,并最终转换为有用的智能(知识)。因为高层知识将在未来的网格应用起到越来越重要的作用,本体是知识网格实现的关键。文章提出了一种实现从Web文档中本体(半)自动构建的本体学习框架WebOntLearn,并讨论了本体学习中领域概念的抽取、概念之间关系的抽取和分类体系的自动构建等关键技术。  相似文献   

3.
语义网文档搜索是发现语义网数据的重要手段.针对传统信息检索方法的不足,提出基于RDF句子的文档词向量构建方法.首先,文档被看作RDF句子的集合,从而在文档分析和索引时能够保留基于RDF句子的结构信息.其次,引入资源的权威描述的定义,能够跨越文档边界搜索到语义网中互连的数据.此外,扩展了传统的倒排索引结构,使得系统能够提取出更加便于阅读和理解的片段.在大规模真实数据集上的实验表明,该方法可以显著地提高文档检索的效率,在可用性上具有明显的提升.  相似文献   

4.
The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web resources, embedding methods made them applicable to machine learning tasks. Subsequently, embedding models for numerous tasks and structures have been developed, and embedding spaces for various resources have been published. The ecosystem of embedding spaces is distributed but not interoperable: Entity embeddings are not readily comparable across different spaces. To parallel the Web of Data with a Web of Embeddings, we must thus integrate available embedding spaces into a uniform space.Current integration approaches are limited to two spaces and presume that both of them were embedded with the same method — both assumptions are unlikely to hold in the context of a Web of Embeddings. In this paper, we present FedCoder— an approach that integrates multiple embedding spaces via a latent space. We assert that linked entities have a similar representation in the latent space so that entities become comparable across embedding spaces. FedCoder employs an autoencoder to learn this latent space from linked as well as non-linked entities.Our experiments show that FedCoder substantially outperforms state-of-the-art approaches when faced with different embedding models, that it scales better than previous methods in the number of embedding spaces, and that it improves with more graphs being integrated whilst performing comparably with current approaches that assumed joint learning of the embeddings and were, usually, limited to two sources. Our results demonstrate that FedCoder is well adapted to integrate the distributed, diverse, and large ecosystem of embeddings spaces into an interoperable Web of Embeddings.  相似文献   

5.
Sharing health-care records over the Internet   总被引:1,自引:0,他引:1  
Presents a novel approach to sharing electronic health-care records that leverages the Internet and the World Wide Web, developed as part of two European Commission-funded projects, Synapses and SynEx. The approach provides an integrated view of patient data from heterogeneous, distributed information systems and presents it to users electronically. Synapses and SynEx illustrate a generic approach in applying Internet technologies for viewing shared records, integrated with existing health computing environments. Prototypes have been validated in a variety of clinical domains and health-care settings  相似文献   

6.
Efficient execution of composite Web services exchanging intensional data   总被引:1,自引:0,他引:1  
Web service technologies provide a standard means of integrating heterogeneous applications distributed over the Internet. Successive compositions of new Web services using pre-existing ones usually create a hierarchical structure of invocations among a large number of Web services. For the efficient execution of these composite Web services, we propose an approach which exploits intensional XML data, i.e. an XML document that contains special elements representing the calls to Web services, in order to delegate the invocations of the external Web services to some relevant nodes. We formalize an invocation plan for composite Web services in which intensional data is used as their parameters and results, and define a cost-based optimization problem to obtain an efficient invocation plan for them. We provide an A∗ heuristic search algorithm to find an optimal invocation plan for a given set of Web services and also present a greedy method of generating an efficient solution in a short time. The experimental results show that the proposed greedy method can find a close-to-optimal solution efficiently and has good scalability for a complex call hierarchy of Web services.  相似文献   

7.
The Web facilitates a global marketplace that provides an economic platform for application developers, merchants and customers to exchange goods and services from a wide range of domains. As a result, large volume of data now resides on the Web. Access to data distributed over the Web is becoming increasingly difficult because of information overload. A system that transcends the amalgamation of data, and provides easy access to data distributed over the Web is necessary. We present WISE, a correctness preserving approach to integration of Web data sources. We describe components of WISE, including a flexible semistructured data model, a common-term vocabulary, and an efficient integration algorithm that automates the integration process. We formally specify these components and show that the global integrated schema is correct, complete, minimal, and understandable.  相似文献   

8.
9.
We present SPUD, a semantic environment for cataloging, exploring, integrating, understanding, processing and transforming urban information. A series of challenges are identified: namely, the heterogeneity of the domain and the impracticality of a common model, the volume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), the complex data dependencies and the sensitivity of the information. We propose an approach for the incremental and continuous integration of static and streaming data, based on Semantic Web technologies and apply our technology to a traffic diagnosis scenario. We demonstrate our approach through a system operating on real data in Dublin and we show that semantic technologies can be used to obtain business results in an environment with hundreds of heterogeneous datasets coming from distributed data sources and spanning multiple domains.  相似文献   

10.
Web搜索中的数据挖掘技术研究   总被引:4,自引:0,他引:4  
WWW已经成为世界上是大的分布式信息系统,如何快速有效地搜索用户所需的资源一直是研究热点。Web挖掘也已经成为数据挖掘中相对成熟的一个分支。本文针对Web资源搜索中利用的相关Web挖掘技术做一个综述。文章首先对目前流行的Web内容挖掘方面的常用技术进行了研究分析,然后着重研究了Web结构挖掘技术,介绍并评价了多种算法模型。接着介绍了用户使用的挖掘,并提出了Web内容挖掘技术,结构挖掘技术和用户使用挖掘相结合,应用于开发智能型搜索引擎的趋势。  相似文献   

11.
Knowledge extraction from Chinese wiki encyclopedias   总被引:1,自引:0,他引:1  
  相似文献   

12.
王立杰  李萌  蔡斯博  李戈  谢冰  杨芙清 《软件学报》2012,23(6):1335-1349
随着Web服务技术的不断成熟和发展,互联网上出现了大量的公共Web服务.在使用Web服务开发软件系统的过程中,其文本描述信息(例如简介和使用说明等)可以帮助服务消费者直观有效地识别和理解Web服务并加以利用.已有的研究工作大多关注于从Web服务的WSDL文件中获取此类信息进行Web服务的发现或检索,调研发现,互联网上大部分Web服务的WSDL文件中普遍缺少甚至没有此类信息.为此,提出一种基于网络信息搜索的从WSDL文件之外的信息源为Web服务扩充文本描述信息的方法.从互联网上收集包含目标Web服务特征标识的相关网页,基于从网页中抽取出的信息片段,利用信息检索技术计算信息片段与目标Web服务的相关度,并选取相关度较高的文本片段为Web服务扩充文本描述信息.基于互联网上的真实数据进行的实验,其结果表明,可为约51%的互联网上的Web服务获取到相关网页,并为这些Web服务中约88%扩充文本描述信息.收集到的Web服务及其文本描述信息数据均已公开发布.  相似文献   

13.
基于XML的Web数据挖掘关键技术的研究   总被引:8,自引:0,他引:8       下载免费PDF全文
由于存在着大量的在线信息,WWW成为数据挖掘的热点。该文介绍了Web网页的数据挖掘技术,提出一种基于XML的Web数据挖掘模型,阐述将半结构化HTML文档转换成良构的XML文档的原因,并给出基于HTML Tide库的转换代码,介绍了利用XML技术从Web网页析取数据的关键技术,包括XHTML、XSLT和XQuery等,对Web数据挖掘的其他方面如数据检验和集成作了一定的探讨。  相似文献   

14.
气田信息整合的主要目标是实现自治、分布、异构数据源的自动数据交换,并为用户提供统一的全局数据视图。Web服务作为一种面向服务的分布式计算技术,提供了一种建立基于Web的复杂松耦合分布式系统框架。讨论了Web服务应用于信息整合的基本模式,提出了基于Web服务的气田信息整合体系结构和基于元数据的气田信息整合方法。系统采用XML作为数据表示和交换的标准,具有较强的灵活性和可维护性。  相似文献   

15.
研究了基于Web的数据库发布技术。Internet,特别是Web技术的飞速发展,使得在很大程度上共享数据库信息成为可能。数据库发布则是这个领域的一项重要应用。有两种典型的方法用来在网上发布数据:基于CGI和基于Java。该文分析了这两种方法的优缺点。然后提出了一种基于新模型即HTTP扩展的数据库发布方法。初步评价结果表明这是一种有效的方法。  相似文献   

16.
The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we present Hermes, an infrastructure for data Web search that addresses a number of challenges involved in realizing search on the data Web. To provide an end-user oriented interface, we support expressive user information needs by translating keywords into structured queries. We integrate heterogeneous Web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local Web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data Web, we show the practicability and scalability of the Hermes infrastructure.  相似文献   

17.
The Internet and related technologies have seen tremendous growth in distributed applications such as medicine, education, e-commerce, and digital libraries. As demand increases for online content and integrated, automated services, various applications employ Web services technology for document exchange among data repositories. Web services provide a mechanism to expose data and functionality using standard protocols, and hence to integrate many features that enhance Web applications. XML, a well-established text format, is playing an increasingly important role in supporting Web services. XML separates data from style and format definition and allows uniform representation, interchange, sharing, and dissemination of information content over the Internet. XML and Web services provide a simplified application integration framework that drives demand for models that support secure information interchange. Providing document security in XML-based Web services requires access control models that offer specific capabilities. Our XML-based access control specification language addresses a new set of challenges that traditional security models do not address.  相似文献   

18.
Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as: (1) all information in one physical page, or (2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. We introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data  相似文献   

19.
Recent advances in Semantic Web and Web Service technologies has shown promise for automatically deriving geospatial information and knowledge from Earth science data distributed over the Web. In a service-oriented environment, the data, information, and knowledge are often consumed or produced by complex, distributed geoscientific workflows or service chains. In order for the chaining results to be consumable, sufficient metadata for data products to be delivered by service chains must be provided. This paper proposes automatic generation of geospatial metadata for Earth science virtual data products. A virtual data product is represented using process models, and can be materialized on demand by dynamically binding and chaining archived data and services, as opposed to requiring that Earth science data products be physically archived. Semantics-enabled geospatial metadata is generated, validated, and propagated during the materialization of a virtual data product. The generated metadata not only provides a context in which end-users can interpret data products before intensive execution of service chains, but also assures semantic consistency of the service chains.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号