首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Web services offer a more reliable and efficient way to access online data than scraping web pages. However, interacting with web services to retrieve data often requires people to write a lot of code. Moreover, many web services return data in complex hierarchical structures that make it difficult for people to perform any further data manipulation. We developed Gneiss, a tool that extends the familiar spreadsheet metaphor to support using structured web service data. Gneiss lets users retrieve or stream arbitrary JSON data returned from web services to a spreadsheet using interaction techniques without writing any code. It introduces a novel visualization that represents hierarchies in data using nested spreadsheet cells and allows users to easily reshape and regroup the extracted structured data. Data flow is two-way between the spreadsheet and the web services, enabling people to easily make a new web service call and retrieve new data by modifying spreadsheet cells. We report results form a user study that showed that Gneiss helped spreadsheet users use and analyze structured data more efficiently than Excel and even outperform professional programmers writing code. We further use a set of examples to demonstrate our tool's ability to create reusable data extraction and manipulation programs that work with complex web service data.  相似文献   

2.
Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.
Luiz Fernando Gomes SoaresEmail:
  相似文献   

3.
The performance of a focused, or topic-specific Web robot can be improved by taking into consideration the structure of the documents downloaded by the robot. In the case of HTML, document structure is tree-like, defined by nested document elements (tags) and their attributes. By analysing this structure, a robot may use the text of certain HTML elements to prioritise documents for downloading and thus significantly improve the speed of convergence to a topic. Clear separation of the structure-aware document parser from the download scheduler provides flexibility but requires a standard interface and protocol between the two. The paper discusses such an interface in the context of an experimental Web robot, whose speed of convergence to a topic was observed to increase by a factor of 3 to 8, as measured by the number of documents downloaded to reach a given average relevance score.  相似文献   

4.
5.
In the past years we have witnessed Sentiment Analysis and Opinion Mining becoming increasingly popular topics in Information Retrieval and Web data analysis. With the rapid growth of the user-generated content represented in blogs, wikis and Web forums, such an analysis became a useful tool for mining the Web, since it allowed us to capture sentiments and opinions at a large scale. Opinion retrieval has established itself as an important part of search engines. Ratings, opinion trends and representative opinions enrich the search experience of users when combined with traditional document retrieval, by revealing more insights about a subject. Opinion aggregation over product reviews can be very useful for product marketing and positioning, exposing the customers’ attitude towards a product and its features along different dimensions, such as time, geographical location, and experience. Tracking how opinions or discussions evolve over time can help us identify interesting trends and patterns and better understand the ways that information is propagated in the Internet. In this study, we review the development of Sentiment Analysis and Opinion Mining during the last years, and also discuss the evolution of a relatively new research direction, namely, Contradiction Analysis. We give an overview of the proposed methods and recent advances in these areas, and we try to layout the future research directions in the field.  相似文献   

6.
7.
This paper takes as its premise that the web is a place of action, not just information, and that the purpose of global data is to serve human needs. The paper presents several component technologies, which together work towards a vision where many small micro-applications can be threaded together using automated assistance to enable a unified and rich interaction. These technologies include data detector technology to enable any text to become a start point of semantic interaction; annotations for web-based services so that they can link data to potential actions; spreading activation over personal ontologies, to allow modelling of context; algorithms for automatically inferring ‘typing’ of web-form input data based on previous user inputs; and early work on inferring task structures from action traces. Some of these have already been integrated within an experimental web-based (extended) bookmarking tool, Snip!t, and a prototype desktop application On Time, and the paper discusses how the components could be more fully, yet more openly, linked in terms of both architecture and interaction. As well as contributing to the goal of an action and activity-focused web, the work also exposes a number of broader issues, theoretical, practical, social and economic, for the Semantic Web.  相似文献   

8.
Nowadays, people frequently use different keyword-based web search engines to find the information they need on the web. However, many words are polysemous and, when these words are used to query a search engine, its output usually includes links to web pages referring to their different meanings. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the users, especially if the user-intended meanings behind the input keywords are not among the most popular on the web.  相似文献   

9.
介绍了采用De Bruiin序列对结构光进行编码,基于全局优化思想对条纹边界进行最优邻域匹配,利用增加约束的动态编程遍历最优匹配路径网格得到最优匹配路径;对畸变条纹图像进行颜色校正,提高了边界检测的准确率.该编码策略解码简单,匹配算法能取得较好效果,得到的点云数据精度能够达到三维表面重建的要求.  相似文献   

10.
11.
12.
WEB半结构化数据查询   总被引:1,自引:0,他引:1  
当前许多大的Web站点的信息和数据呈现结构化或半结构化的特点,因而可经抽象,作为类似关系数据库或面向对象数据库并加以处理,以提高操作效率,特别是在此基础上的查询操作。采用数据模型Araneus的一个子集作为数据模型,并采用连接约束、包含约束、范围约束,提出一种半结构化查询重写的方法,该方法在保证算法正确性和完备性的基础上,利用半结构化数据特点和查询子目标之间的关系,极大地降低了算法的代价。  相似文献   

13.
为了搜索Web资源中深层数据并对其利用,在分析利用搜索引擎获取Web资源存在问题的基础上,利用语义网和Web服务技术,提出构建Web资源本体模型实现对Web资源进行语义标识,结合服务管理代理构建数据中介服务应用模型,并以Web服务方式实现Web资源数据中介服务.通过实验验证了Web资源数据中介服务的有效性和可行性,从而实现帮助用户能在形式多样、种类繁多的海量Web资源中有效地荻取和共享Web资源数据.  相似文献   

14.
15.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

16.
In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice. We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view. We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies. A preliminary version this paper appeared, under the title “Integrating Heterogeneous Multidimensional Databases” [9], in 17th Int. Conference on Scientific and Statistical Database Management, 2005.  相似文献   

17.
18.
随着信息化的不断深入和科学技术的提高,数据库技术和网络技术已经帮助企业实现了办公自动化、经营决策管理信息化和生产过程信息化,但是信息量的扩大给信息的采集和长久保存带来了困难,传统的信息处理技术以及Hadoop技术都不能实现海量结构化数据的处理,为了更好地提升企业决策的思维广度和获取信息的完整度,文章"数据服务云平台"进行了研究和分析,这种站在全新的大数据应用高度,对新的技术架构进行探索和研究的方式,能够更为合理的解决企业大数据应用的关键技术难题。  相似文献   

19.
随着网络应用和企业决策支持系统的需求持续增长,Web数据格式转换的问题成为进一步研究的方向.XML近来已成为Web上数据表示与交换的标准,目前国内外学者己经提出一些基于XML技术的查询语言,XSQL技术就是XML技术扩展了的一种简单易用的查询语言.对XSQL进行了研究和实践,认为XSQL技术的优势为Web技术的发展提供了新的思路.通过使用XSQL技术改进Web的服务功能,解决了数据格式转换的问题,使数据能够按照不同的需求在Web中进行多样式的显示,并以Oracle数据库为例说明了该技术的应用.  相似文献   

20.
As the Internet grows, it becomes essential to find efficient tools to deal with all the available information. Question answering (QA) and text summarization (TS) research fields focus on presenting the information requested by users in a more concise way. In this paper, the appropriateness and benefits of using summaries in semantic QA are analyzed. For this purpose, a combined approach where a TS component is integrated into a Web‐based semantic QA system is developed. The main goal of this paper is to determine to what extent TS can help semantic QA approaches, when using summaries instead of search engine snippets as the corpus for answering questions. In particular, three issues are analyzed: (i) the appropriateness of query‐focused (QF) summarization rather than generic summarization for the QA task, (ii) the suitable length comparing short and long summaries, and (iii) the benefits of using TS instead of snippets for finding the answers, tested within two semantic QA approaches (named entities and semantic roles). The results obtained show that QF summarization is better than generic (58% improvement), short summaries are better than long (6.3% improvement), and the use of TS within semantic QA improves the performance for both named‐entity‐based (10%) and, especially, semantic‐role‐based QA (47.5%). © 2011 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号