首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 506 毫秒
1.
In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi‐faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area.  相似文献   

2.
Developing a comprehensive explanation of complex social phenomena is a difficult task that analysts often have to perform using vast collections of text documents. On the one hand, solutions exist to assist analysts in creating causal maps from text documents, but these can only articulate the relationships at work in a problem. On the other hand, Fuzzy Cognitive Maps (FCMs) can articulate these relationships and perform simulations, but no environment exists to help analysts in iteratively developing FCMs from text. In this paper, we detail the design and implementation of the first tool that allows analysts to develop FCMs from text collections, using interactive visualizations. We make three contributions: (i) we combine text mining and FCMs, (ii) we implement the first visual analytics environment built on FCMs, and (iii) we promote a strong feedback loop between interactive data exploration and model building. We provide two case studies exemplifying how to create a model from the ground-up or improve an existing one. Limitations include the increase in display complexity when working with large collection of files, and the reliance on KL-divergence for ad-hoc retrieval. Several improvements are discussed to further support analysts in creating high-quality models through interactive visualizations.  相似文献   

3.
Many real-world analysis tasks can benefit from the combined efforts of a group of people. Past research has shown that to design visualizations for collaborative visual analytics tasks, we need to support both individual as well as joint analysis activities. We present Cambiera, a tabletop visual analytics tool that supports individual and collaborative information foraging activities in large text document collections. We define collaborative brushing and linking as an awareness mechanism that enables analysts to follow their own hypotheses during collaborative sessions while still remaining aware of the group's activities. With Cambiera, users are able to collaboratively search through documents, maintaining awareness of each others' work and building on each others' findings.  相似文献   

4.
FacetAtlas: multifaceted visualization for rich text corpora   总被引:1,自引:0,他引:1  
Documents in rich text corpora usually contain multiple facets of information. For example, an article about a specific disease often consists of different facets such as symptom, treatment, cause, diagnosis, prognosis, and prevention. Thus, documents may have different relations based on different facets. Powerful search tools have been developed to help users locate lists of individual documents that are most related to specific keywords. However, there is a lack of effective analysis tools that reveal the multifaceted relations of documents within or cross the document clusters. In this paper, we present FacetAtlas, a multifaceted visualization technique for visually analyzing rich text corpora. FacetAtlas combines search technology with advanced visual analytical tools to convey both global and local patterns simultaneously. We describe several unique aspects of FacetAtlas, including (1) node cliques and multifaceted edges, (2) an optimized density map, and (3) automated opacity pattern enhancement for highlighting visual patterns, (4) interactive context switch between facets. In addition, we demonstrate the power of FacetAtlas through a case study that targets patient education in the health care domain. Our evaluation shows the benefits of this work, especially in support of complex multifaceted data analysis.  相似文献   

5.
In this paper we propose an approach in which interactive visualization and analysis are combined with batch tools for the processing of large data collections. Large and heterogeneous data collections are difficult to analyze and pose specific problems to interactive visualization. Application of the traditional interactive processing and visualization approaches as well as batch processing encounter considerable drawbacks for such large and heterogeneous data collections due to the amount and type of data. Computing resources are not sufficient for interactive exploration of the data and automated analysis has the disadvantage that the user has only limited control and feedback on the analysis process. In our approach, an analysis procedure with features and attributes of interest for the analysis is defined interactively. This procedure is used for off-line processing of large collections of data sets. The results of the batch process along with "visual summaries" are used for further analysis. Visualization is not only used for the presentation of the result, but also as a tool to monitor the validity and quality of the operations performed during the batch process. Operations such as feature extraction and attribute calculation of the collected data sets are validated by visual inspection. This approach is illustrated by an extensive case study, in which a collection of confocal microscopy data sets is analyzed.  相似文献   

6.
People accumulate large collections of digital photos, which they use for individual, social, and utilitarian purposes. In order to provide suitable technologies for enjoying our expanding photo collections, it is essential to understand how and to what purpose these collections are used. Contextual interviews with 12 participants in their homes explored the use of digital photos, incorporating new photo activities that are offered by new technologies. Based on the qualitative analysis of the collected data, we give an overview of current photo activities, which we term PhotoUse. We introduce a model of PhotoUse, which emphasises the purpose of photo activities rather than the tools to support them. We argue for the use of our model to design tools to support the user’s individual and social goals pertaining to PhotoUse.  相似文献   

7.
The current suite of Internet information tools only allow their users to reach specific subsets of the information available on the Internet, and require information providers and consumers to interact with data through specific paradigms. This has limited our ability to present and use information, and in fact seem to be little more than extensions of the text based world we have grown up with. There are many untapped possibilities in our interaction with data, but much work needs to be done to provide a framework on which new tools can be deployed. This paper examines the current tools, the infrastructure required to make the current tools work together, and surveys the techniques which have been promoted to enable easier interactions in the future.  相似文献   

8.
Multidimensional Visualization techniques are invaluable tools for analysis of structured and unstructured data with variable dimensionality. This paper introduces PEx-ImageProjection Explorer for Images—a tool aimed at supporting analysis of image collections. The tool supports a methodology that employs interactive visualizations to aid user-driven feature detection and classification tasks, thus offering improved analysis and exploration capabilities. The visual mappings employ similarity-based multidimensional projections and point placement to layout the data on a plane for visual exploration. In addition to its application to image databases, we also illustrate how the proposed approach can be successfully employed in simultaneous analysis of different data types, such as text and images, offering a common visual representation for data expressed in different modalities.  相似文献   

9.
Text retrieval in the legal world   总被引:1,自引:1,他引:0  
The ability to find relevant materials in large document collections is a fundamental component of legal research. The emergence of large machine-readable collections of legal materials has stimulated research aimed at improving the quality of the tools used to access these collections. Important research has been conducted within the traditional information retrieval, the artificial intelligence, and the legal communities with varying degrees of interaction between these groups. This article provides an introduction to text retrieval and surveys the main research related to the retrieval of legal materials.  相似文献   

10.
Digital libraries increasingly benefit from research on automated text categorization for improved access. Such research is typically carried out by means of standard test collections. In this article, we present a pilot experiment of replacing such test collections by a set of 6,000 objects from a real-world digital repository, indexed by Library of Congress Subject Headings, and test support vector machines in a supervised learning setting for their ability to reproduce the existing classification. To augment the standard approach, we introduce a combination of two novel elements: using functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classification reconstruction from abstracts and vice versa from full-text documents, the latter outcome being due to word sense ambiguity. The practical implementation of our methodological framework enhances the analysis and representation of specific knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of specific knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital objects and collections). Our research is an initial step in this direction developing further the methodological approach and demonstrating that text categorization can be applied to analyse the thematic coverage in digital repositories.  相似文献   

11.
We present results from a study on constructing and evaluating a support tool for the extraction of patterns in distributed decision -making processes, based on design criteria elicited from a study on the work process involved in studying such decision-making. Specifically, we devised and evaluated an analysis tool for C2 researchers who study simulated decision-making scenarios for command teams. The analysis tool used text clustering as an underlying pattern extraction technique and was evaluated together with C2 researchers in a workshop to establish whether the design criteria were valid and the approach taken with the analysis tool was sound. Design criteria elicited from an earlier study with researchers (open-endedness and transparency) were highly consistent with the results from the workshop. Specifically, evaluation results indicate that successful deployment of advanced analysis tools requires that tools can treat multiple data sources and offer rich opportunities for manipulation and interaction (open-endedness) and careful design of visual presentations and explanations of the techniques used (transparency). Finally, the results point to the high relevance and promise of using text clustering as a support for analysis of C2 data.  相似文献   

12.
The Synthetic BattleBridge gives users interface tools that help them navigate, analyze, and comprehend a complex, active, distributed virtual battlespace environment. A primary objective for the SBB project is to develop an observatory for real time monitoring and assessment of the activities of intelligently behaving autonomous actors and manned simulators within a VE. Therefore, the SBB uses both environment and computation distribution to let users monitor and assess in real time the activity within a VE and to provide cognitive support for situation analysis. The SBB also addresses the development and evaluation of advanced user interfaces, information aggregation techniques, and information presentation techniques. For the SBB to function as we require, it must present to the user the spatial orientation, type, motion, and distribution of actors in a VE. Key issues regarding these capabilities are updating and displaying the VE at interactive display rates and providing a very large scale environment containing a wide variety of actor types, sizes, and speeds, The SBB provides these capabilities by computing vehicle position, motion, and velocity data for all actors in the battlespace and presenting this information in real time using a 3D rendering of the battlespace and its contents. The user controls the SBB and information presentation with an interface consisting of a combination of visual icons and text, which we describe  相似文献   

13.
Our understanding of distributed decision making in professional teams and their performance comes in part from studies in which researchers gather and process information about the communications and actions of teams. In many cases, the data sets available for analysis are large, unwieldy and require methods for exploratory and dynamic management of data. In this paper, we report the results of interviewing eight researchers on their work process when conducting such analyses and their use of support tools in this process. Our aim with the study was to gain an understanding of their workflow when studying distributed decision making in teams, and specifically how automated pattern extraction tools could be of use in their work. Based on an analysis of the interviews, we elicited three issues of concern related to the use of support tools in analysis: focusing on a subset of data to study, drawing conclusions from data and understanding tool limitations. Together, these three issues point to two observations regarding tool use that are of specific relevance to the design of intelligent support tools based on pattern extraction: open-endedness and transparency.  相似文献   

14.
This paper describes our work on developing a language-independent technique for discovery of implicit knowledge from multilingual information sources. Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus only on processing monolingual documents (particularly English documents): little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. In this work, we attempt to develop a language-neutral method to tackle the linguistics difficulties in the text mining process. Using a variation of automatic clustering techniques, which apply a neural net approach, namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on a Chinese corpus, Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential paths for future work in the field of multilingual information discovery. Besides, this work is expected to act as a starting point for exploring the impacts on linguistics issues with the machine-learning approach to mining sensible linguistics elements from multilingual text collections.  相似文献   

15.
In this paper, we present CatViz—Temporally-Sliced Correspondence Analysis Visualization. This novel method visualizes relationships through time and is suitable for large-scale temporal multivariate data. We couple CatViz with clustering methods, whereupon we introduce the concept of final centroid transfer, which enables the correspondence of clusters in time. Although CatViz can be used on any type of temporal data, we show how it can be applied to the task of exploratory visual analysis of text collections. We present a successful concept of employing feature-type filtering to present different aspects of textual data. We performed case studies on large collections of French and English news articles. In addition, we conducted a user study that confirms the usefulness of our method. We present typical tasks of exploratory text analysis and discuss application procedures that an analyst might perform. We believe that CatViz is general and highly applicable to large data sets because of its intuitiveness, effectiveness, and robustness. We expect that it will enable a better understanding of texts in huge historical archives.  相似文献   

16.
严宇宇  陶煜波  林海 《软件学报》2016,27(5):1114-1126
随着信息技术的快速发展,大量的文本数据产生、被收集和存储.主题模型是文本分析的重要工具之一,被广泛地应用于分析大规模文本集.然而,主题模型通常无法直观而有效地结合用户的领域专业知识对模型结果进行修正.针对这一问题,提出了一个交互式可视分析系统,帮助用户对主题模型进行交互修正.首先对层次狄利克雷过程进行了改进,使其支持单词约束;然后,使用矩阵视图对主题模型进行展示,并使用语义相关的词云布局帮助用户寻找单词约束,用户通过添加单词约束迭代优化主题模型;最后,通过案例分析及用户研究来评价该系统的可用性.  相似文献   

17.
Named Entity Recognition and Classification (NERC) is an important component of applications like Opinion Tracking, Information Extraction, or Question Answering. When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora. This paper presents a lightly supervised system that acquires lists of names and linguistic patterns from large raw text collections in western languages and starting with only a few seeds per class selected by a human expert. Experiments have been carried out with English and Spanish news collections and with the Spanish Wikipedia. Evaluation of NE classification on standard datasets shows that NE lists achieve high precision and reveals that contextual patterns increase recall significantly. Therefore, it would be helpful for applications where annotated NERC data are not available such as those that have to deal with several western languages or information from different domains.  相似文献   

18.
Collocations are understood in this work as the nonrandom combination of two or more lexical units that is typical for both a language as a whole (texts of any type) and a definite type of text. A text is a structured sequence of units of different levels; collocations, as complex text substructures, act as an important object when investigating text analysis procedures. In selecting collections of different types as materials, we study both the general patterns and properties of the analyzed collections. This paper devotes its main attention to digrams that were extracted from a collection of news texts.  相似文献   

19.
Over the past few years, large human populations around the world have been affected by an increase in significant seismic activities. For both conducting basic scientific research and for setting critical government policies, it is crucial to be able to explore and understand seismic and geographical information obtained through all scientific instruments. In this work, we present a visual analytics system that enables explorative visualization of seismic data together with satellite-based observational data, and introduce a suite of visual analytical tools. Seismic and satellite data are integrated temporally and spatially. Users can select temporal ;and spatial ranges to zoom in on specific seismic events, as well as to inspect changes both during and after the events. Tools for designing high dimensional transfer functions have been developed to enable efficient and intuitive comprehension of the multi-modal data. Spread-sheet style comparisons are used for data drill-down as well as presentation. Comparisons between distinct seismic events are also provided for characterizing event-wise differences. Our system has been designed for scalability in terms of data size, complexity (i.e. number of modalities), and varying form factors of display environments.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号