首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Spatializations represent non-spatial data using a spatial layout similar to a map. We present an experiment comparing different visual representations of spatialized data, to determine which representations are best for a non-trivial search and point estimation task. Primarily, we compare point-based displays to 2D and 3D information landscapes. We also compare a colour (hue) scale to a grey (lightness) scale. For the task we studied, point-based spatializations were far superior to landscapes, and 2D landscapes were superior to 3D landscapes. Little or no benefit was found for redundantly encoding data using colour or greyscale combined with landscape height. 3D landscapes with no colour scale (height-only) were particularly slow and inaccurate. A colour scale was found to be better than a greyscale for all display types, but a greyscale was helpful compared to height-only. These results suggest that point-based spatializations should be chosen over landscape representations, at least for tasks involving only point data itself rather than derived information about the data space.  相似文献   

2.
This work presents a real-time active vision tracking system based on log-polar image motion estimation with 2D geometric deformation models. We present a very efficient parametric motion estimation method, where most computation can be done offline. We propose a redundant parameterization for the geometric deformations, which improve the convergence range of the algorithm. A foveated image representation provides extra computational savings and attenuation of background effects. A proper choice of motion models and a hierarchical organization of the iterations provide additional robustness. We present a fully integrated system with real-time performance and robustness to moderate deviations from the assumed deformation models.  相似文献   

3.
Log-polar imaging consists of a type of methods that represent visual information with a space-variant resolution inspired by the visual system of mammals. It has been studied for about three decades and has surpassed conventional approaches in robotics applications, mainly the ones where real-time constraints make it necessary to utilize resource-economic image representations and processing methodologies. This paper surveys the application of log-polar imaging in robotic vision, particularly in visual attention, target tracking, egomotion estimation, and 3D perception. The concise yet comprehensive review offered in this paper is intended to provide novel and experienced roboticists with a quick and gentle overview of log-polar vision and to motivate vision researchers to investigate the many open problems that still need solving. To help readers identify promising research directions, a possible research agenda is outlined. Finally, since log-polar vision is not restricted to robotics, a couple of other areas of application are discussed.  相似文献   

4.
Abstract.  We explore an important phase of information systems design (ISD), namely task redesign, and especially how different viewpoints enter into the discussions. We study how one particular visual representation, a process diagram, is interpreted and how alternative, even competing, representations are produced verbally. To tie the visual and verbal representations and the representational practices to wider social practices, we develop and use the Extended Three-dimensional Model of discourse. Visual representations emerged as focal in bringing in the different viewpoints and as reference points for discussions. Our model provided a focused and powerful means to unveil for the outside researchers how the planned changes in tasks and authority relationships instigated a social struggle. The IS designer was an outsider to the client organization and therefore considered only the information system, not the social system in which it was intended to operate. Other participants did not recognize this, therefore, seeing the designer as furthering managerial interests. Seeing task redesign in the social context of a client organization can help IS designers and researchers to understand what the users see naturally, that is, the ISD as a dynamic, enabling but socially constrained process where different viewpoints are represented.  相似文献   

5.
We report on an investigation into people’s behaviors on information search tasks, specifically the relation between eye movement patterns and task characteristics. We conducted two independent user studies (n = 32 and n = 40), one with journalism tasks and the other with genomics tasks. The tasks were constructed to represent information needs of these two different users groups and to vary in several dimensions according to a task classification scheme. For each participant we classified eye gaze data to construct models of their reading patterns. The reading models were analyzed with respect to the effect of task types and Web page types on reading eye movement patterns. We report on relationships between tasks and individual reading behaviors at the task and page level. Specifically we show that transitions between scanning and reading behavior in eye movement patterns and the amount of text processed may be an implicit indicator of the current task type facets. This may be useful in building user and task models that can be useful in personalization of information systems and so address design demands driven by increasingly complex user actions with information systems. One of the contributions of this research is a new methodology to model information search behavior and investigate information acquisition and cognitive processing in interactive information tasks.  相似文献   

6.
Automatic view selection through depth-based view stability analysis   总被引:3,自引:0,他引:3  
  相似文献   

7.
Xiao  Shaoning  Li  Yimeng  Ye  Yunan  Chen  Long  Pu  Shiliang  Zhao  Zhou  Shao  Jian  Xiao  Jun 《Neural Processing Letters》2020,52(2):993-1003

This work aims to address the problem of video question answering (VideoQA) with a novel model and a new open-ended VideoQA dataset. VideoQA is a challenging field in visual information retrieval, which aims to generate the answer according to the video content and question. Ultimately, VideoQA is a video understanding task. Efficiently combining the multi-grained representations is the key factor in understanding a video. The existing works mostly focus on overall frame-level visual understanding to tackle the problem, which neglects finer-grained and temporal information inside the video, or just combines the multi-grained representations simply by concatenation or addition. Thus, we propose the multi-granularity temporal attention network that enables to search for the specific frames in a video that are holistically and locally related to the answer. We first learn the mutual attention representations of multi-grained visual content and question. Then the mutually attended features are combined hierarchically using a double layer LSTM to generate the answer. Furthermore, we illustrate several different multi-grained fusion configurations to prove the advancement of this hierarchical architecture. The effectiveness of our model is demonstrated on the large-scale video question answering dataset based on ActivityNet dataset.

  相似文献   

8.
There are many interaction tasks a user may wish to accomplish in an immersive virtual environment. A careful examination of these tasks reveals that they are often performed under different contexts. For each task and context, specialized interaction techniques can be developed. We present the context-driven interaction model: a design pattern that represents contextual information as a first-class, quantifiable component within a user interface and supports the development of context-sensitive applications by decoupling context recognition, context representation, and interaction technique development. As a primary contribution, this model provides an enumeration of important representations of contextual information gathered from across the literature and describes how these representations can effect the selection of an appropriate interaction technique. We also identify how several popular 3D interaction techniques adhere to this design pattern and describe how the pattern itself can lead to a more focused development of effective interfaces. We have constructed a formalized programming toolkit and runtime system that serves as a reference implementation of the context-driven model and a discussion is provided explaining how the toolkit can be used to implement a collection of representative 3D interaction interfaces.  相似文献   

9.
Several augmented reality systems have been proposed for different target fields such as medical, cultural heritage and military. However, most of the current AR authoring tools are actually programming interfaces that are exclusively suitable for programmers. In this paper, we propose an AR authoring tool which provides advanced visual effect, such as occlusion or media contents. This tool allows non-programming users to develop low-cost AR applications, specially oriented to on-site assembly and maintenance/repair tasks. A new 3D edition interface is proposed, using photos and Kinect depth information to improve 3D scenes composition. In order to validate our AR authoring tool, two evaluations have been performed, to test the authoring process and the task execution using AR. The evaluation results show that overlaying 3D instructions on the actual work pieces reduces the error rate for an assembly task by more than a 75%, particularly diminishing cumulative errors common in sequential procedures. Also, the results show how the new edition interface proposed, improves the 3D authoring process making possible create more accurate AR scenarios and 70% faster.  相似文献   

10.
In this paper, we explore how visual representations of information in 3D virtual environments (3DVEs) supports both individual and shared understanding, and consequently contribute to group decision making in tasks with a strong visual component. We integrate insights from cognitive fit theory and cognitive load theory in order to formulate hypotheses about how 3DVEs can contribute to individual understanding, shared understanding, and group decision making. We discuss the results of an experiment in which 192 participants, in 3-person teams, were asked to select an apartment. As proposed by cognitive fit theory, our results indicate that 3DVEs are indeed more effective in supporting individual understanding than 2D information presentations. Next, in line with cognitive load theory, the static presentation of 3D information turns out to be more effective in supporting shared understanding and group decision making than an immersive 3DVE. Our results suggest that although the 3DVE capabilities of realism, immersion and interactivity contribute to individual understanding, these capabilities combined with the interaction and negotiation processes required for reaching a shared understanding (and group decision), increases cognitive load and makes group processes inefficient. The implications of this paper for research and practice are discussed.  相似文献   

11.
We propose a novel visualization technique for graphs that are attributed with scalar data. In many scenarios, these attributes (e.g., birth date in a family network) provide ambient context information for the graph structure, whose consideration is important for different visual graph analysis tasks. Graph attributes are usually conveyed using different visual representations (e.g., color, size, shape) or by reordering the graph structure according to the attribute domain (e.g., timelines). While visual encodings allow graphs to be arranged in a readable layout, assessing contextual information such as the relative similarities of attributes across the graph is often cumbersome. In contrast, attribute-based graph reordering serves the comparison task of attributes, but typically strongly impairs the readability of the structural information given by the graph's topology. In this work, we augment force-directed node-link diagrams with a continuous ambient representation of the attribute context. This way, we provide a consistent overview of the graph's topological structure as well as its attributes, supporting a wide range of graph-related analysis tasks. We resort to an intuitive height field metaphor, illustrated by a topographic map rendering using contour lines and suitable color maps. Contour lines visually connect nodes of similar attribute values, and depict their relative arrangement within the global context. Moreover, our contextual representation supports visualizing attribute value ranges associated with graph nodes (e.g., lifespans in a family network) as trajectories routed through this height field. We discuss how user interaction with both the structural and the contextual information fosters exploratory graph analysis tasks. The effectiveness and versatility of our technique is confirmed in a user study and case studies from various application domains.  相似文献   

12.
《Ergonomics》2012,55(6):1184-1198
This study investigated the use of visual mediators to facilitate information access by low spatial individuals. Based on theories of adaptive learning and field-dependence, two human-computer interfaces were developed which were intended to compensate for the inability of low spatial individuals to readily construct visual mental models of a menu system's structure. The two compensatory interfaces included: a 2D visual hierarchy and a linear structure. The information search performance of high and low spatial individuals was compared on the two compensatory interfaces and a third challenge match interface, which challenged individuals to construct a mental model of a hierarchical menu system in order to perform efficiently. The visual mediators were successful in accommodating low spatial individuals, as indicated by the lack of any significant performance differences being detected between the high and low spatial groups on the two compensatory interfaces. High spatial individuals outperformed low spatial individuals only when information search tasks required the use of spatial ability in mentally constructing a model of the organization and structure of embedded task information. The key factor in the accommodation process was the elimination of the need to mentally visualize the structure of embedded task information. These results indicate that visualization techniques can be successfully used to enhance the information search performance of low spatial individuals.  相似文献   

13.
In this study, we explored how stereoscopic depth affects performance and user experience in a mobile device with an autostereoscopic touch display. Participants conducted a visual search task with an image gallery application on three layouts with different depth ranges. The task completion times were recorded, and the participants were asked to rate their experiences. The results revealed that the image search times were facilitated by a mild depth effect and that too great a depth slowed search times and decreased user-experience ratings.  相似文献   

14.
In joint tasks, adjusting to the actions of others is critical for success. For joint visual search tasks, research has shown that when search partners visually receive information about each other’s gaze, they use this information to adjust to each other’s actions, resulting in faster search performance. The present study used a visual, a tactile and an auditory display, respectively, to provide search partners with information about each other’s gaze. Results showed that search partners performed faster when the gaze information was received via a tactile or auditory display in comparison to receiving it via a visual display or receiving no gaze information. Findings demonstrate the effectiveness of tactile and auditory displays for receiving task-relevant information in joint tasks and are applicable to circumstances in which little or no visual information is available or the visual modality is already taxed with a demanding task such as air-traffic control. Practitioner Summary: The present study demonstrates that tactile and auditory displays are effective for receiving information about actions of others in joint tasks. Findings are either applicable to circumstances in which little or no visual information is available or when the visual modality is already taxed with a demanding task.  相似文献   

15.
This study investigates the effects of performance and communication within audio-visual (shared representations) and audio-only conditions. Two three-dimensional (3D) representations were presented in each communication condition. The goal of the study was to examine both explicit and implicit references made during verbal interactions, and to gather subjective usability evaluations of each representation. Sixty dyads performed a series of problem solving tasks in three experimental conditions: mixed, 3D cylinder and 3D helix representations. Assessment measures included overall performance time and accuracy, and user attitudes pertaining to the usability of the displays. Although no differences in task performance were observed, qualitative measures revealed differences between representation and communication groups. User preferences for 3D cylinder and 3D helix representations were observed, with disparate strategies being adopted between groups. In general, the analyses indicated that the presence of shared visual information enhances collaborative problem solving.  相似文献   

16.
Foveated video quality assessment   总被引:2,自引:0,他引:2  
Most image and video compression algorithms that have been proposed to improve picture quality relative to compression efficiency have either been designed based on objective criteria such as signal-to-noise-ratio (SNR) or have been evaluated, post-design, against competing methods using an objective sample measure. However, existing quantitative design criteria and numerical measurements of image and video quality both fail to adequately capture those attributes deemed important by the human visual system, except, perhaps, at very low error rates. We present a framework for assessing the quality of and determining the efficiency of foveated and compressed images and video streams. Image foveation is a process of nonuniform sampling that accords with the acquisition of visual information at the human retina. Foveated image/video compression algorithms seek to exploit this reduction of sensed information by nonuniformly reducing the resolution of the visual data. We develop unique algorithms for assessing the quality of foveated image/video data using a model of human visual response. We demonstrate these concepts on foveated, compressed video streams using modified (foveated) versions of H.263 that are standard-compliant. We rind that quality vs. compression is enhanced considerably by the foveation approach  相似文献   

17.
We report an investigation into the processes involved in a common graph-reading task using two types of Cartesian graph. We describe an experiment and eye movement study, the results of which show that optimal scan paths assumed in the task analysis approximate the detailed sequences of saccades made by individuals. The research demonstrates the computational inequivalence of two sets of informationally equivalent graphs and illustrates how the computational advantages of a representation outweigh factors such as user unfamiliarity. We describe two models, using the ACT rational perceptual motor (ACT-R/PM) cognitive architecture, that replicate the pattern of observed response latencies and the complex scan paths revealed by the eye movement study. Finally, we outline three guidelines for designers of visual displays: Designers should (a) consider how different quantities are encoded within any chosen representational format, (b) consider the full range of alternative varieties of a given task, and (c) balance the cost of familiarization with the computational advantages of less familiar representations. Actual or potential applications of this research include informing the design and selection of appropriate visual displays and illustrating the practice and utility of task analysis, eye tracking, and cognitive modeling for understanding interactive tasks with external representations.  相似文献   

18.
Collaborative virtual environments (CVEs) are 3D spaces in which users share virtual objects, communicate, and work together. To collaborate efficiently, users must develop a common representation of their shared virtual space. In this work, we investigated spatial communication in virtual environments. In order to perform an object co-manipulation task, the users must be able to communicate and exchange spatial information, such as object position, in a virtual environment. We conducted an experiment in which we manipulated the contents of the shared virtual space to understand how users verbally construct a common spatial representation of their environment. Forty-four students participated in the experiment to assess the influence of contextual objects on spatial communication and sharing of viewpoints. The participants were asked to perform in dyads an object co-manipulation task. The results show that the presence of a contextual object such as fixed and lateralized visual landmarks in the virtual environment positively influences the way male operators collaborate to perform this task. These results allow us to provide some design recommendations for CVEs for object manipulation tasks.  相似文献   

19.
Virtual 3D city models serve as integration platforms for complex geospatial and georeferenced information and as medium for effective communication of spatial information. In order to explore these information spaces, navigation techniques for controlling the virtual camera are required to facilitate wayfinding and movement. However, navigation is not a trivial task and many available navigation techniques do not support users effectively and efficiently with their respective skills and tasks. In this article, we present an assisting, constrained navigation technique for multiscale virtual 3D city models that is based on three basic principles: users point to navigate, users are lead by suggestions, and the exploitation of semantic, multiscale, hierarchical structurings of city models. The technique particularly supports users with low navigation and virtual camera control skills but is also valuable for experienced users. It supports exploration, search, inspection, and presentation tasks, is easy to learn and use, supports orientation, is efficient, and yields effective view properties. In particular, the technique is suitable for interactive kiosks and mobile devices with a touch display and low computing resources and for use in mobile situations where users only have restricted resources for operating the application. We demonstrate the validity of the proposed navigation technique by presenting an implementation and evaluation results. The implementation is based on service-oriented architectures, standards, and image-based representations and allows exploring massive virtual 3D city models particularly on mobile devices with limited computing resources. Results of a user study comparing the proposed navigation technique with standard techniques suggest that the proposed technique provides the targeted properties, and that it is more advantageous to novice than to expert users.  相似文献   

20.
Keil MS 《Neural computation》2006,18(4):871-903
Recent evidence suggests that the primate visual system generates representations for object surfaces (where we consider representations for the surface attribute brightness). Object recognition can be expected to perform robustly if those representations are invariant despite environmental changes (e.g., in illumination). In real-world scenes, it happens, however, that surfaces are often overlaid by luminance gradients, which we define as smooth variations in intensity. Luminance gradients encode highly variable information, which may represent surface properties (curvature), nonsurface properties (e.g., specular highlights, cast shadows, illumination inhomogeneities), or information about depth relationships (cast shadows, blur). We argue, on grounds of the unpredictable nature of luminance gradients, that the visual system should establish corresponding representations, in addition to surface representations. We accordingly present a neuronal architecture, the so-called gradient system, which clarifies how spatially accurate gradient representations can be obtained by relying on only high-resolution retinal responses. Although the gradient system was designed and optimized for segregating, and generating, representations of luminance gradients with real-world luminance images, it is capable of quantitatively predicting psychophysical data on both Mach bands and Chevreul's illusion. It furthermore accounts qualitatively for a modified Ehrenstein disk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号