首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the major barriers for the social inclusion of blind persons is the limited access to graphics-based learning resources which are highly vision oriented. This paper presents a cost-effective tool which facilitates comprehension and creation of virtual directed graphs, such as flowcharts, using alternate modalities of audio and touch. It provides a physically accessible virtual spatial workspace and multimodal interface to non-visually represent directed graphs in interactive manner. The concept of spatial query is used to aid exploration and mental visualization through audio and tactile feedback. A unique aspect of the tool, named DiGVis, offers compatible representations of directed graphs for the sighted and non-sighted persons. A study with 28 visually challenged subjects indicates that the tool facilitates comprehension of layout and directional connectivity of elements in a virtual diagram. Further, in a pilot study, blind persons could independently comprehend a virtual flowchart layout and its logical steps. They were also able to create the flowchart data without sighted assistance using DiGVis. A comparison with sighted subjects using DiGVis for similar task demonstrates the effectiveness of the technique for inclusive education.  相似文献   

2.
This paper introduces a novel interface designed to help blind and visually impaired people to explore and navigate on the Web. In contrast to traditionally used assistive tools, such as screen readers and magnifiers, the new interface employs a combination of both audio and haptic features to provide spatial and navigational information to users. The haptic features are presented via a low-cost force feedback mouse allowing blind people to interact with the Web, in a similar fashion to their sighted counterparts. The audio provides navigational and textual information through the use of non-speech sounds and synthesised speech. Interacting with the multimodal interface offers a novel experience to target users, especially to those with total blindness. A series of experiments have been conducted to ascertain the usability of the interface and compare its performance to that of a traditional screen reader. Results have shown the advantages that the new multimodal interface offers blind and visually impaired people. This includes the enhanced perception of the spatial layout of Web pages, and navigation towards elements on a page. Certain issues regarding the design of the haptic and audio features raised in the evaluation are discussed and presented in terms of recommendations for future work.  相似文献   

3.
Automatic detection of (semantically) meaningful audio segments, or audio scenes, is an important step in high-level semantic inference from general audio signals, and can benefit various content-based applications involving both audio and multimodal (multimedia) data sets. Motivated by the known limitations of traditional low-level feature-based approaches, we propose in this paper a novel approach to discover audio scenes, based on an analysis of audio elements and key audio elements, which can be seen as equivalents to the words and keywords in a text document, respectively. In the proposed approach, an audio track is seen as a sequence of audio elements, and the presence of an audio scene boundary at a given time stamp is checked based on pair-wise measuring the semantic affinity between different parts of the analyzed audio stream surrounding that time stamp. Our proposed model for semantic affinity exploits the proven concepts from text document analysis, and is introduced here as a function of the distance between the audio parts considered, and the co-occurrence statistics and the importance weights of the audio elements contained therein. Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem.   相似文献   

4.
Neuro-cognitively inspired haptic user interfaces   总被引:1,自引:1,他引:0  
Haptic systems and devices are a recent addition to multimodal systems. These devices have widespread applications such as surgical simulations, medical and procedural training, scientific visualizations, assistive and rehabilitative devices for individuals who have physical or neurological impediments and assistive devices for individuals who are blind. While the potential of haptics in natural human machine interaction is undisputable, the realization of such means is still a long way ahead. There are considerable research challenges to development of natural haptic interfaces. The study of human tactile abilities is a recent endeavor and many of the available systems still do not incorporate the domain knowledge of psychophysics, biomechanics and neurological elements of haptic perception. Development of smart and effective haptic interfaces and devices requires extensive studies that link perceptual phenomena with measurable parameters and incorporation of such domain knowledge in the engineering of haptic interfaces. This paper presents design, development and usability testing of a neuro-cognitively inspired haptic user interface for individuals who are blind. The proposed system design is inspired by neuro-cognitive basis of haptic perception and incorporates the computational aspects and requirements of multimodal information processing system. Usability testing of the system suggests that a biologically inspired haptic user interfaces may form a powerful paradigm for haptic user interface design.
Sethuraman PanchanathanEmail:
  相似文献   

5.
We investigate data parallel techniques for belief propagation in acyclic factor graphs on multi-core systems. Belief propagation is a key inference algorithm in factor graph, a probabilistic graphical model that has found applications in many domains. In this paper, we explore data parallelism for basic operations over the potential tables in belief propagation. Data parallel techniques for these table operations are developed for shared memory platforms. We then propose a complete belief propagation algorithm using these table operations to perform exact inference in factor graphs. The proposed algorithms are implemented on state-of-the-art multi-socket multi-core systems with additional NUMA-aware optimizations. Our proposed algorithms exhibit good scalability using a representative set of factor graphs. On a four-socket Intel Westmere-EX system with 40 cores, we achieve 39.5 $\times $ speedup for the table operations and 39 $\times $ speedup for the complete algorithm using factor graphs with large potential tables.  相似文献   

6.
城市道路视频描述存在仅考虑视觉信息而忽视了同样重要的音频信息的问题,多模态融合算法是解决此问题的方案之一。针对现有基于Transformer的多模态融合算法都存在着模态之间融合性能低、计算复杂度高的问题,为了提高多模态信息之间的交互性,提出了一种新的基于Transformer的视频描述模型多模态注意力瓶颈视频描述(multimodal attention bottleneck for video captioning,MABVC)。首先使用预训练好的I3D和VGGish网络提取视频的视觉和音频特征并将提取好的特征输入到Transformer模型当中,然后解码器部分分别训练两个模态的信息再进行多模态的融合,最后将解码器输出的结果经过处理生成人们可以理解的文本描述。在通用数据集MSR-VTT、MSVD和自建数据集BUUISE上进行对比实验,通过评价指标对模型进行验证。实验结果表明,基于多模态注意力融合的视频描述模型在各个指标上都有明显提升。该模型在交通场景数据集上依旧能够取得良好的效果,在智能驾驶行业具有很大的应用前景。  相似文献   

7.
8.
In this paper, a new perceptual spread spectrum audio watermarking scheme is discussed. The watermark embedding process is performed in the Empirical Mode Decomposition (EMD) domain, and the hybrid watermark extraction process is based on the combination of EMD and ISA (Independent Subspace Analysis) techniques, followed by the generic detection system, i.e. inverse perceptual filter, predictor filter and correlation based detector. Since the EMD decomposes the audio signal into several oscillating components–the intrinsic mode functions (IMF)–the watermark information can be inserted in more than one IMF, using spread spectrum modulation, allowing hence the increase of the insertion capacity. The imperceptibility of the inserted data is ensured by the use of a psychoacoustical model. The blind extraction of the watermark signal, from the received watermarked audio, consists in the separation of the watermark from the IMFs of the received audio signal. The separation is achieved by a new proposed under-determined ISA method, here referred to as UISA. The proposed hybrid watermarking system was applied to the SQAM (Sound Quality Assessment Material) audio database (Available at http://sound.media.mit.edu/mpeg4/audio/sqam/) and proved to have efficient detection performances in terms of Bit Error Rate (BER) compared to a generic perceptual spread spectrum watermarking system. The perceptual quality of the watermarked audio was objectively assessed using the PEMO-Q (Tool for objective perceptual assessment of audio quality) algorithm. Also, using our technique, we can extract the different watermarks without using any information of original signal or the inserted watermark. Experimental results exhibit that the transparency and high robustness of the watermarked audio can be achieved simultaneously with a substantial increase of the amount of information transmitted. A reliability of 1.8 10???4 (against 1.5 10???2 for the generic system), for a bit rate of 400 bits/s, can be achieved when the channel is not disturbed.  相似文献   

9.
Mobile E-Witness   总被引:1,自引:0,他引:1  
This paper describes the design, implementation and experimental evaluation of a system prototype, named Mobile E-Witness (MEW), which enables the acquisition and remote storage of multimedia (i.e., audio and video) data streams. In essence, MEW consists of a mobile device, incorporating a camera and a microphone, which can be “worn” (i.e., it can be carried without causing any impediment) by public officers, such as policemen and health care operators, in order to record the events these officers witness while on duty. MEW transmits the audio and video data recordings it takes to a remote storage service which maintains these recordings for future replay. Thus, for example, an event recording can be used as an impartial testimony to resolve disputes concerning the relative responsibilities of those participating to the recorded event, including the officers themselves (hence the name Mobile E-Witness). The infrastructure MEW uses for communications with the remote storage service consists of the wired and wireless communication infrastructures publicly available in metropolitan areas, including the Internet. MEW utilizes these infrastructures in order to (1) ensure that sufficient bandwidth for multimedia data transmission is available, (2) guarantee highly available communications, (3) limit the power consumption for the multimedia transmission and, finally, (4) limit the electromagnetic radiation emanation of the device worn by the public officers. We have carried out an experimental evaluation of a MEW prototype in the city of Bologna. The results of this evaluation, reported in this paper, confirm the potential of our system.
F. PanzieriEmail:
  相似文献   

10.
We present an approach to connect multiple remote environments over web for natural interaction among people and objects. Focus of current communication and telepresence systems severely restrict user affordances in terms of movement, interaction, peripheral vision, spatio-semantic integrity and even information flow. These systems allow information transfer rather than experiential interaction. We propose Environment-to-Environment (E2E) as a new paradigm for communication which allows users to interact in natural manner using text, audio, and video by connecting environments. Each Environment is instrumented using as many different types of sensors as may be required to detect presence and activity of objects. This object position and activity information is used by a scalable event-based multimodal information system called EventServer to share the appropriate experiential information with other environments as well as to present incoming multimedia information on right displays and speakers. This paper describes the design principles for E2E communication, discusses system architecture, and gives our experience in implementing prototypes of such systems in telemedicine and office collaboration applications. We also discuss the research challenges and a road-map for creating more sophisticated E2E applications in near future.
Vivek K. SinghEmail:
  相似文献   

11.
问答系统是人工智能和自然语言处理领域中具有广泛发展前景的研究方向之一.早期的问答系统限定以自然语言形式进行提问和回答,近年来,随着多模态知识图谱、多模态预训练模型的发展,支持文字、图片、音频、视频等多种模态间信息查询的广义问答系统逐渐成为新的研究热点,其以多媒体方式展示结果,更加直观、全面.本文根据问答系统任务对象的变化,将问答系统划分为3种类型:专用问答系统、通用问答系统和多模态问答系统.分析了这3种类型的问答系统发展过程中所面临的问题,着重总结每个阶段所采用的关键技术与方法,同时对问答系统在工业上的应用进行了举例说明,并对未来研究方向进行了展望.  相似文献   

12.
This paper proposes an on-line audio watermarking system for broadcast monitoring. The designed watermark (WM) encoder is a nonlinear data adaptive system that performs perceptual embedding. It allows working at very low watermark-to-signal ratio (WSR) levels thus preserves the inaudibility. The developed decoder adopts wavelet de-noising for blind watermark extraction and it is capable of watermark decoding while establishing the synchronization between the transmitter and the receiver. Unlike the published watermarking schemes, it is shown that the introduced WM embedding scheme minimizes false alarm ratio by adaptively controlling the WSR. Furthermore it integrates synchronization and WM extraction into one processing step resulting in an on-line decoding scheme suitable to audio broadcast monitoring. The proposed system is robust to Digital-to-Analog (D/A), Analog-to-Digital (A/D) conversions, compression and noise as well as the attenuations that arise from FM broadcasting and acoustic transmission. Performance under Stirmak attacks is also reported. It is shown that error free transmission of 24 bps is achieved at around WSR = −32 dB when the PSNR is around 50 dB. Granularity of the system is 0.6 s thus it is capable of tracking short audio clips i.e. commercials.
Bilge GunselEmail:
  相似文献   

13.
The advent of electronic documents and the consequent creation of digital libraries—vast repositories of electronic information—has a profound impact on how we produce, organize, store, retrieve and consume information. All of these activities have been dictated to the present by the technologies used to share information. A change in the underlying technology, namely, the move from paper to electronic documents, offers a unique opportunity to revolutionize how information is archived and disseminated. This paper will focus on a specific aspect of the opportunities opened up by electronic publishing on the NII—the ability to present information in multiple modalities and thereby free it from any single presentation medium.Traditional printed communication relies on a passive intermediary, paper, for the exchange of information between the author and reader. Ideas put down on paper come back to life only when perused by the reader.Electronic publishing is mediated by a computer, an agent capable of processing the information. As a consequence, the ideas expressed by an author need no longer be bound to any single display form; nor does it require human intervention to translate the information from one displayed form to another. Electronic information can be processed and displayed in a manner best suited to each individual's needs. Thus, the advent of electronic documents makes information available in more than its visual form—electronic information can now be display-independent.Traditionally, an electronic document has been viewed simply as digitally representing (or the means towards producing) the printed page. Instead, we view the electronic document as the basic entity that represents information; we allow the information to be rendered in different ways—on paper, spoken, processed in different ways by a computer, etc. This change of viewpoint has allowed us to develop ASTER (Audio System For Technical Readings) a computing system that audio formats electronic documents to produce audio documents. ASTER can speak both literary texts and highly technical documents that contain complex mathematics. Moreover, the listener can ask to have parts of a document repeated in different ways: a document has many different spoken views.The adequacy of the audio rendering depends on how well the electronic document captures the essential internal structure of the information. In this paper, we discuss capturing structure and give guidelines for authors to follow to ensure that their documents exhibit structure adequately.In the context of the NII, the digital libraries of the future can be viewed as large information servers that allow multiple clients to access and display information in a format chosen by the user. By obviating the need to move physical media, e.g., printed paper or recorded tapes, the NII enables the ready dissemination of multimodal renderings of information.  相似文献   

14.
The paper proposed approaches to minimized embedding of the Hamiltonian graphs in the enveloping fault-tolerant graph representing the structural model of a fault-tolerant multiprocessor computer system. Failures are regarded as faults of vertices and/or connections between the graph vertices. Mathematical studies rely on the group-theoretical analysis of the characteristics of system structure. It underlies the proposed unique approach to designing the one-fault-tolerant and k-fault-tolerant structures retaining after reconfiguration the logical structure of the original target graph and, therefore, the compiled code of system tasks. The minimum fault-tolerant solutions were obtained for one-fault-tolerant and k-fault-tolerant cycles, simple and diagonal grids, and other popular structures, including arbitrary Hamiltonian graphs for which solutions are of minimized nature. Consideration was given to the algorithms of reconfiguration after arbitrary single and multiple faults. Restoration after faults is very simple; it is based on small tables of the group of system automorphisms which enable correct restoration of the system at the level of theorems without either static or dynamic additional verification of the reconfiguration process.  相似文献   

15.
The Chinese information processing system(CIPS)introduced in this paper can producegraphs,tables,flowcharts,mathematical equations,forms and also provides typesettingfacilities.The system can process not only Chinese text but also English text or a mixture ofthem.It is written in C language and runs on VAX Ⅱ/780 under Unix operating system.TheCIPS system is very easy to use and provides user-defined macro which allows abbreviationsof commonly used Chinese phrases and reduce the complexity of Chinese characters coding.  相似文献   

16.
Large displays have become ubiquitous in our everyday lives, but these displays are designed for sighted people.This paper addresses the need for visually impaired people to access targets on large wall-mounted displays. We developed an assistive interface which exploits mid-air gesture input and haptic feedback, and examined its potential for pointing and steering tasks in human computer interaction(HCI). In two experiments, blind and blindfolded users performed target acquisition tasks using mid-air gestures and two different kinds of feedback(i.e., haptic feedback and audio feedback). Our results show that participants perform faster in Fitts' law pointing tasks using the haptic feedback interface rather than the audio feedback interface. Furthermore, a regression analysis between movement time(MT) and the index of difficulty(ID)demonstrates that the Fitts' law model and the steering law model are both effective for the evaluation of assistive interfaces for the blind. Our work and findings will serve as an initial step to assist visually impaired people to easily access required information on large public displays using haptic interfaces.  相似文献   

17.
Speaker diarization aims to automatically answer the question “who spoke when” given a speech signal. In this work, we have focused on applying the FLsD approach, a semi-supervised version of Fisher Linear Discriminant analysis, both in the audio and the video signals to form a complete multimodal speaker diarization system. Extensive experiments have proven that the FLsD method boosts the performance of the face diarization task (i.e. the task of discovering faces over time given only the visual signal). In addition, we have proven through experimentation that applying the FLsD method for discriminating between faces is also independent of the initial feature space and remains relatively unaffected as the number of faces increases. Finally, a fusion method is proposed that leads to performance improvement in comparison to the best individual modality, which is the audio signal.  相似文献   

18.
Computer-based logic proofs are a form of unnatural language in which the process and structure of proof generation can be observed in considerable detail. We have been studying how students respond to multimodal logic teaching, and performance measures have already indicated that students' pre-existing cognitive styles have a significant impact on teaching outcome. Furthermore, a large corpus of proofs has been gathered via automatic logging of proof development. This paper applies a series of techniques, including corpus statistical methods, to the proof logs. The results indicate that students' cognitive styles influence the structure of their logical discourse, via their differing methods of handling abstract information in diagrams, and transferring information between modalities.  相似文献   

19.
One of the biggest challenges in data embedding is that the confidential data need to be in the ‘transparency’ after being embedded into the audio signal. Therefore, embedding methods must reduce the influence of embedded data onto the original audio signal. In this paper, the multiple bit marking layers (MBML) method has been proposed to fulfill this requirement. This method reuses the results from the previous embedding time (layer) as the input data to continue embedding it into audio signals (i.e. the next layer). The quality of the proposed method is evaluated through embedding error (EE), signal-to-noise ratio (SNR), embedded capacity (EC) and contribution error (CE). Experimental results have shown that the proposed method provides better quality of EE, and SNR than any other proposed embedding methods such as: LSB (Least Significant Bit), ELS (Embedding Large Sample.), BM (Bit Marking), and the BM/SW (Sliding Window) method with a single layer.  相似文献   

20.
Current distributed and multi-database systems are designed to allow timely and reliable access to large amounts of data distributed at different locations. Changes in current technology now allow users to access this data via a wide variety of devices through a diverse communication medium. A mobile data access system is an environment in which a wireless-mobile computing environment is superimposed upon a multi-database environment in order to realize anywhere, anytime access capability. As a potentially large number of users may siultaneously access the available data, there are several issues involved in the ability to concurrently manage transactions. Current multi-database concurrency control schemes do not efficiently manage these accesses because they do not address the limited bandwidth and frequent disconnections associated with wirelessnetworks.This paper first introduces the so-called mobile data access system (MDAS) and then proposes a new hierarchical concurrency control algorithm. The proposed concurrency control algorithm, v-lock, uses global locking tables created with semantic information contained within the hierarchy. The locking tables are subsequently used to serialize global transactions, and detect and remove global deadlocks. The performance of the new algorithm is simulated and the results are presented. In addition (through simulation) the performance of the proposed algorithm has been compared and contrasted against the site graph method, the potential conflict graph method, and the forced conflict method  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号