首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
《Ergonomics》2012,55(11):1943-1957
Abstract

Errors, whether created by the user, the recognizer, or inadequate systems design, are an important consideration in the more widespread and successful use of automatic speech recognition (ASR). An experiment is described in which recognition errors are studied under different types of feedback. Subjects entered data verbally to a microcomputer according to four experimental conditions: namely, orthogonal combinations of spoken and visual feedback presented concurrently or terminally after six items. Although no significant differences in terms of error rates or speed of data entry were shown across the conditions, analysis of the time penalty for error correction indicated that as a general rule, there is a small timing advantage for terminal feedback, when the error rate is low. It was found that subjects do not monitor visual feedback with the same degree of accuracy as spoken, as a larger number of incorrect data entry strings was being confirmed as correct. Further evidence for the use of ‘second best’ recognition data is given, since correct recognition on re-entry could be increased from 83·0% to 92·4% when the first choice recognition was deleted from the second attempt. Finally, the implications for error correction protocols in system design are discussed.  相似文献   

3.
Editorial     
《Ergonomics》2012,55(7):829-832
The UK health service, which had been diagnosed to be seriously out of step with good design practice, has been recommended to obtain knowledge of design and risk management practice from other safety-critical industries. While these other industries have benefited from a broad range of systems modelling approaches, healthcare remains a long way behind. In order to investigate the healthcare-specific applicability of systems modelling approaches, this study identified 10 distinct methods through meta-model analysis. Healthcare workers' perception on ‘ease of use’ and ‘usefulness’ was then evaluated.

The characterisation of the systems modelling methods showed that each method had particular capabilities to describe specific aspects of a complex system. However, the healthcare workers found that some of the methods, although potentially very useful, would be difficult to understand, particularly without prior experience. This study provides valuable insights into a better use of the systems modelling methods in healthcare.

Statement of Relevance: The findings in this study provide insights into how to make a better use of various systems modelling approaches to the design and risk management of healthcare delivery systems, which have been a growing research interest among ergonomists and human factor professionals.  相似文献   

4.
《Ergonomics》2012,55(8):759-769
Two studies are described in this paper with the aim of assessing the degree to which the instructions given to a subject during an experiment designed to investigate human reaction to vibration, affect the vibration equal sensation contour which is produced.

In the first study, 100 subjects produced equal sensation contours by equating pairs of vibration stimuli. After each pair, subjects were required to record the basis on which they had made their judgements. The results demonstrated that subjects differ in the concepts which they use to equate vibration stimuli, although the majority equate in terms of the degree to which parts of the body are shaken.

In the second study, 48 subjects were required to produce equal sensation contours using the terms of either ‘comfort’ or ‘discomfort’ or ‘body shake’ or ‘sensation’. The overall contour shapes produced by the four instruction groups were not significantly different, although the frequency ranges of maximum vibration sensitivity were shown to be significantly different.

Implications of these findings are discussed.  相似文献   

5.
A consistent finding reported in online privacy research is that an overwhelming majority of people are ‘concerned’ about their privacy when they use the Internet. Therefore, it is important to understand the discourse of Internet users’ privacy concerns, and any actions they take to guard against these concerns. A Dynamic Interviewing Programme (DIP) was employed in order to survey users of an instant messaging ICQ (‘I seek you’) client using both closed and open question formats. Analysis of 530 respondents’ data illustrates the importance of establishing users’ privacy concerns and the reasoning behind these concerns. Results indicate that Internet users are concerned about a wider range of privacy issues than surveys have typically covered. The results do not provide final definitions for the areas of online privacy, but provide information that is useful to gain a better understanding of privacy concerns and actions.  相似文献   

6.
Abstract

There has been much research into the feasibility of speech in aircraft cockpits, but little in human supervisory control tasks. Speech displays can provide a number of benefits over conventional, visual displays, particularly as a means of providing alarm information. We discuss the term ‘alarm’, and suggest that different alarm situations will have different information requirements. Thus, a single type of alarm display may not be suitable for the complete range of situations encountered in the control room. We investigated the use of speech for different ‘alarm-initiated actions’: recording, urgency rating, location identification, and action specification. These tasks varied in terms of difficulty, and this affected performance. We also varied the quality of speech, comparing synthesized with human speech. While speech quality affected performance on the recording task, we found that task difficulty interacted with speech quality on the other tasks. This means that definable ‘trade-offs’ exist between the use of speech and the situation in which it is to be used.  相似文献   

7.
Speech recognition technology continues to improve, but users still experience significant difficulty using the software to create and edit documents. The reported composition speed using speech software is only between 8 and 15 words per minute [Proc CHI 99 (1999) 568; Universal Access Inform Soc 1 (2001) 4], much lower than people's normal speaking speed of 125–150 words per minute. What causes the huge gap between natural speaking and composing using speech recognition? Is it possible to narrow the gap and make speech recognition more promising to users? In this paper we discuss users' learning processes and the difficulties they experience as related to continuous dictation tasks using state of the art Automatic Speech Recognition (ASR) software. Detailed data was collected for the first time on various aspects of the three activities involved in document composition tasks: dictation, navigation, and correction. The results indicate that navigation and error correction accounted for big chunk of the dictation task during the early stages of interaction. As users gained more experience, they became more efficient at dictation, navigation and error correction. However, the major improvements in productivity were due to dictation quality and the usage of navigation commands. These results provide insights regarding the factors that cause the gap between user expectation with speech recognition software and the reality of use, and how those factors changed with experience. Specific advice is given to researchers as to the most critical issues that must be addressed.  相似文献   

8.
Visual speech information plays an important role in automatic speech recognition (ASR) especially when audio is corrupted or even inaccessible. Despite the success of audio-based ASR, the problem of visual speech decoding remains widely open. This paper provides a detailed review of recent advances in this research area. In comparison with the previous survey [97] which covers the whole ASR system that uses visual speech information, we focus on the important questions asked by researchers and summarize the recent studies that attempt to answer them. In particular, there are three questions related to the extraction of visual features, concerning speaker dependency, pose variation and temporal information, respectively. Another question is about audio-visual speech fusion, considering the dynamic changes of modality reliabilities encountered in practice. In addition, the state-of-the-art on facial landmark localization is briefly introduced in this paper. Those advanced techniques can be used to improve the region-of-interest detection, but have been largely ignored when building a visual-based ASR system. We also provide details of audio-visual speech databases. Finally, we discuss the remaining challenges and offer our insights into the future research on visual speech decoding.  相似文献   

9.
《Ergonomics》2012,55(8):1667-1673
Abstract

When ergonomists contribute to the design of products and services they aim to be user-centred. This paper explores two possible meanings of user-centredness; the ergonomist may use theories and findings about human behaviour to act for the user or may help the user to participate in design. Both approaches are well known in ergonomics and they can point in conflicting directions. This paper examines the rationale for the two approaches and presents the results of a survey, which found that the most successful strategy was to mix the two approaches. It offers a classification to support the identification of the appropriate approach to adopt in different situations. The paper proposes, for example, that ‘design by users’ is the appropriate strategy when significant value judgements have to be taken in a local or bespoke design setting. By contrast a ‘design for users’ approach is appropriate for the design of generic products. An additional approach ‘design for users with users’ is introduced for settings that require knowledge about human characteristics and that need users to make value judgements.  相似文献   

10.
《Ergonomics》2012,55(3):217-248
Abstract

An experiment was conducted in which drivers negotiated a test course containing thirteen low-speed curves with well-defined lateral boundaries (restricted-path turns), but were free to select their speed of travel. No evidence is found that the ‘ preforrcd yaw rato ’ beliuviour exhibited in free-path turns is relevant to restricted-path driving. The results indicate that the maximum lateral acceleration developed was the major determinant of speed selection on a given radius curve, the level adopted decreasing with increased curve radius. The deviations of the vehicle paths from the set-out curves are examined in detail. The effect of experimental instructions designed to elicit ‘ normal ’ and ‘ stressed ’ driving strategies is also investigated. The data obtained appear to provide the first comprehensive collection of detailed information on driver-vehicle behaviour over a range of curve geometries.  相似文献   

11.
ABSTRACT

Employing ICT platforms has the potential to improve efforts to assist displaced people, or to liberate them in being more able to help each other, or both. And while platform development has resulted in a patchwork of initiatives – an electronic version of ‘letting a thousand flowers bloom’ – there are patterns emerging as to which flowers grow and have ‘staying power’ as compared to ones that wilt and die. Using a partial application of grounded theory, we analyze 47 platforms, categorizing the services they provide, the functionalities they use, and the extent to which end users are involved in initial design and ongoing modification. We found that 23% offer one-way communication, 72%, provide two-way communication, 74% involve crowdsourcing and 43% use artificial intelligence. For future developers, we offer a preliminary list of what leads to a successful ICT initiative for refugees and migrants. Finally, we list ethical considerations for all stakeholders.  相似文献   

12.
Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%.  相似文献   

13.
Interactive agents such as pet robots or adaptive speech interface systems that require forming a mutual adaptation process with users should have two competences. One of these is recognizing reward information from users' expressed paralanguage information, and the other is informing the learning system about the users by means of that reward information. The purpose of this study was to clarify the specific contents of reward information and the actual mechanism of a learning system by observing how 2 persons could create a smooth speech communication, such as that between owners and their pets.

A communication experiment was conducted to observe how human participants create smooth communication through acquiring meaning from utterances in languages they did not understand. Then, based on experimental results, a meaning-acquisition model that considers the following 2 assumptions was constructed: (a) To achieve a mutual adaptive relationship with users, the model needs to induce users' adaptation and to exploit this induced adaptation to recognize the meanings of a user's speech sounds; and (b) to recognize users' utterances through trial-and-error interaction regardless of the language used, the model should focus on prosodic information in speech sounds, rather than on the phoneme information on which most past interface studies have focused.

The results confirmed that the proposed model could recognize the meanings of users' verbal commands by using participants' adaptations to the model for its meaning-acquisition process. However, this phenomenon was observed only when an experimenter gave the participants appropriate instructions equivalent to catchphrases that helped users learn how to use and interact intuitively with the model. Thus, this suggested the need for a subsequent study to discover how to induce the participants' adaptations or natural behaviors without giving these kinds of instructions.  相似文献   

14.
Abstract

An experiment was designed to determine whether speech input is a valuable alternative or addition to manual input. Subjects used both speech and mouse input for control purposes in a document-annotation system. Speech recognition was realized by a speaker-dependent speech-recognition board. In separate sessions, subjects used either a mouse or speech interface, and comparisons were made between the two media in performance speed, number of commands, and number of errors. In a third session, subjects were free to use either input medium, and measures included both objective (usage) and subjective (questionnaire) preferences for the two media. The main results were that: (1)9 out of 24 subjects used speech more than the mouse when they were free to use both; (2) 21% of the subjects preferred speech control, because it allowed other devices to be operated manually; and (3) 37% of the subjects preferred to control the system with both input devices available. Speech can be a valuable addition to other input media enabling users to adapt their choice of media to specific task situations.  相似文献   

15.
We present a method through which domestic service robots can comprehend natural language instructions. For each action type, a variety of natural language expressions can be used, for example, the instruction, ‘Go to the kitchen’ can also be expressed as ‘Move to the kitchen.’ We are of the view that natural language instructions are intuitive and, therefore, constitute one of the most user-friendly robot instruction methods. In this paper, we propose a method that enables robots to comprehend instructions spoken by a human user in his/her natural language. The proposed method combines action-type classification, which is based on a support vector machine, and slot extraction, which is based on conditional random fields, both of which are required in order for a robot to execute an action. Further, by considering the co-occurrence relationship between the action type and the slots along with the speech recognition score, the proposed method can avoid degradation of the robot’s comprehension accuracy in noisy environments, where inaccurate speech recognition can be problematic. We conducted experiments using a Japanese instruction data-set collected using a questionnaire-based survey. Experimental results show that the robot’s comprehension accuracy is higher in a noisy environment using our method than when using a baseline method with only a 1-best speech recognition result.  相似文献   

16.
In the last years several solutions were proposed to support people with visual impairments or blindness during road crossing. These solutions focus on computer vision techniques for recognizing pedestrian crosswalks and computing their relative position from the user. Instead, this contribution addresses a different problem; the design of an auditory interface that can effectively guide the user during road crossing. Two original auditory guiding modes based on data sonification are presented and compared with a guiding mode based on speech messages.Experimental evaluation shows that there is no guiding mode that is best suited for all test subjects. The average time to align and cross is not significantly different among the three guiding modes, and test subjects distribute their preferences for the best guiding mode almost uniformly among the three solutions. From the experiments it also emerges that higher effort is necessary for decoding the sonified instructions if compared to the speech instructions, and that test subjects require frequent ‘hints’ (in the form of speech messages). Despite this, more than 2/3 of test subjects prefer one of the two guiding modes based on sonification. There are two main reasons for this: firstly, with speech messages it is harder to hear the sound of the environment, and secondly sonified messages convey information about the “quantity” of the expected movement.  相似文献   

17.
18.
《Ergonomics》2012,55(7):962-981
Despite the success and widespread use of Automatic Teller Machines (ATMs), a significant proportion of bank customers can not or will not use them, or experience difficulties in their interactions. Speech technology has been suggested as a means by which non-users might be encouraged to use ATMs, while simultaneously improving usability for all. The potential advantages of speech interfaces include hands-free and eyes-free use for physically- and visuallyimpaired users, and improved ease and speed of use through increased ‘naturalness’ of the interaction. This study investigated user attitudes to the concepts of a speech-based ATM, via large-scale survey and a series of focus groups. Objective performance was also considered in user trials with a prototype speech-driven ATM. The idea of using speech for ATM transactions led to a number of concerns. Privacy (the concern over one's personal financial details being overheard) and security (the fear of potential attackers hearing the user withdraw cash) were the major reasons given. The user trials confirmed that possible solutions, such as the adoption of a hood over the ATM or the use of a telephone handset as the speech input/output device, were ineffective. Groups of impaired users, particularly visually-impaired subjects, were more positive about the concept of speech, citing various difficulties with current visual-manual interactions. Most non-users, however, would not be encouraged to use ATMs with the addition of speech. The paper discusses these and other issues relating to the likely success of using speech for ATM applications.  相似文献   

19.
ContextOpen source development allows a large number of people to reuse and contribute source code to the community. Social networking features open opportunities for information discovery, social collaborations, and improved recommendations of potential collaborators.ObjectiveOnline community and development platforms rely on social network features to increase awareness and attention among community members for improved collaborations. The objective of this work is to introduce an approach for recommending relevant users to follow. Follower networks provide means for informal information propagation. The efficiency and effectiveness of such information flows is impacted by the network structure. Here, we aim to understand the resilience of networks against random or strategic node removal.MethodSocial network features of online software development communities present a new opportunity to enhance online collaboration. Our approach is based on the automatic analysis of user behavior and network structure. The proposed ‘who to follow’ recommendation algorithm can be parametrized for specific contexts. Link-analysis techniques such as PageRank/HITS provide the basis for a novel ‘who to follow’ recommendation model.ResultsWe tested the approach using a GitHub-based dataset. Currently, users follow popular community members to get updates regarding their activities instead of maintaining personal relations. Thus, social network features require further improvements to increase reciprocity. The application of our ‘who to follow’ recommendation model using the GitHub dataset shows excellent results with respect to context-sensitive following recommendations. The sensitivity of GitHub’s follower network to random node removal is comparable with other social networks but more sensitive to follower authority based node removal.ConclusionLink-based algorithm can be used for context-sensitive ‘who to follow’ recommendations. GitHub is highly sensitive to authority based node removal. Information flow established through follower relations will be strongly impacted if many authorities are removed from the network. This underpins the importance of ‘central’ users and the validity of focusing the ‘who to follow’ recommendations on those users.  相似文献   

20.
The viewing of video increasingly occurs in a wide range of public and private environments via a range of static and mobile devices. The proliferation of content on demand and the diversity of the viewing situations means that delivery systems can play a key role in introducing audiences to contextually relevant content of interest whilst maximising the viewing experience for individual viewers. However, for video delivery systems to do this, they need to take into account the diversity of the situations where video is consumed, and the differing viewing experiences that users desire to create within them. This requires an ability to identify different contextual viewing situations as perceived by users. This paper presents the results from a detailed, multi-method, user-centred field study with 11 UK-based users of video-based content. Following a review of the literature (to identify viewing situations of interest on which to focus), data collection was conducted comprising observation, diaries, interviews and self-captured video. Insights were gained into whether and how users choose to engage with content in different public and private spaces. The results identified and validated a set of contextual cues that characterise distinctive viewing situations. Four archetypical viewing situations were identified: ‘quality time’, ‘opportunistic planning’, ‘sharing space but not content’ and ‘opportunistic self-indulgence’. These can be differentiated in terms of key contextual factors: solitary/shared experiences, public/private spaces and temporal characteristics. The presence of clear contextual cues provides the opportunity for video delivery systems to better tailor content and format to the viewing situation or additionally augment video services through social media in order to provide specific experiences sensitive to both temporal and physical contexts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号