共查询到20条相似文献,搜索用时 0 毫秒
1.
Alexander Schmitt Dmitry Zaykovskiy Wolfgang Minker 《International Journal of Speech Technology》2008,11(2):63-72
This article presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile
users. Three principal system architectures with respect to the employment of a wireless communication link are analyzed:
Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). An overview of the solutions having been standardized so far as well as a critical analysis of the latest developments
in the field of speech recognition in mobile environments is given. Open issues, pros and cons of the different methodologies
and techniques are highlighted. Special emphasis is placed on the constraints and limitations ASR applications are confronted
with under different architectures. 相似文献
2.
《Pattern recognition》2002,35(9):1917-1931
The aim of this paper is to present a new generalized Hough transform-based hardware algorithm in order to detect non-analytic objects in a two-dimensional (2D) image space. Our main idea consists to use, during voting process into the 5D parameter space, only meaningful set of edge points that belong to the boundary of the target object and that feature a similar geometric property. In this paper, a same line support property has been used. This has the merit to reduce the size of the 5D parameter space, while increasing the detection accuracy. The whole algorithm was implemented into a highly parallel architecture supported by a single PC board. It is composed of a mixture of digital signal processing and field programmable gate array technologies and uses the content addressable memory as a main processing unit. Complexity evaluation of the whole system indicated that a set of 46 different images of 256×256 pixels each can be classified in real-time (e.g. under frame rate). 相似文献
3.
One aim of detection proposal methods is to reduce the computational overhead of object detection. However, most of the existing methods have significant computational overhead for real-time detection on mobile devices. A fast and accurate proposal method of human detection called personness estimation is proposed, which facilitates real-time human detection on mobile devices and can be effectively integrated into part-based detection, achieving high detection performance at a low computational cost. Our work is based on two observations: (i) normed gradients, which are designed for generic objectness estimation, effectively generate high-quality detection proposals for the person category; (ii) fusing the normed gradients with color attributes improves the performance of proposal generation for human detection. Thus, the candidate windows generated by the personness estimation will very likely contain human subjects. The human detection is then guided by the candidate windows, offering high detection performance even when the detection task terminates prior to completion. This interruptible detection scheme, called anytime detection, enables real-time human detection on mobile devices. Furthermore, we introduce a new evaluation methodology called time-recall curves to practically evaluate our approach. The applicability of our proposed method is demonstrated in extensive experiments on a publicly available dataset and a real mobile device, facilitating acquisition and enhancement of portrait photographs (e.g. selfie) on widespread mobile platforms. 相似文献
4.
Kwontaeg Choi Author Vitae Author Vitae Hyeran Byun Author Vitae 《Pattern recognition》2011,44(2):386-400
Due to the increases in processing power and storage capacity of mobile devices over the years, an incorporation of realtime face recognition to mobile devices is no longer unattainable. However, the possibility of the realtime learning of a large number of samples within mobile devices must be established. In this paper, we attempt to establish this possibility by presenting a realtime training algorithm in mobile devices for face recognition related applications. This is differentiated from those traditional algorithms which focused on realtime classification. In order to solve the challenging realtime issue in mobile devices, we extract local face features using some local random bases and then a sequential neural network is trained incrementally with these features. We demonstrate the effectiveness of the proposed algorithm and the feasibility of its application in mobile devices through empirical experiments. Our results show that the proposed algorithm significantly outperforms several popular face recognition methods with a dramatic reduction in computational speed. Moreover, only the proposed method shows the ability to train additional samples incrementally in realtime without memory failure and accuracy degradation using a recent mobile phone model. 相似文献
5.
Mobile device is an important interactive platform. Due to the limitation of computation, memory, display area and energy,
how to realize the efficient and real-time interaction of 3D models based on mobile devices is an important research topic.
Considering features of mobile devices, this paper adopts remote rendering mode and point models, and then, proposes a transmission
and rendering approach that could interact in real time. First, improved simplification algorithm based on MLS and display
resolution of mobile devices is proposed. Then, a hierarchy selection of point models and a QoS transmission control strategy
are given based on interest area of operator, interest degree of object in the virtual environment and rendering error. They
can save the energy consumption. Finally, the rendering and interaction of point models are completed on mobile devices. The
experiments show that our method is efficient.
Supported by the National Natural Science Foundation of China (Grant No. 60873159), the Program for New Century Excellent
Talents in University (Grant No. NCET-07-0039), the National High-Tech Research & Development Progrom of China (Grant No.
2006AA01Z333) 相似文献
6.
Severe limitations in computational, memory, and energy resources make implementing high-quality speech recognition in embedded devices a difficult challenge. In this article, the authors investigate the energy consumption of computation and communication in an embedded distributed speech recognition system and propose optimizations that reduce overall energy consumption while maintaining adequate quality of service for the end user. This article considers the application of DSR traffic to both Bluetooth and 802.11b networks. 相似文献
7.
Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions
restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and
accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support
client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands
based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse
at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that
time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the
next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate
their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared
to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via
online updating of model parameters based on individual user data. Personalization significantly increased relative reduction
in error rate by an additional 5%. 相似文献
8.
Xin?Yang Jiabin?Guo Tangli?Xue Kwang-Ting??Cheng 《Multimedia Tools and Applications》2018,77(6):6607-6628
This paper addresses robust and ultrafast pose tracking on mobile devices, such as smartphones and small drones. Existing methods, relying on either vision analysis or inertial sensing, are either too computational heavy to achieve real-time performance on a mobile platform, or not sufficiently robust to address unique challenges in mobile scenarios, including rapid camera motions, long exposure time of mobile cameras, etc. This paper presents a novel hybrid tracking system which utilizes on-device inertial sensors to greatly accelerate the visual feature tracking process and improve its robustness. In particular, our system adaptively resizes each video frame based on inertial sensor data and applies a highly efficient binary feature matching method to track the object pose in each resized frame with little accuracy degradation. This tracking result is revised periodically by a model-based feature tracking method (Hare et al. 2012) to reduce accumulated errors. Furthermore, an inertial tracking method and a solution of fusing its results with the feature tracking results are employed to further improve the robustness and efficiency. We first evaluate our hybrid system using a dataset consisting of 16 video clips with synchronized inertial sensing data and then assess its performance in a mobile augmented reality application. Experimental results demonstrated our method’s superior performance to a state-of-the-art feature tracking method (Hare et al. 2012), a direct tracking method (Engel et al. 2014) and the Vuforia SDK (Ibañez and Figueras 2013), and can run at more than 40 Hz on a standard smartphone. We will release the source code with the pubilication of this paper. 相似文献
9.
Predicting performance of object recognition 总被引:3,自引:0,他引:3
Boshra M. Bhanu B. 《IEEE transactions on pattern analysis and machine intelligence》2000,22(9):956-969
We present a method for predicting fundamental performance of object recognition. We assume that both scene data and model objects are represented by 2D point features and a data/model match is evaluated using a vote-based criterion. The proposed method considers data distortion factors such as uncertainty, occlusion, and clutter, in addition to model similarity. This is unlike previous approaches, which consider only a subset of these factors. Performance is predicted in two stages. In the first stage, the similarity between every pair of model objects is captured by comparing their structures as a function of the relative transformation between them. In the second stage, the similarity information is used along with statistical models of the data-distortion factors to determine an upper bound on the probability of recognition error. This bound is directly used to determine a lower bound on the probability of correct recognition. The validity of the method is experimentally demonstrated using real synthetic aperture radar (SAR) data 相似文献
10.
Ville Könönen Jani Mäntyjärvi Heidi Similä Juha Pärkkä Miikka Ermes 《Pervasive and Mobile Computing》2010,6(2):181-197
In mobile devices there exist several in-built sensor units and sources which provide data for context reasoning. More context sources can be attached via wireless network connections. Usually, the mobile devices and the context sources are battery powered and their computational and space resources are limited. This sets special requirements for the context recognition algorithms. In this paper, several classification and automatic feature selection algorithms are compared in the context recognition domain. The main goal of this study is to investigate how much advantage can be achieved by using sophisticated and complex classification methods compared with a simple method that can easily be implemented in mobile devices. The main result is that even a simple linear classification algorithm can achieve a reasonably good accuracy if the features calculated from raw data are selected in a suitable way. Usually context recognition algorithms are fitted to a particular problem instance in an off-line manner and modifying methods for on-line learning is difficult or impossible. An on-line version of the Minimum-distance classifier is presented in this paper and it is justified that it leads to considerably higher classification accuracies compared with the static off-line version of the algorithm. Moreover, we report superior performance for the Minimum-distance classifier compared to other classifiers from the view point of computational load and power consumption of a smart phone. 相似文献
11.
Multimedia Tools and Applications - Recognition of moving objects in video images is mainly based on acquiring the target information in a certain time series. After image processing, relevant... 相似文献
12.
Bonny Talal Rabie Tamer Baziyad Mohammed Balid Walid 《Multimedia Tools and Applications》2019,78(18):25781-25806
Multimedia Tools and Applications - Object recognition is a broad area that covers several topics including face recognition, gesture recognition, human gait recognition, traffic road signs... 相似文献
13.
Ying-Hao Yu Tsu-Tian Lee Pei-Yin Chen Ngaiming Kwok 《Journal of Real-Time Image Processing》2018,15(2):249-264
Describing image features in a concise and perceivable manner is essential to focus on candidate solutions for classification purpose. In addition to image recognition with geometric modeling and frequency domain transformation, this paper presents a novel 2D on-chip feature extraction named semantics-based vague image representation (SVIR) to reduce the semantic gap of content-based image retrieval. The development of SVIR aims at successively deconstructing object silhouette into intelligible features by pixel scans and then evolves and combines piecewise features into another pattern in a linguistic form. In addition to semantic annotations, SVIR is free of complicated calculations so that on-chip designs of SVIR can attain real-time processing performance without making use of a high-speed clock. The effectiveness of SVIR algorithm was demonstrated with timing sequences and real-life operations based on a field-programmable-gate-array (FPGA) development platform. With low hardware resource consumption on a single FPGA chip, the design of SVIR can be used on portable machine vision for ambient intelligence in the future. 相似文献
14.
15.
Rui Godinho Marielba Zacarias Fernando G. Lobo 《Behaviour & Information Technology》2015,34(2):135-150
This paper describes the development process of EasyWrite, a text-entry method for mobile devices that allows people with hand coordination problems to use small computer devices such as smartphones, tablet PCs, or other touchscreen machines. This text-entry method aims at improving typing accuracy and reducing frustration of people affected by this motor disability when using small devices. EasyWrite was developed following an iterative and user-centred process. Starting from requirements elicited from observing potential users with mild and moderate motor disabilities and information provided by a literature review, a low-fidelity prototype was built and evaluated. This early prototype was refined throughout several design and evaluation iterations. Its current state is a functional prototype that works on Android phones. The functional prototype usability was evaluated through user tests. The result of this process is a small virtual keyboard for mobile devices that has less and bigger keys as compared to other onscreen keyboards. The concept of EasyWrite is largely based on the notion of scanning group systems, but it allows users to navigate directly through groups and subgroups of characters by tapping on directional keys in order to find the desired character rather than waiting for a visual cursor to advance through the options, one at a time, at a specific time rate. Though at its current stage the method proposed by EasyWrite shows some limitations, it appears to be appropriate for users with moderate motor disabilities. For this group of people, user test results indicate that EasyWrite could be a more adequate text-entry method than the one provided by standard keyboards, both physical and onscreen, commonly found in mobile devices. 相似文献
16.
17.
J. Guerra-Casanova C. Sánchez-ávila G. Bailador A. de Santos Sierra 《International Journal of Information Security》2012,11(2):65-83
This article proposes an innovative biometric technique based on the idea of authenticating a person on a mobile device by
gesture recognition. To accomplish this aim, a user is prompted to be recognized by a gesture he/she performs moving his/her
hand while holding a mobile device with an accelerometer embedded. As users are not able to repeat a gesture exactly in the
air, an algorithm based on sequence alignment is developed to correct slight differences between repetitions of the same gesture.
The robustness of this biometric technique has been studied within 2 different tests analyzing a database of 100 users with
real falsifications. Equal Error Rates of 2.01 and 4.82% have been obtained in a zero-effort and an active impostor attack,
respectively. A permanence evaluation is also presented from the analysis of the repetition of the gestures of 25 users in
10 sessions over a month. Furthermore, two different gesture databases have been developed: one made up of 100 genuine identifying
3-D hand gestures and 3 impostors trying to falsify each of them and another with 25 volunteers repeating their identifying
3-D hand gesture in 10 sessions over a month. These databases are the most extensive in published studies, to the best of
our knowledge. 相似文献
18.
Previous work described a biologically motivated object recognition system with Gabor wavelets as basic feature type. These features are robust against slight distortion, rotation and variation in illumination. We here describe extensions of the system that address image variance due to arbitrary in-plane rotation, substantial scale changes and moderate depth rotation of objects, and to background variation, using simple linear transformation of the Gabor filter responses. The performance of the system is enhanced significantly. 相似文献
19.
Suk Kyu Lee Hyunsoon Kim Albert Yongjoon Chung Hwangnam Kim 《Multimedia Tools and Applications》2018,77(2):1811-1842
It is popular to watch a 3D video through a 3D display nowadays. However, it is still difficult to enjoy the 3D multimedia contents with a mobile device even if a mobile device with a 3D display is currently introduced into the market. The main technological challenges for watching 3D contents via the mobile devices can be identified as the following: generating and streaming 3D contents. Generating 3D contents requires extra computational resources. Moreover, streaming 3D contents demands additional network bandwidth for receiving and transmitting the 3D data. To overcome these technological challenges, we propose ReMA, a novel 3D video streaming system in this paper. We devised a novel architecture for transmitter, receiver, and a distribution system to efficiently disseminate and generate 3D videos for the mobile devices. We implemented ReMA in a real test-bed and conducted a thorough empirical evaluation study to see the feasibility of streaming 3D contents for the mobile devices. Based on our empirical study, the resulting system presents a great promise in streaming 3D video in real-time to the mobile devices. 相似文献
20.
Mobile robotics has achieved notable progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an Office or a Kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as Doors or furniture, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent semantic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alternative computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and structural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mobile robot navigating in Office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques for scene recognition. 相似文献