首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optical Character Recognition (OCR) systems have been effectively developed for the recognition of printed characters of non-Indian languages. Efforts are on the way for the development of efficient OCR systems for Indian languages, especially for Kannada, a popular South Indian language. We present in this paper an OCR system developed for the recognition of basic characters (vowels and consonants) in printed Kannada text, which can handle different font sizes and font types. Hu’s invariant moments and Zernike moments that have been progressively used in pattern recognition are used in our system to extract the features of printed Kannada characters. Neural classifiers have been effectively used for the classification of characters based on moment features. An encouraging recognition rate of 96.8% has been obtained. The system methodology can be extended for the recognition of other south Indian languages, especially for Telugu.  相似文献   

2.
In this paper, Krawtchouk moment-based shape features at lower orders are proposed for Indian sign language (ISL) recognition system which gives local information about the shape from a specific region of interest. The shape recognition capability of Krawtchouk moment-based local features is verified on two databases: the standard Jochen Triesch’s database and 26 ISL alphabets which are collected from 72 different subjects, with variations in position, scale and rotation. Feature selection is performed to minimise redundancy. The effect of order and feature dimensionality for different classifiers is studied. Results show that Krawtchouk moment-based local features are found to exhibit user, scale, rotation and translation invariance. Moreover, they have shape identification capability.  相似文献   

3.
目的 为提高连续手语识别准确率,缓解听障人群与非听障人群的沟通障碍。方法 提出了基于全局注意力机制和LSTM的连续手语识别算法。通过帧间差分法对视频数据进行预处理,消除视频冗余帧,借助ResNet网络提取特征序列。通过注意力机制加权,获得全局手语状态特征,并利用LSTM进行时序分析,形成一种基于全局注意力机制和LSTM的连续手语识别算法,实现连续手语识别。结果 实验结果表明,该算法在中文连续手语数据集CSL上的平均识别率为90.08%,平均词错误率为41.2%,与5种算法相比,该方法在识别准确率与翻译性能上具有优势。结论 基于全局注意力机制和LSTM的连续手语识别算法实现了连续手语识别,并且具有较好的识别效果及翻译性能,对促进听障人群无障碍融入社会方面具有积极的意义。  相似文献   

4.
5.
Road traffic sign recognition is an important task in intelligent transportation system. Convolutional neural networks (CNNs) have achieved a breakthrough in computer vision tasks and made great success in traffic sign classification. In this paper, it presents a road traffic sign recognition algorithm based on a convolutional neural network. In natural scenes, traffic signs are disturbed by factors such as illumination, occlusion, missing and deformation, and the accuracy of recognition decreases, this paper proposes a model called Improved VGG (IVGG) inspired by VGG model. The IVGG model includes 9 layers, compared with the original VGG model, it is added max-pooling operation and dropout operation after multiple convolutional layers, to catch the main features and save the training time. The paper proposes the method which adds dropout and Batch Normalization (BN) operations after each fully-connected layer, to further accelerate the model convergence, and then it can get better classification effect. It uses the German Traffic Sign Recognition Benchmark (GTSRB) dataset in the experiment. The IVGG model enhances the recognition rate of traffic signs and robustness by using the data augmentation and transfer learning, and the spent time is also reduced greatly.  相似文献   

6.
7.
Sign language fills the communication gap for people with hearing and speaking ailments. It includes both visual modalities, manual gestures consisting of movements of hands, and non-manual gestures incorporating body movements including head, facial expressions, eyes, shoulder shrugging, etc. Previously both gestures have been detected; identifying separately may have better accuracy, but much communicational information is lost. A proper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others. Our novel proposed system contributes as Sign Language Action Transformer Network (SLATN), localizing hand, body, and facial gestures in video sequences. Here we are expending a Transformer-style structural design as a “base network” to extract features from a spatiotemporal domain. The model impulsively learns to track individual persons and their action context in multiple frames. Furthermore, a “head network” emphasizes hand movement and facial expression simultaneously, which is often crucial to understanding sign language, using its attention mechanism for creating tight bounding boxes around classified gestures. The model’s work is later compared with the traditional identification methods of activity recognition. It not only works faster but achieves better accuracy as well. The model achieves overall 82.66% testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second (G-FLOPS). Another contribution is a newly created dataset of Pakistan Sign Language for Manual and Non-Manual (PkSLMNM) gestures.  相似文献   

8.
Human biometric analysis has gotten much attention due to its widespread use in different research areas, such as security, surveillance, health, human identification, and classification. Human gait is one of the key human traits that can identify and classify humans based on their age, gender, and ethnicity. Different approaches have been proposed for the estimation of human age based on gait so far. However, challenges are there, for which an efficient, low-cost technique or algorithm is needed. In this paper, we propose a three-dimensional real-time gait-based age detection system using a machine learning approach. The proposed system consists of training and testing phases. The proposed training phase consists of gait features extraction using the Microsoft Kinect (MS Kinect) controller, dataset generation based on joints’ position, pre-processing of gait features, feature selection by calculating the Standard error and Standard deviation of the arithmetic mean and best model selection using R2 and adjusted R2 techniques. T-test and ANOVA techniques show that nine joints (right shoulder, right elbow, right hand, left knee, right knee, right ankle, left ankle, left, and right foot) are statistically significant at a 5% level of significance for age estimation. The proposed testing phase correctly predicts the age of a walking person using the results obtained from the training phase. The proposed approach is evaluated on the data that is experimentally recorded from the user in a real-time scenario. Fifty (50) volunteers of different ages participated in the experimental study. Using the limited features, the proposed method estimates the age with 98.0% accuracy on experimental images acquired in real-time via a classical general linear regression model.  相似文献   

9.
目的 交通标志识别作为智能驾驶、交通系统研究中的一项重要内容,具有较大的理论价值和应用前景.尤其是文本型交通标志,其含有丰富的高层语义信息,能够提供极其丰富的道路信息.因此通过设计并实现一套新的端到端交通标志文本识别系统,达到有效缓解交通拥堵、提高道路安全的目的.方法 系统主要包括文本区域检测和文字识别两个视觉任务,并基于卷积神经网络的深度学习技术实现.首先以ResNet-50为骨干网络提取特征,并采用类FPN结构进行多层特征融合,将融合后的特征作为文本检测和识别的共享特征.文本检测定位文本区域并输出候选文本框的坐标,文字识别输出词条对应的文本字符串.结果 通过实验验证,系统在Traffic Guide Panel Dataset上取得了令人满意的结果,行识别准确率为71.08%.结论 端到端交通标志文本识别非常具有现实意义.通过卷积神经网络的深度学习技术,提出了一套端到端交通标志文本识别系统,并在开源的Traffic Guide Panel Dataset上证明了该系统的优越性.  相似文献   

10.
The problem of non-recognition of road signs has many aspects which are of great importance in traffic safety. Considering all signs on a test road and eye-movement technique and recognition rate method, a temporal analysis has been conducted for two techniques of driving: driving with the time necessary to see, read and recognize each type of road sign, and free driving to determine the actual time the driver spends reading these signs. The actual time spent provides recognition rates, totally and partially, and also rates of non-recognition. Many of the factors involved were investigated and the analysis was designed to estimate the effect of these factors separately. For a more practical use of the results, a set of probabilistic models has been estimated to characterize the different distributions of fixation durations. Next, the parameters of these models were used to develop a method for measuring the efficiency-level index of the road sign system.  相似文献   

11.
A fast and robust method to detect and recognize scaled and skewed road signs is proposed in this paper. In the detection stage, the input color image is first quantized in HSV color model. Border tracing those regions with the same colors as road signs is adopted to find the regions of interest (ROI). The ROIs are then automatically adjusted to fit road sign shape models so as to facilitate detection verification even for scaled and skewed road signs in complicated scenes. Moreover, the ROI adjustment and verification are both performed only on border pixels; thus, the proposed road sign detector is fast. In the recognition stage, the detected road sign is normalized first. Histogram matching based on polar mesh is then adopted to measure the similarity between the scene and model road signs to accomplish recognition. Since histogram matching is fast and has high tolerance to distortion and deformation while contextual information can still be incorporated into it in a natural and elegant way, our method has high recognition accuracy and fast execution speed. Experiment results show that the detection rate and recognition accuracy of our method can achieve 94.2% and 91.7%, respectively. On an average, it takes only 4–50 and 10 ms for detection and recognition, respectively. Thus, the proposed method is effective, yet efficient. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 17, 28–39, 2007  相似文献   

12.
选取Cyberglove型号数据手套作为手语输入设备,采用DGMM(Dynamic Gaussian Mixture Model)作为手势词识别技术,提出了基于相对熵的搜索策略,并将其应用于基于半连续DGMM的手势词识别中以提高手势词识别速度。实验结果表明,采用搜索策略后手势识别效果与原来相当,而识别速度提高了近15倍。  相似文献   

13.
ABSTRACT

Sign language is a medium of communication for people with hearing disabilities. Static and dynamic gestures are identified in a video-based sign language recognition and translated them into humanly understandable phrases to achieve the communication objective. However, videos contain redundant Key-frames which require additional processing. Number of such Key-frames can be reduced. The selection of particular Key-frames without losing the required information is a challenging task. The Key-frame extraction algorithm is used which helps to speed-up the sign language recognition process by extracting essential key-frames. The proposed framework eliminates the computation overhead by picking up the distinct Key-frames for the recognition process. Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Histograms of Oriented Gradient (HOG) are used for unique features extraction. We used the bagged tree, boosted tree ensemble method, Fine KNN, and SVM for classification. We tested methodology on video-based datasets of Pakistani Sign Language. It achieved an overall 97.5% accuracy on 37 Urdu alphabets and 95.6% accuracy on 100 common words.  相似文献   

14.
设计了中型组足球机器人的比赛中应用的人机语音交互系统。首先简要介绍了相关软件,对语音识别原理进行了解析;然后分析了语音合成技术及其实现步骤,并根据足球机器人在比赛中的实际需要,设计了一套语音指令;利用Kinect软件进行了实验研究;最终针对不同的发声对象测试了多组数据,实验结果表明所设计的语音交互系统对语音指令的识别行之有效,识别率较高。无论是裁判机还是队员机,都能快速准确地识别队员的语音指令并进行播报,完全满足人机对抗赛对人机语音交互的需求。  相似文献   

15.
N USHA RANI  P N GIRIJA 《Sadhana》2012,37(6):747-761
Speech is one of the most important communication channels among the people. Speech Recognition occupies a prominent place in communication between the humans and machine. Several factors affect the accuracy of the speech recognition system. Much effort was involved to increase the accuracy of the speech recognition system, still erroneous output is generating in current speech recognition systems. Telugu language is one of the most widely spoken south Indian languages. In the proposed Telugu speech recognition system, errors obtained from decoder are analysed to improve the performance of the speech recognition system. Static pronunciation dictionary plays a key role in the speech recognition accuracy. Modification should be performed in the dictionary, which is used in the decoder of the speech recognition system. This modification reduces the number of the confusion pairs which improves the performance of the speech recognition system. Language model scores are also varied with this modification. Hit rate is considerably increased during this modification and false alarms have been changing during the modification of the pronunciation dictionary. Variations are observed in different error measures such as F-measures, error-rate and Word Error Rate (WER) by application of the proposed method.  相似文献   

16.
朱奇光  董惠茹  张孟颖 《计量学报》2021,42(9):1136-1141
基于模仿学习以及人机交互技术,借助Kinect深度相机传感器对类人机器人上半身动作模仿问题进行了研究。首先,将改进的D-H模型应用于NAO机器人的手臂完成了手臂运动学模型的精确建立及求解,解决了传统建模方法两相邻关节平行时出现的奇异性问题。其次,提出了一种改进的基于深度图像的手势识别算法,完成了对于示教者手势的判定及模仿,与传统基于彩色图像的手势识别相比,不受光照影响的同时提升了识别准确率,改进算法的平均识别准确率达到96.2%。最后将NAO机器人作为试验平台的实验表明:NAO机器人对于示教者上半身动作的实时在线模仿运动轨迹平滑且稳定,并且在抓取实验中也显现出了较好的准确性。  相似文献   

17.
18.
Communication is a basic need of every human being; by this, they can learn, express their feelings and exchange their ideas, but deaf people cannot listen and speak. For communication, they use various hands gestures, also known as Sign Language (SL), which they learn from special schools. As normal people have not taken SL classes; therefore, they are unable to perform signs of daily routine sentences (e.g., what are the specifications of this mobile phone?). A technological solution can facilitate in overcoming this communication gap by which normal people can communicate with deaf people. This paper presents an architecture for an application named Sign4PSL that translates the sentences to Pakistan Sign Language (PSL) for deaf people with visual representation using virtual signing character. This research aims to develop a generic independent application that is lightweight and reusable on any platform, including web and mobile, with an ability to perform offline text translation. The Sign4PSL relies on a knowledge base that stores both corpus of PSL Words and their coded form in the notation system. Sign4PSL takes English language text as an input, performs the translation to PSL through sign language notation and displays gestures to the user using virtual character. The system is tested on deaf students at a special school. The results have shown that the students were able to understand the story presented to them appropriately.  相似文献   

19.
Over recent years, various virtual prototyping technologies have been developed to innovate apparel industry. For each step of the garment design process one can find dedicated tools (from body acquisition to garment modelling and simulation) with the aim of making the process easier and faster. However, most of them are based on expensive solutions both for hardware and software systems. In this paper, we focus the attention on the first step of the made-to-measure garment design, i.e. customer’s measures acquisition. We present a plug-in, named Tailor Tracking, which permits to get the measurements by interacting with the customer’s avatar using hands as in the traditional way. Tailor Tracking has been developed using low cost devices, such as Microsoft Kinect sensor, Leap motion device and Oculus Rift, and open source libraries, such as Visualisation Toolkit (VTK) and Qt. The proposed approach is based on the use of multiple Kinect v2 to simultaneously acquire both customer’s body and motion. This permits to emulate the customer’s postures required to take the correct measurements. In addition, a virtual measuring tape is made available to replicate the one commonly used by the tailor. A men shirt has been considered as case study and a tailor and 14 people with no skills in garment design and different levels of experience in virtual reality technology have been involved to preliminary test Tailor tracking. Finally, tests as well as results reached so far are presented and discussed. Results have been considered quite good; however, some critical measures have been identified as well as future developments. Anyway, Tailor Tracking can represent an alternative solution to the existing approaches that automatically extract anthropometric measures from the customer’s avatar.  相似文献   

20.
Skin segmentation and tracking play an important role in sign language recognition. A framework for segmenting and tracking skin objects from signing videos is described. It mainly consists of two parts: a skin colour model and a skin object tracking system. The skin colour model is first built based on the combination of support vector machine active learning and region segmentation. Then, the obtained skin colour model is integrated with the motion and position information to perform segmentation and tracking. The tracking system is able to predict occlusions among any of the skin objects using a Kalman filter (KF). Moreover, the skin colour model can be updated with the help of tracking to handle illumination variation. Experimental evaluations using real-world gesture videos and comparison with other existing algorithms demonstrate the effectiveness of the proposed work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号