首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
传统的手指语识别采用卷积神经网络的方法,模型结构单一,在池化层会丢弃很多信息; Capsule(胶囊)是在神经网络中构建和抽象出的子网络,每个胶囊都专注于一些单独的任务,又能保留图像的空间特征。分析了中国手语中手指语的特征,构建并扩展了手指语图片训练集,试图用CapsNet(胶囊网络)模型解决手指语的识别任务,对比了不同参数下CapsNet的识别率,并与经典的GoogLeNet卷积网络作对比。实验结果表明,CapsNet在手语识别任务上能达到较好的识别效果。  相似文献   

2.
Recent hardware technologies have enabled acquisition of 3D point clouds from real world scenes in real time. A variety of interactive applications with the 3D world can be developed on top of this new technological scenario. However, a main problem that still remains is that most processing techniques for such 3D point clouds are computationally intensive, requiring optimized approaches to handle such images, especially when real time performance is required. As a possible solution, we propose the use of a 3D moving fovea based on a multiresolution technique that processes parts of the acquired scene using multiple levels of resolution. Such approach can be used to identify objects in point clouds with efficient timing. Experiments show that the use of the moving fovea shows a seven fold performance gain in processing time while keeping 91.6% of true recognition rate in comparison with state-of-the-art 3D object recognition methods.  相似文献   

3.
In this paper, a 3D object recognition algorithm is proposed. Objects are recognized by studying planar images corresponding to a sequence of views. Planar shape contours are represented by their adaptively calculated curvature functions, which are decomposed in the Fourier domain as a linear combination of a set of representative shapes. Finally, sequences of views are identified by means of Hidden Markov Models. The proposed system has been tested for artificial and real objects. Distorted and noisy versions of the objects were correctly clustered together.  相似文献   

4.
为了解决语言障碍者与健康人之间的交流障碍问题,提出了一种基于神经网络的手语到情感语音转换方法。首先,建立了手势语料库、人脸表情语料库和情感语音语料库;然后利用深度卷积神经网络实现手势识别和人脸表情识别,并以普通话声韵母为合成单元,训练基于说话人自适应的深度神经网络情感语音声学模型和基于说话人自适应的混合长短时记忆网络情感语音声学模型;最后将手势语义的上下文相关标注和人脸表情对应的情感标签输入情感语音合成模型,合成出对应的情感语音。实验结果表明,该方法手势识别率和人脸表情识别率分别达到了95.86%和92.42%,合成的情感语音EMOS得分为4.15,合成的情感语音具有较高的情感表达程度,可用于语言障碍者与健康人之间正常交流。  相似文献   

5.
Because of using traditional hand-sign segmentation and classification algorithm,many diversities of Bangla language including joint-letters,dependent vowels etc.and representing 51 Bangla written characters by using only 36 hand-signs,continuous hand-sign-spelled Bangla sign language(BdSL)recognition is challenging.This paper presents a Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language which consists of two phases.First phase is designed for hand-sign classification and the second phase is designed for Bangla language modeling algorithm(BLMA)for automatic recognition of hand-sign-spelled Bangla sign language.In first phase,we have proposed two step classifiers for hand-sign classification using normalized outer boundary vector(NOBV)and window-grid vector(WGV)by calculating maximum inter correlation coefficient(ICC)between test feature vector and pre-trained feature vectors.At first,the system classifies hand-signs using NOBV.If classification score does not satisfy specific threshold then another classifier based on WGV is used.The system is trained using 5,200 images and tested using another(5,200×6)images of 52 hand-signs from 10 signers in 6 different challenging environments achieving mean accuracy of 95.83%for classification with the computational cost of 39.972 milliseconds per frame.In the Second Phase,we have proposed Bangla language modeling algorithm(BLMA)which discovers allhidden charactersbased onrecognized charactersfrom 52 hand-signs of BdSL to make any Bangla words,composite numerals and sentences in BdSL with no training,only based on the result of first phase.To the best of our knowledge,the proposed system is the first system in BdSL designed on automatic recognition of hand-sign-spelled BdSL for large lexicon.The system is tested for BLMA using hand-sign-spelled 500 words,100 composite numerals and 80 sentences in BdSL achieving mean accuracy of 93.50%,95.50%and 90.50%respectively.  相似文献   

6.
目的 随着3D扫描技术和虚拟现实技术的发展,真实物体的3D识别方法已经成为研究的热点之一。针对现有基于深度学习的方法训练时间长,识别效果不理想等问题,提出了一种结合感知器残差网络和超限学习机(ELM)的3D物体识别方法。方法 以超限学习机的框架为基础,使用多层感知器残差网络学习3D物体的多视角投影特征,并利用提取的特征数据和已知的标签数据同时训练了ELM分类层、K最近邻(KNN)分类层和支持向量机(SVM)分类层识别3D物体。网络使用增加了多层感知器的卷积层替代传统的卷积层。卷积网络由改进的残差单元组成,包含多个卷积核个数恒定的并行残差通道,用于拟合不同数学形式的残差项函数。网络中半数卷积核参数和感知器参数以高斯分布随机产生,其余通过训练寻优得到。结果 提出的方法在普林斯顿3D模型数据集上达到了94.18%的准确率,在2D的NORB数据集上达到了97.46%的准确率。该算法在两个国际标准数据集中均取得了当前最好的效果。同时,使用超限学习机框架使得本文算法的训练时间比基于深度学习的方法减少了3个数量级。结论 本文提出了一种使用多视角图识别3D物体的方法,实验表明该方法比现有的ELM方法和深度学习等最新方法的识别率更高,抗干扰性更强,并且其调节参数少,收敛速度快。  相似文献   

7.
8.
In this paper, a new appearance-based 3D object classification method is proposed based on the Hidden Markov Model (HMM) approach. Hidden Markov Models are a widely used methodology for sequential data modelling, of growing importance in the last years. In the proposed approach, each view is subdivided in regular, partially overlapped sub-images, and wavelet coefficients are computed for each window. These coefficients are then arranged in a sequential fashion to compose a sequence vector, which is used to train a HMM, paying particular attention to the model selection issue and to the training procedure initialization. A thorough experimental evaluation on a standard database has shown promising results, also in presence of image distortions and occlusions, the latter representing one of the most severe problems of the recognition methods. This analysis suggests that the proposed approach represents an interesting alternative to classic appearance-based methods to 3D object classification.  相似文献   

9.
10.
针对基于视频的连续手语识别的文本结果存在语义模糊、语序混乱的问题,提出一种两步法将连续手语识别结果的手语文本转化为通顺、可懂的汉语文本.第一步,基于自然手语规则以及N元语言模型(N-gram)对连续手语识别的结果进行文本调序;第二步,利用汉语通用量词数据集训练双向长短期记忆(Bi-LSTM)网络模型,以解决手语语法无量...  相似文献   

11.
A new algorithm is presented for interpreting two-dimensional (2D) line drawings as three-dimensional (3D) objects without models. Even though no explicit models or additional heuristics are included, the algorithm tends to reach the same 3D interpretations of 2D line drawings that humans do. The algorithm explicitly calculates the partial derivatives of Marill's Minimum Standard Deviation of Angles (MSDA) with respect to all adjustable parameters, and follows this gradient to minimize SDA. For an image with lines meeting atm points formingn angles, the gradient descent algorithm requiresO(n) time to adjust all the points, while Marill's method requiredO(mn) time to do so. Experimental results on various line drawing objects show that this gradient descent algorithm running on a Macintosh II is one to two orders of magnitude faster than the MSDA algorithm running on a Symbolics, while still giving comparable results.  相似文献   

12.
Computer vision has been extensively adopted in industry for the last two decades. It enhances productivity and quality management, and is flexibility, efficient, fast, inexpensive, reliable and robust. This study presents a new translation, rotation and scaling-free object recognition method for 2D objects. The proposed method comprises two parts: KRA feature extractor and GRA classifier. The KRA feature extractor employs K-curvature, re-sampling, and autocorrelation transformation to extract unique features of objects, and then gray relational analysis (GRA) classifies the extracted invariant features. The boundary of the digital object was first represented as the form of the K-curvature over a given region of support, and was then re-sampled and transformed with autocorrelation function. After that, the extracted features own the unique property that is invariant to translation, rotation and scaling. To verify and validate the proposed method, 50 synthetic and 50 real objects were digitized as standard patterns, and 10 extra images of each object (test images) which were taken at different positions, orientations and scales, were acquired and compared with the standard patterns. The experimental results reveal that the proposed method with either GRA or MD methods is effective and reliable for part recognition.  相似文献   

13.
14.
15.
Three-dimensional models are widely used in the fields of multimedia, computer graphics, virtual reality, entertainment, design, and manufacturing because of the rich information that preserves the surface, color and texture of real objects. Therefore, effective 3D object classification technology has become an urgent need. Previous methods usually directly convert classic 2D convolution into 3D form and apply it to objects with binary voxel representation, which may lose internal information that is essential for recognition. In this paper, we propose a novel voxel-based three-view hybrid parallel network for 3D shape classification. This method first obtains the depth projection views of the three-dimensional model from the front view, the top view and the side view, so as to preserve the spatial information of the three-dimensional model to the greatest extent, and output its predicted probability value for the category of the three-dimensional model, and then combining the three-view parallel network with voxel sub-network performs weight fusion, and then uses Softmax for classification. We conducted a series of experiments to verify the design of the network and achieved competitive performance in the 3D object classification tasks of ModelNet10 and ModelNet40.  相似文献   

16.
为解决传统卷积神经网络中大量池化层的引入导致特征信息丢失的问题,依据胶囊网络(CapsNet)使用向量神经元保存特征空间信息的特性,提出了一种用以识别三维模型的网络模型3DSPNCapsNet。使用新的网络结构,提取更具代表性的特征的同时降低了模型复杂度,并提出基于动态路由(DR)算法的DRL算法来优化胶囊权重的迭代计算过程。在ModelNet10上的实验结果表明,相比3DCapsNet以及VoxNet,该网络取得了更好的识别效果,在原始测试集上3DSPNCapsNet的平均识别准确率达到95%,同时验证了该网络对旋转三维模型的识别能力。适当扩展旋转训练集之后,所提网络对各角度旋转模型的平均识别率达到81%。实验结果表明,3DSPNCapsNet对三维模型及其旋转具有良好的识别能力。  相似文献   

17.
徐访  黄俊  陈权 《计算机工程》2021,47(11):283-291
在不带有标志帧的手势视频上进行动态手势识别,容易导致识别准确率下降。提出一种具有分级网络结构的动态手势识别模型。以手势检测模型为第1级网络,手势分类模型为第2级网络,分步完成识别任务。同时,将三维卷积核拆分为时间域和空间域卷积分阶段完成任务,解决三维卷积神经网络中因参数过多造成模型训练或运行时间过长的问题。实验结果表明,在保证实时性的前提下,该模型在EgoGesture数据集上的识别准确率高达93.35%,优于C3D、ResNeXt101、MTUT等模型。  相似文献   

18.
目的 通过深度学习卷积神经网络进行3维目标检测的方法已取得巨大进展,但卷积神经网络提取的特征既缺乏不同区域特征的依赖关系,也缺乏不同通道特征的依赖关系,同时难以保证在无损空间分辨率的情况下扩大感受野。针对以上不足,提出了一种结合混合域注意力与空洞卷积的3维目标检测方法。方法 在输入层融入空间域注意力机制,变换输入信息的空间位置,保留需重点关注的区域特征;在网络中融入通道域注意力机制,提取特征的通道权重,获取关键通道特征;通过融合空间域与通道域注意力机制,对特征进行混合空间与通道的混合注意。在特征提取器的输出层融入结合空洞卷积与通道注意力机制的网络层,在不损失空间分辨率的情况下扩大感受野,根据不同感受野提取特征的通道权重后进行融合,得到全局感受野的关键通道特征;引入特征金字塔结构构建特征提取器,提取高分辨率的特征图,大幅提升网络的检测性能。运用基于二阶段的区域生成网络,回归定位更准确的3维目标框。结果 KITTI(A project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)数据集中的实验结果表明,在物体被遮挡的程度由轻到高时,对测试集中的car类别,3维目标检测框的平均精度AP3D值分别为83.45%、74.29%、67.92%,鸟瞰视角2维目标检测框的平均精度APBEV值分别为89.61%、87.05%、79.69%; 对pedestrian和cyclist 类别,AP3DAPBEV值同样比其他方法的检测结果有一定优势。结论 本文提出的3维目标检测网络,一定程度上解决了3维检测任务中卷积神经网络提取的特征缺乏视觉注意力的问题,从而使3维目标检测更有效地运用于室外自动驾驶。  相似文献   

19.
目的 在自动化、智能化的现代生产制造过程中,行为识别技术扮演着越来越重要的角色,但实际生产制造环境的复杂性,使其成为一项具有挑战性的任务。目前,基于3D卷积网络结合光流的方法在行为识别方面表现出良好的性能,但还是不能很好地解决人体被遮挡的问题,而且光流的计算成本很高,无法在实时场景中应用。针对实际工业装箱场景中存在的人体被遮挡问题和光流计算成本问题,本文提出一种结合双视图3D卷积网络的装箱行为识别方法。方法 首先,通过使用堆叠的差分图像(residual frames, RF)作为模型的输入来更好地提取运动特征,替代实时场景中无法使用的光流。原始RGB图像和差分图像分别输入到两个并行的3D ResNeXt101中。其次,采用双视图结构来解决人体被遮挡的问题,将3D ResNeXt101优化为双视图模型,使用一个可学习权重的双视图池化层对不同角度的视图做特征融合,然后使用该双视图3D ResNeXt101模型进行行为识别。最后,为进一步提高检测结果的真负率(true negative rate, TNR),本文在模型中加入降噪自编码器和two-class支持向量机(support vec...  相似文献   

20.
This paper presents a segment-based probabilistic approach to robustly recognize continuous sign language sentences. The recognition strategy is based on a two-layer conditional random field (CRF) model, where the lower layer processes the component channels and provides outputs to the upper layer for sign recognition. The continuously signed sentences are first segmented, and the sub-segments are labeled SIGN or ME (movement epenthesis) by a Bayesian network (BN) which fuses the outputs of independent CRF and support vector machine (SVM) classifiers. The sub-segments labeled as ME are discarded and the remaining SIGN sub-segments are merged and recognized by the two-layer CRF classifier; for this we have proposed a new algorithm based on the semi-Markov CRF decoding scheme. With eight signers, we obtained a recall rate of 95.7% and a precision of 96.6% for unseen samples from seen signers, and a recall rate of 86.6% and a precision of 89.9% for unseen signers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号