首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
一个聋哑人辅助教学系统   总被引:4,自引:0,他引:4  
为促进聋哑人与正常人之间的交流,本文设计并实现了一个聋哑人辅助教学系统.该系统的功能是能够在文本的驱动下实时合成一个有表情,有口型,有手势动作的三维虚拟“图形人”.对于用户任意输入的一段文本,系统首先将其切分成词的序列,然后顺次驱动“图形人”根据当前词蕴涵的感情色彩做出相应的表情,同时作出该词发音时的相应口型并打出对应于当前词的哑语手势.本系统可以直接应用于聋哑人辅助教学.  相似文献   

2.
针对汉语的发音习惯以及语音可视化技术中对口型动画自然、连续的要求,提出了一种基于肌肉模型与协同发音模型的与语音保持同步的口型动画的方法.首先,根据汉语发音时的口型视位特征将声、韵母音素归类,并用数据映射的方式合成与之对应的口型关键帧.通过分析输入的文本信息,合成与语音保持同步的三维人脸口型动画.为了解决汉语发音习惯的问题,设计了一种基于微分几何学描述的协同发音建模的方法,该方法通过分析相邻音子间视素的影响权重,可以产生符合汉语发音习惯的口型动画.最后,通过实验对比和分析,该方法产生的口型动画更为逼真,且符合汉语发音的习惯.  相似文献   

3.
构建一种基于文本和朗读语音共同驱动的双模态语音与口型匹配控制模型,充分利用文本拼音提供正确的口型视位,利用朗读语音为口型视位提供正确的时序,在此基础上对口型几何参数和时间参数进行离散化,根据汉语语音发音机理,合理控制语音的口型动作过程。  相似文献   

4.
一个面向OA的印刷汉字OCR实用系统   总被引:1,自引:0,他引:1  
本文叙述一个采取以“统计模式识别”为主, 以“结构模式识别”方法为辅的识别技术路线实现的以办公室自动化(OA)为应用环境的一级印刷汉字文本识别系统,该系统从实用化角度出发, 采用页式文本图象扫描输入,输入后将图象文本分割成单个汉字, 并根据汉字的结构特点, 抽取了汉字的内层, 外层,局部等多个特征。识别采用多级分类方法。识别结果形成一个国标区位码文件,系统软件建立了一种与用户间的友好界面。该系统是在IBM PC/XT上实现的, 对印刷字样识别率>99%, 对各类实际的办公行文其统计识别率>95%, 识别速度为1-2字/秒。 前  相似文献   

5.
一、引言 在汉字系统中,我们可以很容易地使用汉语拼音输入汉字,但如果希望得到一个汉字或一篇文章的拼音注释却非易事。其实,我们可以利用汉字系统的拼音模块来获得汉字的汉语拼音,这在许多场合下都有实用价值。比如,可根据汉字的汉语拼音实现对汉字的字典式排序(按照汉字内码只能对国标中常用字区  相似文献   

6.
<正> 一种触摸式汉字处理器最近在英国推出。这个由英特意系统公司(Intech System Inc.)发明的CP2054B处理器,通过操作触摸式显示器和输入简单的汉语拼音,便可产生汉字文件并输出到二十四针打印机上。该处理器能直接存取6000个基本汉字。目前正朝  相似文献   

7.
软件     
1不认识的字也能用智能ABC输入北京读者杨国林问:我使用的是WindowsXP操作系统,通常我只用系统自带的全拼输入法和智能A B C输入法输入汉字。请问,在遇到不会发音的汉字时,除了查字典外,用什么方法能输入这个汉字?千万不要告诉我去学五笔哦,我可不想去记那么多字根。答:在不会汉语拼音,或者不知道某字的读音时,智能ABC输入法照样可以发挥作用,你可以使用智能ABC输入法中提供的笔形输入法。智能ABC笔形输入法把所有汉字的基本部件归纳为横(提)、竖、撇、点(捺)、折(竖弯钩)、弯、叉、方八种基本笔形,分别用键盘上的1~8这八个数字键…  相似文献   

8.
在计算机硬件系统里,键盘作为一种基本的输入设备是必不可少的,通过键盘不但要进行各种文本输入,如汉字的输入,还可以对窗口、菜单进行操作,所以一旦键盘出现故障,就会影响计算机的使用。键盘出现故障后,根据其故障现象,一般可采取如下的维修方法。计算机键盘出现...  相似文献   

9.
通用徒手绘图系统是一种利用新一代计算机笔输入技术“汉瑞得”笔,进行二维工程绘图及手写汉字识别的智能化软件。该系统将计算机绘图工作简化到只需一支“笔”,即能实现草图自动规整、个人墨迹保留或识别为系统汉字。与常规计算机绘图软件相比,具有自然的输入方式和高智能、高效率、消除人机障碍等特点。系统功能该系统是一种集计算机笔输入技术与模糊识别技术及尺寸驱动的参数化设计为一体的高智能新一代CAD系统,就象傻瓜相机一样将方便留给用户。1.基本绘图功能该系统利用“汉瑞得”笔直接在电子图板上操作,其方式如同用铅笔随手…  相似文献   

10.
拼音汉字计算机自动转换系统   总被引:4,自引:0,他引:4  
本文介绍了一种运用自然语言理解,完成汉语拼音到汉字的计算机自动转换系统。该系统利用汉语词法知识、句法知识、语义和语用知识,构造了一个层次结构的知识基,对汉语拼音形式的文章,逐句进行切词、词法分析、语法分析、语义和语用处理,最后形成正确的汉字句子、文章。 该系统已在IBM PC/XT微型计算机上实现。  相似文献   

11.
We present a technique for accurate automatic visible speech synthesis from textual input. When provided with a speech waveform and the text of a spoken sentence, the system produces accurate visible speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 3D face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key‐shape mapping function learned by a set of viseme examples in the source and target faces. A well‐posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated visible speech are also developed to provide the final concatenative motion sequence. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

12.
This paper presents a novel data‐driven expressive speech animation synthesis system with phoneme‐level controls. This system is based on a pre‐recorded facial motion capture database, where an actress was directed to recite a pre‐designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme‐aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best‐matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify ‘hard constraints’ (motion‐node constraints for expressing phoneme utterances) and ‘soft constraints’ (emotion modifiers) to guide this search process. We also introduce a phoneme–Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.  相似文献   

13.
This paper presents a real-time speech-driven talking face system which provides low computational complexity and smoothly visual sense. A novel embedded confusable system is proposed to generate an efficient phoneme-viseme mapping table which is constructed by phoneme grouping using Houtgast similarity approach based on the results of viseme similarity estimation using histogram distance, according to the concept of viseme visually ambiguous. The generated mapping table can simplify the mapping problem and promote viseme classification accuracy. The implemented real time speech-driven talking face system includes: 1) speech signal processing, including SNR-aware speech enhancement for noise reduction and ICA-based feature set extractions for robust acoustic feature vectors; 2) recognition network processing, HMM and MCSVM are combined as a recognition network approach for phoneme recognition and viseme classification, which HMM is good at dealing with sequential inputs, while MCSVM shows superior performance in classifying with good generalization properties, especially for limited samples. The phoneme-viseme mapping table is used for MCSVM to classify the observation sequence of HMM results, which the viseme class is belong to; 3) visual processing, arranges lip shape image of visemes in time sequence, and presents more authenticity using a dynamic alpha blending with different alpha value settings. Presented by the experiments, the used speech signal processing with noise speech comparing with clean speech, could gain 1.1 % (16.7 % to 15.6 %) and 4.8 % (30.4 % to 35.2 %) accuracy rate improvements in PER and WER, respectively. For viseme classification, the error rate is decreased from 19.22 % to 9.37 %. Last, we simulated a GSM communication between mobile phone and PC for visual quality rating and speech driven feeling using mean opinion score. Therefore, our method reduces the number of visemes and lip shape images by confusable sets and enables real-time operation.  相似文献   

14.
Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.  相似文献   

15.
论文基于主动形状模型,提出了一种新的人脸特征建模方法,并详细介绍了唇部形状的建模过程。该方法从人脸数据库中提取一定的训练集,对唇部形状进行标注、校准和统计分析,可以得到唇部的变形模式,每种变形模式对应形状协方差矩阵的一个特征值,而最大的N(N<10)个特征值对应的主要变形模式可以还原出98%的形状变化。实验证明,该方法以非常小的信息损失为代价,仅用少量的参数就可以描述绝大部分的视觉特征。  相似文献   

16.
为了达到辅助老师教聋哑学生语文的目的,开发一套文本翻译成手语的教学系统。采用改进的结巴分词对课文内容进行分词,课文句子转化成词语序列,使用系统编辑功能对词语序列进行编辑,使其满足文法手语要求;同时建立虚拟人,采用关键帧技术制作手语动画,使用Unity3D游戏引擎完成手语动画合成和动画之间的过渡,实现课文内容自动翻译成手语的辅助教学系统。该研究对聋哑学生语文教学有特殊的意义,具有一定的实用价值。  相似文献   

17.
This paper proposes a real-time lip reading system (consisting of a lip detector, lip tracker, lip activation detector, and word classifier), which can recognize isolated Korean words. Lip detection is performed in several stages: face detection, eye detection, mouth detection, mouth end-point detection, and active appearance model (AAM) fitting. Lip tracking is then undertaken via a novel two-stage lip tracking method, where the model-based Lucas-Kanade feature tracker is used to track the outer lip, and then a fast block matching algorithm is used to track the inner lip. Lip activation detection is undertaken through a neural network classifier, the input for which being a combination of the lip motion energy function and the first dominant shape feature. In the last step, input words are defined and recognized by three different classifiers: HMM, ANN, and K-NN. We combine the proposed lip reading system with an audio-only automatic speech recognition (ASR) system to improve the word recognition performance in the noisy environments. We then demonstrate the potential applicability of the combined system for use within hands free in-vehicle navigation devices. Results from experiments undertaken on 30 isolated Korean words using the K-NN classifier at a speed of 15 fps demonstrate that the proposed lip reading system achieves a 92.67% word correct rate (WCR) for person-dependent tests, and a 46.50% WCR for person-independent tests. Also, the combined audio-visual ASR system increases the WCR from 0% to 60% in a noisy environment.  相似文献   

18.
基于ANN/HMM的中国手语识别系统   总被引:5,自引:1,他引:4  
手语是聋哑人使用的语言。它是由手形动作辅之以表倩姿势为符号构成的比较稳定的表达系统,是一种靠动作/视觉交际的特殊的语言。一方面,手语识别可以作为健全人与聋哑人之间的翻译,为聋哑人提供更好的服务;另一方面,作为人体语言理解的一部分,手语识别可作为人机交互的一种手段。该文实现了基于ANN/HMM的手语识别系统,采用ANN方法建立了关于手形、位置、方向的特征映射器,并在建立手形特征映射器的过程中,给出了多特征多分类器融合算法。实验证明,基于ANN/HMM的手语识别系统是可行及实用的。  相似文献   

19.
目前,汉语识别已经取得了一定的研究成果.但由于中国的地域性差异,十里不同音,使得汉语识别系统在进行方言识别时识别率低、性能差.针对语音识别系统在对方言进行识别时的缺陷,构建了基于HTK的衡阳方言孤立词识别系统.该系统使用HTK3.4.1工具箱,以音素为基本识别单元,提取39维梅尔频率倒谱系数(MFCC)语音特征参数,构建隐马尔可夫模型(HMM),采用Viterbi算法进行模型训练和匹配,实现了衡阳方言孤立词语音识别.通过对比实验,比较了在不同因素模型下和不同高斯混合数下系统的性能.实验结果表明,将39维MFCC和5个高斯混合数与HMM模型结合实验时,系统的性能得到很大的改善.  相似文献   

20.
唇同步效果影响人类对语言的理解。着重研究汉语语音和口型的唇同步,将汉语对应口型划分为4类、两种状态(极点态与过渡态),得出汉语唇同步验证是对极点态音频和极点态视频的同步验证,提出基于极点态音频/视频知识库的唇同步识别与验证模型,分别阐述了模型中音频/视频特征分析子系统,提出了可以将基于运动对象识别的帧间差法与嘴唇形状、颜色和运动特征结合,实现嘴唇精确定位,最后给出唇同步验证过程。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号