首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The author's goal is to generate a virtual space close to the real communication environment between network users or between humans and machines. There should be an avatar in cyberspace that projects the features of each user with a realistic texture-mapped face to generate facial expression and action controlled by a multimodal input signal. Users can also get a view in cyberspace through the avatar's eyes, so they can communicate with each other by gaze crossing. The face fitting tool from multi-view camera images is introduced to make a realistic three-dimensional (3-D) face model with texture and geometry very close to the original. This fitting tool is a GUI-based system using easy mouse operation to pick up each feature point on a face contour and the face parts, which can enable easy construction of a 3-D personal face model. When an avatar is speaking, the voice signal is essential in determining the mouth shape feature. Therefore, a real-time mouth shape control mechanism is proposed by using a neural network to convert speech parameters to lip shape parameters. This neural network can realize an interpolation between specific mouth shapes given as learning data. The emotional factor can sometimes be captured by speech parameters. This media conversion mechanism is described. For dynamic modeling of facial expression, a muscle structure constraint is introduced for making a facial expression naturally with few parameters. We also tried to obtain muscle parameters automatically from a local motion vector on the face calculated by the optical flow in a video sequence  相似文献   

2.
倪奎  董兰芳 《电子技术》2009,36(12):64-67
人脸动画广泛地应用于游戏行业、远程会议、代理和化身等许多其它领域,近年吸引了很多学者的研究,其中口腔/眼睛等器官的动画一直是一个较大的难点。本文提出了一种将口腔/眼睛的器官样本图片融合到人脸图像中并根据单张中性人脸图片生成人脸动画的方法。该方法根据特征点生成样条,在极坐标上对样条插值来实现空间映射,然后采用后向映射和插值进行图像重采样得到融合图像。实验结果表明,该方法产生的融合图片较为自然,能实现口腔/眼球等器官的运动,能满足人脸动画生成的实时性要求。  相似文献   

3.
The initial conception of a model-based analysis synthesis image coding (MBASIC) system is described and a construction method for a three-dimensional (3-D) facial model that includes synthesis methods for facial expressions is presented. The proposed MBASIC system is an image coding method that utilizes a 3-D model of the object which is to be reproduced. An input image is first analyzed and an output image using the 3-D model is then synthesized. A very low bit rate image transmission can be realized because the encoder sends only the required analysis parameters. Output images can be reconstructed without the noise corruption that reduces naturalness because the decoder synthesizes images from a similar 3-D model.

In order to construct a 3-D model of a person's face, a method is developed which uses a 3-D wire frame face model. A full-face image is then projected onto this wire frame model. For the synthesis of facial expressions two different methods are proposed; a clip-and-paste method and a facial structure deformation method.  相似文献   


4.
Lifelike talking faces for interactive services   总被引:1,自引:0,他引:1  
Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTP-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.  相似文献   

5.
A real-time algorithm for affine-structure-based video compression for facial images is presented. The face undergoing motion is segmented and triangulated to yield a set of control points. The set of control points generated by triangulation are tracked across a few frames using an intensity-based correlation technique. For accurate motion and structure estimation a Kalman-filter-based algorithm is used to track features on the facial image. The structure information of the control points is transmitted only during the bootstrapping stage. After that only the motion information is transmitted to the decoder. This reduces the number of motion parameters associated with control points in each frame. The local motion of the eyes and lips is captured using local 2-D affine transformations. For real time implementation a quad-tree based search technique is adopted to solve local correlation. Any remaining reconstruction error is accounted for using predictive encoding. Results on real image sequences demonstrate the applicability of the method  相似文献   

6.
This paper describes an autostereoscopic image overlay technique that is integrated into a surgical navigation system to superimpose a real three-dimensional (3-D) image onto the patient via a half-silvered mirror. The images are created by employing a modified version of integral videography (IV), which is an animated extension of integral photography. IV records and reproduces 3-D images using a microconvex lens array and flat display; it can display geometrically accurate 3-D autostereoscopic images and reproduce motion parallax without the need for special devices. The use of semitransparent display devices makes it appear that the 3-D image is inside the patient's body. This is the first report of applying an autostereoscopic display with an image overlay system in surgical navigation. Experiments demonstrated that the fast IV rendering technique and patient-image registration method produce an average registration accuracy of 1.13 mm. Experiments using a target in phantom agar showed that the system can guide a needle toward a target with an average error of 2.6 mm. Improvement in the quality of the IV display will make this system practical and its use will increase surgical accuracy and reduce invasiveness.  相似文献   

7.
Three-dimensional (3-D) subband/wavelet coding with motion compensation has been demonstrated to be an efficient technique for video coding applications in some recent research works. When motion compensation is performed with half-pixel accuracy, images need to be interpolated in both temporal subband analysis and synthesis stages. The resulting subband filter banks developed in these former algorithms were not invertible due to image interpolation. In this paper, an invertible temporal analysis/synthesis system with half-pixel-accurate motion compensation is presented. We look at temporal decomposition of image sequences as a kind of down-conversion of the sampling lattices. The earlier motion-compensated (MC) interlaced/progressive scan conversion scheme is extended for temporal subband analysis/synthesis. The proposed subband/wavelet filter banks allow perfect reconstruction of the decomposed video signal while retaining high energy compaction of subband transforms. The invertible filter banks are then utilized in our 3-D subband video coder. This video coding system does not contain the temporal DPCM loop employed in the conventional hybrid coder and the earlier MC 3-D subband coders. The experimental results show a significant PSNR improvement by the proposed method. The generalization of our algorithm for MC temporal filtering at arbitrary subpixel accuracy is also discussed.  相似文献   

8.
This paper presents an algorithm for fast image synthesis inside deformed volumes. Given the node displacements of a mesh and a reference 3-D image dataset of a predeformed volume, the method first maps the image pixels that need to be synthesized from the deformed configuration to the nominal predeformed configuration, where the pixel intensities are obtained easily through interpolation in the regular-grid structure of the reference voxel volume. This mapping requires the identification of the mesh element enclosing each pixel for every image frame. To accelerate this point location operation, a fast method of projecting the deformed mesh on image pixels is introduced in this paper. The method presented was implemented for ultrasound B-mode image simulation of a synthetic tissue phantom. The phantom deformation as a result of ultrasound probe motion was modeled using the finite element method. Experimental images of the phantom under deformation were then compared with the corresponding synthesized images using sum of squared differences and mutual information metrics. Both this quantitative comparison and a qualitative assessment show that realistic images can be synthesized using the proposed technique. An ultrasound examination system was also implemented to demonstrate that real-time image synthesis with the proposed technique can be successfully integrated into a haptic simulation.   相似文献   

9.
We propose a novel approach for face tracking, resulting in a visual feedback loop: instead of trying to adapt a more or less realistic artificial face model to an individual, we construct from precise range data a specific texture and wireframe face model, whose realism allows the analysis and synthesis modules to visually cooperate in the image plane, by directly using 2D patterns synthesized by the face model. Unlike other feedback loops found in the literature, we do not explicitly handle the 3D complex geometric data of the face model, to make real-time manipulations possible. Our main contribution is a complete face tracking and pose estimation framework, with few assumptions about the face rigid motion (allowing large rotations out of the image plane), and without marks or makeup on the user's face. Our framework feeds the feature-tracking procedure with synthesized facial patterns, controlled by an extended Kalman filter. Within this framework, we present original and efficient geometric and photometric modelling techniques, and a reformulation of a block-matching algorithm to make it match synthesized patterns with real images, and avoid background areas during the matching. We also offer some numerical evaluations, assessing the validity of our algorithms, and new developments in the context of facial animation. Our face-tracking algorithm may be used to recover the 3D position and orientation of a real face and generate a MPEG-4 animation stream to reproduce the rigid motion of the face with a synthetic face model. It may also serve as a pre-processing step for further facial expression analysis algorithms, since it locates the position of the facial features in the image plane, and gives precise 3D information to take into account the possible coupling between pose and expressions of the analysed facial images.  相似文献   

10.
Scalable low bit-rate video coding is vital for the transmission of video signals over wireless channels. A scalable model-based video coding scheme is proposed in this paper to achieve this. This paper mainly addresses automatic scalable face model design. Firstly, a robust and adaptive face segmentation method is proposed, which is based on piecewise skin-colour distributions. 43 million skin pixels from 900 images are used to train the skin-colour model, which can identify skin-colour pixels reliably under different lighting conditions. Next, reliable algorithms are proposed for detecting the eyes, mouth and chin that are used to verify the face candidatures. Then, based on the detected facial features and human face muscular distributions, a heuristic scalable face model is designed to represent the rigid and non-rigid motion of head and facial features. A novel motion estimation algorithm is proposed to estimate the object model motion hierarchically. Experimental results are provided to illustrate the performance of the proposed algorithms for facial feature detection and the accuracy of the designed scalable face model for representing face motion.  相似文献   

11.
This paper describes a system that can synthesize realistic sequential images of moving goldfish based on the image understanding result of real goldfish. To analyze and synthesize images in real-time, we have constructed a hardware system that consists of 32 paralell transputers with a high-speed visual-data interface called VIT (Visual Interface for Transpputer Network). The system is very flexible and powerful for various types of image processing because it can be extended according to the required computational cost. In the understanding process, we assume that the target object, a goldfish in this case, deforms its shape pliably in 3-D space and moves only in a two-dimensional direction. A modeling, called the Bone-Structured Solid Modeler, which is suitable for representing deformable objects such as living things, plays an important role in the understanding and synthesis processes of the deformable object. Three types of constraints for motion, namely, static, dynamic, and object, are utilized to verify the estimated pose and orientation of the object. In the motion synthesis process, realistic moving images are synthesized by controlling the model employing the motion understanding result. Simulation results are presented to show the effectiveness of the system. The technology discussed in this paper is expected to play a key role in the realization of future visual human interfaces.  相似文献   

12.
AMethodforHeadshoulderSegmentationandHumanFacialFeaturePositioningHuTianjianCaiDejunDepartmentofElectricalandInformationEngi...  相似文献   

13.
Although several algorithms have been proposed for facial model adaptation from image sequences, the insufficient feature set to adapt a full facial model, imperfect matching of feature points, and imprecise head motion estimation may degrade the accuracy of model adaptation. In this paper, we propose to resolve these difficulties by integrating facial model adaptation, texture mapping, and head pose estimation as cooperative and complementary processes. By using an analysis-by-synthesis approach, salient facial feature points and head profiles are reliably tracked and extracted to form a growing and more complete feature set for model adaptation. A more robust head motion estimation is achieved with the assistance of the textured facial model. The proposed scheme is performed with image sequences acquired with single uncalibrated camera and requires only little manual adjustment in the initialization setup, which proves to be a feasible approach for facial model adaptation.  相似文献   

14.
Described is a system for the multidimensional display and analysis of tomographic images utilizing the principle of variable focal (varifocal) length optics. The display system uses a vibrating mirror in the form of an aluminized membrane stretched over a loudspeaker, coupled with a cathode ray tube (CRT) display monitor suspended face down over the mirror, plus the associated digital hardware to generate a space filling display. The mirror is made to vibrate back and forth, as a spherical cap, by exciting the loudspeaker with a 30 Hz sine wave. "Stacks" of 2-D tomographic images are displayed, one image at a time, on the CRT in synchrony with the mirror motion. Because of the changing focal length of the mirror and the integrating nature of the human eye-brain combination, the time sequence of 2-D images, displayed on the CRT face, appears as a 3-D image in the mirror. The system simplifies procedures such as: reviewing large amounts of 3-D image information, exploring volume images in three dimensions, and gaining an appreciation or understanding of three-dimensional shapes and spatial relationships. The display system facilitates operator interactivity, e.g., the user can point at structures within the volume image, remove selected image regions to more clearly visualize underlying structure, and control the orientation of brightened oblique planes through the volume.  相似文献   

15.
Three-dimensional (3-D) displays are drawing attention as next-generation devices. Some techniques which can reproduce 3-D images prepared in advance have already been developed. However, technology for the transmission of 3-D moving pictures in real time is yet to be achieved. In this paper, we present a novel method for 360/spl deg/ viewable 3-D displays and the Transpost system in which we implement the method. The basic concept of our system is to project multiple images of the object, taken from different angles, onto a spinning screen. The key to the method is projection of the images onto a directionally reflective screen with a limited viewing angle. The images are reconstructed to give the viewer a 3-D image of the object displayed on the screen. The display system can present images of computer-graphics pictures, live pictures, and movies. Furthermore, the reverse optical process of that in the display system can be used to record images of the subject from multiple directions; the images can then be transmitted to the display in real-time. We have developed prototypes of a 3-D display and a 3-D human-image transmission system. Our preliminary working prototypes demonstrate new possibilities of expression and forms of communication.  相似文献   

16.
Oriented speckle reducing anisotropic diffusion.   总被引:2,自引:0,他引:2  
Ultrasound imaging systems provide the clinician with noninvasive, low-cost, and real-time images that can help them in diagnosis, planning, and therapy. However, although the human eye is able to derive the meaningful information from these images, automatic processing is very difficult due to noise and artifacts present in the image. The speckle reducing anisotropic diffusion filter was recently proposed to adapt the anisotropic diffusion filter to the characteristics of the speckle noise present in the ultrasound images and to facilitate automatic processing of images. We analyze the properties of the numerical scheme associated with this filter, using a semi-explicit scheme. We then extend the filter to a matrix anisotropic diffusion, allowing different levels of filtering across the image contours and in the principal curvature directions. We also show a relation between the local directional variance of the image intensity and the local geometry of the image, which can justify the choice of the gradient and the principal curvature directions as a basis for the diffusion matrix. Finally, different filtering techniques are compared on a 2-D synthetic image with two different levels of multiplicative noise and on a 3-D synthetic image of a Y-junction, and the new filter is applied on a 3-D real ultrasound image of the liver.  相似文献   

17.
In this paper, we present a novel deep generative facial parts swapping method: parts-swapping generative adversarial network (PSGAN). PSGAN independently handles facial parts, such as eyes (left eye and right eye), nose, mouth and jaw, which achieves facial parts swapping by replacing the target facial parts with source facial parts and reconstructing the entire face image with these parts. By separately modeling the facial parts in the form of region inpainting, the proposed method can successfully achieve highly photorealistic face swapping results, enabling users to freely manipulate facial parts. In addition, the proposed method is able to perform jaw editing based on sketch guidance information. Experimental results on the CelebA dataset suggest that our method achieves superior performance for facial parts swapping and provides higher user control flexibility.  相似文献   

18.
Tracking a dynamic set of feature points   总被引:5,自引:0,他引:5  
We address the problems of tracking a set of feature points over a long sequence of monocular images as well as how to include and track new feature points detected in successive frames. Due to the 3-D movement of the camera, different parts of the images exhibit different image motion. Tracking discrete features can therefore be decomposed into several independent and local problems. Accordingly, we propose a localized feature tracking algorithm. The trajectory of each feature point is described by a 2-D kinematic model. Then to track a feature point, an interframe motion estimation scheme is designed to obtain the estimates of interframe motion parameters. Subsequently, using the estimates of motion parameters, corresponding points are identified to subpixel accuracy. Afterwards, the temporal information is processed to facilitate the tracking scheme. Since different feature points are tracked independently, the algorithm is able to handle the image motion arising from general 3-D camera movements. On the other hand, in addition to tracking feature points detected at the beginning, an efficient way to dynamically include new points extracted in subsequent frames is devised so that the information in a sequence is preserved. Experimental results for several image sequences are also reported.  相似文献   

19.
A novel micro-grooved structure of lightguides and a sequential driving scheme of light sources were demonstrated in achieving comparable image qualities of 3-D displays to that of 2-D displays. The modified distribution of micro-grooves not only locates viewing cones for respective eyes but also suppresses the moire pattern, which might occur when a periodic micro-grooved structure and a color filter are superimposed. The configuration of lightguides for a 1.8-in liquid crystal display (LCD) panel can yield acceptable 3-D perceptions at the viewing distance of 7-23 cm and the brightness uniformity of greater than 83%. In addition, the driving scheme of light sources in synchronization with parallax images can project images to the viewer's respective eyes sequentially. With the refreshing sub-frame rate of 190 Hz and double displaying parallax images, the image crosstalk of 3-D perceptions can be efficiently reduced for an LC response time of 7.0 ms.  相似文献   

20.
Four-dimensional (4-D) imaging to capture the three-dimensional (3-D) structure and motion of the heart in real time is an emerging trend. We present here our method of interactive multiplanar reformatting (MPR), i.e., the ability to visualize any chosen anatomical cross section of 4-D cardiac images and to change its orientation smoothly while maintaining the original heart motion. Continuous animation to show the time-varying 3-D geometry of the heart and smooth dynamic manipulation of the reformatted planes, as well as large image size (100-300 MB), make MPR challenging. Our solution exploits the hardware acceleration of 3-D texture mapping capability of high-end commercial PC graphics boards. Customization of volume subdivision and caching concepts to periodic cardiac data allows us to use this hardware effectively and efficiently. We are able to visualize and smoothly interact with real-time 3-D ultrasound cardiac images at the desired frame rate (25 Hz). The developed methods are applicable to MPR of one or more 3-D and 4-D medical images, including 4-D cardiac images collected in a gated fashion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号