|
|
|
|
 |
|
|
|
| |
| |
|
Text-to-Audio-Visual System |
|
| |
Description:
This project deals VTalk, a system for synthesizing text-to-audiovisual
speech (TTAVS), where the input text is converted into an audiovisual
speech stream incorporating the head and eye movements. It is an
image-based system, where the face is modeled using a set of images of a
human subject. A concatination of visemes the corresponding lip shapes for
phonemes can be used for modeling visual speech. A smooth transition
between visemes is achieved using morphing along the correspondence
between the visemes obtained by optical flows. The phonemes and timing
parameters given by the text-to-speech synthesizer determines the
corresponding visemes to be used for the synthesis of the visual stream.
We provide a method using polymorphing to incorporate co-articulation
during the speech in our TTAVS. We also include nonverbal mechanisms in
visual speech communication such as eye blinks and head nods, which make
the talking head model more lifelike. For eye movement, a simple mask
based approach is employed and view morphing is used to generate the
intermediate images for the movement of head. All these features are
integrated into a single system, which takes text, head and eye movement
parameters as input and produces the complete audiovisual stream.
Results:
- Sentence: Tea twenty two temporary food stew.
- Sentence: I miss you
- Sentence: How are you?
Reports:
-
VTalk: A System for generating
text-to-audio-visual speech.
Prem Kalra, Ashish Kapoor, Udit K Goyal
IETE Technical Review, Vol. 18, No. 4, pp. 307-314, July-August 2001.
-
Modeling Co-articulation for text-to-audio-visual speech
synthesizer
Ashish Kapoor, Udit K Goyal, Prem Kalra
Proc. of ICVGIP 2000, Bangalore.
|
| |
|
|
|