Menu VisualVoice / Activities

Visual Voice Home

About Artistic Vision People Teams Contact us


Publications Media Images/Movies


Related Links

Local Only

Website problems?

edit SideBar


The artistic outcomes and scientific research depend upon the coupling between the two areas. Thus, while we list our activities as independantly pursued, they are all linked to allow findings from one area to feed into the others. The main activities include:

Two technical aspects of the project require significant research activity. Firstly, the creation of synthesizers for speech and song that include both audio and visual components is a challenging yet achievable task, and our work will advance the state-of-the-art in research on data-driven audio-visual speech synthesis. These are summarized as Face Synthesis and Speech synthesis. Secondly, gestural control of the singing head advances the state-of-the-art in adaptive interfaces and new interfaces for musical expression (NIME). The various functional and non-functional requirements for these synthesizers to be useful will come directly from the performers learning to speak and sing with the devices, and from both qualitative and quantitative perceptual evaluation of their performance.

Artistically, the development of the gesture-controlled singing head creates new means of expressing human emotions, feelings, and ideas. It is important to note that while the artistic component of the project requires the technology, the technology itself is not the focal point of the artistic work: instead, the emphasis is on the ability to create new means of vocal expression, and through those means explore new connections, nuances, and subtleties in artistic expression.

Finally, one of the key scientific research directions of the researchers is to advance the understanding of the production and perception of human vocalization through Evaluation. The performers comprise a unique set of people who can improvise audible and visual speech simultaneously with their vocal tracts and their hands. This affords an extraordinary opportunity to study the mechanisms and linkage between speech understanding and production, in addition to the crucial structural and perceptual validation of the DIVA system. These studies will provide clear protocols for future brain function studies (EEG, fMRI) of the cognitive and neuromotor mechanisms underlying the vocally and hand-controlled productions of linguistic and artistic behavior.

There are three phases of the project, each which has interlocking scientific and artistic outcomes in which the sophistication of the performance and the technology increases progressively, accommodating additional performers and experiments that can assess the complex coordination among performers and its impact on perceptual processing.

Phase 1: Solo/Duet Performance using a Dictionary-based DIVA

The artistic goal of Phase 1 is to create a work for a solo Diva accompanied by percussion. This work will be premiered in 2008 as part of the Nu:BC concert series. We will use a parallel formant speech synthesizer (Rye and Holmes, 1982) with a simple excitation model connected to an articulated model of a face built inside the ArtiSynth modeling framework (Fels et al., 2006). A dictionary of phonemes and facial configurations combined with interpolation schemes will provide the coordinated mapping between hand gesture, acoustics and visual face movement. A simple adaptation procedure will aid learning the correspondence between a performer’s gestures and the resulting sounds. Development of the stage work will parallel the performer’s learning process, as the composer and the librettist will work closely with the performer to refine and practice the text, music, and meanings. Additionally, the performer's production will be analyzed and evaluated perceptually – for example, to compare and validate the audiovisual intelligibility of natural and gesturally synthesized talking and singing heads.

Phase 2: Duet/Quartet using Coordinated Acoustic Tube-Based DIVAs

A new work using two DIVAs and a small ensemble will be composed to explore the interaction of characters who have a real voice and a virtual voice and persona. This work will be premiered at the 2009 Open Ears Festival in Kitchener-Waterloo, one of Canada’s most innovative new music festivals. As part of the research in this phase, we will replace the formant-based synthesizer with a two-dimensional acoustic tube model that is based on an articulatory synthesizer so that facial movements and configurations are coordinated with sound production directly. Using the experimental results of Phase 1, we will improve the excitation source models to increase the expressive capacity of the synthesizer. Additionally, each synthesizer will be made fully portable for stage performance, enhancing the dramatic impact of performances. Training at the beginning of this phase will include the experienced performer from Phase 1 sharing her experience and knowledge with the new second performer, thereby facilitating the process of learning to talk and sing. Production and perception studies will continue, allowing us to assess and compare the acquisition experiences of the two performers, as well as to validate the communicative realism of the synthesized output. These studies and the performers’ experiences controlling a more sophisticated synthesizer will inform many aspects of Phase 3.

Phase 3: Visual Voice Opera for 3D Articulatory Synthesis DIVAs

In this phase, our largest artistic production will occur, as we develop and present a large stage work for three DIVAs and ensemble, to be premiered in 2010 as part of the Vancouver New Music concert series. Artistic issues that will be addressed include sound localization and character identification, comprehension of multiple vocal streams, plot development with fluid morphing of character personae, and the interplay of artificial ensemble speech with acoustic instruments. Technically, the acoustic-tube based model will now support consonant production based on 3D models of the vocal tract that provide tighter coupling between the visual and acoustic modalities. The experiences of the two performers in Phase 2 will be used to refine the adaptive interface, and both will contribute to the training of the third performer. Performers will continue to be used in production and perception studies with increased emphasis on characterizing the behavioral entrainment among the performers and the communicative coherence of the performance.