back to top page

Analysis of emotional speech

Chairperson:
Donna Erickson - Gifu City Women's College, Japan

[14:20-16:00]   Room: Room H


We4.H.1 INVESTIGATION OF THE ACOUSTIC FEATURES OF EMOTIONAL SPEECH USING A PHYSIOLOGICAL ARTICULATORY MODEL

Shin'ichi Ito, Jianwu Dang, Masato Akagi

School of Information Science, Japan Advanced Institute of Science and Technology, Japan

Processing emotional speech is an important issue for speech information science and there are many studies working on this issue. However, we still have no clear knowledge to answer what are the crucial acoustic features for emotional speech, except the fundamental frequencies, and how human manipulate their speech organs to generate emotional speech. In this study, we investigate the acoustic features concerned with emotional speech using a physiological articulatory model. Human behaviors on production of emotional speech are introduced into the articulatory model. The changes in acoustic parameters are detected and analyzed. Then, the crucial parameters are identified using a listening test.


We4.H.2 CAN WE ESTIMATE THE SPEAKER'S EMOTIONAL STATE FROM HER/HIS PROSODIC FEATURES? -EFFECTS OF F0 CONTOUR'S SLOPE AND DURATION ON PERCEIVING DISAGREEMENT, HESITATION, AGREEMENT AND ATTENTION

Takanori Komatsu, Yasuko Nagasaki

Department of Media Architecture, Future University-Hakodate, Japan

We experimentally investigated the relationship between the prosodic features in speech sounds (F0 contour's slope and duration) and its speakers' emotional state (disagreement, hesitation, agreement and attention). In this experiment, 44 artificial synthesized sounds (11 slopes x 4 durations) were presented to participants, and they were asked to answer the question "Do you think that this speaker expressed XXX?" Actually, XXX was the randomly selected emotional state among the above four emotions. As the result, we could confirm that (1) the upward slopes (increasing intonation) regardless of its duration were interpreted as "disagreement," (2) the slower downward slopes with the longer duration as "hesitation," (3) the downward slopes (decreasing intonation) with the shorter duration as "agreement" and (4) the steep downward slopes with the shortest duration as "attention."


We4.H.3 FUNDAMENTAL FREQUENCY AS A CUE TO ESTIMATE SPEAKERS' EMOTIONAL STATE

Yasuko Nagasaki, Takanori Komatsu

Department of Media Architecture, Future University-Hakodate, Japan

This study concerned the relationship between the prosodic features of utterances and the listeners' estimation of speaker's emotional state or contexts. In our previous studies, not only segmental phonemes but also prosodic features were considered as important information on estimating speakers' emotional state. In the previous study (Experiment1), which investigated Japanese interjectory word "eh," we found that the duration and the slope of F0 contours were generally sufficient to distinguish various kind of "eh." The responses of the listeners showed a high degree of congruence with the original contexts, although the stimuli were presented singly separated from original contexts. However, as the stimuli of the experiment were natural utterances, other kinds of features of voice like intensity or voice quality were also presented to the listeners. Then another experiment (Experiment2) had done with the synthesized "eh," which had variation only in pitch and duration. The following was the results: the voice with rising tone were perceived as "disagreement," flat tone with long duration were perceived as "hesitation," and falling tone with short duration were perceived as "agreement." The results of this experiment were congruous with Experiment1. In this study (Experimetn3), we presented triangle waves, which had the same duration and F0 transition as Experiment2, as stimuli. 20 university students listened to the stimuli and asked if they perceived the sounds as "disagreement," "hesitation," or "agreement." The responses of listeners were highly related with the results of Experiment2. What is important here is the stimuli presented in Experiment3 didn't include phoneme information, even the voice quality of all the stimuli were completely the same. The result indicated that duration and the transition of F0 contours is supposed to be an important cue to estimating speaker's emotional state or contexts.


We4.H.4 THE INFLUENCE OF F0 PATTERNS ON PERCEPTION OF DECLARATIVES AND DECLARATIVES WITH DISSATISFACTION: THE CASE OF EXPRESSION "AWANAINO"

Yukinori Tagawa1, Sakuko Tabuchi2

1Osaka University Graduate School of Letters, Japan, 2Sangmyung University, Korea

In speech communication, the prosodic features show not only linguistic features but also paralinguistic information in which there are emotion, intention of speakers and so forth. When we teach Japanese as a foreign language, it is very important to clarify the influence of prosodic features on the perception of emotion. Though clarifying methods, teachers can help learners develop their communication ability. We conducted some experiments, which focused on the intonation of utterance among the various prosodic features. In our previous study(TAGAWA,TABUCHI 2001), we examined the differences of intonations of interrogatives with neutral and interrogatives with criticism. In this study, native speakers of Japanese are judged to declaratives with neutral and declaratives with dissatisfaction. We analyzed each sample in order to clarify these prosodic features. We found differences of each speech at the end portion of fundamental frequency (F0)contours. Declarative speech has a falling shape at the end of F0 contours. By contrast, declarative with dissatisfaction speech has a rise and fall shape at the end of F0 contours. We synthesized forty-six stimuli creating partial F0 contours. In result, this listening experiment shows that a rise-fall pattern at the end of F0 contours becomes an important factor in the perception of declarative with dissatisfaction.


We4.H.5 FUNDAMENTAL FREQUENCY IN FEEDBACK WORDS IN SWEDISH

Mechtild Tronnier1, Jens Allwood2

1Department of Language and Culture, Linköping University, Sweden, 2Department of Linguistics, Göteborg University, Sweden

An investigation of the fundamental frequency in Swedish feedback words is presented. It is hypothesised that the F0-pattern differs between words signalling positive and negative feedback-i.e. agreement and disagreement with the utterance of the preceding speaker. When imitating the contrast, the negative feedback word nej 'no' is often produced with a falling F0, stretching over a larger range, whereas the positive feedback word ja 'yes' is produced with some steady intonation, ending in a rise. Data from spontaneous dialogues has been investigated to confirm that feedback words in discourse are handled in the described way. The data is part of the Göteborg Spoken Language Corpus (GSLC). Neither consistent use of these patterns, nor of any other pattern has been found in the data. An identification test was carried out, to see whether the hummed F0-pattern contained enough information for the listener to separate positive feedback words from negative ones. Results show that listeners' capability to distinguish between positive and negative feedback words when solely relying on the F0-pattern is rather poor. A variety of F0-patterns are used for positive and negative feedback words within and across the categories. The recognition of the contrast between the two categories depends presumably on more information, which may also be found in a broader scope of prosody, like phrasal prosody, including intonation and pausing. Otherwise one might assume that in some cases-such as noticeable hesitation-information about the feedback type is solely conveyed by the words. It seems like the F0-patterns previously described reflect other factors-like emphasis-rather than contrast between positive and negative feedback.


back to top page