UU Database User's Manual

1 Overview

File Listings

		  /00README.txt           README
		  /Sessions               Speech data and XML documents for each session
		  /Sessions/C001          Data for Session C001
		  /Sessions/C001/C001.sd  Speech data for Session C001 (2 channels, ESPS format)
		  /Sessions/C001/C001.wav Speech data for Session C001 (2 channels, Microsoft WAVE format)
		  /Sessions/C001/C001.xml XML document for Session C001
		  /Sessions/C002          Data for Session C002
		  /Sessions/C003          Data for Session C003
		  /Sessions/C004          Data for Session C004
		  /Sessions/C005          Data for Session C005
		  /Sessions/C006          Data for Session C006
		  /Sessions/C007          Data for Session C007
		  /Sessions/C011          Data for Session C011
		  /Sessions/C012          Data for Session C012
		  /Sessions/C013          Data for Session C013
		  /Sessions/C021          Data for Session C021
		  /Sessions/C022          Data for Session C022
		  /Sessions/C023          Data for Session C023
		  /Sessions/C024          Data for Session C024
		  /Sessions/C031          Data for Session C031
		  /Sessions/C032          Data for Session C032
		  /Sessions/C033          Data for Session C033
		  /Sessions/C041          Data for Session C041
		  /Sessions/C042          Data for Session C042
		  /Sessions/C043          Data for Session C043
		  /Sessions/C051          Data for Session C051
		  /Sessions/C052          Data for Session C052
		  /Sessions/C053          Data for Session C053
		  /Sessions/C061          Data for Session C061
		  /Sessions/C062          Data for Session C062
		  /Sessions/C063          Data for Session C063
		  /Sessions/C064          Data for Session C064
		  /Sessions/speakers.xml  Speaker information file
		  /doc                    Documents
		  /doc/manual.pdf         UUDB User's Manual (Japanese)
		  /doc/resp.mat           Filter coefficients for downsampling
		  /tools                  XML-related utilities
		  /var                    Derivative data (Recoverable from speech data and XML documents)
		  /var/C001               Derivative data for Session C001
		  /var/C001/C001.list     Per-utterance speech file name list for Session C001 (in order)
		  /var/C001/C001.txt      Orthographic transcription file
		  /var/C001/C001.syl      Syllabic transcription file
		  /var/C001/C001.para     Averaged rating file of paralinguistic information
		  /var/C001/C001L_001.wav Speech file of 1st utterance by speaker on channel L
		  /var/C001/C001L_002.wav Speech file of 2nd utterance by speaker on channel L
		  .
		  .
		  /var/C001/C001R_001.wav Speech file of 1st utterance by speaker on channel R
		  /var/C001/C001R_002.wav Speech file of 2nd utterance by speaker on channel R
		  .
		  .

All the information that the UU Database provides is integrated into speech data under the directory /Sessions, and XML documents.

Session ID

In the UU Database, a series of utterance in a discourse is treated as a unit called session. Each session corresponds to a series of utterance for a given 4-frame cartoon.

Each session is identified by a 4-character ID starts with "C".

The character following to "C" is always "0" in this release.

The next character represents a serial number for participant pair. For example, "3" is given to the pair of speaker FKC and FUE.

The following character is a session number for the pair.

Thus, for example, the Session ID C031 indicates the first session by the pair of speaker FKC and FUE.

Speech Data

Speech data for each session are stored in the directory identified by its Session ID under the directory /Sessions, in both ESPS format and Microsoft WAVE format. For example, speech data for the session C031 are /Sessions/C031/C031.sd (ESPS format) and /Sessions/C031/C031.wav (WAVE format). Both speech data are two-channel speech of the whole span of the session, and identical except the latter lack Session Start Time information. This redundancy comes from the requirement of ESPS (not available any longer) for cross conversion.

XML Document

The XML document for each session is stored as a UUDB XML Document. For example, the XML document for the session C031 is /Sessions/C031/C031.xml.

Derivative Data

For users who are busy or not so familiar with XML, various types of ready-to-use data generated from the speech data and the XML documents are provided as derivative data, which can be found under the directory /var. The derivative data can be recovered with accompanied utilities described in Using XML Utilities section.

For taking a quick glance at the database, transcription files can be a good start point. Two kinds of transcription files are provided; both are in the Shift JIS encoding. Although the encoding is very popular for Japanese users, other users might find it difficult to handle the files. Currently they have several options (requires an XSLT processor):

Generate transcription files in the UTF-8 encoding, by modifying /tools/xml2txt.xsl.
Generate romanized phonetic transcription files, by modifying /tools/xml2syl.xsl.

The next release of UUDB will supply utilities for non-Japanese environment support.

Using XML Utilities

The directory /tools contains the files listed below.

		  /tools/SplitUtterance    Source files of the utterance splitting utility
		  /tools/SplitUtterance.sh Shell script for batch execution of the utterance splitting utility
		  /tools/xml2txt.xsl       Stylesheet for generating orthographic transcription files (EUC encoding)
		  /tools/xml2txt-sjis.xsl  Stylesheet for generating orthographic transcription files (Shift JIS encoding)
		  /tools/xml2syl.xsl       Stylesheet for generating phonetic transcription files (EUC encoding)
		  /tools/xml2syl-sjis.xsl  Stylesheet for generating phonetic transcription files (Shift JIS encoding)
		  /tools/transcription.sh  Shell script for batch execution of transcription file generation
		  /tools/xml2list.xsl      Stylesheet for generating per-utterance speech file name lists
		  /tools/list.sh           Shell script for batch execution of the per-utterance speech file name list generation
		  /tools/xml2para.xsl      Stylesheet for generating averaged rating files of paralinguistic information
		  /tools/para.sh           Shell script for batch execution of averaged rating file generation
		  /tools/uudb.rng          RELAX NG schama of the UUDB XML Document

To run the utterance splitting utility, the Java SE development environment is required.

2 Design and Building of the UU Database

The Four-Frame Cartoon Sorting Task

(This section is a stub.)

Dialogue Recording

(This section is a stub.)

Identifying the Unit of Utterance

(This section is a stub.)

Transcription

(This section is a stub.)

Paralinguistic Information Annotation

(This section is a stub.)

3 Structure of the UU Database

Elements of the UUDB XML Document

In the UU Database, each individual discourse is treated as a session. An XML document of the UU Database is a single file for a session, and its root element is Session.

Attributes to `Session`

SessionID: Session ID of the session for which this XML document describes.
Date: The date of recording.
LSpeakerID: ID of the speaker whose speech is recorded in the first channel (normally the left channel) for this session.
RSpeakerID: ID of the speaker whose speech is recorded in the second channel (normally the right channel) for this session.
MaterialNo: The cartoon material number used for this session.
Lhas: Enumerated frame numbers that the speaker with the ID LSpeakerID has. For example, if a speaker has the (originally) fourth and second frames, the attribute value is "4,2".
Rhas: Enumerated frame numbers that the speaker with the ID RSpeakerID has.
TimePressure: True if this session was performed under a time constraint.
SessionStartTime: The time at which this session started (in seconds). Identical to the start time recorded in the header of this session's speech data in the ESPS format. All sorts of start/end time indicated in the XML documents (UtteranceStartTime, UtteranceEndTime, StartTime, EndTime) regard this time as origin. For example, if SessionStartTime is 2.0s, UtteranceStartTime is 10.1s and UtteranceEndTime is 11.1s, the utterance started at 8.1s after the beginning of this session, and ended at 9.1s after the beginning of the session.

`SessionComment` element

Comments for this session. Just for developmental use. Users should not rely on this information.

`Comment` element

A comment.

Attributes to `Comment`

CommentStrings: Comment strings.

`Utterance` element

An utterance. The maximum unit for description in the UU database.

Each session is composed of a sequence of utterances. Utterances are arranged in the progressing order according to the start time.

Attributes to `Utterance`

UtteranceID

ID of this utterance. Serial number where the first utterance of each session is "001".

Channel

Indicates the speaker of this utterance. If "L", this utterance is of the speaker with the ID LSpeakerID. If "R", this utterance is of the speaker with the ID RSpeakerID.

UtteranceStartTime

The start time of this utterance. cf. SessionStartTime

UtteranceEndTime

The end time of this utterance. cf. SessionStartTime

SlashUnit

A keyword that designates whether the end of this utterance agrees with the end of slash unit. Default value is "complete".

If "complete", the end of this utterance is the end of slash unit.

If "incomplete", the end of this utterance is not the end of slash unit, and the slash unit continues to the next utterance.

If "irrelevant", this utterance is not involved with identifying slash units. Applies in the case where the whole utterance is composed of nonlinguistic sounds or short fragments that can hardly be a slash unit.

`UtteranceComment` element

Comments for this utterance. Just for developmental use. Users should not rely on this information.

`Comment` element

A comment.

Attributes to `Comment`

CommentStrings: Comment strings.

`EmotionalState` element

A set of paralinguistic information annotations for this utterance.

`Rating` element

The perceived emotional states of the speaker that an annotator evaluated for this utterance on a 7-point scale for six abstract dimensions.

Attributes to `Rating`

AnnotatorID

The annotator ID.

Pleasantness

The rating for the "pleasant-unpleasant" dimension. Evaluation of speaker's feeling.

Extremely unpleasant
Very unpleasant
Somewhat unpleasant
Neutral
Somewhat pleasant
Very pleasant
Extremely pleasant

Arousal

The rating for the "aroused-sleepy" dimension. Evaluation of the speaker's mental activity.

Extremely sleepy
Very sleepy
Somewhat sleepy
Neutral
Somewhat aroused
Very aroused
Extremely aroused

Dominance

The rating for the "dominant-submissive" dimension. Evaluation of the degree at which the speaker leads the communication to the another party.

Extremely submissive
Very submissive
Somewhat submissive
Neutral
Somewhat dominant
Very dominant
Extremely dominant

Credibility

The rating for the "credible-doubtful" dimension. Evaluation of the degree at which the speaker believes the another party.

Extremely doubtful
Very doubtful
Somewhat doubtful
Neutral
Somewhat credible
Very credible
Extremely credible

Interest

The rating for the "interested-indifferent" dimension. Evaluation of the degree at which the speaker is interested in the another party or her/his utterance.

Extremely indifferent
Very indifferent
Somewhat indifferent
Neutral
Somewhat interested
Very interested
Extremely interested

Positivity

The rating for the "positive-negative" dimension. Evaluation of the degree at which the speaker evaluates the another party's utterance positively.

Extremely negative
Very negative
Somewhat negative
Neutral
Somewhat positive
Very positive
Extremely positive

Child elements of `Utterance`

Each utterance is a sequence whose constituents are either "nonlinguistic sound", "short pause" or "chunk". These elements are contiguous and not overlapping. Therefore, the speech duration of an utterance can be calculated by subtracting the total sum of the duration of nonlinguistic sound and short pause within the utterance from the duration of the utterance (= UtteranceEndTime − UtteranceStartTime).

`NonLinguisticSound` element

Nonlinguistic sound. Originated from the speaker. Does not occur simultaneously with speech sound.

Unlike speech sounds (chunks), identification of nonlinguistic sound is not comprehensive.

Attributes to `NonLinguisticSound`

ChunkID

Constituent ID of this nonlinguistic sound. Serial number where the first constituent of each utterance is "1".

StartTime

The start time of this nonlinguistic sound. cf. SessionStartTime

EndTime

The end time of this nonlinguistic sound. cf. SessionStartTime

TagBreath

True if this nonlinguistic sound is a breathing sound.

TagLaugh

True if this nonlinguistic sound is a laughing sound.

Applies to pure laughing sound; speech portions with laughing are irrelevant.

TagSigh

True if this nonlinguistic sound is a sigh.

TagCough

True if this nonlinguistic sound is a cough or throat clearing.

`ShortPause` element

Short pause within utterance.

Attributes to `ShortPause`

ChunkID: Constituent ID of this short pause. Serial number where the first constituent of each utterance is "1".; The start time of this short pause. cf. SessionStartTime
EndTime: The end time of this short pause. cf. SessionStartTime

`Chunk` element

A stretch of speech sound. A speech continuum not being divided by nonlinguistic sounds, short pauses, or element boundaries.

Attributes to `Chunk`

ChunkID: Constituent ID of this chunk. Serial number where the first constituent of each utterance is "1".
OrthographicTranscription: Orthographic transcription for this chunk.
PhoneticTranscription: Phonetic transcription for this chunk by katakana, which can directly be transformed to a phoneme sequence.
Disfluency: True if this chunk is a slip of the tongue (mispronunciation) or a repetition. Some disfluent chunks are followed by a self repair; others are not.
Filler: True if this chunk is a filler.
Backchannel: True if this chunk is a backchannel (aizuchi).
Conjunction: True if this chunk is a conjunction.
DiscourseMarker: True if this chunk is a discourse marker.
TagS: (experimental) True if this chunk is marked as S (something like shout or surprise).
EndOfSlashUnit: True if the slash unit ends prematurely at this chunk.

`Mora` element

A mora. A chunk is composed of a sequence of morae.

Attributes to `Mora`

MoraID: ID of this mora. Serial number where the first mora of each chunk is "1".
MoraEntity: A symbol of this mora. Written in katakana, just the same way as PhoneticTranscription.

`ExternalNoise` element

A set of external noise occurred during this utterance.

`Noise` element

An external noise. (not complehensive, not objective)

Attributes to `ExternalNoise`

NoiseID: ID of this external noise. Serial number where the first external noise appeared in each utterance is "1".
StartTime: The start time of this external noise. cf. SessionStartTime
EndTime: The end time of this external noise. cf. SessionStartTime

Structure of the speaker information file

The speaker information file is an XML document, whose root element is Speakers.

`Speaker` element

A speaker.

`SpeakerInfo` element

Personal information for this speaker.

Attributes to `SpeakerInfo`

SpeakerID: Speaker ID of this speaker.
SpeakerAge: This speaker's age as of the recording date.
SpeakerGender: Gender of this speaker. "F" means female. "M" means male.

`ResidentialHistory` element

The history of residence of this speaker in retroactive order.

`Sessions` element

A set of sessions in which this speaker participated.

`SessionInfo` element

Information of a session.

Attributes to `SessionInfo`

SessionID: Session ID of this session.
Channel: Indicates the channel in which this speaker's speech is recorded for this session.
PartnerID: The speaker ID of the speaker with whom this speaker is talking.

UU Database User's Manual

1 Overview

File Listings

Session ID

Speech Data

XML Document

Derivative Data

Using XML Utilities

2 Design and Building of the UU Database

The Four-Frame Cartoon Sorting Task

Dialogue Recording

Identifying the Unit of Utterance

Transcription

Paralinguistic Information Annotation

3 Structure of the UU Database

Elements of the UUDB XML Document

Attributes to Session

SessionComment element

Comment element

Attributes to Comment

Utterance element

Attributes to Utterance

UtteranceComment element

Comment element

Attributes to Comment

EmotionalState element

Rating element

Attributes to Rating

Child elements of Utterance

NonLinguisticSound element

Attributes to NonLinguisticSound

ShortPause element

Attributes to ShortPause

Chunk element

Attributes to Chunk

Mora element

Attributes to Mora

ExternalNoise element

Noise element

Attributes to ExternalNoise

Structure of the speaker information file

Speaker element

SpeakerInfo element

Attributes to SpeakerInfo

ResidentialHistory element

Sessions element

SessionInfo element

Attributes to SessionInfo

Attributes to `Session`

`SessionComment` element

`Comment` element

Attributes to `Comment`

`Utterance` element

Attributes to `Utterance`

`UtteranceComment` element

`Comment` element

Attributes to `Comment`

`EmotionalState` element

`Rating` element

Attributes to `Rating`

Child elements of `Utterance`

`NonLinguisticSound` element

Attributes to `NonLinguisticSound`

`ShortPause` element

Attributes to `ShortPause`

`Chunk` element

Attributes to `Chunk`

`Mora` element

Attributes to `Mora`

`ExternalNoise` element

`Noise` element

Attributes to `ExternalNoise`

`Speaker` element

`SpeakerInfo` element

Attributes to `SpeakerInfo`

`ResidentialHistory` element

`Sessions` element

`SessionInfo` element

Attributes to `SessionInfo`