/00README.txt README /Sessions Speech data and XML documents for each session /Sessions/C001 Data for Session C001 /Sessions/C001/C001.sd Speech data for Session C001 (2 channels, ESPS format) /Sessions/C001/C001.wav Speech data for Session C001 (2 channels, Microsoft WAVE format) /Sessions/C001/C001.xml XML document for Session C001 /Sessions/C002 Data for Session C002 /Sessions/C003 Data for Session C003 /Sessions/C004 Data for Session C004 /Sessions/C005 Data for Session C005 /Sessions/C006 Data for Session C006 /Sessions/C007 Data for Session C007 /Sessions/C011 Data for Session C011 /Sessions/C012 Data for Session C012 /Sessions/C013 Data for Session C013 /Sessions/C021 Data for Session C021 /Sessions/C022 Data for Session C022 /Sessions/C023 Data for Session C023 /Sessions/C024 Data for Session C024 /Sessions/C031 Data for Session C031 /Sessions/C032 Data for Session C032 /Sessions/C033 Data for Session C033 /Sessions/C041 Data for Session C041 /Sessions/C042 Data for Session C042 /Sessions/C043 Data for Session C043 /Sessions/C051 Data for Session C051 /Sessions/C052 Data for Session C052 /Sessions/C053 Data for Session C053 /Sessions/C061 Data for Session C061 /Sessions/C062 Data for Session C062 /Sessions/C063 Data for Session C063 /Sessions/C064 Data for Session C064 /Sessions/speakers.xml Speaker information file /doc Documents /doc/manual.pdf UUDB User's Manual (Japanese) /doc/resp.mat Filter coefficients for downsampling /tools XML-related utilities /var Derivative data (Recoverable from speech data and XML documents) /var/C001 Derivative data for Session C001 /var/C001/C001.list Per-utterance speech file name list for Session C001 (in order) /var/C001/C001.txt Orthographic transcription file /var/C001/C001.syl Syllabic transcription file /var/C001/C001.para Averaged rating file of paralinguistic information /var/C001/C001L_001.wav Speech file of 1st utterance by speaker on channel L /var/C001/C001L_002.wav Speech file of 2nd utterance by speaker on channel L . . /var/C001/C001R_001.wav Speech file of 1st utterance by speaker on channel R /var/C001/C001R_002.wav Speech file of 2nd utterance by speaker on channel R . .
All the information that the UU Database provides is integrated into speech data under the directory /Sessions
, and XML documents.
In the UU Database, a series of utterance in a discourse is treated as a unit called session. Each session corresponds to a series of utterance for a given 4-frame cartoon.
Each session is identified by a 4-character ID starts with "C".
The character following to "C" is always "0" in this release.
The next character represents a serial number for participant pair. For example, "3" is given to the pair of speaker FKC and FUE.
The following character is a session number for the pair.
Thus, for example, the Session ID C031 indicates the first session by the pair of speaker FKC and FUE.
Speech data for each session are stored in the directory identified by its Session ID under the directory /Sessions
, in both ESPS format and Microsoft WAVE format. For example, speech data for the session C031 are /Sessions/C031/C031.sd
(ESPS format) and /Sessions/C031/C031.wav
(WAVE format). Both speech data are two-channel speech of the whole span of the session, and identical except the latter lack Session Start Time information. This redundancy comes from the requirement of ESPS (not available any longer) for cross conversion.
The XML document for each session is stored as a UUDB XML Document. For example, the XML document for the session C031 is /Sessions/C031/C031.xml
.
For users who are busy or not so familiar with XML, various types of ready-to-use data generated from the speech data and the XML documents are provided as derivative data, which can be found under the directory /var
. The derivative data can be recovered with accompanied utilities described in Using XML Utilities section.
For taking a quick glance at the database, transcription files can be a good start point. Two kinds of transcription files are provided; both are in the Shift JIS encoding. Although the encoding is very popular for Japanese users, other users might find it difficult to handle the files. Currently they have several options (requires an XSLT processor):
/tools/xml2txt.xsl
./tools/xml2syl.xsl
.The next release of UUDB will supply utilities for non-Japanese environment support.
The directory /tools
contains the files listed below.
/tools/SplitUtterance Source files of the utterance splitting utility /tools/SplitUtterance.sh Shell script for batch execution of the utterance splitting utility /tools/xml2txt.xsl Stylesheet for generating orthographic transcription files (EUC encoding) /tools/xml2txt-sjis.xsl Stylesheet for generating orthographic transcription files (Shift JIS encoding) /tools/xml2syl.xsl Stylesheet for generating phonetic transcription files (EUC encoding) /tools/xml2syl-sjis.xsl Stylesheet for generating phonetic transcription files (Shift JIS encoding) /tools/transcription.sh Shell script for batch execution of transcription file generation /tools/xml2list.xsl Stylesheet for generating per-utterance speech file name lists /tools/list.sh Shell script for batch execution of the per-utterance speech file name list generation /tools/xml2para.xsl Stylesheet for generating averaged rating files of paralinguistic information /tools/para.sh Shell script for batch execution of averaged rating file generation /tools/uudb.rng RELAX NG schama of the UUDB XML Document
To run the utterance splitting utility, the Java SE development environment is required.
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
In the UU Database, each individual discourse is treated as a session. An XML document of the UU Database is a single file for a session, and its root element is Session
.
Session
SessionID
Session ID of the session for which this XML document describes.
Date
The date of recording.
LSpeakerID
ID of the speaker whose speech is recorded in the first channel (normally the left channel) for this session.
RSpeakerID
ID of the speaker whose speech is recorded in the second channel (normally the right channel) for this session.
MaterialNo
The cartoon material number used for this session.
Lhas
Enumerated frame numbers that the speaker with the ID LSpeakerID
has. For example, if a speaker has the (originally) fourth and second frames, the attribute value is "4,2".
Rhas
Enumerated frame numbers that the speaker with the ID RSpeakerID
has.
TimePressure
True if this session was performed under a time constraint.
SessionStartTime
The time at which this session started (in seconds). Identical to the start time recorded in the header of this session's speech data in the ESPS format. All sorts of start/end time indicated in the XML documents (UtteranceStartTime
, UtteranceEndTime
, StartTime
, EndTime
) regard this time as origin. For example, if SessionStartTime
is 2.0s, UtteranceStartTime
is 10.1s and UtteranceEndTime
is 11.1s, the utterance started at 8.1s after the beginning of this session, and ended at 9.1s after the beginning of the session.
SessionComment
elementComments for this session. Just for developmental use. Users should not rely on this information.
Comment
elementA comment.
Comment
CommentStrings
Comment strings.
Utterance
elementAn utterance. The maximum unit for description in the UU database.
Each session is composed of a sequence of utterances. Utterances are arranged in the progressing order according to the start time.
Utterance
UtteranceID
ID of this utterance. Serial number where the first utterance of each session is "001".
Channel
Indicates the speaker of this utterance. If "L", this utterance is of the speaker with the ID LSpeakerID
. If "R", this utterance is of the speaker with the ID RSpeakerID
.
UtteranceStartTime
The start time of this utterance. cf. SessionStartTime
UtteranceEndTime
The end time of this utterance. cf. SessionStartTime
SlashUnit
A keyword that designates whether the end of this utterance agrees with the end of slash unit. Default value is "complete".
If "complete", the end of this utterance is the end of slash unit.
If "incomplete", the end of this utterance is not the end of slash unit, and the slash unit continues to the next utterance.
If "irrelevant", this utterance is not involved with identifying slash units. Applies in the case where the whole utterance is composed of nonlinguistic sounds or short fragments that can hardly be a slash unit.
UtteranceComment
elementComments for this utterance. Just for developmental use. Users should not rely on this information.
Comment
elementA comment.
Comment
CommentStrings
Comment strings.
EmotionalState
elementA set of paralinguistic information annotations for this utterance.
Rating
elementThe perceived emotional states of the speaker that an annotator evaluated for this utterance on a 7-point scale for six abstract dimensions.
Rating
AnnotatorID
The annotator ID.
Pleasantness
The rating for the "pleasant-unpleasant" dimension. Evaluation of speaker's feeling.
Arousal
The rating for the "aroused-sleepy" dimension. Evaluation of the speaker's mental activity.
Dominance
The rating for the "dominant-submissive" dimension. Evaluation of the degree at which the speaker leads the communication to the another party.
Credibility
The rating for the "credible-doubtful" dimension. Evaluation of the degree at which the speaker believes the another party.
Interest
The rating for the "interested-indifferent" dimension. Evaluation of the degree at which the speaker is interested in the another party or her/his utterance.
Positivity
The rating for the "positive-negative" dimension. Evaluation of the degree at which the speaker evaluates the another party's utterance positively.
Utterance
Each utterance is a sequence whose constituents are either "nonlinguistic sound", "short pause" or "chunk". These elements are contiguous and not overlapping. Therefore, the speech duration of an utterance can be calculated by subtracting the total sum of the duration of nonlinguistic sound and short pause within the utterance from the duration of the utterance
(= UtteranceEndTime
− UtteranceStartTime)
.
NonLinguisticSound
elementNonlinguistic sound. Originated from the speaker. Does not occur simultaneously with speech sound.
Unlike speech sounds (chunks), identification of nonlinguistic sound is not comprehensive.
NonLinguisticSound
ChunkID
Constituent ID of this nonlinguistic sound. Serial number where the first constituent of each utterance is "1".
StartTime
The start time of this nonlinguistic sound. cf. SessionStartTime
EndTime
The end time of this nonlinguistic sound. cf. SessionStartTime
TagBreath
True if this nonlinguistic sound is a breathing sound.
TagLaugh
True if this nonlinguistic sound is a laughing sound.
Applies to pure laughing sound; speech portions with laughing are irrelevant.
TagSigh
True if this nonlinguistic sound is a sigh.
TagCough
True if this nonlinguistic sound is a cough or throat clearing.
ShortPause
elementShort pause within utterance.
ShortPause
ChunkID
Constituent ID of this short pause. Serial number where the first constituent of each utterance is "1".
The start time of this short pause. cf. SessionStartTime
EndTime
The end time of this short pause. cf. SessionStartTime
Chunk
elementA stretch of speech sound. A speech continuum not being divided by nonlinguistic sounds, short pauses, or element boundaries.
Chunk
ChunkID
Constituent ID of this chunk. Serial number where the first constituent of each utterance is "1".
OrthographicTranscription
Orthographic transcription for this chunk.
PhoneticTranscription
Phonetic transcription for this chunk by katakana, which can directly be transformed to a phoneme sequence.
Disfluency
True if this chunk is a slip of the tongue (mispronunciation) or a repetition. Some disfluent chunks are followed by a self repair; others are not.
Filler
True if this chunk is a filler.
Backchannel
True if this chunk is a backchannel (aizuchi).
Conjunction
True if this chunk is a conjunction.
DiscourseMarker
True if this chunk is a discourse marker.
TagS
(experimental) True if this chunk is marked as S (something like shout or surprise).
EndOfSlashUnit
True if the slash unit ends prematurely at this chunk.
Mora
elementA mora. A chunk is composed of a sequence of morae.
Mora
MoraID
ID of this mora. Serial number where the first mora of each chunk is "1".
MoraEntity
A symbol of this mora. Written in katakana, just the same way as PhoneticTranscription
.
ExternalNoise
elementA set of external noise occurred during this utterance.
Noise
elementAn external noise. (not complehensive, not objective)
ExternalNoise
NoiseID
ID of this external noise. Serial number where the first external noise appeared in each utterance is "1".
StartTime
The start time of this external noise. cf. SessionStartTime
EndTime
The end time of this external noise. cf. SessionStartTime
The speaker information file is an XML document, whose root element is Speakers
.
Speaker
elementA speaker.
SpeakerInfo
elementPersonal information for this speaker.
SpeakerInfo
SpeakerID
Speaker ID of this speaker.
SpeakerAge
This speaker's age as of the recording date.
SpeakerGender
Gender of this speaker. "F" means female. "M" means male.
ResidentialHistory
elementThe history of residence of this speaker in retroactive order.
Sessions
elementA set of sessions in which this speaker participated.
SessionInfo
elementInformation of a session.
SessionInfo
SessionID
Session ID of this session.
Channel
Indicates the channel in which this speaker's speech is recorded for this session.
PartnerID
The speaker ID of the speaker with whom this speaker is talking.