/00README.txt README /Sessions Speech data and XML documents for each session /Sessions/C001 Data for Session C001 /Sessions/C001/C001.sd Speech data for Session C001 (2 channels, ESPS format) /Sessions/C001/C001.wav Speech data for Session C001 (2 channels, Microsoft WAVE format) /Sessions/C001/C001.xml XML document for Session C001 /Sessions/C002 Data for Session C002 /Sessions/C003 Data for Session C003 /Sessions/C004 Data for Session C004 /Sessions/C005 Data for Session C005 /Sessions/C006 Data for Session C006 /Sessions/C007 Data for Session C007 /Sessions/C011 Data for Session C011 /Sessions/C012 Data for Session C012 /Sessions/C013 Data for Session C013 /Sessions/C021 Data for Session C021 /Sessions/C022 Data for Session C022 /Sessions/C023 Data for Session C023 /Sessions/C024 Data for Session C024 /Sessions/C031 Data for Session C031 /Sessions/C032 Data for Session C032 /Sessions/C033 Data for Session C033 /Sessions/C041 Data for Session C041 /Sessions/C042 Data for Session C042 /Sessions/C043 Data for Session C043 /Sessions/C051 Data for Session C051 /Sessions/C052 Data for Session C052 /Sessions/C053 Data for Session C053 /Sessions/C061 Data for Session C061 /Sessions/C062 Data for Session C062 /Sessions/C063 Data for Session C063 /Sessions/C064 Data for Session C064 /Sessions/speakers.xml Speaker information file /doc Documents /doc/manual.pdf UUDB User's Manual (Japanese) /doc/resp.mat Filter coefficients for downsampling /tools XML-related utilities /var Derivative data (Recoverable from speech data and XML documents) /var/C001 Derivative data for Session C001 /var/C001/C001.list Per-utterance speech file name list for Session C001 (in order) /var/C001/C001.txt Orthographic transcription file /var/C001/C001.syl Syllabic transcription file /var/C001/C001.para Averaged rating file of paralinguistic information /var/C001/C001L_001.wav Speech file of 1st utterance by speaker on channel L /var/C001/C001L_002.wav Speech file of 2nd utterance by speaker on channel L . . /var/C001/C001R_001.wav Speech file of 1st utterance by speaker on channel R /var/C001/C001R_002.wav Speech file of 2nd utterance by speaker on channel R . .
All the information that the UU Database provides is integrated into speech data under the directory /Sessions, and XML documents.
In the UU Database, a series of utterance in a discourse is treated as a unit called session. Each session corresponds to a series of utterance for a given 4-frame cartoon.
Each session is identified by a 4-character ID starts with "C".
The character following to "C" is always "0" in this release.
The next character represents a serial number for participant pair. For example, "3" is given to the pair of speaker FKC and FUE.
The following character is a session number for the pair.
Thus, for example, the Session ID C031 indicates the first session by the pair of speaker FKC and FUE.
Speech data for each session are stored in the directory identified by its Session ID under the directory /Sessions, in both ESPS format and Microsoft WAVE format. For example, speech data for the session C031 are /Sessions/C031/C031.sd (ESPS format) and /Sessions/C031/C031.wav (WAVE format). Both speech data are two-channel speech of the whole span of the session, and identical except the latter lack Session Start Time information. This redundancy comes from the requirement of ESPS (not available any longer) for cross conversion.
The XML document for each session is stored as a UUDB XML Document. For example, the XML document for the session C031 is /Sessions/C031/C031.xml.
For users who are busy or not so familiar with XML, various types of ready-to-use data generated from the speech data and the XML documents are provided as derivative data, which can be found under the directory /var. The derivative data can be recovered with accompanied utilities described in Using XML Utilities section.
For taking a quick glance at the database, transcription files can be a good start point. Two kinds of transcription files are provided; both are in the Shift JIS encoding. Although the encoding is very popular for Japanese users, other users might find it difficult to handle the files. Currently they have several options (requires an XSLT processor):
/tools/xml2txt.xsl./tools/xml2syl.xsl.The next release of UUDB will supply utilities for non-Japanese environment support.
The directory /tools contains the files listed below.
/tools/SplitUtterance Source files of the utterance splitting utility /tools/SplitUtterance.sh Shell script for batch execution of the utterance splitting utility /tools/xml2txt.xsl Stylesheet for generating orthographic transcription files (EUC encoding) /tools/xml2txt-sjis.xsl Stylesheet for generating orthographic transcription files (Shift JIS encoding) /tools/xml2syl.xsl Stylesheet for generating phonetic transcription files (EUC encoding) /tools/xml2syl-sjis.xsl Stylesheet for generating phonetic transcription files (Shift JIS encoding) /tools/transcription.sh Shell script for batch execution of transcription file generation /tools/xml2list.xsl Stylesheet for generating per-utterance speech file name lists /tools/list.sh Shell script for batch execution of the per-utterance speech file name list generation /tools/xml2para.xsl Stylesheet for generating averaged rating files of paralinguistic information /tools/para.sh Shell script for batch execution of averaged rating file generation /tools/uudb.rng RELAX NG schama of the UUDB XML Document
To run the utterance splitting utility, the Java SE development environment is required.
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
(This section is a stub.)
In the UU Database, each individual discourse is treated as a session. An XML document of the UU Database is a single file for a session, and its root element is Session.
SessionSessionIDSession ID of the session for which this XML document describes.
DateThe date of recording.
LSpeakerIDID of the speaker whose speech is recorded in the first channel (normally the left channel) for this session.
RSpeakerIDID of the speaker whose speech is recorded in the second channel (normally the right channel) for this session.
MaterialNoThe cartoon material number used for this session.
LhasEnumerated frame numbers that the speaker with the ID LSpeakerID has. For example, if a speaker has the (originally) fourth and second frames, the attribute value is "4,2".
RhasEnumerated frame numbers that the speaker with the ID RSpeakerID has.
TimePressureTrue if this session was performed under a time constraint.
SessionStartTimeThe time at which this session started (in seconds). Identical to the start time recorded in the header of this session's speech data in the ESPS format. All sorts of start/end time indicated in the XML documents (UtteranceStartTime, UtteranceEndTime, StartTime, EndTime) regard this time as origin. For example, if SessionStartTime is 2.0s, UtteranceStartTime is 10.1s and UtteranceEndTime is 11.1s, the utterance started at 8.1s after the beginning of this session, and ended at 9.1s after the beginning of the session.
SessionComment elementComments for this session. Just for developmental use. Users should not rely on this information.
Comment elementA comment.
CommentCommentStringsComment strings.
Utterance elementAn utterance. The maximum unit for description in the UU database.
Each session is composed of a sequence of utterances. Utterances are arranged in the progressing order according to the start time.
UtteranceUtteranceIDID of this utterance. Serial number where the first utterance of each session is "001".
ChannelIndicates the speaker of this utterance. If "L", this utterance is of the speaker with the ID LSpeakerID. If "R", this utterance is of the speaker with the ID RSpeakerID.
UtteranceStartTimeThe start time of this utterance. cf. SessionStartTime
UtteranceEndTimeThe end time of this utterance. cf. SessionStartTime
SlashUnitA keyword that designates whether the end of this utterance agrees with the end of slash unit. Default value is "complete".
If "complete", the end of this utterance is the end of slash unit.
If "incomplete", the end of this utterance is not the end of slash unit, and the slash unit continues to the next utterance.
If "irrelevant", this utterance is not involved with identifying slash units. Applies in the case where the whole utterance is composed of nonlinguistic sounds or short fragments that can hardly be a slash unit.
UtteranceComment elementComments for this utterance. Just for developmental use. Users should not rely on this information.
Comment elementA comment.
CommentCommentStringsComment strings.
EmotionalState elementA set of paralinguistic information annotations for this utterance.
Rating elementThe perceived emotional states of the speaker that an annotator evaluated for this utterance on a 7-point scale for six abstract dimensions.
RatingAnnotatorIDThe annotator ID.
PleasantnessThe rating for the "pleasant-unpleasant" dimension. Evaluation of speaker's feeling.
ArousalThe rating for the "aroused-sleepy" dimension. Evaluation of the speaker's mental activity.
DominanceThe rating for the "dominant-submissive" dimension. Evaluation of the degree at which the speaker leads the communication to the another party.
CredibilityThe rating for the "credible-doubtful" dimension. Evaluation of the degree at which the speaker believes the another party.
InterestThe rating for the "interested-indifferent" dimension. Evaluation of the degree at which the speaker is interested in the another party or her/his utterance.
PositivityThe rating for the "positive-negative" dimension. Evaluation of the degree at which the speaker evaluates the another party's utterance positively.
UtteranceEach utterance is a sequence whose constituents are either "nonlinguistic sound", "short pause" or "chunk". These elements are contiguous and not overlapping. Therefore, the speech duration of an utterance can be calculated by subtracting the total sum of the duration of nonlinguistic sound and short pause within the utterance from the duration of the utterance
(= UtteranceEndTime − UtteranceStartTime).
NonLinguisticSound elementNonlinguistic sound. Originated from the speaker. Does not occur simultaneously with speech sound.
Unlike speech sounds (chunks), identification of nonlinguistic sound is not comprehensive.
NonLinguisticSoundChunkIDConstituent ID of this nonlinguistic sound. Serial number where the first constituent of each utterance is "1".
StartTimeThe start time of this nonlinguistic sound. cf. SessionStartTime
EndTimeThe end time of this nonlinguistic sound. cf. SessionStartTime
TagBreathTrue if this nonlinguistic sound is a breathing sound.
TagLaughTrue if this nonlinguistic sound is a laughing sound.
Applies to pure laughing sound; speech portions with laughing are irrelevant.
TagSighTrue if this nonlinguistic sound is a sigh.
TagCoughTrue if this nonlinguistic sound is a cough or throat clearing.
ShortPause elementShort pause within utterance.
ShortPauseChunkIDConstituent ID of this short pause. Serial number where the first constituent of each utterance is "1".
The start time of this short pause. cf. SessionStartTime
EndTimeThe end time of this short pause. cf. SessionStartTime
Chunk elementA stretch of speech sound. A speech continuum not being divided by nonlinguistic sounds, short pauses, or element boundaries.
ChunkChunkIDConstituent ID of this chunk. Serial number where the first constituent of each utterance is "1".
OrthographicTranscriptionOrthographic transcription for this chunk.
PhoneticTranscriptionPhonetic transcription for this chunk by katakana, which can directly be transformed to a phoneme sequence.
DisfluencyTrue if this chunk is a slip of the tongue (mispronunciation) or a repetition. Some disfluent chunks are followed by a self repair; others are not.
FillerTrue if this chunk is a filler.
BackchannelTrue if this chunk is a backchannel (aizuchi).
ConjunctionTrue if this chunk is a conjunction.
DiscourseMarkerTrue if this chunk is a discourse marker.
TagS(experimental) True if this chunk is marked as S (something like shout or surprise).
EndOfSlashUnitTrue if the slash unit ends prematurely at this chunk.
Mora elementA mora. A chunk is composed of a sequence of morae.
MoraMoraIDID of this mora. Serial number where the first mora of each chunk is "1".
MoraEntityA symbol of this mora. Written in katakana, just the same way as PhoneticTranscription.
ExternalNoise elementA set of external noise occurred during this utterance.
Noise elementAn external noise. (not complehensive, not objective)
ExternalNoiseNoiseIDID of this external noise. Serial number where the first external noise appeared in each utterance is "1".
StartTimeThe start time of this external noise. cf. SessionStartTime
EndTimeThe end time of this external noise. cf. SessionStartTime
The speaker information file is an XML document, whose root element is Speakers.
Speaker elementA speaker.
SpeakerInfo elementPersonal information for this speaker.
SpeakerInfoSpeakerIDSpeaker ID of this speaker.
SpeakerAgeThis speaker's age as of the recording date.
SpeakerGenderGender of this speaker. "F" means female. "M" means male.
ResidentialHistory elementThe history of residence of this speaker in retroactive order.
Sessions elementA set of sessions in which this speaker participated.
SessionInfo elementInformation of a session.
SessionInfoSessionIDSession ID of this session.
ChannelIndicates the channel in which this speaker's speech is recorded for this session.
PartnerIDThe speaker ID of the speaker with whom this speaker is talking.