This readme.txt file was generated on 20201117 by Hao Lu ------------------- GENERAL INFORMATION ------------------- 1. Head and eye movements of normal hearing and hearing impaired participants during three-party conversations 2. Author Information Principal Investigator Contact Information Name: Hao Lu Institution: Department of Psychology, University of Minnesota Address:75 East River Parkway, Minneapolis, Minnesota 55455, USA Email:luxx0489@umn.edu ORCID:https://orcid.org/0000-0001-5478-9244 3. Date of data collection (single date, range, approximate date) From 20170626 to 20171114 4. Geographic location of data collection (where was data collected?): Multi-sensory perception lab S39 Elliott Hall 75 East River Parkway Minneapolis, MN 5. Information about funding sources that supported the collection of the data: The project was sponsored by Starkey Laboratories -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC-BY-NC 3.0: https://creativecommons.org/licenses/by-nc/3.0/us/ 2. Links to publications that cite or use the data: [insert paper info here] 3. Links to other publicly accessible locations of the data: 4. Links/relationships to ancillary data sets: 5. Was data derived from another source? If yes, list source(s): 6. Recommended citation for the data: --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: metadata.csv Short description: the general information and experiment settings for each participant. The audiogram of subject24 and subject 26 were too good to be considered as older hearing impaired participants and not good enough to be older normal-hearing participants. We recommend dropping those two participants in group analysis, and data from those two participants was also dropeed in [insert paper info here]. Files are within continuous_data.zip: B. Filename: Sub[num]\Sub[num]Confederates_location.mat Short description: The estimated location of two confederates of Sub[num] C. Filename: Sub[num]\Sub[num]Gaze_movement.mat Short description: The gaze movement recorded during experiment of Sub[num] D. Filename: Sub[num]\Sub[num]Head_movement.mat Short description: The head movement recorded during experiment of Sub[num] E. Filename: Sub[num]\Sub[num]VAD_speech_label.mat Short description: The start and end time of speech segments during experiment labeld by VAD algorithm. All four background noise conditions (no noise, 50 dB, 60 dB, 70 dB SPL) were labeled. We recommend only using VAD labeled speech for analysis on all conditions of the participant and no noise or 50 dB noise conditions of the two confederates. F. Filename: Sub[num]\Sub[num]Manual_speech_label.mat Short description: The start and end time of speech segments during experiment labeled by human. Only the 60-dB and 70-dB background noise conditions were labeled. We recommend using manually labeled speech for analysis on 60 dB and 70 dB conditions of the two confederates. 2. Relationship between files: Each row of metadata.csv provided the settings for a participant, and files in each folder Sub[num] were the data recorded from the participant with the settings. 3. Additional related data collected that was not included in the current data package: N/A 4. Are there multiple versions of the dataset? yes/no If yes, list versions: Name of file that was updated: i. Why was the file updated? ii. When was the file updated? Name of file that was updated: i. Why was the file updated? ii. When was the file updated? No -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: For detailed description of methods with diagram, please see [insert paper info here]. The experiment took place inside an acoustic booth with internal dimensions of 10' by 13' by 8.5' height (ETS-Lindgren Acoustic Systems, Cedar Park, TX) at the University of Minnesota’s Center for Applied and Translational Sensory Science (CATSS). The inside of the booth was carpeted and all walls (including the ceiling) were lined with 4" foam to reduce reflections and reverberation. During each experiment session, a participant sat on a chair placed at the center of the acoustic chamber, in front of a square coffee table (60 cm by 60 cm). The room layout was shown in Figure 2. Two female graduate students were recruited as confederates in this experiment, and they sat around the coffee table with the participant to have a group conversation. The three people in the booth (the participant and two confederates) were seated approximately equidistant from each other around the table, and the angle between the two confederates from the perspective of the participant was approximately 60°. The distance between any two of the three people in the booth was approximately 100 cm. The participant and two confederates were seated facing the center of the coffee table, but they could freely move their head during conversation. No script was given to the confederates before the experiment began, and the same instructions were given to the participant and confederates at the beginning of each experimental session: they could talk about any topic they were interested in as a group, while trying to avoid any one of the participants dominating the conversation. Participants were aware that their head and eye movement during experiment was recorded, but they did not know the hypothesis of the experiment. The same two confederates were used throughout the experiment, and the relative positions of two confederates (to the left or right of the participant) were balanced across participants. The participant and the two confederates each wore a Sennheiser lapel microphone (Wedemark, Germany) so their speech could be recorded. In addition to microphone, the participant also wore a pair of Tobii Pro Glasses 2 (Danderyd, Sweden), which could record a participant’s eye movements at a sampling rate of 100 Hz. Corrective lenses were paired with the Tobii glasses when needed. A built-in wide-angle scene camera between two glass lenses on the Tobii Glasses recorded the scene in front of participant during conversation at 30 frames per second. The background noise used in the experiment was recorded in an 80' x 56' x 13' restaurant during lunch hour with a CMC6 MK4 stereo cardioid microphone (Schoeps, Karlsruhe, Germany) in an ORTF configuration and an R-4 portable sound recorder (Roland, Los Angeles, CA) with two recording channels and 24-bit quantization at a 48-kHz sampling rate. The recorded background noise did not contain any intelligible speech. To provide more immersive experience for the participants seated at the center of the acoustic chamber, The recording was reproduced in the chamber via six loudspeakers (A’Diva Ti, Anthony Gallo Acoustics, San Antonio, TX) separated by angles of 60 degrees around the center of the chamber in the horizontal plane at approximately ear height. The loudspeakers were connected to a personal computer through XLS 1500 Power Amplifiers (Crown, Elkhart, IN) and Lynx Aurora 16 24-bit D/A converters with Lynx AES16e Sound Cards (Lynx Studio Technology, Costa Mesa, CA). The left and right channels of the recorded noise were delivered to three loudspeakers on participant’s left and three loudspeakers on participant’s right, respectively. The experiment began with a 5-minute acclimatization period, during which the participant and the two confederates were able to converse without any background noise. Following the acclimatization period, the next 20 minutes of conversation were divided into four 5-minute segments during each of which the background noise was set to one of the following levels: 50 dB SPL, 60 dB SPL, 70 dB SPL, and no noise. The order of noise level was balanced between participants through a repeated Latin square design. The level of noise was calibrated with a Bruel & Kjaer 2250 sound level meter (Nærum, Denmark) located at the participant’s seat at head height in the center of acoustic chamber. 2. Methods for processing the data: The eye camera on the Tobii Pro Glasses 2 directly recorded the eye movements relative to the head during conversation at the sample rate of 100Hz, and the recorded eye movements were exported via the Tobii Pro Lab software without filtering. On average, valid eye movements were recorded for over 80% of the total recording time, and the missing values were imputed with linear interpolation. The shared data only included the 80% valid eye movements. The wide-angle scene camera locating between user’s eyes on the Tobii glasses recorded the scene in front of the participant. Since the video camera on the glasses is stationary relative to the participant’s head, the participant’s head movements during conversation could be extracted from the video by estimating the camera movement during the conversation. As the participant and two confederates were seated on the chairs during the experiment, we assumed their heads were stationary relative to the room. Therefore, the camera movements relative to the faces of confederates were taken as the participants’ head movements relative to the room. The video files recorded by the scene camera were analyzed with the MATLAB R2015b face detection function in the computer vision toolbox (The Mathworks, Natick, MA) to label the confederates’ faces in each frame. By summing the head movements relative to the room (head movement) and eye movements relative to participant’s head (eye movement), the participant’s eye movements relative to the room (head+eye movement) were estimated. The three audio files recorded with lapel microphones of the participant and the confederates were analyzed through an unsupervised voice activation detection (VAD) algorithm (Lathoud et al., 2004) to label the beginnings and ends of speech segments spoken by the participant and the two confederates during the group conversations. To test the reliability of the speech segments labeled by the algorithm, the audio recordings from three experiment sessions (one from each participant group) were also manually labeled. The proportion of recording when the algorithm labeled speech was consistent with manually labeled speech was consistently above 80% for all three groups under no noise and 50dB SPL background noise. However, the consistency at 60 dB SPL and 70 dB SPL for two confederates can be as low as 20 – 30% in some conditions. To ensure the speech segments were labeled correctly, the speech segments of two confederates at 60 dB and 70 dB SPL noise were all replaced by manually labeled segments. The speech segments at noise level higher than 60 dB SPL were labeled by two graders and the consistency between two graders were around 90%. As the audio recordings from the participant and two confederates were separately labeled, the segmentation labeled by algorithm and human allow multiple talkers talking simultaneously. Because the participant’s speech was recorded by both the lapel microphone and the build-in microphone in the Tobii glasses, the delay between recording of two devices was estimated by finding the maximum of the cross-correlation function (built-in function of MATLAB R2015b Natick, MA) between two audio files, and the recordings between devices were synchronized by compensating for the estimated delay. The eye tracking data was exported with the Tobii Pro Lab software. Both eye tracking data and estimated head movements were converted to visual angles. The loaction of two confederates was also estimated through the face detection algorithm. All data were deidentified. The speech segments were manually labeled with Aegisub 3.2.2 (http://www.aegisub.org/) through inspecting the synchronized audio file recorded with lapel microphones and the synchronized video file reocrded with the camera on Tobii Glasses. 3. Instrument- or software-specific information needed to interpret the data: n/a 4. Standards and calibration information, if appropriate: n/a 5. Environmental/experimental conditions: n/a 6. Describe any quality-assurance procedures performed on the data: To test the reliability of the speech segments labeled by the algorithm, the audio recordings from three experiment sessions (one from each participant group) were also manually labeled. The proportion of recording when the algorithm labeled speech was consistent with manually labeled speech was consistently above 80% for all three groups under no noise and 50dB SPL background noise. However, the consistency at 60 dB SPL and 70 dB SPL for two confederates can be as low as 20 – 30% in some conditions. To ensure the speech segments were labeled correctly, the speech segments of two confederates at 60 dB and 70 dB SPL noise were all replaced by manually labeled segments. The speech segments at noise level higher than 60 dB SPL were labeled by two graders and the consistency between two graders were around 90%. As the audio recordings from the participant and two confederates were separately labeled, the segmentation labeled by algorithm and human allow multiple talkers talking simultaneously. 7. People involved with sample collection, processing, analysis and/or submission: The experiment was designed by Hao Lu, Martin F. McKinney, Tao Zhang and Andrew J. Oxenham We are grateful to Nathaniel Helwig for comments, Andrew Byrne for technical support, PuiYii Goh for data analysis, and Peggy Nelson for sharing the background noise audio. ----------------------------------------- File Tree for continuous_data.zip: ----------------------------------------- continuous_data ├── Icon\r ├── Sub1 │   ├── Icon\r │   ├── Sub1Confederates_location.mat │   ├── Sub1Gaze_movement.mat │   ├── Sub1Head_movement.mat │   ├── Sub1Manual_speech_label.mat │   └── Sub1VAD_speech_label.mat ├── Sub10 │   ├── Icon\r │   ├── Sub10Confederates_location.mat │   ├── Sub10Gaze_movement.mat │   ├── Sub10Head_movement.mat │   ├── Sub10Manual_speech_label.mat │   └── Sub10VAD_speech_label.mat ├── Sub11 │   ├── Icon\r │   ├── Sub11Confederates_location.mat │   ├── Sub11Gaze_movement.mat │   ├── Sub11Head_movement.mat │   ├── Sub11Manual_speech_label.mat │   └── Sub11VAD_speech_label.mat ├── Sub12 │   ├── Icon\r │   ├── Sub12Confederates_location.mat │   ├── Sub12Gaze_movement.mat │   ├── Sub12Head_movement.mat │   ├── Sub12Manual_speech_label.mat │   └── Sub12VAD_speech_label.mat ├── Sub13 │   ├── Icon\r │   ├── Sub13Confederates_location.mat │   ├── Sub13Gaze_movement.mat │   ├── Sub13Head_movement.mat │   ├── Sub13Manual_speech_label.mat │   └── Sub13VAD_speech_label.mat ├── Sub14 │   ├── Icon\r │   ├── Sub14Confederates_location.mat │   ├── Sub14Gaze_movement.mat │   ├── Sub14Head_movement.mat │   ├── Sub14Manual_speech_label.mat │   └── Sub14VAD_speech_label.mat ├── Sub15 │   ├── Icon\r │   ├── Sub15Confederates_location.mat │   ├── Sub15Gaze_movement.mat │   ├── Sub15Head_movement.mat │   ├── Sub15Manual_speech_label.mat │   └── Sub15VAD_speech_label.mat ├── Sub16 │   ├── Icon\r │   ├── Sub16Confederates_location.mat │   ├── Sub16Gaze_movement.mat │   ├── Sub16Head_movement.mat │   ├── Sub16Manual_speech_label.mat │   └── Sub16VAD_speech_label.mat ├── Sub17 │   ├── Icon\r │   ├── Sub17Confederates_location.mat │   ├── Sub17Gaze_movement.mat │   ├── Sub17Head_movement.mat │   ├── Sub17Manual_speech_label.mat │   └── Sub17VAD_speech_label.mat ├── Sub18 │   ├── Icon\r │   ├── Sub18Confederates_location.mat │   ├── Sub18Gaze_movement.mat │   ├── Sub18Head_movement.mat │   ├── Sub18Manual_speech_label.mat │   └── Sub18VAD_speech_label.mat ├── Sub19 │   ├── Icon\r │   ├── Sub19Confederates_location.mat │   ├── Sub19Gaze_movement.mat │   ├── Sub19Head_movement.mat │   ├── Sub19Manual_speech_label.mat │   └── Sub19VAD_speech_label.mat ├── Sub2 │   ├── Icon\r │   ├── Sub2Confederates_location.mat │   ├── Sub2Gaze_movement.mat │   ├── Sub2Head_movement.mat │   ├── Sub2Manual_speech_label.mat │   └── Sub2VAD_speech_label.mat ├── Sub20 │   ├── Icon\r │   ├── Sub20Confederates_location.mat │   ├── Sub20Gaze_movement.mat │   ├── Sub20Head_movement.mat │   ├── Sub20Manual_speech_label.mat │   └── Sub20VAD_speech_label.mat ├── Sub21 │   ├── Icon\r │   ├── Sub21Confederates_location.mat │   ├── Sub21Gaze_movement.mat │   ├── Sub21Head_movement.mat │   ├── Sub21Manual_speech_label.mat │   └── Sub21VAD_speech_label.mat ├── Sub22 │   ├── Icon\r │   ├── Sub22Confederates_location.mat │   ├── Sub22Gaze_movement.mat │   ├── Sub22Head_movement.mat │   ├── Sub22Manual_speech_label.mat │   └── Sub22VAD_speech_label.mat ├── Sub23 │   ├── Icon\r │   ├── Sub23Confederates_location.mat │   ├── Sub23Gaze_movement.mat │   ├── Sub23Head_movement.mat │   ├── Sub23Manual_speech_label.mat │   └── Sub23VAD_speech_label.mat ├── Sub24 │   ├── Icon\r │   ├── Sub24Confederates_location.mat │   ├── Sub24Gaze_movement.mat │   ├── Sub24Head_movement.mat │   ├── Sub24Manual_speech_label.mat │   └── Sub24VAD_speech_label.mat ├── Sub25 │   ├── Icon\r │   ├── Sub25Confederates_location.mat │   ├── Sub25Gaze_movement.mat │   ├── Sub25Head_movement.mat │   ├── Sub25Manual_speech_label.mat │   └── Sub25VAD_speech_label.mat ├── Sub26 │   ├── Icon\r │   ├── Sub26Confederates_location.mat │   ├── Sub26Gaze_movement.mat │   ├── Sub26Head_movement.mat │   ├── Sub26Manual_speech_label.mat │   └── Sub26VAD_speech_label.mat ├── Sub27 │   ├── Icon\r │   ├── Sub27Confederates_location.mat │   ├── Sub27Gaze_movement.mat │   ├── Sub27Head_movement.mat │   ├── Sub27Manual_speech_label.mat │   └── Sub27VAD_speech_label.mat ├── Sub28 │   ├── Icon\r │   ├── Sub28Confederates_location.mat │   ├── Sub28Gaze_movement.mat │   ├── Sub28Head_movement.mat │   ├── Sub28Manual_speech_label.mat │   └── Sub28VAD_speech_label.mat ├── Sub29 │   ├── Icon\r │   ├── Sub29Confederates_location.mat │   ├── Sub29Gaze_movement.mat │   ├── Sub29Head_movement.mat │   ├── Sub29Manual_speech_label.mat │   └── Sub29VAD_speech_label.mat ├── Sub3 │   ├── Icon\r │   ├── Sub3Confederates_location.mat │   ├── Sub3Gaze_movement.mat │   ├── Sub3Head_movement.mat │   ├── Sub3Manual_speech_label.mat │   └── Sub3VAD_speech_label.mat ├── Sub30 │   ├── Icon\r │   ├── Sub30Confederates_location.mat │   ├── Sub30Gaze_movement.mat │   ├── Sub30Head_movement.mat │   ├── Sub30Manual_speech_label.mat │   └── Sub30VAD_speech_label.mat ├── Sub4 │   ├── Icon\r │   ├── Sub4Confederates_location.mat │   ├── Sub4Gaze_movement.mat │   ├── Sub4Head_movement.mat │   ├── Sub4Manual_speech_label.mat │   └── Sub4VAD_speech_label.mat ├── Sub5 │   ├── Icon\r │   ├── Sub5Confederates_location.mat │   ├── Sub5Gaze_movement.mat │   ├── Sub5Head_movement.mat │   ├── Sub5Manual_speech_label.mat │   └── Sub5VAD_speech_label.mat ├── Sub6 │   ├── Icon\r │   ├── Sub6Confederates_location.mat │   ├── Sub6Gaze_movement.mat │   ├── Sub6Head_movement.mat │   ├── Sub6Manual_speech_label.mat │   └── Sub6VAD_speech_label.mat ├── Sub7 │   ├── Icon\r │   ├── Sub7Confederates_location.mat │   ├── Sub7Gaze_movement.mat │   ├── Sub7Head_movement.mat │   ├── Sub7Manual_speech_label.mat │   └── Sub7VAD_speech_label.mat ├── Sub8 │   ├── Icon\r │   ├── Sub8Confederates_location.mat │   ├── Sub8Gaze_movement.mat │   ├── Sub8Head_movement.mat │   ├── Sub8Manual_speech_label.mat │   └── Sub8VAD_speech_label.mat └── Sub9 ├── Icon\r ├── Sub9Confederates_location.mat ├── Sub9Gaze_movement.mat ├── Sub9Head_movement.mat ├── Sub9Manual_speech_label.mat └── Sub9VAD_speech_label.mat ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: metadata.csv ----------------------------------------- 1. Number of variables: 9 2. Number of cases/rows: 30 3. Missing data codes: N/a 4. Variable List Example. Name: Gender Description: Gender of respondent 1 = Male 2 = Female 3 = Other A. Name: SubjectCode Description: The code which can be used to find the folder Sub[num] with data collected from certain participant B. Name: Date Description: Date when the data was collected, MMDDYYYY format C. Name: Group Description: The group to which the participant belongs NH-Y = Young normal hearing NH-A = Older normal hearing HI-A = Older hearing impaired D. Name: ConfederatePosition Description: The location arrangement of the two confederates during a experiment. The location was fixed for one participant through all conditions. 1 = Confederate A on the left hand side of the participant, Confederate B on the right hand side of the participant 2 = Confederate A on the right hand side of the participant, Confederate B on the left hand side of the participant E. Name: FirstConditionStart Description: The start time of the first 320-s session with certain level of background noise. Before this start time was the experiment setup and 5-min silence for confederates and participant to warm up. Use this start time to obtain behavior under certain level of background noise from the continuous recordings in Sub[num] folder. unit: seconds F. Name: Condition1 Description: The background noise level during the first 320-s session G. Name: Condition2 Description: The background noise level during the second 320-s session H. Name: Condition3 Description: The background noise level during the third 320-s session I. Name: Condition4 Description: The background noise level during the forth 320-s session ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: sub[num]/sub[num]Confederates_location.mat ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: 2 3. Missing data codes: N/A 4. Variable List It should be noticed that the "variable" used here is defined as the MATLAB variable loaded from the *.mat files, which can be a data array or matrix and is different from the column variable used in metadata.csv A. Name: ConfePo Description: The location of the two confederates. The unit was in degree from participant's visual angle. Two values always sums to zero: the negative one was the location of the confederate on participant's left and the positive one was the location of the confederate on participant's right. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Sub[num]\Sub[num]Gaze_movement ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: not fixed 3. Missing data codes: NAN failed to record head or eye movement 4. Variable List It should be noticed that the "variable" used here is defined as the MATLAB variable loaded from the *.mat files, which can be a data array or matrix and is different from the column variable used in metadata.csv A. Name: HeadEyePo Description: This is a n*2 matrix. The first column was the time stamp in ms, and the second column was the horizontal eye gaze angle in degree where positive values were to the right and negative values were to the left. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Sub[num]\Sub[num]Head_movement ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: not fixed 3. Missing data codes: NAN failed to record head or eye movement 4. Variable List It should be noticed that the "variable" used here is defined as the MATLAB variable loaded from the *.mat files, which can be a data array or matrix and is different from the column variable used in metadata.csv A. Name: HeadEyePo Description: This is a n*2 matrix. The first column was the time stamp in ms, and the second column was the horizontal estimated head movement angle in degree where positive values were to the right and negative values were to the left. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Sub[num]\Sub[num]Manual_speech_label ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: 3 3. Missing data codes: N/A 4. Variable List A. Name: multiseg_manual Description: This is a MATLAB struct array with three elements, and each element is a 2*n matrix called seg. Three elements are the speech segments from the participant, confederate A and confederate B. Within each element, the two rows of the matrix are the start and end times of the speech segments. In multiseg_manual, only the speech segments during the 60-dB and 70dB background noise conditions were labeled. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Sub[num]\Sub[num]VAD_speech_label ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: 3 3. Missing data codes: N/A 4. Variable List It should be noticed that the "variable" used here is defined as the MATLAB variable loaded from the *.mat files, which can be a data array or matrix and is different from the column variable used in metadata.csv A. Name: multiseg Description: This is a MATLAB struct array with three elements, and each element is a 2*n matrix called seg. Three elements are the speech segments from the participant, confederate A and confederate B. Within each element, the two rows of the matrix are the start and end times of the speech segments. In multiseg, all speech has been labeled, but we recommend using the manually labeled speech for analysis involving speech during the 60-dB and 70-dB noise.