Classifying Heart Sounds Challenge Peter Bentley, Glenn Nordehn, Miguel Coimbra, Shie Mannor, Rita Getz
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
According to the World Health Organisation, cardiovascular diseases (CVDs) are the number one cause of death globally: more people die annually from CVDs than from any other cause. An estimated 17.1 million people died from CVDs in 2004, representing 29% of all global deaths. Of these deaths, an estimated 7.2 million were due to coronary heart disease. Any method which can help to detect signs of heart disease could therefore have a significant impact on world health. This challenge is to produce methods to do exactly that. Specifically, we are interested in creating the first level of screening of cardiac pathologies both in a Hospital environment by a doctor (using a digital stethoscope) and at home by the patient (using a mobile device). The problem is of particular interest to machine learning researchers as it involves classification of audio sample data, where distinguishing between classes of interest is non-trivial. Data is gathered in real-world situations and frequently contains background noise of every conceivable type. The differences between heart sounds corresponding to different heart symptoms can also be extremely subtle and challenging to separate. Success in classifying this form of data requires extremely robust classifiers. Despite its medical significance, to date this is a relatively unexplored application for machine learning.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data has been gathered from two sources: (A) from the general public via the iStethoscope Pro iPhone app, provided in Dataset A, and (B) from a clinic trial in hospitals using the digital stethoscope DigiScope, provided in Dataset B. CHALLENGE 1 - Heart Sound Segmentation The first challenge is to produce a method that can locate S1(lub) and S2(dub) sounds within audio data, segmenting the Normal audio files in both datasets. To enable your machine learning method to learn we provide the exact location of S1 and S2 sounds for some of the audio files. You need to use them to identify and locate the S1 and S2 sounds of all the heartbeats in the unlabelled group. The locations of sounds are measured in audio samples for better precision. Your method must use the same unit. CHALLENGE 2 - Heart Sound Classification The task is to produce a method that can classify real heart audio (also known as “beat classification”) into one of four categories for Dataset A:
and three classes for Dataset B:
You may tackle either or both of these challenges. If you can solve the first challenge, the second will be considerably easier! The winner of each challenge will be the method best able to segment and/or classify two sets of unlabelled data into the correct categories after training on both datasets provided below. The creator of the winning method will receive a WiFi 32Gb iPad as the prize, awarded at a workshop at AISTATS 2012. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
After downloading the data, please register your interest to participate in the challenge by clicking here. There are two datsets: Dataset A, containing 176 files in WAV format, organized as:
The same datasets are also available in aif format:
Segmentation data (updated 23 March 2012), giving locations of S1 and S2 sounds in Atraining_normal: Atraining_normal_seg.csv Dataset B, containing 656 files in WAV format, organized as:
The same datasets are also available in aif format:
Segmentation data, giving locations of S1 and S2 sounds in Btraining_normal: Btraining_normal_seg.csv
Evaluation Scripts plus full details of the metrics and test procedure you must use in order to measure the effectiveness of your methods are available here: Evaluation.zip
Challenge 1 involves segmenting the audio files in Atraining_normal.zip and Btraining_normal.zip using the training segmentations provided above. Challenge 2 involves correctly labelling the sounds in Aunlabelledtest.zip and Bunlabelledtest.zip
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data Description and Organisation Please use the following citation if the data is used: @misc{pascal-chsc-2011, The audio files are of varying lengths, between 1 second and 30 seconds (some have been clipped to reduce excessive noise and provide the salient fragment of the sound). Most information in heart sounds is contained in the low frequency components, with noise in the higher frequencies. It is common to apply a low-pass filter at 195 Hz. Fast Fourier transforms are also likely to provide useful information about volume and frequency over time. More domain-specific knowledge about the difference between the categories of sounds is provided below. Normal Category …lub……….dub……………. lub……….dub……………. lub……….dub……………. lub……….dub… In medicine we call the lub sound "S1" and the dub sound "S2". Most normal heart rates at rest will be between about 60 and 100 beats (‘lub dub’s) per minute. However, note that since the data may have been collected from children or adults in calm or excited states, the heart rates in the data may vary from 40 to 140 beats or higher per minute. Dataset B also contains noisy_normal data - normal data which includes a substantial amount of background noise or distortion. You may choose to use this or ignore it, however the test set will include some equally noisy examples. Murmur Category …lub..****...dub……………. lub..****..dub ……………. lub..****..dub ……………. lub..****..dub … or …lub……….dub…******….lub………. dub…******….lub ………. dub…******….lub ……….dub… Dataset B also contains noisy_murmur data - murmur data which includes a substantial amount of background noise or distortion. You may choose to use this or ignore it, however the test set will include some equally noisy examples Extra Heart Sound Category (Dataset A) …lub.lub……….dub………..………. lub. lub……….dub…………….lub.lub……..…….dub……. or …lub………. dub.dub………………….lub.……….dub.dub………………….lub……..…….dub. dub…… Artifact Category (Dataset A) Extrasystole sounds may appear occasionally and can be identified because there is a heart sound that is out of rhythm involving extra or skipped heartbeats, e.g. a “lub-lub dub” or a “lub dub-dub”. (This is not the same as an extra heart sound as the event is not regularly occuring.) An extrasystole may not be a sign of disease. It can happen normally in an adult and can be very common in children. However, in some situations extrasystoles can be caused by heart diseases. If these diseases are detected earlier, then treatment is likely to be more effective. Below, note the temporal description of the extra heart sounds: …........lub……….dub………..………. lub. ………..……….dub…………….lub.lub……..…….dub……. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To allow systems to be comparable, there are some guidelines that we would like participants to follow:
See the evaluation scripts in the downloads section for details of how accuracy of your results can be calculated. You must use this script to enable each system to be compared.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|