-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,7 +78,7 @@ Assume that the task is to train a neural network to detect segments in audio st | |
MUSAN [@musan2015] and GTZAN [@GTZAN] are two suitable datasets for this task because they provide a wide selection of music, speech, and noise samples. | ||
In the example below, we first download MUSAN and GTZAN to the local disk before creating `Loader` instances for each format that allow Audiomate to access both datasets using a unified interface. Then, we instruct Audiomate to merge both datasets. | ||
Afterwards, we use a `Splitter` to partition the merged dataset into a train and test set. | ||
By merely creating views, Audiomate avoids creating unnecessary disk I/O and is therefore ideally suited to work with large datasets in the range of tens or hundreds of gigabytes. | ||
By merely creating views, Audiomate avoids creating unnecessary disk I/O and is therefore ideally suited to work with large datasets in the range of tens or hundreds of gigabytes. | ||
Ultimately, we load the samples and labels by iterating over all utterances. | ||
Alternatively, it is possible to load the samples in batches, which is ideal for feeding them to a deep learning toolkit like PyTorch. | ||
|
||
|
@@ -129,4 +129,18 @@ Usually, `Reader` and `Downloader` are implemented for datasets, while `Writer` | |
|
||
Audiomate supports more than a dozen datasets and half as many toolkits. | ||
|
||
# Related Work | ||
|
||
A variety of frameworks and tools offer functionality similar to Audiomate. | ||
|
||
**Data loaders** Data loaders are libraries that focus on downloading and preprocessing data sets to make them easily accessible without requiring a specific tool or framework. | ||
In contrast to Audiomate, they cannot convert between formats, split or merge data sets. | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
aahlenst
Author
Collaborator
|
||
Examples of libraries in that category are [@mirdata], [@speechcorpusdownloader], and [@audiodatasets]. | ||
Furthermore, some of these libraries focus on a particular kind of data, such as music, and do not assist with speech data sets. | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
aahlenst
Author
Collaborator
|
||
|
||
**Tools for specific frameworks** Various machine learning tools and deep learning frameworks include the necessary infrastructure to make various datasets readily available to their users. | ||
One notable example is TensorFlow [@tensorflow], which includes data loaders for different kinds of data, including image, speech, and music data sets, such as [@ardila2019common]. | ||
Another one is torchaudio [@torchaudio] for PyTorch, which not only offers data loaders but is also capable of converting between various formats. | ||
In contrast to Audiomate, those tools or libraries support a specific machine learning or deep learning framework (TensorFlow or PyTorch, respectively), whereas Audiomate is framework agnostic. | ||
This comment has been minimized.
Sorry, something went wrong.
faroit
Contributor
|
||
|
||
# References |
I think this is not correct... See https://pytorch.org/docs/master/data.html#torch.utils.data.ConcatDataset