Gerard Roma, Owen Green, Pierre Alexandre Tremblay University of Huddersfield [email protected]
- is_blind: no
- additional_training_data: no
This system employs a Convolutional Neural Network with fully-connected output layers. The input of the network is a slice of 11 STFT frames (about 200ms). The output is a ratio mask corresponding to one spectral frame. We trained the network by optimizing the mean square error loss with ideal ratio masks simultaneously for four sources using the training set of the musdb dataset.
- G. Roma, O. Green, P.A. Tremblay, Improving single-network single-channel separation of musical audio with convolutional layers. Proceedings of LVA/ICA, 2018