A Multi-Resolution Approach to GAN-Based Speech Enhancement
Abstract
Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.
Audio Examples
Example 1,2
Clean
Noisy
SERGAN
Proposed
Clean
Noisy
SERGAN
Proposed
Example 3,4
Clean
Noisy
SERGAN
Proposed
Clean
Noisy
SERGAN
Proposed
Example 5,6
Clean
Noisy
SERGAN
Proposed
Clean
Noisy
SERGAN
Proposed
Example 7,8
Clean
Noisy
SERGAN
Proposed
Clean
Noisy
SERGAN
Proposed