A Multi-Resolution Approach to GAN-Based Speech Enhancement

Hyung yong Kim Ji Won Yoon Sung Jun Cheon Woo Hyun Kang Nam Soo Kim

Abstract

Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.


Audio Examples

Example 1,2

Clean

Noisy

SERGAN

Proposed

Clean

Noisy

SERGAN

Proposed

Example 3,4

Clean

Noisy

SERGAN

Proposed

Clean

Noisy

SERGAN

Proposed

Example 5,6

Clean

Noisy

SERGAN

Proposed

Clean

Noisy

SERGAN

Proposed

Example 7,8

Clean

Noisy

SERGAN

Proposed

Clean

Noisy

SERGAN

Proposed