COMPARISON OF VOICE ACTIVITY DETECTION METHODS IN REALISTIC NOISE SCENARIOS
In this paper, we compare several voice activity detection (VAD) methods on noisy data. The objective is to find methods that are suitable for improving the speech communication over mobile phones in loud environments, e.g., when traffic noise is present. First, the methods are applied on synthetic data, where traffic noise is added to recordings taken in a quiet environment. To account for the Lombard effect, which is typical for speech in loud environments, we use recordings of a specially designed database. Second, real-world examples with highway traffic noise are tested. For these examples, short dialogs over the telephone were recorded with one person situated near the street. The different VAD methods are evaluated by the true positive rate, the false positive rate, and by plotting the resulting receiver operator characteristic (ROC) curves. In the case of synthetic data, the reference VAD is derived by applying an energy-based VAD algorithm on the clean data. We test recordings with 5, 0, and -5 dB signal-to-noise ratio (SNR). In the case of real-world data, the reference is derived using a laryngograph, which detects voiced portions within the speech signal. The experiments show that an algorithm based on vowel detection achieves good results in all SNR conditions.