Setup

For this example, we have a 8x5x3 m room. The reverberation time is 0.5sec. Room

Mod-MFCC Based Clusters

CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
Mod-MFCC-based Clusters image
Mod-MFCC-based Clusters image
Mod-MFCC-based Clusters image
cluster reference microphone masked reference signal DSB signal
1


2


Speaker Embedding Clusters

CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
SpVer Clusters image
SpVer Clusters image
SpVer Clusters image
cluster reference microphone masked reference signal DSB signal
1


2


Discussion

For the Mod-MFCC based features, the background cluster and cluster around the first source are intertwined.

Closer look at difference in DSB output for cluster 1


DSB signals for cluster 1 with Speaker Embedings features