For this example, we have a 3x4x3 m room. The reverberation time is 0.3sec. We also have a version of the same room with more reverberant walls in the fourth example
Mod-MFCC Based Clusters
CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
cluster
reference microphone
masked reference signal
DSB signal
1
2
Speaker Embedding Clusters
CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
cluster
reference microphone
masked reference signal
DSB signal
1
2
Discussion
Here both Mod-MFCC based features and the speaker embeding features give good cluster allocations.
Note however that the speaker embedding features do result in a bigger cluster for source 1.