For this example, we have a 3x4x3 m room. The reverberation time (RT60) is 0.3sec. The critical distnaces overlap here.
This makes for a more challenging setup than the previous examples.
Mod-MFCC Based Clusters
CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
cluster
reference microphone
masked reference signal
DSB signal
1
2
Speaker Embedding Clusters
CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
cluster
reference microphone
masked reference signal
DSB signal
1
2
Discussion
Even for this challenging case, the embeddings show to be good clustering features, delivering logically plausible clusters.
In contrast, for the Mod-MFCC based features, there seems to be no logic in the clustering.
In this example, it seems to us that the Mod-MFCC features cluster more based on the SINR than on the speaker-specific features.
Also note that the speaker embedding features avoid taking microphones located in the region with overlapping critical distances of the speakers.