We humans subconsciously use our own voices to make sure our mouth is making the noises we want it to. This is why hearing yourself speak with a few seconds of delay, completely crashes your brain.
In short, the brain uses nuanced feedback loops, via sound from the ears, to modulate speech, form phonemes, and so forth. When there is a delay in this feedback loop, the brain just isn’t able to figure out what’s wrong but still tries to correct itself. This why singers need foldback speakers or an earpiece when singing with a microphone. Without this, it is difficult for a singer to know if they're singing in tune.