This is a good explanation!
I should have clarified, though. I understand in principle what a 1x1 convolution does. It’s more that I didn’t understand why MultiNet applies it in this precise place.
I’ve thought it through and I think I get it now, though.