Good questions!
You could test out having a single three-headed architecture for steering/throttle/brake, or three different networks, and see what you get. My guess is that one network is the way to go, since the features the model needs to learn for steering are probably similar to what the model needs to learn for braking.
You should try adding steering as an input to the network, after flattening the fully convolutional layers. I bet it helps!
You should ping my now-colleague Anthony Navarro, who is giving an upcoming talk on the self-racing car effort!