That's true. It can be perhaps be represented as "I keep the last N blocks" and then most likely for any given node the policy doesn't change all that fast, so if you know the best chain height you can calculate which nodes have what.
 
Disconnecting in case something is requested that isn't served seems like an acceptable behaviour, yes. A specific message indicating data is pruned may be more flexible, but more complex to handle too. 

Well, old nodes would ignore it and new nodes wouldn't need it?
 
The reason for splitting them is that I think over time these may be handled by different implementations. You could have stupid storage/bandwidth nodes that just keep the blockchain around, and others that validate it. Even if that doesn't happen implementation-wise, I think these are sufficiently independent functions to start thinking about them as such.

Maybe so, with a "last N blocks" in addr messages though such nodes could just set their advertised history to zero and not have to deal with serving blocks to nodes.

If you have a node that serves the chain but doesn't validate it, how does it know what the best chain is? Just whatever the hardest is?