Are you assuming no network protocol changes?

At root, the requirement is that peers can prove their total chain POW.

Since each block has the height in the coinbase, a peer can send a short proof of height for a disconnected header and could assert the POW for that header.

Each peer could send the the N strongest headers (lowest digest/most POW) for their main chain and prove the height of each one. 

The total chain work can be estimated as N times the POW for the lowest in the list.  This is an interesting property of how POW works.  The 10th best POW block will have about 10% of the total POW.

The N blocks would be spread along the chain and the peer could ask for all headers between any 2 of them and check the different in claimed POW.  If dishonesty is discovered, the peer can be banned and all info from that peer wiped.

You can apply the rule hierarchically.  The honest peers would have a much higher POW chain.  You could ask the peer to give you the N strongest headers between 2 headers that they gave for their best chain.  You can check that their height is between the two limits.

The peer would effectively be proving their total POW recursively.

This would require a new set of messages so you can request info about the best chain.

It also has the nice feature that it allows you to see if multiple peers are on the same chain, since they will have the same best blocks.

The most elegant would be something like using SNARKS to directly prove that your chain tip has a particular POW.  The download would go tip to genesis, unlike now when it is in the other direction.

------------------------------------------------------------------------

In regard to your proposal, I think the key is to limit things by peer, rather than globally.

The limit to header width should be split between peers.  If you have N outgoing peers, they get 1/N of your header download resources each.

You store the current best/most POW header chain and at least one alternative chain per outgoing peer. 

You could still prune old chains based on POW, but the best chain and the current chain for each outgoing peer should not be pruned.

The security assumption is that a node is connected to at least one honest node.

If you split resources between all peers, then it prevents the dishonest nodes from flooding and wiping out the progress for the honest peer.

- Message Limiting -

I have the same objection here.  The message limiting should be per peer.

An honest peer who has just been connected to shouldn't suffer a penalty.

Your point that it is only a few minutes anyway may make this point moot though.