On Mon, Nov 4, 2013 at 3:26 PM, Peter Todd <pete@petertodd.org> wrote:
The attacker now only needs to connect to every identified miner
with especially fast nodes. With judicious use of DoS attacks and low
latency .....

So you're back to a complicated sybil attack. I don't follow your thought process here - I didn't say anything about numerical advantage. The attack outlined in the paper requires you to be able to race the rest of the network and win some non-trivial fraction of the time. If you can't do that then all it means is that when you try to release a private block to compete with the other found block, you're quite likely to lose and you sacrifice the block rewards by doing so.
 
The correct, and rational, approach for a miner is to always mine to
extend the block that the majority of hashing power is trying to extend.

There's no stable way to know that. The whole purpose of the block chain to establish the majority. I think your near-miss headers solution is circular/unstable for that reason, it's essentially a recursive solution.
 
Mining strategy is now to mine to extend the first block you see, on the
assumption that the earlier one probably propagated to a large portion
of the total hashing power. But as you receive "near-blocks" that are
under the PoW target, use them to estimate the hashing power on each
fork, and if it looks like you are not on the majority side, switch.

But you can't reliably estimate that. You can't even reliably estimate the speed of the overall network especially not on a short term basis like a block interval.