If this is the underlying reason why SegWit is being delayed... that is pretty deplorable.

Probably too late now for bitcoin, but maybe it would be good to pre-mix the block header bits around before it even enters the SHA256 hash.  Not sure if best to use a hardcoded map, or to make the map with the tx merkle root as a seed.  Depends on how hard it is to find good nonce (etc) bit location collisions.

Maybe gmaxwell's solution is good enough for this particular problem... but the above recommendation might help improve bitcoin's available remaining puzzle difficulty.

Another thing that could be done is increase the number of times SHA256 is performed... but now we are really talking about altering the PoW algorithm.  Correct me if I'm wrong: The more number of times its performed, the less any patent-able pre or post calculation skipping/caching have an effect on efficiency.

Cheers,
Praxeology Guy