Alternatively I think instead of displaying a meaningless number we ought
to go by a percentage (the double spend improbability) and go by
'confidence'.

That is a great idea, and not too hard to implement.  A bit of code can determine over the last N blocks, how many blocks that were at the current depth of the present transaction were orphaned and divide that by the total number of blocks solved (orphaned or not) while those N blocks were solved.  That's the historical number (H), and then the "51% attack" number (A) can make an explicit assumptions like "Assuming a bad actor has 51% of the hashing power for 24 hours starting right now, the block holding this transaction has an X% chance of being orphaned."  Report "# confirmations" as "99.44% confidence" using [100% - max(H,A)].