Is there a reason such a validation check is a bad idea? We already have OP_RETURN to store arbitrary data that is limited to 80kb.

A reason to not ban storing arbitrary/non-functional data is that people will still want to store things, so will start (ab)using useful data to do so, which is worse -- see Stamps[1], which stores Inscription-like data in fake outputs that consume UTXO set storage (using the Counterparty spec IIRC).

The UTXO set getting 'too big' is a much bigger problem than the chain getting bigger at closer to 4MB/10mins than the 'expected' ~1MB/10mins is (some nuance/argument to be had here, though).

Was it an oversight that arbitrary data can be inserted between OP_FALSE and OP_IF when the size limit for witness scripts was lifted as part of taproot?

Kinda? But if we want Taproot to enable large useful scripts, it's probably hard/impossible to have an undefeatable definition of 'not useful' to then filter out. You could say "scripts must not have any unreachable code (dead code)" but then it'd be easy to come up with Inscriptions 2.0 where the code is reachable but never used, rinse and repeat in a game of whack-a-mole.

In my opinion, it'd be wise to not incentivize people to do something worse by attempting to censor what they're currently doing, given that it could be a fair bit worse!

[1]: https://github.com/mikeinspace/stamps/blob/main/BitcoinStamps.md

Angus