On Mon, Jan 8, 2018 at 7:39 AM, Pavol Rusnak via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
On 08/01/18 05:22, Gregory Maxwell wrote:
>> https://github.com/satoshilabs/slips/blob/master/slip-0039.md


> The 16-bit "checksum" based on sha2 seems pretty poor since basing
> small checksums on a cryptographic hash results in a fairly poor
> checksum that is surprisingly likely to accept an errored string. Your
> wordlist is 10 bits and you have much less than 1023*10 bits of input,
> so you could easily have a 20 bit code (two words) which guaranteed
> that up to two errored words would always be detected, and probably
> could choose one which catches three words much more often 1:2^20
> (sipa's crc tools can help find codes like this).

Originally, we wanted to use 16-bit of CRC32 for checksum, but after the
discussion with Daan Sprenkels we were suggested to change this for
cryptographically strong function. The argument was that CRC32 contains
less entropy and mixing high-entropy data (secret) with low-entropy data
(checksum) is not a good idea.

This entropy argument seems confused.  Ignoring constant factors, the entropy of a checksum is the sum over all possible checksums, i, of -n_i*log(n_i), where n_i is the number of times the ith checksum occurs over the space of all possible data being checksummed.  In this application the checksum is being applied to a fixed m-bit blob of uniformly random data.

The entropy is maximized when every possible checksum occurs equally as frequently, that is we achieve maximum entropy when all the n_i values are equal to each other.  Any error correcting code worth it's salt will try to achieve this property because the designers want every checksum value to have as much error correcting power as every other checksum value.  I'm almost certain that the algebraic properties of your typical error correcting codes allow you to prove that maximum entropy is perfectly achieved whenever the data-blob size is at least as large as the checksum size.

Meanwhile the truncated value of a cryptographic hash function is expected to be slightly under the maximum entropy value, under the assumption that the hash function it behaves like a random function.

The main properties of a "strong cryptographic hash function" is that it is infeasible to find collisions and preimages.  However these properties are lost when you truncate the hash down to 16-bits.  At this point is it entirely feasible to find collisions and preimages.

So using a truncated cryptographic hash function doesn't provide you with more entropy (and, in fact, probably a sliver less entropy), and doesn't provide you with any of the befits of strong cryptographic hash function.
 
Also, there is an argument between a checksum and ECC. We discussed that
ECC might not be a good idea, because it helps the attacker to compute
missing information, while we only want to check for integrity. Also the
word mnemonic is itself a ECC, because if you see the word "acadornic"
it is probably the word "academic".

Every checksum is error correcting.  Given an failed checksum, all you have to do is search around the space of edits to find the smallest set edits that yield a valid checksum.  With a 2^16 bit checksum one will expect to find a nearby checksum within 2^16 trails, even when using a truncated hash function.

What an error-correcting codes gives you isn't the ability to correct errors, which we have seen is something that all short checksums provide, rather they provide *guarantees* about the ability to detect (and correct) certain common classes of errors.  For example we can have an ECC that guarantees to find the error where are word is accidentally written down twice (see https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-January/015506.html).

The advice you have been given will only result in losing any guarantees about detecting common classes or errors; it won't stop attackers from recovering missing information, and it won't provide a cryptographically strong function.