Hi Weiji, and Everyone,

I think this is an important topic so sharing my two cents in case in
helps: It makes sense for users to know that they can't merely just
translate a word from one language into another and expect the same
underlying entropy to be mapped, as the wordlists are not the same (i.e.
words differ at the same index values across languages).

However, while the words for each language cannot translate directly to
their equivalent in another language, in terms of entropy (bits), the
underlying entropy is, in fact, the same, when comparing mnemonics
generated across languages (see English/Spanish comparison below) when
sourced from the same initial entropy.

Importantly, the entropy is a pre-image of the resulting mnemonic and
doesn't change as the language changes, where the only changes are to the
resulting words which depend on the language chosen, for a given entropy
string. Ideally, the wallet/software should deal with these nuances, I
don't think the protocol needs any revision (except for how the BIP39 seed
is derived, perhaps), even if someone made up their own wordlist, as long
as the wallet/software has a copy of it to map those words to the
underlying index values, it's *those underlying index values and the
entropy they map too is what really matters**. *

I fully support the idea for users to back up this pre-image (initial
entropy) as it can also be used to check the validity of the mnemonic and
check that it mapped correctly, see Ian Coleman's BIP39 tool which shows
index values, a feature that I proposed last year and was since
implemented. Below is an example of how two mnemonics generated with the
same entropy will produce different BIP39 seeds.

* Example initial entropy of 128 bits +4 bit checksum derived from hash of
byte array: *

10001101000 01010100100 11011010000 11100001101 01010001101 00010010001
01100000010 10101110100 00100100011 11110000111 01100011010 1100010 (+1110
checksum)

*In English*: minimum fee sure ticket faculty banana gate purse caught
valley globe shift

The same initial entropy above (all 132 bits) produces this mnemonic:

*In Spanish*: mercado faja soledad tarea evadir aries gafas peine búho
tumor gerente reja

And the underlying index values below are the same for both the English and
Spanish mnemonics above:

Word Indexes: 1128, 676, 1744, 1805, 653, 145, 770, 1396, 291, 1927, 794,
1582

*ISSUE AT HAND*:  While the initial entropy is the same, and word indexes
the same for a given entropy, (i.e. same pre-image), the resulting BIP39
seed is not the same when comparing the above English mnemonic with its
Spanish counterpart:

   - *English BIP39 seed:*
   ce7618075099c89e986f18dc495daa3be190450ed07bef77d4334a54dbc1cd7e205797ffed2615ac0999a5d691f65bf316e2cdbfd2c9d7d90b03e77ff1e6a6f5
   - *Spanish BIP39 seed*:
   9f164de0fb09af51b5831886e424d6d2479d49b5e5a1b28f5c09467ea36089b144cd94bb9b636b3c27ccff96a8958e5b7ce43cf1dea81423fc66fa7fef0aea2c


*Option 1:* Without changing anything in terms of the entropy
generation/mapping process in the BIP39 spec, the wallet/client-side
software would ideally recognize the language and show the corresponding
index value per wordlist, and reverse-calculate the entropy and then re-map
it to the language selected.

*Option 2*: Perhaps a revision is needed to how the BIP39 seed is generated
in the first place, such as by hashing the entropy instead of the words.
Any thoughts on how viable that could be where the initial entropy is fed
into the PBKDF2 function and not the words?

*Closing thoughts and tiny checksum nitpick: *

      - The multiple BIP39 seeds per language lend some similarities to
BIP44 multi-account, so perhaps this can be an advantage, depends on how it
is applied in UI/UX's (compared to having one BIP39 seed regardless of
language, for a given initial entropy).
      - There is perhaps an opportunity to add greater detail to the BIP39
spec in terms of standards/best-practices for computing checksum values, as
some software may be hashing bits, versus hashing bytes, or hashing the
entropy as a hex string, etc.. for a given entropy, which will result in
different checksum values for the same "valid" mnemonic, that might not be
"valid" in another wallet which may format the data differently before
hashing to compute the checksum.


Best regards,

Steven Hatzakis

_______________
[bitcoin-dev] BIP- & SLIP-0039 -- better multi-language support*Weiji
Guo* weiji.g
at gmail.com
<bitcoin-dev%40lists.linuxfoundation.org?Subject=Re:%20Re%3A%20%5Bbitcoin-dev%5D%20%20BIP-%20%26%20SLIP-0039%20--%20better%20multi-language%20support&In-Reply-To=%3CCA%2Bydi%3DLM%2Bq-9WKewb%3D65tWCqM1cPMoWEeWq5XAxdqg4rz%3DZJ6g%40mail.gmail.com%3E>
*Tue Nov 6 16:16:41 UTC 2018*


   - Previous message: [bitcoin-dev] draft proposal: change forwarding
   (improved fungibility through wallet interoperability)
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/016469.html>
   - Next message: [bitcoin-dev] Considering starting a toy full-node
   implementation. Any advice?
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/016470.html>
   - *Messages sorted by:* [ date ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/date.html#16468>
    [ thread ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/thread.html#16468>
    [ subject ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/subject.html#16468>
    [ author ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/author.html#16468>

------------------------------

Hello everyone,

I just realized that BIP-0039 is language dependent. I was assuming the
other way till I looked closer. The way the seed is derived from a BIP-0039
entropy, as is shown below, depends on which language to generate the
mnemonic sentence:

   Entropy <=> Mnemonic Sentence => PBKDF2 => BIP-0032 Seed

Therefore when a user choose a non-English mnemonic code he or she is stuck
with that language. Meanwhile only a few native languages are supported.

SLIP-0039 does not solve this issue in a user friendly way by providing
only an English wordlist. That's understandable as it aims to provide SSS
capability. However those users who do not speak English or recognize
English words will suffer.

What I am trying to bring to attention of the community is that, no matter
if we make a new version of BIP-0039, or a new BIP (with SSS support), or
to enhance SLIP-0039, we really need to address this language issue.

Here are what I propose:

1. The mnemonic code should be only a representation of underlying entropy
or (pre) master secret, seed, whatever. In this way, the same seed/secret
could be displayed in English or in Chinese or other languages. Then there
could be 3rd party conversion tools to support translations in case any
wallet software or device does not support all specified languages. Now it
looks like:

   Mnemonic Sentence <=> Entropy => PBKDF2 => BIP-0032 Seed

2. Given that only 8 languages are supported in BIP-0039, we should allow
the seed/secret to be represented in decimal numbers, each ranging from 0
to 2047. So those who cannot find a native language support yet having
difficulty coping words in other languages could choose to just use numbers.

So far I don't have a preference how this should be implemented. I'd like
to hear from community first.

Thanks,

Weiji Guo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20181107/88677440/attachment.html>

------------------------------


   - Previous message: [bitcoin-dev] draft proposal: change forwarding
   (improved fungibility through wallet interoperability)
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/016469.html>
   - Next message: [bitcoin-dev] Considering starting a toy full-node
   implementation. Any advice?
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/016470.html>
   - *Messages sorted by:* [ date ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/date.html#16468>
    [ thread ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/thread.html#16468>
    [ subject ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/subject.html#16468>
    [ author ]
   <https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/author.html#16468>

------------------------------
More information about the bitcoin-dev mailing list
<https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>