Hi Michael,
Comments inline.

On Sat, Apr 10, 2021 at 7:34 PM Michael.flaxman <michael.flaxman@protonmail.com> wrote:

Hi Hugo,

I appreciate the effort you and everyone else is making to improve multisig in bitcoin!

Thanks.

I like that this BIP gets rid of SLIP132 version bytes, as those have been de-facto deprecated in favor of output descriptors for some time. Having a standard for how to communicate descriptor records (BSMS 1.0) also seems like a nice positive.

The most commonly raised issues from the 10x security guide are about how to properly verify that all hardware wallets are participants in the user's multisig quorum (and with the correct m-of-n). This shows up in two big ways:
The O(n^2) xpub validation problem creates a bad UX and is hard for non-advanced users.
The risk for stateless hardware wallets (like Trezor) to have their xpubs swapped out by a compromised Coordinator.
Unfortunately, this BIP does not improve either of these issues, while adding considerable complexity.

1. O(n^2) Xpub Validation

The proposed use of an output descriptor checksum has an obvious 40-bit MITM collision attack. A compromised Coordinator could trick a Signer into displaying an attacker's receive address, despite a correctly functioning Signers and the user properly validating the checksum (github link).

Using a checksum with much higher entropy would reduce xpub validation to O(n) and create a very nice UX for signers. This would be a huge win for multisig! Instead, the recommended solution from the BIP is to validate all the key records manually, which is how multisig is currently done and what we desperately want to move away from. With a proper checksum, there’s no reason for a user to ever see an xpub.

Users should not be shown a checksum and asked to validate it in meatspace (across Signers) if an attacker’s address could still be substituted! Validating a single address across devices does solve this problem, but if you’re going to validate an address there’s no reason to display the checksum at all. However, validating an address is confusing to non-experts:
Is it a wallet ID or a bitcoin address?
Am I supposed to send funds to this address?
If creating a new checksum standard for the descriptor record is undesirable, we could use a child address (from an unhardened BIP32 path) and encode that in some way for end-users to verify it matches across all Signers. It would be strongly preferable for the encoding to be an unambiguously different format from a bitcoin address / BIP39 seed phrase, so that it’s clear it’s just a wallet ID. One non-ideal but simple solution is to use a hash function (i.e. dsha256) to calculate the digest of the child address, and display this in hexadecimal format. While hexadecimal is non-ideal for manual verification, it is already trivial for any bitcoin library to perform these steps.

As I have responded to your previous comment about the same on Github (https://github.com/nunchuk-io/bips/pull/1), I do see the value of a longer checksum.

There are trade-offs when it comes to designing checksum. Mainly complexity and size. At one end of the spectrum you can have a single-byte XOR checksum. At the other end you can have something like HMAC-SHA256 (which we are using in the proposal to calculate the MACs for the key and descriptor records). And then there's everything in between. But we should know that nothing comes for free.

It's a good topic that warrants further discussion.

Confirming a single address is a promising direction, since it's something the user should do anyway prior to using the wallet. Currently the proposal recommends that the Signers show a preview of the first address(es) upon wallet creation. But we can elevate this and make it a mandatory part of the spec: have all Signers confirm that they have the same 1st receive address. If we go with this approach, the checksum can stay as-is, and only there for error detection. (We get the checksum for free with the descriptor language anyway, so there's no reason to remove it).

Also nice to see that you have come around and agree that moving away from manual inspection is desirable.

2. Allow Support for Stateless Wallets

The current BIP states:

"If all checks pass, the Signer must persist the descriptor record in its storage."

While persistence has a lot of benefits, it is not a feature of the most sold multisig hardware wallet: Trezor. A simple solution here is to have each Signer sign the entire descriptor record at the end of round 2, not just its own key record in round 1. Then the data can be stored anywhere (including on the Signer itself) and played back to each Signer for validation when needed. The end-user would have no idea this was happening, but the device could refuse to display information it hasn’t fully validated (or at least add a warning message). Even a device with persistent storage would be better served using a signature, so that an evil maid couldn't tamper with the device (say in the no-encryption case for simplicity).

I reiterate that I strongly disagree that going stateless is the direction we want to pursue when it comes to multisig.

In a multisig or any type of MPC smart contract, any Signer in the contract must know who the other co-Signers are at all times. You can choose to do this verification once at setup and persist this info on the Signer, or you'd have to re-do the verification for every single transaction. There is no other choice.

Signing the descriptor record is insufficient, while also introducing a great deal of complexity. Here are the problems:
1) The signature needs to be stored somewhere. Who stores it if it's not the Signer itself? What if it gets lost? (If the Signer stores its own signature, then the scheme is no longer stateless. You might as well store the full descriptor).
2) When the signature is "played back" to the Signer, a copy of the original descriptor must be included. Who stores the descriptor? What if it gets lost? This is an under-appreciated aspect of the stateful approach: every participant in the multisig has a full copy of the original contract, which adds resilience to the wallet backup / recovery process.
3) Because the full descriptor must be "played back" for every single transaction, this means every detail of the contract must be shared again and again, indefinitely. Not only does this add overhead (engineering and cognitive) to the spending process, it has massive privacy implications, since the descriptor contains everything you need to know about the wallets and its participants.

Here's an analogy in the physical world. Would you:
a) Enter any type of written contract and
b) Not keep a copy of the contract, forget about it, and
c) Later on rely on your counter-parties or a third-party to provide you with the original contract and your signature, when the terms get carried out?

One would be insane to enter such a contract in the real world.

I realize that some vendors are currently not stateful, but I take this as an unfortunate fact, because multisig wasn't a priority when these hardware were originally designed. But that is no reason to keep going with a broken architecture. The industry is green enough that we still can learn and recover from these sorts of flaws.

Since you mentioned Trezor, I want to thank Pavol in particular here, because as Trezor CTO Pavol knows best that Trezor is currently stateless, but he's still on-board with the general idea here, AFAIU.

Bottom line: IMO, signers in a multisig MUST be stateful.

This existing vulnerability in stateless wallets is particularly bad for hosted multisig services like Casa/Unchained, where the service might control m-1 keys. It’s far easier for a hosted service to potentially trick non-expert users into displaying an attacker's receive address on their stateless Signer.

For example, assume the user is doing 2-of-3 multisig, where the Coordinator (service) controls 1 key. Here is how the Coordinator could trick their end-users:
Coordinator swaps out 1 of the end-user’s xpubs, going from a 2-of-3 where the end-user has 2 seeds to a 2-of-3 where the Coordinator has 2 seeds.
The end-user logs into the service to get a new receive address, and the service (Coordinator) displays malicious receive address X (as part of a 2-of-3).
The end user connects stateless Signer 1 to the service (Coordinator), which under-the-hood gives stateless Signer 1 proof that it is included in this 2-of-3. Stateless singer 1 displays malicious receive address X!
The end-user doesn't verify the address on Signer 2, as many users unfortunately don't -- perhaps it is in a far away location and the end-user (incorrectly) thinks that it’s already been validated in 2 places -- and makes a large deposit to receive address X. These funds now belong to the attacker and can be swept at any time!
If stateless Signer 1 required a signature to be replayed at step 3, stateless Signer 1 would refuse to display malicious receive address X (or at a minimum warn the end-user that it did not have enough info to properly validate the address).

This is also a concern for self-hosted multisig, I just used the hosted services as the best example.

It's also not just Trezor that is stateless. For example, I wrote a simple CLI software multisig wallet as part of the buidl library to be used mostly for emergency recovery. At 800 lines of code, it's too simple/minimal to touch the file system.

BIP39

While unrelated, the use of BIP39 words for session tokens seems like a big mistake, as end-users have learned over years that BIP39 words are for private key material. A small percent of users may backup their token BIP39 mnemonic and not their seed phrase BIP39 mnemonic! My suggestion is to just stick with the other two Token options: decimal and hex.

Repost of my previous response. We discussed this at length on the linked Github PR:
"We decided to keep the TOKEN at 6-9 words, not 12 or anything above precisely for this reason. Please note that the user has to back up their Signers first, before proceeding to setting up the multisig wallet. So there's no writing both things down at once or mixing of the two flows here.

I also find it hard to believe that someone who wants to invest in a safe multisig solution (and therefore must know at minimum what keys and multisig represent) will not know the difference between (permanent) 12 words and (one-time use) 6 words. Also note that the TOKEN can be used without using BIP39 mnemonic at all."

We also made the decimal format, not BIP39 mnemonic, the recommended encoding in the spec.

Best,
Hugo