Riccardo wrote:

> The BIP recall some go code for how the parameter has been selected which

> I can hardly understand and run

The code you're linking to is for generating test vectors (to allow

implementations to check the correctness of their gcs filters. The name of

the file is 'gentestvectors.go'. It produces CSV files which contain test

vectors of various testnet blocks and at various false positive rates.

> it's totally my fault but if possible I would really like more details on

> the process, like charts and explanations

When we published the BIP draft last year (wow, time flies!), we put up code

(as well as an interactive website) showing the process we used to arrive at

the current false positive rate. The aim was to minimize the bandwidth

required to download each filter plus the expected bandwidth from

downloading "large-ish" full segwit blocks. The code simulated a few wallet

types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet.

One could also model the selection as a Bernoulli process where we attempt

to compute the probability that after k queries (let's say you have k

addresses) we have k "successes". A success would mean the queries item

wasn't found in the filter, while a failure is a filter match (false

positive or not). A failure in the process requires fetching the entire

block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

Another parameter which heavily affects filter size is the false positive rate which is empirically set to 2^-20
The BIP recall some go code for how the parameter has been selected which I can hardly understand and run, it's totally my fault but if possible I would really like more details on the process, like charts and explanations (for example, which is the number of elements to search for which the filter has been optimized for?)

Instinctively I feel 2^-20 is super low and choosing a lot higher alpha will shrink the total filter size by gigabytes at the cost of having to wastefully download just some megabytes of blocks.

2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org>:
On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about

I think this is convincing for the txids themselves.

What about also making input prevouts filter based on the scriptpubkey
being _spent_? Layering wise in the processing it's a bit ugly, but
if you validated the block you have the data needed.

This would eliminate the multiple data type mixing entirely.

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

--
Riccardo Casatta - @RCasatta

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev