Riccardo wrote:
> The BIP recall some go code for how the parameter has been selected which
> I can hardly understand and run

The code you're linking to is for generating test vectors (to allow
implementations to check the correctness of their gcs filters. The name of
the file is 'gentestvectors.go'. It produces CSV files which contain test
vectors of various testnet blocks and at various false positive rates.

> it's totally my fault but if possible I would really like more details on
> the process, like charts and explanations

When we published the BIP draft last year (wow, time flies!), we put up code
(as well as an interactive website) showing the process we used to arrive at
the current false positive rate. The aim was to minimize the bandwidth
required to download each filter plus the expected bandwidth from
downloading "large-ish" full segwit blocks. The code simulated a few wallet
types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet.
One could also model the selection as a Bernoulli process where we attempt
to compute the probability that after k queries (let's say you have k
addresses) we have k "successes". A success would mean the queries item
wasn't found in the filter, while a failure is a filter match (false
positive or not). A failure in the process requires fetching the entire
block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Another parameter which heavily affects filter size is the false positive
> rate which is empirically set
> <https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#construction>
> to 2^-20
> The BIP recall some go code
> <https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go>
> for how the parameter has been selected which I can hardly understand and
> run, it's totally my fault but if possible I would really like more details
> on the process, like charts and explanations (for example, which is the
> number of elements to search for which the filter has been optimized for?)
>
> Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
> will shrink the total filter size by gigabytes at the cost of having to
> wastefully download just some megabytes of blocks.
>
>
> 2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org>:
>
>> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > I believe (1) could be skipped entirely - there is almost no reason why
>> > you'd not be able to filter for, eg, the set of output scripts in a
>> > transaction you know about
>>
>> I think this is convincing for the txids themselves.
>>
>> What about also making input prevouts filter based on the scriptpubkey
>> being _spent_?  Layering wise in the processing it's a bit ugly, but
>> if you validated the block you have the data needed.
>>
>> This would eliminate the multiple data type mixing entirely.
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
>
>
> --
> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>