I processed the historic blockchain to create a single filter populated with spent input scripts and output scripts. The Golomb parameter was P=2^20

The resulting chart shows a volatile history of same-block address re-use with a notable drops in relative filter size during the early history and in the time window where SatoshiDICE was popular, since then trending higher.
The history of only the last half year suggests a current filter size of between 2.0% - 2.5% of block sizes.

Since most outputs are spent within a short time period, but apparently not that often in same blocks, I think it was worth considering filter series that match over a windows of 2^n blocks (n=(0…10)). Applications could then bracket the 
range of interest and then narrow down requesting finer filters or blocks.

Then I created 1600 random (P2SH) scripts and totaled the false positive block download data size if observing 100, 200, 400, 800, 1600 of them. 
The result suggests that even for 1600 the false positive overhead is less than 0.1% of blockchain data size. 

I agree with Greg that we should optimize the parameters for a small observed set as those will be running on mobile devices.
As of Pieter’s findings the simulation parameters were optimal for ca. 1000 observed scripts which is maybe to many for a “small” application.
On the other hand we do not know the needs of future popular mobile applications.  
 
With parameters of the simulation the current minimal data burden on a mobile wallet would be ca. 0.1 GB / Month.

Simulations with other parameters could be executed using this patch branch: https://github.com/tamasblummer/rust-bitcoin-spv/tree/blockfilterstats A run takes a few hours on a fast machine with release build and local bitcoind.
The calculation can not be reduced to the recent history as the process builds in-memory utxo from genesis.

The result of executing the binary is a CSV file containing:
blocknumber, blocksize, utxo size, filter size, false positive data size for 100, false positive data size for 100, … false positive data size for 100
e.g:
524994,1112181,57166825,21556,0,0,0,0,1112181

Tamas Blummer

On May 29, 2018, at 06:01, Olaoluwa Osuntokun via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

> The additional benefit of the input script/outpoint filter is to watch for
> unexpected spends (coins getting stolen or spent from another wallet) or
> transactions without a unique change or output address. I think this is a
> reasonable implementation, and it would be nice to be able to download that
> filter without any input elements. 

As someone who's implemented a complete integration of the filtering
technique into an existing wallet, and a higher application I disagree.
There's not much gain to be had in splitting up the filters: it'll result in
additional round trips (to fetch these distinct filter) during normal
operation, complicate routine seed rescanning logic, and also is detrimental
to privacy if one is fetching blocks from the same peer as they've
downloaded the filters from.

However, I'm now convinced that the savings had by including the prev output
script (addr re-use and outputs spent in the same block as they're created)
outweigh the additional booking keeping required in an implementation (when
extracting the precise tx that matched) compared to using regular outpoint
as we do currently. Combined with the recently proposed re-parametrization
of the gcs parameters[1], the filter size should shrink by quite a bit!

I'm very happy with the review the BIPs has been receiving as of late. It
would've been nice to have this 1+ year ago when the draft was initially
proposed, but better late that never!

Based on this thread, [1], and discussions on various IRC channels, I plan
to make the following modifications to the BIP:

  1. use P=2^19 and M=784931 as gcs parameters, and also bind these to the
     filter instance, so future filter types may use distinct parameters
  2. use the prev output script rather than the prev input script in the
     regular filter
  3. remove the txid from the regular filter(as with some extra book-keeping
     the output script is enough) 
  4. do away with the extended filter all together, as our original use case
     for it has been nerfed as the filter size grew too large when doing
     recursive parsing. instead we watch for the outpoint being spent and
     extract the pre-image from it if it matches now

The resulting changes should slash the size of the filters, yet still ensure
that they're useful enough for our target use case.

[1]: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-May/016029.html

-- Laolu
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev