I processed the historic blockchain to create a single filter populated with spent input scripts and output scripts. The Golomb parameter was P=2^20

The resulting chart shows a volatile history of same-block address re-use with a notable drops in relative filter size during the early history and in the time window where SatoshiDICE was popular, since then trending higher.
The history of only the last half year suggests a current filter size of between 2.0% - 2.5% of block sizes.

Since most outputs are spent within a short time period, but apparently not that often in same blocks, I think it was worth considering filter series that match over a windows of 2^n blocks (n=(0…10)). Applications could then bracket the 
range of interest and then narrow down requesting finer filters or blocks.

Then I created 1600 random (P2SH) scripts and totaled the false positive block download data size if observing 100, 200, 400, 800, 1600 of them. 
The result suggests that even for 1600 the false positive overhead is less than 0.1% of blockchain data size. 

I agree with Greg that we should optimize the parameters for a small observed set as those will be running on mobile devices.
As of Pieter’s findings the simulation parameters were optimal for ca. 1000 observed scripts which is maybe to many for a “small” application.
On the other hand we do not know the needs of future popular mobile applications.  
 
With parameters of the simulation the current minimal data burden on a mobile wallet would be ca. 0.1 GB / Month.

Simulations with other parameters could be executed using this patch branch: https://github.com/tamasblummer/rust-bitcoin-spv/tree/blockfilterstats A run takes a few hours on a fast machine with release build and local bitcoind.
The calculation can not be reduced to the recent history as the process builds in-memory utxo from genesis.

The result of executing the binary is a CSV file containing:
blocknumber, blocksize, utxo size, filter size, false positive data size for 100, false positive data size for 100, … false positive data size for 100
e.g:
524994,1112181,57166825,21556,0,0,0,0,1112181

Tamas Blummer


> On May 29, 2018, at 06:01, Olaoluwa Osuntokun via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> > The additional benefit of the input script/outpoint filter is to watch for
> > unexpected spends (coins getting stolen or spent from another wallet) or
> > transactions without a unique change or output address. I think this is a
> > reasonable implementation, and it would be nice to be able to download that
> > filter without any input elements. 
> 
> As someone who's implemented a complete integration of the filtering
> technique into an existing wallet, and a higher application I disagree.
> There's not much gain to be had in splitting up the filters: it'll result in
> additional round trips (to fetch these distinct filter) during normal
> operation, complicate routine seed rescanning logic, and also is detrimental
> to privacy if one is fetching blocks from the same peer as they've
> downloaded the filters from.
> 
> However, I'm now convinced that the savings had by including the prev output
> script (addr re-use and outputs spent in the same block as they're created)
> outweigh the additional booking keeping required in an implementation (when
> extracting the precise tx that matched) compared to using regular outpoint
> as we do currently. Combined with the recently proposed re-parametrization
> of the gcs parameters[1], the filter size should shrink by quite a bit!
> 
> I'm very happy with the review the BIPs has been receiving as of late. It
> would've been nice to have this 1+ year ago when the draft was initially
> proposed, but better late that never!
> 
> Based on this thread, [1], and discussions on various IRC channels, I plan
> to make the following modifications to the BIP:
> 
>   1. use P=2^19 and M=784931 as gcs parameters, and also bind these to the
>      filter instance, so future filter types may use distinct parameters
>   2. use the prev output script rather than the prev input script in the
>      regular filter
>   3. remove the txid from the regular filter(as with some extra book-keeping
>      the output script is enough) 
>   4. do away with the extended filter all together, as our original use case
>      for it has been nerfed as the filter size grew too large when doing
>      recursive parsing. instead we watch for the outpoint being spent and
>      extract the pre-image from it if it matches now
> 
> The resulting changes should slash the size of the filters, yet still ensure
> that they're useful enough for our target use case.
> 
> [1]: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-May/016029.html
> 
> -- Laolu
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev