Thanks, Jimpo!

This is very encouraging, I think. I sorta assumed that separating the
elements into their own sub-filters would hurt the compression a lot more.
Can the compression ratio/false positive rate be tweaked with the
sub-filters in mind?

With the total size of the separated filters being no larger than the
combined filters, I see no benefit of combined filters? Committing to them
all in the headers would also save space, and we could ensure nodes are
serving all sub-filters.

- Johan

On Wed, May 23, 2018 at 9:38 AM, Jim Posen <jim.posen@gmail.com> wrote:

> So I checked filter sizes (as a proportion of block size) for each of the
> sub-filters. The graph is attached.
>
> As interpretation, the first ~120,000 blocks are so small that the
> Golomb-Rice coding can't compress the filters that well, which is why the
> filter sizes are so high proportional to the block size. Except for the
> input filter, because the coinbase input is skipped, so many of them have 0
> elements. But after block 120,000 or so, the filter compression converges
> pretty quickly to near the optimal value. The encouraging thing here is
> that if you look at the ratio of the combined size of the separated filters
> vs the size of a filter containing all of them (currently known as the
> basic filter), they are pretty much the same size. The mean of the ratio
> between them after block 150,000 is 99.4%. So basically, not much
> compression efficiently is lost by separating the basic filter into
> sub-filters.
>
> On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
>
>> My suggestion was to advertise a bitfield for each filter type the node
>>> serves,
>>> where the bitfield indicates what elements are part of the filters. This
>>> essentially
>>> removes the notion of decided filter types and instead leaves the
>>> decision to
>>> full-nodes.
>>>
>>
>> I think it makes more sense to construct entirely separate filters for
>> the different types of elements and allow clients to download only the ones
>> they care about. If there are enough elements per filter, the compression
>> ratio shouldn't be much worse by splitting them up. This prevents the
>> exponential blowup in the number of filters that you mention, Johan, and it
>> works nicely with service bits for advertising different filter types
>> independently.
>>
>> So if we created three separate filter types, one for output scripts, one
>> for input outpoints, and one for TXIDs, each signaled with a separate
>> service bit, are people good with that? Or do you think there shouldn't be
>> a TXID filter at all, Matt? I didn't include the option of a prev output
>> script filter or rolling that into the block output script filter because
>> it changes the security model (cannot be proven to be correct/incorrect
>> succinctly).
>>
>> Then there's the question of whether to separate or combine the headers.
>> I'd lean towards keeping them separate because it's simpler that way.
>>
>
>