Thanks, Jimpo! This is very encouraging, I think. I sorta assumed that separating the elements into their own sub-filters would hurt the compression a lot more. Can the compression ratio/false positive rate be tweaked with the sub-filters in mind? With the total size of the separated filters being no larger than the combined filters, I see no benefit of combined filters? Committing to them all in the headers would also save space, and we could ensure nodes are serving all sub-filters. - Johan On Wed, May 23, 2018 at 9:38 AM, Jim Posen wrote: > So I checked filter sizes (as a proportion of block size) for each of the > sub-filters. The graph is attached. > > As interpretation, the first ~120,000 blocks are so small that the > Golomb-Rice coding can't compress the filters that well, which is why the > filter sizes are so high proportional to the block size. Except for the > input filter, because the coinbase input is skipped, so many of them have 0 > elements. But after block 120,000 or so, the filter compression converges > pretty quickly to near the optimal value. The encouraging thing here is > that if you look at the ratio of the combined size of the separated filters > vs the size of a filter containing all of them (currently known as the > basic filter), they are pretty much the same size. The mean of the ratio > between them after block 150,000 is 99.4%. So basically, not much > compression efficiently is lost by separating the basic filter into > sub-filters. > > On Tue, May 22, 2018 at 5:42 PM, Jim Posen wrote: > >> My suggestion was to advertise a bitfield for each filter type the node >>> serves, >>> where the bitfield indicates what elements are part of the filters. This >>> essentially >>> removes the notion of decided filter types and instead leaves the >>> decision to >>> full-nodes. >>> >> >> I think it makes more sense to construct entirely separate filters for >> the different types of elements and allow clients to download only the ones >> they care about. If there are enough elements per filter, the compression >> ratio shouldn't be much worse by splitting them up. This prevents the >> exponential blowup in the number of filters that you mention, Johan, and it >> works nicely with service bits for advertising different filter types >> independently. >> >> So if we created three separate filter types, one for output scripts, one >> for input outpoints, and one for TXIDs, each signaled with a separate >> service bit, are people good with that? Or do you think there shouldn't be >> a TXID filter at all, Matt? I didn't include the option of a prev output >> script filter or rolling that into the block output script filter because >> it changes the security model (cannot be proven to be correct/incorrect >> succinctly). >> >> Then there's the question of whether to separate or combine the headers. >> I'd lean towards keeping them separate because it's simpler that way. >> > >