public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Eric Voskuil <eric@voskuil•org>
To: Bram Cohen <bram@bittorrent•com>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists•linuxfoundation.org>,
	Gregory Maxwell <greg@xiph•org>
Subject: Re: [bitcoin-dev] Using a storage engine without UTXO-index
Date: Fri, 7 Apr 2017 12:55:58 -0700	[thread overview]
Message-ID: <cb45ab6e-f3d0-85f2-4b41-8c7bc2bdf399@voskuil.org> (raw)
In-Reply-To: <CA+KqGko0cDY29bhznMxJJ7yAUTuB6GaDDNGBRwzssJUxM_53xQ@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/07/2017 11:39 AM, Bram Cohen via bitcoin-dev wrote:
> Expanding on this question a bit, it's optimized for parallel
> access, but hard drive access isn't parallel and memory accesses
> are very fast, so shouldn't the target of optimization be about
> cramming as much as possible in memory and minimizing disk
> accesses?

While this may seem to be the case it is not generally optimal. The
question is overly broad as one may or may not be optimizing for any
combination of:

startup time (first usability)
warm-up time (priming)
shutdown time (flush)
fault tolerance (hard shutdown survivability)
top block validation (read speed)
full chain validation (read/write speed)
RAM consumption
Disk consumption
Query response
Servers (big RAM)
Desktops (small RAM)
Mining (fast validation)
Wallets (background performance)
SSD vs. HDD

But even limiting the question to input validation, all of these
considerations (at least) are present.

Ideally one wants the simplest implementation that is optimal under
all considerations. While this may be a unicorn, it is possible to
achieve a simple implementation (relative to alternatives) that allows
for the trade-offs necessary to be managed through configuration (by
the user and/or implementation).

Shoving the entire data set into RAM has the obvious problem of
limited RAM. Eventually the OS will be paging more of the data back to
disk (as virtual RAM). In other words this does not scale, as a change
in hardware disproportionately impacts performance. Ideally one wants
the trade between "disk" and "memory" to be made by the underlying
platform, as that is its purpose. Creating one data structure for disk
and another for memory not only increases complexity, but denies the
platform visibility into this trade-off. As such the platform
eventually ends up working directly against the optimization.

An on-disk structure that is not mapped into memory by the application
allows the operating system to maintain as much or as little state in
memory as it considers optimal, given the other tasks that the user
has given it. In the case of memory mapped files (which are optimized
by all operating systems as central to their virtual memory systems)
it is possible for everything from zero to the full store to be memory
resident.

Optimization for lower memory platforms then becomes a process of
reducing the need for paging. This is the purpose of a cache. The seam
between disk and memory can be filled quite nicely by a small amount
of cache. On high RAM systems any cache is actually a de-optimization
but on low RAM systems it can prevent excessive paging. This is
directly analogous to a CPU cache. There are clear optimal points in
terms of cache size, and the implementation and management of such a
cache can and should be internal to a store. Of course a cache cannot
provide perfect scale all the way to zero RAM, but it scales quite
well for actual systems.

While a particular drive may not support parallel operations one
should not assume that a disk-based store does not benefit from
parallelism. Simply refer to the model described above and you will
see that with enough memory the entire blockchain can be
memory-resident, and for high performance operations a fraction of
that is sufficient for a high degree of parallelism.

In practice a cache of about 10k transactions worth of outputs is
optimal for 8GB RAM. This requires just a few blocks for warm-up,
which can be primed in inconsequential time at startup. Fault
tolerance can be managed by flushing after all writes, which also
reduces shutdown time to zero. For higher performance systems,
flushing can be disabled entirely, increasing shutdown time but also
dramatically increasing write performance. Given that the blockchain
is a cache, this is a very reasonable trade-off in some scenarios. The
model works just as well with HDD as SSD, although certainly SSD
performs better overall.

e
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY5+7GAAoJEDzYwH8LXOFOsAsH/3QK55aWH6sAi6OsTwV1FLZV
Y/2SSjwn1vUh55MDkPpCxDwV99JqVwpk0vGM8mGg5s4ZS8sxOPqwGiBz/SZWbF9v
oStJS0DjUPnbYtI/mrC30GuAYVcKnc5DFDHvjX6f0xrLIzViFR7eiW0npUH6Xipt
RI9Mockaf1CqqGExtbIqWal0YDEQGH0ekXRp7uEjh8nPUoKqTVvxDCgqVooQfvfx
EeKX9ruSv/r91EM1JQuH8HBBF7+R24tmMtwbpGx0zrDg5ytpIyrRzVH/ze1Mj2a3
ZxThvofGzhKcDiTPWiJI11DBYUvhSH4Kx0uWLzFUA0gxPfWkZQKJWNDl2CEwljk=
=C7rD
-----END PGP SIGNATURE-----


  reply	other threads:[~2017-04-07 19:55 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 22:12 Tomas
2017-04-06 23:38 ` Eric Voskuil
2017-04-07  0:17   ` Tomas
2017-04-08 22:37     ` Eric Voskuil
2017-04-08 23:58       ` Tomas
2017-04-11  1:44         ` Eric Voskuil
2017-04-11  8:43           ` Tomas
2017-04-11  9:41             ` Eric Voskuil
2017-04-11 10:04               ` Tomas
     [not found] ` <CAAS2fgTEMCkDWdhCWt1EsUrnt3+Z_8m+Y1PTsff5Rc0CBnCKWQ@mail.gmail.com>
2017-04-07  0:48   ` Tomas
2017-04-07  1:09     ` Gregory Maxwell
2017-04-07  1:29       ` Tomas
2017-04-07 18:52         ` Tom Harding
2017-04-07 19:42           ` Gregory Maxwell
2017-04-08 18:27             ` Tom Harding
2017-04-08 19:23               ` Tomas
2017-04-07  7:55 ` Marcos mayorga
2017-04-07  8:47   ` Tomas
2017-04-07 14:14     ` Greg Sanders
2017-04-07 16:02       ` Tomas
2017-04-07 18:18 ` Gregory Maxwell
2017-04-07 18:39   ` Bram Cohen
2017-04-07 19:55     ` Eric Voskuil [this message]
2017-04-07 21:44       ` Tomas
2017-04-07 23:51         ` Eric Voskuil
2017-04-07 21:14     ` Tomas
2017-04-08  0:44       ` Gregory Maxwell
2017-04-08  7:28         ` Tomas
2017-04-08 19:23           ` Johnson Lau
2017-04-08 19:56             ` Tomas
2017-04-08 20:21               ` Johnson Lau
2017-04-08 20:42                 ` Tomas
2017-04-08 22:12                 ` Gregory Maxwell
2017-04-08 22:34                   ` Tomas
2017-04-08 21:22     ` Troy Benjegerdes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb45ab6e-f3d0-85f2-4b41-8c7bc2bdf399@voskuil.org \
    --to=eric@voskuil$(echo .)org \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    --cc=bram@bittorrent$(echo .)com \
    --cc=greg@xiph$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox