public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [Bitcoin-development] Building a node crawler to map network
@ 2011-09-06  7:42 Steve
  2011-09-06  8:29 ` Steve
  2011-09-06 14:36 ` Rick Wesson
  0 siblings, 2 replies; 10+ messages in thread
From: Steve @ 2011-09-06  7:42 UTC (permalink / raw)
  To: bitcoin-development

Hi All,

I started messing around today with building a node crawler to try and 
map out the bitcoin network and hopefully provide some useful 
statistics.  It's very basic so far using a mutilated bitcoinj to 
connect (due me being java developer and not having a clue with c/c++). 
  If it's worthwhile I'll hack bitcoinj some more to run on top Netty to 
take advantage of it's NIO architecture (netty's been shown to handle 
1/2 million concurrent connections so would be ideal for the purpose).

Hoping to a get a bit of input into what would be useful as well as 
strategy for getting max possible connections without distorted data.  I 
seem to recall Gavin talking about the need for some kind of network 
health monitoring so I assume there's a need for something like this...

Firstly at the moment basically I'm just storing version message and the 
results of getaddr for each node that I can connect to.  Is there any 
other useful info that can be extracted from a node that's worth collecting?

Second and main issue is how to connect.  From my first very basic 
probing it seems the very vast majority of nodes don't accept incoming 
connections no doubt due to lack of upnp.  So it seems the active crawl 
approach is not really ideal for the purpose.  Even if it was used the 
resultant data would be hopelessly distorted.

A honeypot approach would probably be better if there was some way to 
make a node 'attractive' to other nodes to connect to.  That way it 
could capture non-listening nodes as well.  If there is some way to 
influence other nodes to connect to the crawler node that solves the 
problem.  If there isn't which I suspect is the case then perhaps 
another approach is to build an easy to deploy crawler node that many 
volunteers could run and that could then upload collected data to a 
central repository.

While I'm asking questions I'll add one more regarding the getaddr 
message.  It seems most nodes return about 1000 addresses in response to 
this message.  Obviously most of these nodes haven't actually talked to 
all 1000 on the list so where does this list come from?  Is it mixture 
of addresses obtained from other nodes somehow sorted by timestamp? 
Does it include some nodes discovered by IRC/DNS? Or are those only used 
to find the first nodes to connect to?

Thanks for any input... Hopefully I can build something that's useful 
for the network...



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06  7:42 [Bitcoin-development] Building a node crawler to map network Steve
@ 2011-09-06  8:29 ` Steve
  2011-09-06  8:36   ` Christian Decker
  2011-09-06 14:36 ` Rick Wesson
  1 sibling, 1 reply; 10+ messages in thread
From: Steve @ 2011-09-06  8:29 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1288 bytes --]


>
> While I'm asking questions I'll add one more regarding the getaddr 
> message.

Talking to myself here.  I just sent this message then found this 
brilliant set of articles in the Dev & Tech forum which answers the 
question very nicely: *https://bitcointalk.org/index.php?topic=41722.0 
<https://bitcointalk.org/index.php?board=6.0>
*
Anyway just as an FYI I've been running v0.0.0.0.0.0.0.0.1 for about an 
hour.  It's only running 10 concurrent connections due to girlfriend 
complaining she couldn't watch youtube but here's some early results.

New nodes: 19319 // node address discovered but no contact attempt made yet
Contacted nodes: 754
Uncontactable nodes: 3253
Limbo nodes: 9 //not as exciting as it sounds, just nodes with connect 
in progress
Total nodes: 23335 // about 5000 from initial IRC discover, the rest are 
from getaddr

Versions: {
300=1,
31900=7,
31902=1,
32000=2,
32001=7,
32002=22,
32100=100,
32200=24,
32300=277,
32400=317,
32500=2}

Fails: {
ConnectException: Connection refused=377,
IOException: Socket is disconnected=87,
SocketException: Network is unreachable=2,
ProtocolException: Error deserializing message =1,
NoRouteToHostException: No route to host=115,
SocketException: Connection reset=149,
SocketTimeoutException: connect timed out=2521}



[-- Attachment #2: Type: text/html, Size: 2019 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06  8:29 ` Steve
@ 2011-09-06  8:36   ` Christian Decker
  2011-09-06 12:49     ` Mike Hearn
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Decker @ 2011-09-06  8:36 UTC (permalink / raw)
  To: shadders.del; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2451 bytes --]

Hi Steve,

before attempting to hack BitcoinJ to use NIO you might want to take a look
at BitDroid (https://github.com/cdecker/BitDroid-Network), which is my
attempt to build an easily extensible network client (no crypto stuff so
far) on top of NIO and a simple publish-subscribe architecture. I build a
crawler like yours with just a single class that subscribes to events
published and closes and opens connections to crawl.

HTH,
Christian

On Tue, Sep 6, 2011 at 10:29 AM, Steve <shadders.del@gmail•com> wrote:

> **
>
>
> While I'm asking questions I'll add one more regarding the getaddr message.
>
>
>
> Talking to myself here.  I just sent this message then found this brilliant
> set of articles in the Dev & Tech forum which answers the question very
> nicely: *https://bitcointalk.org/index.php?topic=41722.0<https://bitcointalk.org/index.php?board=6.0>
> *
> Anyway just as an FYI I've been running v0.0.0.0.0.0.0.0.1 for about an
> hour.  It's only running 10 concurrent connections due to girlfriend
> complaining she couldn't watch youtube but here's some early results.
>
> New nodes: 19319 // node address discovered but no contact attempt made yet
> Contacted nodes: 754
> Uncontactable nodes: 3253
> Limbo nodes: 9 //not as exciting as it sounds, just nodes with connect in
> progress
> Total nodes: 23335 // about 5000 from initial IRC discover, the rest are
> from getaddr
>
> Versions: {
> 300=1,
> 31900=7,
> 31902=1,
> 32000=2,
> 32001=7,
> 32002=22,
> 32100=100,
> 32200=24,
> 32300=277,
> 32400=317,
> 32500=2}
>
> Fails: {
> ConnectException: Connection refused=377,
> IOException: Socket is disconnected=87,
> SocketException: Network is unreachable=2,
> ProtocolException: Error deserializing message =1,
> NoRouteToHostException: No route to host=115,
> SocketException: Connection reset=149,
> SocketTimeoutException: connect timed out=2521}
>
>
>
>
> ------------------------------------------------------------------------------
> Special Offer -- Download ArcSight Logger for FREE!
> Finally, a world-class log management solution at an even better
> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
> download Logger. Secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsisghtdev2dev
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

[-- Attachment #2: Type: text/html, Size: 3565 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06  8:36   ` Christian Decker
@ 2011-09-06 12:49     ` Mike Hearn
  2011-09-06 13:27       ` Steve
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Hearn @ 2011-09-06 12:49 UTC (permalink / raw)
  To: Christian Decker; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 239 bytes --]

Actually Steve, take a look at the bitcoinj mailing list today. Somebody has
already built this and has it running. It's accumulating data at the moment,
they'll announce it more widely soon. But I think there's no need to
duplicate work.

[-- Attachment #2: Type: text/html, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06 12:49     ` Mike Hearn
@ 2011-09-06 13:27       ` Steve
  2011-09-06 13:31         ` Mike Hearn
  0 siblings, 1 reply; 10+ messages in thread
From: Steve @ 2011-09-06 13:27 UTC (permalink / raw)
  To: Mike Hearn; +Cc: bitcoin-development

Hi Mike,

I've looked but can't find a post like you're talking about.  Can you 
point me to it?

If so then bollocks... I'm looking for something useful to do atm.  
PoolServerJ is in a holding pattern atm as I've stabilisied all the bugs 
I know about and am waiting for several pools to finish testing and move 
into production so I'm twiddling thumbs trying to figure out how to 
spend my time.

On 06/09/11 22:49, Mike Hearn wrote:
> Actually Steve, take a look at the bitcoinj mailing list today. 
> Somebody has already built this and has it running. It's accumulating 
> data at the moment, they'll announce it more widely soon. But I think 
> there's no need to duplicate work.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06 13:27       ` Steve
@ 2011-09-06 13:31         ` Mike Hearn
  2011-09-06 14:17           ` Steve
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Hearn @ 2011-09-06 13:31 UTC (permalink / raw)
  To: shadders.del; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

>
> I've looked but can't find a post like you're talking about.  Can you point
> me to it?
>

https://groups.google.com/forum/?pli=1#!topic/bitcoinj/LSlZdUWcCdk


> If so then bollocks... I'm looking for something useful to do atm.
>  PoolServerJ is in a holding pattern atm as I've stabilisied all the bugs I
> know about and am waiting for several pools to finish testing and move into
> production so I'm twiddling thumbs trying to figure out how to spend my
> time.
>

Patches to BitCoinJ are always welcome :-)

If you'd rather do your own thing, you could experiment with writing a proxy
that sits in front of bitcoind and multiplexes connections. Gavin is
concerned about socket exhaustion as users move to lightweight clients.
Multiplexing proxies are a battle-tested technique for reducing the strain
of this type of thing. BitCoinJ uses thread-per-connection so wouldn't do a
good job of that right now, but allowing it to use a mix of async io and
multi-threading would be a nice improvement. It'd need some changes to
bitcoind as well for a really good effort, to allow for IPs to be forwarded.
I'm happy to discuss it more with you over on the bitcoinj list if wanted.

[-- Attachment #2: Type: text/html, Size: 1686 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06 13:31         ` Mike Hearn
@ 2011-09-06 14:17           ` Steve
  2011-09-06 14:52             ` Mike Hearn
  0 siblings, 1 reply; 10+ messages in thread
From: Steve @ 2011-09-06 14:17 UTC (permalink / raw)
  To: Mike Hearn; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

Hi Mike,

I expect I'll be submitting patches for bitcoinj sometime in the future 
but I'm not really across it yet to the point where I'd be confident 
submitting patches right now...

This proxy sound like a good match for what I've been up to lately 
though so long as it wouldn't involve direct changes to bitcoind on my 
part.  My c/c++ skills are non-existent.

However I have been building a pool protocol using protobufs and netty 
for non-blocking IO and I'd imagine the kind of multiplexing proxy 
you're talking about could be easily implemented using netty.

I'm not really understanding the use case though.  I believe most 
bitcoind's have a default max connections of 8.  Is the goal to increase 
this without fundamentally altering the bitcoind concurrency model?  Or 
is it to provide capactity for a more hub/client oriented network?  If 
the latter then presumably this is functionality that should ideally be 
native to the client in the long term in the form of NIO?

On 06/09/11 23:31, Mike Hearn wrote:
>
>     I've looked but can't find a post like you're talking about.  Can
>     you point me to it?
>
> https://groups.google.com/forum/?pli=1#!topic/bitcoinj/LSlZdUWcCdk 
> <https://groups.google.com/forum/?pli=1#%21topic/bitcoinj/LSlZdUWcCdk>
>
>     If so then bollocks... I'm looking for something useful to do atm.
>      PoolServerJ is in a holding pattern atm as I've stabilisied all
>     the bugs I know about and am waiting for several pools to finish
>     testing and move into production so I'm twiddling thumbs trying to
>     figure out how to spend my time.
>
>
> Patches to BitCoinJ are always welcome :-)
>
> If you'd rather do your own thing, you could experiment with writing a 
> proxy that sits in front of bitcoind and multiplexes connections. 
> Gavin is concerned about socket exhaustion as users move to 
> lightweight clients. Multiplexing proxies are a battle-tested 
> technique for reducing the strain of this type of thing. BitCoinJ uses 
> thread-per-connection so wouldn't do a good job of that right now, but 
> allowing it to use a mix of async io and multi-threading would be a 
> nice improvement. It'd need some changes to bitcoind as well for a 
> really good effort, to allow for IPs to be forwarded. I'm happy to 
> discuss it more with you over on the bitcoinj list if wanted.

[-- Attachment #2: Type: text/html, Size: 3612 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06  7:42 [Bitcoin-development] Building a node crawler to map network Steve
  2011-09-06  8:29 ` Steve
@ 2011-09-06 14:36 ` Rick Wesson
  1 sibling, 0 replies; 10+ messages in thread
From: Rick Wesson @ 2011-09-06 14:36 UTC (permalink / raw)
  To: shadders.del; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 3330 bytes --]

I've got minna patches for nio based on bitcoinj. I've enumerated the
network a few times and am working on a DNS seed service as well as some
weather reports.

Happy to start a branch when the committers are ready.

-rick


On Tue, Sep 6, 2011 at 12:42 AM, Steve <shadders.del@gmail•com> wrote:

> Hi All,
>
> I started messing around today with building a node crawler to try and
> map out the bitcoin network and hopefully provide some useful
> statistics.  It's very basic so far using a mutilated bitcoinj to
> connect (due me being java developer and not having a clue with c/c++).
>  If it's worthwhile I'll hack bitcoinj some more to run on top Netty to
> take advantage of it's NIO architecture (netty's been shown to handle
> 1/2 million concurrent connections so would be ideal for the purpose).
>
> Hoping to a get a bit of input into what would be useful as well as
> strategy for getting max possible connections without distorted data.  I
> seem to recall Gavin talking about the need for some kind of network
> health monitoring so I assume there's a need for something like this...
>
> Firstly at the moment basically I'm just storing version message and the
> results of getaddr for each node that I can connect to.  Is there any
> other useful info that can be extracted from a node that's worth
> collecting?
>
> Second and main issue is how to connect.  From my first very basic
> probing it seems the very vast majority of nodes don't accept incoming
> connections no doubt due to lack of upnp.  So it seems the active crawl
> approach is not really ideal for the purpose.  Even if it was used the
> resultant data would be hopelessly distorted.
>
> A honeypot approach would probably be better if there was some way to
> make a node 'attractive' to other nodes to connect to.  That way it
> could capture non-listening nodes as well.  If there is some way to
> influence other nodes to connect to the crawler node that solves the
> problem.  If there isn't which I suspect is the case then perhaps
> another approach is to build an easy to deploy crawler node that many
> volunteers could run and that could then upload collected data to a
> central repository.
>
> While I'm asking questions I'll add one more regarding the getaddr
> message.  It seems most nodes return about 1000 addresses in response to
> this message.  Obviously most of these nodes haven't actually talked to
> all 1000 on the list so where does this list come from?  Is it mixture
> of addresses obtained from other nodes somehow sorted by timestamp?
> Does it include some nodes discovered by IRC/DNS? Or are those only used
> to find the first nodes to connect to?
>
> Thanks for any input... Hopefully I can build something that's useful
> for the network...
>
>
> ------------------------------------------------------------------------------
> Special Offer -- Download ArcSight Logger for FREE!
> Finally, a world-class log management solution at an even better
> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
> download Logger. Secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsisghtdev2dev
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 4105 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06 14:17           ` Steve
@ 2011-09-06 14:52             ` Mike Hearn
  2011-09-06 15:25               ` Steve
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Hearn @ 2011-09-06 14:52 UTC (permalink / raw)
  To: shadders.del; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2549 bytes --]

On Tue, Sep 6, 2011 at 4:17 PM, Steve <shadders.del@gmail•com> wrote:

> **
> I'm not really understanding the use case though.  I believe most
> bitcoind's have a default max connections of 8.  Is the goal to increase
> this without fundamentally altering the bitcoind concurrency model?
>

bitcoind already uses asynchronous IO. That's not the problem.

The issue came up in a conversation about scalability. If Bitcoins
popularity continues to grow, users are very likely to migrate away from
running full verifying nodes to lightweight clients, either a different mode
of the Satoshi client or different implementations like the Android Wallet
or MultiBit.

Lightweight clients cannot verify thus should not relay. And they'll be run
by users who just want to send/receive coins from time to time, so don't
leave the programs running 24/7. The result could be running out of sockets
(like we have had problems with recently). It's especially true because
lightweight clients cannot check transactions for themselves. If they want
to show transactions appearing immediately (and they do), they have to use
"heard from lots of nodes" as a proxy for validity. So lightweight clients
are likely to be socket intensive.

We could solve this by just hoping that lots of people run full nodes. The
problem is that a full node is quite an intensive thing already, it uses
lots of CPU and disk seeks, and will just get more expensive in future. And
as transaction traffic increases, that leaves less CPU time available to
service thousands of connected clients. The ROI of bringing up a new node
decreases at the same time as the userbase increases.

One traditional approach to solving this is frontend proxies. Jabber.com/org
used this technique many years ago, and Google has also used it to scale up the
lockservice<http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/chubby-osdi06.pdf>
(see
section 3.1). It's effective because often maintaining connections to
thousands of clients doesn't involve much brainwork, just shifting bytes
around. This is especially true of Bitcoin. So if somebody is running a full
node already they could increase their client capacity by just bringing up a
frontend proxy and having it handle things like outbound tx
broadcasts/deduping inbound broadcasts, connection setup, relaying recently
found blocks etc. A well written proxy could probably support tens of
thousands of simultaneous clients which frees up the bitcoinds time for
verification and wallet manipulation.

[-- Attachment #2: Type: text/html, Size: 3033 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bitcoin-development] Building a node crawler to map network
  2011-09-06 14:52             ` Mike Hearn
@ 2011-09-06 15:25               ` Steve
  0 siblings, 0 replies; 10+ messages in thread
From: Steve @ 2011-09-06 15:25 UTC (permalink / raw)
  To: Mike Hearn; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

Thanks for the overview Mike.  I just bailed up Gavin on IRC and between 
that convo and what you've just written I'm starting to picture a plan 
in my head... This sounds right up my alley, I wish I didn't have to go 
to bed right now as I've got a ton of ideas buzzing around I'd like to 
get started on right now.  But I'll be onto it as soon as I've got a 
free moment...

On 07/09/11 00:52, Mike Hearn wrote:
> On Tue, Sep 6, 2011 at 4:17 PM, Steve <shadders.del@gmail•com 
> <mailto:shadders.del@gmail•com>> wrote:
>
>     I'm not really understanding the use case though.  I believe most
>     bitcoind's have a default max connections of 8.  Is the goal to
>     increase this without fundamentally altering the bitcoind
>     concurrency model?
>
>
> bitcoind already uses asynchronous IO. That's not the problem.
>
> The issue came up in a conversation about scalability. If Bitcoins 
> popularity continues to grow, users are very likely to migrate away 
> from running full verifying nodes to lightweight clients, either a 
> different mode of the Satoshi client or different implementations like 
> the Android Wallet or MultiBit.
>
> Lightweight clients cannot verify thus should not relay. And they'll 
> be run by users who just want to send/receive coins from time to time, 
> so don't leave the programs running 24/7. The result could be running 
> out of sockets (like we have had problems with recently). It's 
> especially true because lightweight clients cannot check transactions 
> for themselves. If they want to show transactions appearing 
> immediately (and they do), they have to use "heard from lots of nodes" 
> as a proxy for validity. So lightweight clients are likely to be 
> socket intensive.
>
> We could solve this by just hoping that lots of people run full nodes. 
> The problem is that a full node is quite an intensive thing already, 
> it uses lots of CPU and disk seeks, and will just get more expensive 
> in future. And as transaction traffic increases, that leaves less CPU 
> time available to service thousands of connected clients. The ROI of 
> bringing up a new node decreases at the same time as the userbase 
> increases.
>
> One traditional approach to solving this is frontend proxies. 
> Jabber.com/org used this technique many years ago, and Google has also 
> used it to scale up the lockservice 
> <http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/chubby-osdi06.pdf> (see 
> section 3.1). It's effective because often maintaining connections to 
> thousands of clients doesn't involve much brainwork, just shifting 
> bytes around. This is especially true of Bitcoin. So if somebody is 
> running a full node already they could increase their client capacity 
> by just bringing up a frontend proxy and having it handle things like 
> outbound tx broadcasts/deduping inbound broadcasts, connection setup, 
> relaying recently found blocks etc. A well written proxy could 
> probably support tens of thousands of simultaneous clients which frees 
> up the bitcoinds time for verification and wallet manipulation.

[-- Attachment #2: Type: text/html, Size: 4453 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-09-06 15:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-06  7:42 [Bitcoin-development] Building a node crawler to map network Steve
2011-09-06  8:29 ` Steve
2011-09-06  8:36   ` Christian Decker
2011-09-06 12:49     ` Mike Hearn
2011-09-06 13:27       ` Steve
2011-09-06 13:31         ` Mike Hearn
2011-09-06 14:17           ` Steve
2011-09-06 14:52             ` Mike Hearn
2011-09-06 15:25               ` Steve
2011-09-06 14:36 ` Rick Wesson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox