--- Log opened Sun May 05 00:00:03 2019
00:22 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 252 seconds]
00:44 -!- Amperture [~amp@24.136.5.183] has joined #c-lightning
01:42 <@cdecker> We appreciate the help, but there was no need for you to hurt yourself for it :-)
03:43 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...]
03:56 -!- spinza [~spin@155.93.246.187] has joined #c-lightning
04:21 < molz> cdecker, no pain, no gain?  :D
05:12 < m-schmoock> :D
05:35 < k3tan> looking to send an invoiceless payment, who wants it?
06:04 -!- jtimon [~quassel@181.61.134.37.dynamic.jazztel.es] has joined #c-lightning
06:45 < t0mix> "wallet_utxoset_prune: database disk image is malformed" .. that doesn't seems right :( no server reboot, just this out of the sudden
06:52 < m-schmoock> k3tan: 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41 :D
06:53 < m-schmoock> im kurrently working on finalizing the 'receivedinvoiceless' call
07:01 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...]
07:03 < m-schmoock> cdecker: my master mainnet node crashed this noon with this log: https://pastebin.com/3m2RrCrh    (gossip related)
07:04 < m-schmoock> revision was aa9284ea
07:08 -!- spinza [~spin@155.93.246.187] has joined #c-lightning
07:21 < t0mix> did anyone ever tried to recover sqlite3 file?
07:33 < m-schmoock> nope
07:33 < t0mix> this will be fun
07:37 <@cdecker> m-schmoock: sounds to me like the bug fixed in https://github.com/ElementsProject/lightning/pull/2609
07:37 <@cdecker> t0mix: what happened to your sqlite3?
07:50 < m-schmoock> cdecker: that patch was defintively in my compiled binary
07:51 < m-schmoock> and the sympton was different this time: 'normally' gossip store truncation problems lead to a failed start where the second worked. this time my node crashed during normal operation
07:53 <@cdecker> Hm, shall we reopen #2583 do you think?
07:54 < m-schmoock> i think so, but we can wait until the one runs into this 
07:55 < m-schmoock> whats interesting my node didnt even write a crash.log
07:56 < m-schmoock> i found it in the normal logs after I noticed that it was down
07:56 < m-schmoock> *until the next one
07:57 < m-schmoock> also it contained this line a second before the crash
07:57 < m-schmoock> 2019-05-05T11:57:27.746Z lightning_gossipd(4757): Failed reading 4250484004 from to gossip store @1692454: Success
07:57 < m-schmoock> (just rechecked everything)
07:58 < m-schmoock> *not the second before, but a page before in the log
07:58 <@cdecker> Interesting that log line suggests we have a corrupt msglen when reading the gossipstore
07:59 <@cdecker> It tried reading 4.25 GB into memory :-)
07:59 < m-schmoock> uh
07:59 < m-schmoock> not good
08:03 < t0mix> @cdecker, I'd really like to know. I just connected to server and found CLN down. in crash log are 2 lines:
08:03 < t0mix> lightningd(24630):BROKEN: wallet_utxoset_prune: database disk image is malformed
08:03 < t0mix> lightningd(24630):BROKEN: FATAL SIGNAL 6 (version v0.7.0-1-g0631490)
08:04 <@cdecker> Interesting
08:04 < t0mix> CLN is not starting now. I tried to run "PRAGMA integrity_check". here is the output
08:05 < t0mix> https://pastebin.com/6QYwftQf
08:05 < t0mix> no idea how to recover so far
08:05 <@cdecker> Ouch
08:05 <@cdecker> Any disk corruption? (dmesg should show you IO errors if that's the case)
08:06 <@cdecker> With a bit of luck only non-important tables are affected
08:06 < t0mix> f*ck..
08:06 < t0mix> [2620025.929026] EXT4-fs error (device md1): ext4_get_branch:171: inode #16: block 2416100466: comm lightningd: invalid block
08:06 < t0mix> [2620028.692280] EXT4-fs error (device md1): ext4_get_branch:171: inode #16: block 4294967295: comm lightningd: invalid block
08:07 <@cdecker> Ok, copy over the DB to some other disk and try to recover from there instead
08:07 <@cdecker> Seems the EXT4 FS is corrupt under the DB
08:08 < t0mix> mhm.. only, how to recover? restart won't make it, does it?
08:08 <@cdecker> Probably best not to try and restart the node in this condition
08:09 <@cdecker> Nope, need to export the data from the DB and re-initialize a new DB with the recovered data
08:09 <@cdecker> This seems to give some good instrution: https://techblog.dorogin.com/sqliteexception-database-disk-image-is-malformed-77e59d547c50?gi=99fb6fecea5c
08:09 < t0mix> I'll give it a try
08:10 <@cdecker> Let us know if it works, I'm keeping my fingers crossed
08:10 <@cdecker> fwiw the utxoset table mentioned in the error is not important, so hopefully it's just that table
08:10 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning
08:15 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Excess Flood]
08:19 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning
08:24 < m-schmoock> cdecker: I appended my logs from the crash to the #2609 conversation incase we see this again
08:25 <@cdecker> Thanks ^^
08:25 < t0mix> node recovered
08:26 < m-schmoock> gz :D
08:26 < t0mix> lol.. I have removed "suspicious" disk from mirror and left "hopefully better disk" in array. .. DB file seems healthy now.
08:26 < t0mix> maybe I should buy new USB stick
08:27 < m-schmoock> cdecker: when im done with the plugin repo PR (invoiceless cleanups and receive), is it okay if I do the merge or do you still want to have a final look on the plugin repo?
08:28 < m-schmoock> im talking with zoltan about code in anyway
08:29 < t0mix> here, life saving advise for you all.. use mirrored devices =D
08:29 < m-schmoock> already do
08:29 < m-schmoock> simplest and most effective raid of all
09:10 < t0mix> funny is that disk was not kicked out of the raid, it appeared healthy. but it was not. it is 2nd USB stick within 6 months. first time it was kicked out of raid. the very same HW type. it was the cheapest one I could get (5 eur). maybe it is worth investing a bit more in more reliable flash disks.
10:05 <@cdecker> m-schmoock: let me have  a look and I'll add my ACK to it
10:05 <@cdecker> Let's keep the same process as we do for the ElementsProject/lightning repo :-)
10:28 <@cdecker> t0mix: glad you were able to recover the DB, had me worried for a second there ^^
10:34 < m-schmoock> ack
14:14 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning
14:19 < grubles> i think it would be useful to show # of payments routed in the output of getinfo
14:20 < grubles> total routing fees collected is already included 
14:24 < grubles> oh cool at least we have listforwards now
15:39 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...]
15:52 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
16:07 -!- spinza [~spin@155.93.246.187] has joined #c-lightning
16:32 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Quit: Leaving.]
16:34 -!- belcher [~belcher@unaffiliated/belcher] has quit [Quit: Leaving]
16:49 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
16:50 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 256 seconds]
17:42 -!- drexl [~drexl@cpc130676-camd16-2-0-cust445.know.cable.virginm.net] has joined #c-lightning
17:59 -!- jb55 [~jb55@S010660e327dca171.vc.shawcable.net] has joined #c-lightning
18:55 -!- jtimon [~quassel@181.61.134.37.dynamic.jazztel.es] has quit [Ping timeout: 250 seconds]
19:51 -!- drexl [~drexl@cpc130676-camd16-2-0-cust445.know.cable.virginm.net] has quit [Quit: drexl]
20:08 -!- EagleTM [~EagleTM@unaffiliated/eagletm] has joined #c-lightning
20:09 -!- Eagle[TM] [~EagleTM@unaffiliated/eagletm] has quit [Ping timeout: 250 seconds]
21:46 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 248 seconds]
22:01 < rusty> Urg, I spent a day and a half rewriting all our packet creation code to optimize it.  But turns out, with '-O3 -flto' the compiler is smart enoough that it makes no real difference.
22:13 < jb55> rusty: gcc9?
22:14 < rusty> jb55: Even gcc8.  It's easier to run perf without optimization, so I've been doing that, and it deeply mislead me :(*
22:20 < jb55> rusty: what are the main hotspots atm?
22:22 < rusty> jb55: well, after my current patches, I've got all the obvious ones.  If I could answer that w I would be working on it I guess ?
22:30 -!- EagleTM [~EagleTM@unaffiliated/eagletm] has quit [Ping timeout: 248 seconds]
22:38 < rusty> jb55: I'm mainly disturbed that we still use 550M with 1M channels, TBH
22:38 < jb55> rusty: do we have heap profiles ?
22:51 < rusty> jb55: I was about to run massif and take a look...
--- Log closed Mon May 06 00:00:04 2019