--- Log opened Sun May 05 00:00:03 2019 00:22 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 252 seconds] 00:44 -!- Amperture [~amp@24.136.5.183] has joined #c-lightning 01:42 <@cdecker> We appreciate the help, but there was no need for you to hurt yourself for it :-) 03:43 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...] 03:56 -!- spinza [~spin@155.93.246.187] has joined #c-lightning 04:21 < molz> cdecker, no pain, no gain? :D 05:12 < m-schmoock> :D 05:35 < k3tan> looking to send an invoiceless payment, who wants it? 06:04 -!- jtimon [~quassel@181.61.134.37.dynamic.jazztel.es] has joined #c-lightning 06:45 < t0mix> "wallet_utxoset_prune: database disk image is malformed" .. that doesn't seems right :( no server reboot, just this out of the sudden 06:52 < m-schmoock> k3tan: 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41 :D 06:53 < m-schmoock> im kurrently working on finalizing the 'receivedinvoiceless' call 07:01 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...] 07:03 < m-schmoock> cdecker: my master mainnet node crashed this noon with this log: https://pastebin.com/3m2RrCrh (gossip related) 07:04 < m-schmoock> revision was aa9284ea 07:08 -!- spinza [~spin@155.93.246.187] has joined #c-lightning 07:21 < t0mix> did anyone ever tried to recover sqlite3 file? 07:33 < m-schmoock> nope 07:33 < t0mix> this will be fun 07:37 <@cdecker> m-schmoock: sounds to me like the bug fixed in https://github.com/ElementsProject/lightning/pull/2609 07:37 <@cdecker> t0mix: what happened to your sqlite3? 07:50 < m-schmoock> cdecker: that patch was defintively in my compiled binary 07:51 < m-schmoock> and the sympton was different this time: 'normally' gossip store truncation problems lead to a failed start where the second worked. this time my node crashed during normal operation 07:53 <@cdecker> Hm, shall we reopen #2583 do you think? 07:54 < m-schmoock> i think so, but we can wait until the one runs into this 07:55 < m-schmoock> whats interesting my node didnt even write a crash.log 07:56 < m-schmoock> i found it in the normal logs after I noticed that it was down 07:56 < m-schmoock> *until the next one 07:57 < m-schmoock> also it contained this line a second before the crash 07:57 < m-schmoock> 2019-05-05T11:57:27.746Z lightning_gossipd(4757): Failed reading 4250484004 from to gossip store @1692454: Success 07:57 < m-schmoock> (just rechecked everything) 07:58 < m-schmoock> *not the second before, but a page before in the log 07:58 <@cdecker> Interesting that log line suggests we have a corrupt msglen when reading the gossipstore 07:59 <@cdecker> It tried reading 4.25 GB into memory :-) 07:59 < m-schmoock> uh 07:59 < m-schmoock> not good 08:03 < t0mix> @cdecker, I'd really like to know. I just connected to server and found CLN down. in crash log are 2 lines: 08:03 < t0mix> lightningd(24630):BROKEN: wallet_utxoset_prune: database disk image is malformed 08:03 < t0mix> lightningd(24630):BROKEN: FATAL SIGNAL 6 (version v0.7.0-1-g0631490) 08:04 <@cdecker> Interesting 08:04 < t0mix> CLN is not starting now. I tried to run "PRAGMA integrity_check". here is the output 08:05 < t0mix> https://pastebin.com/6QYwftQf 08:05 < t0mix> no idea how to recover so far 08:05 <@cdecker> Ouch 08:05 <@cdecker> Any disk corruption? (dmesg should show you IO errors if that's the case) 08:06 <@cdecker> With a bit of luck only non-important tables are affected 08:06 < t0mix> f*ck.. 08:06 < t0mix> [2620025.929026] EXT4-fs error (device md1): ext4_get_branch:171: inode #16: block 2416100466: comm lightningd: invalid block 08:06 < t0mix> [2620028.692280] EXT4-fs error (device md1): ext4_get_branch:171: inode #16: block 4294967295: comm lightningd: invalid block 08:07 <@cdecker> Ok, copy over the DB to some other disk and try to recover from there instead 08:07 <@cdecker> Seems the EXT4 FS is corrupt under the DB 08:08 < t0mix> mhm.. only, how to recover? restart won't make it, does it? 08:08 <@cdecker> Probably best not to try and restart the node in this condition 08:09 <@cdecker> Nope, need to export the data from the DB and re-initialize a new DB with the recovered data 08:09 <@cdecker> This seems to give some good instrution: https://techblog.dorogin.com/sqliteexception-database-disk-image-is-malformed-77e59d547c50?gi=99fb6fecea5c 08:09 < t0mix> I'll give it a try 08:10 <@cdecker> Let us know if it works, I'm keeping my fingers crossed 08:10 <@cdecker> fwiw the utxoset table mentioned in the error is not important, so hopefully it's just that table 08:10 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning 08:15 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Excess Flood] 08:19 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning 08:24 < m-schmoock> cdecker: I appended my logs from the crash to the #2609 conversation incase we see this again 08:25 <@cdecker> Thanks ^^ 08:25 < t0mix> node recovered 08:26 < m-schmoock> gz :D 08:26 < t0mix> lol.. I have removed "suspicious" disk from mirror and left "hopefully better disk" in array. .. DB file seems healthy now. 08:26 < t0mix> maybe I should buy new USB stick 08:27 < m-schmoock> cdecker: when im done with the plugin repo PR (invoiceless cleanups and receive), is it okay if I do the merge or do you still want to have a final look on the plugin repo? 08:28 < m-schmoock> im talking with zoltan about code in anyway 08:29 < t0mix> here, life saving advise for you all.. use mirrored devices =D 08:29 < m-schmoock> already do 08:29 < m-schmoock> simplest and most effective raid of all 09:10 < t0mix> funny is that disk was not kicked out of the raid, it appeared healthy. but it was not. it is 2nd USB stick within 6 months. first time it was kicked out of raid. the very same HW type. it was the cheapest one I could get (5 eur). maybe it is worth investing a bit more in more reliable flash disks. 10:05 <@cdecker> m-schmoock: let me have a look and I'll add my ACK to it 10:05 <@cdecker> Let's keep the same process as we do for the ElementsProject/lightning repo :-) 10:28 <@cdecker> t0mix: glad you were able to recover the DB, had me worried for a second there ^^ 10:34 < m-schmoock> ack 14:14 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning 14:19 < grubles> i think it would be useful to show # of payments routed in the output of getinfo 14:20 < grubles> total routing fees collected is already included 14:24 < grubles> oh cool at least we have listforwards now 15:39 -!- spinza [~spin@155.93.246.187] has quit [Quit: Coyote finally caught up with me...] 15:52 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 16:07 -!- spinza [~spin@155.93.246.187] has joined #c-lightning 16:32 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Quit: Leaving.] 16:34 -!- belcher [~belcher@unaffiliated/belcher] has quit [Quit: Leaving] 16:49 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 16:50 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 256 seconds] 17:42 -!- drexl [~drexl@cpc130676-camd16-2-0-cust445.know.cable.virginm.net] has joined #c-lightning 17:59 -!- jb55 [~jb55@S010660e327dca171.vc.shawcable.net] has joined #c-lightning 18:55 -!- jtimon [~quassel@181.61.134.37.dynamic.jazztel.es] has quit [Ping timeout: 250 seconds] 19:51 -!- drexl [~drexl@cpc130676-camd16-2-0-cust445.know.cable.virginm.net] has quit [Quit: drexl] 20:08 -!- EagleTM [~EagleTM@unaffiliated/eagletm] has joined #c-lightning 20:09 -!- Eagle[TM] [~EagleTM@unaffiliated/eagletm] has quit [Ping timeout: 250 seconds] 21:46 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 248 seconds] 22:01 < rusty> Urg, I spent a day and a half rewriting all our packet creation code to optimize it. But turns out, with '-O3 -flto' the compiler is smart enoough that it makes no real difference. 22:13 < jb55> rusty: gcc9? 22:14 < rusty> jb55: Even gcc8. It's easier to run perf without optimization, so I've been doing that, and it deeply mislead me :(* 22:20 < jb55> rusty: what are the main hotspots atm? 22:22 < rusty> jb55: well, after my current patches, I've got all the obvious ones. If I could answer that w I would be working on it I guess ? 22:30 -!- EagleTM [~EagleTM@unaffiliated/eagletm] has quit [Ping timeout: 248 seconds] 22:38 < rusty> jb55: I'm mainly disturbed that we still use 550M with 1M channels, TBH 22:38 < jb55> rusty: do we have heap profiles ? 22:51 < rusty> jb55: I was about to run massif and take a look... --- Log closed Mon May 06 00:00:04 2019