--- Log opened Fri Feb 14 00:00:40 2020
02:06 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 240 seconds]
03:42 -!- x7268 [~Funky0x@2a02:8070:a9e:8200:79c4:aed5:45fb:f3cf] has joined #c-lightning
03:44 < x7268> Hi everyone
03:44 < x7268> I am trying to restart my Lightning Node, which was shut down cleanly.
03:44 < x7268> But I get "Error executing statement: wallet/db.c:926: INSERT INTO payments (id, timestamp, status, payment_hash, destination, msatoshi, payment_preimage, path_secrets, route_nodes, route_channels, failonionreply, faildestperm, failindex, failcode, failnode, failchannel, failupdate, msatoshi_sent, faildetail, description, faildirection, bolt11)SELECT id, timestamp, status, payment_hash, destination, msatoshi, payment_preim
03:44 < x7268> age, path_secrets, route_nodes, route_channels, failonionreply, faildestperm, failindex, failcode, failnode, failchannel, failupdate, msatoshi_sent, faildetail, description, faildirection, bolt11 FROM temp_payments;: database disk image is malformed"
03:45 < x7268> I am aware of the risks fiddling with channel backups (didn't take any), and I hope my hsm_secret is intact.
03:45 < x7268> How do I recover from this?
03:48 < x7268> Reading the source, I see this is in db_get_version(struct db *db), which is an attempt to detect the schema. It's unclear to me why this fails.
03:50 < vasild> x7268: can you run "sqlite3 ~/.lightning/bitcoin/lightningd.sqlite3" and manually execute the SELECT at least?
03:51 < vasild> SELECT id, timestamp, status,...bolt11 FROM temp_payments;
03:52 <@cdecker> The hsm_secret is not written to after the initial creation x7268, so unless you have a disk issue it should be ok
03:52 < x7268> It returns "no such table"
03:52 < vasild> "database disk image is malformed" does not look nice, I guess it is returned by sqlite itself
03:52 < x7268> vasild: my guess as well. Unclear to me how.
03:52 <@cdecker> However database disk image sounds like disk failure
03:53 <@cdecker> Try `sqlite3 /path/to/lightningd.sqlite3 .schema` to see if the schema is intact
03:53 < x7268> cdecker: it's a DigitalOcean node; I could go and run the xfs checks, but since I can open the sqlite and list the tables, it sounds a bit less likely
03:54 <@cdecker> Also `dmesg` should tell you if there are I/O errors
03:54 < x7268> Running the .schema, it returns without error (long paste, avoiding it)
03:54 <@cdecker> Hm, ok. Make a copy of the DB just in case of further corruption
03:54 < vasild> yes, make a copy and try to execute the whole statement manually: insert...select...whatever;
03:55 < x7268> dmesg seems fine - no errors logged
03:55 <@cdecker> Depends on which parts of the file may be corrupted, reading the schema doesn't mean the entire file is intact
03:55 < vasild> maybe it only bricks due to the insert
03:55 < x7268> cdecker: that's right
03:55 < x7268> The .tables does show there is no temp_payments
03:57 < x7268> Is there a risk in me manually creating the temp_payments table?
03:57 < vasild> sqlite> pragma integrity_check;
03:57 < vasild> https://www.sqlite.org/pragma.html#pragma_integrity_check
03:58 < x7268> vasild: thx, details:
03:58 < x7268> sqlite> pragma integrity_check;
03:58 < x7268> *** in database main ***
03:58 < x7268> Main freelist: 7 of 32741 pages missing from overflow list starting at 139211
03:58 < x7268> On tree page 173664 cell 0: 2nd reference to page 149069
03:58 < x7268> row 978792 missing from index sqlite_autoindex_payments_1
03:58 < x7268> row 978793 missing from index sqlite_autoindex_payments_1
03:58 < x7268> row 978794 missing from index sqlite_autoindex_payments_1
03:58 < x7268> row 978795 missing from index sqlite_autoindex_payments_1
03:58 < x7268> wrong # of entries in index sqlite_autoindex_payments_1
03:58 < x7268> Error: database disk image is malformed
03:58 < x7268> mhm.
03:58 < vasild> :/
03:58 < x7268> there is the sqlite3-journal, of course, but I don't know if that contains any useful info
04:00 < x7268> I could export the sqlite3 into a .sql file and restore from there. But I am afraid stale channel state would lead to loss of funds.
04:01 < vasild> can you export everything without errors?
04:02 < x7268> .output yolo.db
04:02 < x7268> ;-)
04:03 < vasild> Also, does the entire statement insert...select..., when executed manually, result in "malformed disk image" or "table not found temp_payments"?
04:03 < x7268> vasild: just a sec, dumping to file now
04:04 < x7268> but, judging from the error message I had, the select could not possibly go through as the table is missing, and hence it must be malformed db that already prevents execution of the SELECT
04:05 < x7268> vasild: OK - dump went through without errors. Very surprised.
04:05 < x7268> I have some hope this is only the indices.
04:07 < x7268> vasild: OK, ran the command manually. It just says "no such table: temp_payments"
04:08 < x7268> Trying this by starting the ln node, however, gets me the malformed error again
04:08 < x7268> What happens if I recreate the temp_payments?
04:08 < x7268> In theory, that should be an empty insert, but would it possibly cause the node to broadcast old TX?
04:08 < x7268> In which case, I'd be in favour of restoring via the hsm_secret
04:09 < x7268> Is there a wiki/tutorial how to restore via hsm_secret?
04:13 < x7268> This is a way, apparently, but it lacks some info for me to have confidence what I need to do: https://www.gitmemory.com/issue/ElementsProject/lightning/2459/571367485
04:14 < x7268> Issue https://github.com/ElementsProject/lightning/issues/2459 is a bit better...
04:25 <@cdecker> There is no way to restore channels solely from the `hsm_secret`, only on-chain funds would be automatically returned to your wallet (for the exact same reason backups are dangerous in the LN protocol)
04:36 < x7268> cdecker: I understand that. If I am not mistaken, the channels should be automatically closed after some timeout if my node remains offline. they should go on-chain, after that, and i should be able to get them via hsm_secret, correct?
04:37 < x7268> In which case, I'd try to go the way via hsm_secret. It seems dangerous to try and restart the node on what might be outdated channel info and commitments in the sqlite3 db.
04:39 <@cdecker> No, there is a channel-dependent secret that is being mixed into the on-chain outputs of a unilateral close (per-commitment secret) so your funds will not just return without the DB
04:39 < x7268> oh, ouch.
04:40 < x7268> Seems I have to take a leap of faith here. What's the way forward? Or are you saying the funds are definitely lost?
04:42 < x7268> cdecker: may I ask whether https://github.com/ElementsProject/lightning/issues/1156#issuecomment-370280072 is wrong info, then?
04:42 <@cdecker> No, you should be fine. If the corruption happened during a write then it should have killed c-lightning and therefore not update was committed to
04:43 < x7268> O-K. So just create an empty table and restart, or should I try the route mentioned in the link to issue 1156?
04:43 <@cdecker> Well, that assumes that you have backup, doesn't it?
04:44 <@cdecker> Not an expert on recovery, but I know people have succesfully recovered just by extracting the per-commitment points and a cli tool in `devtools`
04:52 < x7268> There is no other backup - the only thing I have is the successful dump of the DB.
04:54 < x7268> cdecker: I am not sure I am following entirely - are you saying I need the per-commitment points, or should I be fine with just the hsm_secret? The finer points of the LN protocols are hard to grasp...
04:56 <@cdecker> No, you definitely need the per-commitment points
04:58 < x7268> OK. Just to aid my understanding - if I go ahead and restore the dump, what would happen? Based on the assumption that only the indices are broken, the node might come up OK, right? If I recreate an empty temp_payments table, however, what would happen then?
04:59 < x7268> Looking at my dump, which is pure SQL, it seems very complete.
04:59 <@cdecker> Ok, so if you have a dump from the DB you should be able to close the channels, the important thing is not to start c-lightning on two systems in parallel, so make sure that the one on DO is definitely offline (I suggest renaming `hsm_secret` so any accidental start would not result in a conflict)
05:01 < x7268> You are advocating setting up a new node and replacing the lightningd.sqlite3 and the hsm_secret?
05:01 <@cdecker> Then copy the hsm_secret and dump over, then load it into the DB, then start c-lightning on this backup (start with `--offline` so it doesn't attempt to reconnect)
05:01 <@cdecker> Yes, I can't guarantee that you'll get back all the funds, but you should get a majority back
05:02 < x7268> Ok. Let me recap this: 1) set up new, clean node. Don't go online. 2) copy hsm_secret and dump over, load into DB via sqlite3. 3) start this node. 4) hope?
05:02 <@cdecker> You can then go through the list of peers and manually connect  one by one to see how they react
05:03 <@cdecker> That should then either re-establish the channel correctly, or the fields in the DB are corrupt and we'd be getting an error and the connection is closed
05:03 < x7268> Makes sense.
05:04 < x7268> the important thing seems to be to keep the new node offline until everything is restored/loaded, then manually reconnect to peers to see if we can kick-start the channels. Is that right?
05:04 <@cdecker> For the channels that result in an error we need to do some extra work (and I need to read into the `mkcommit` tool)
05:04 <@cdecker> Yep, since you are using the latest version of the data you should not get punished by your peers
05:05 < x7268> Amen to that. I hope I do.
05:05 <@cdecker> The worst that can happen is that fields were corrupt and then we simply can't sign anything, which is better than signing an old state
05:05 <@cdecker> (because that'd then result in punishments)
05:06 < x7268> cdecker: Thanks a million (also contacted you on DM on different topic)
05:06 < x7268> I will try this.
05:06 <@cdecker> Keep us posted, a quick debrief can help us better address things in the future :-)
05:07 < x7268> Will do.
05:07 < x7268> It's a relevant engineering problem, I think. I should have used psql in the first place, but the work was split between two people and... well.
05:11 <@cdecker> Hehe, I understand. The postgresql server is also not perfectly safe against disk corruption, but it seems more stable
05:12 <@cdecker> Might be selection bias though, given the extra work to set psql up the people using it tend to be more tech-savvy than the simple default sqlite3 database 
05:13 <@cdecker> (also sqlite3 tends to be used on RPis and cheap SD cards)
05:26 < x7268> I get the tech-savvy part. And bias notwithstanding, but sqlite3 has messed with us before.
05:26 < x7268> Whereas I have psql DBs running since 2008, and no single row ever lost.
05:26 < x7268> I think the psql people emphasize stability a lot
05:26 -!- lxer [~lx@ip5f5bf497.dynamic.kabel-deutschland.de] has joined #c-lightning
05:30 < darosior> rusty: "darosior: [...] there may be others, but this was a clear one..." -> I'm not sure to understand ?.. Do you mean I introduced this ?
05:30 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has quit [Ping timeout: 245 seconds]
05:31 < darosior> Ah ok, with dynamic plugins ?..
05:39 -!- Kostenko [~Kostenko@2001:8a0:7293:1200:68f5:2150:4d47:f16f] has quit [Ping timeout: 245 seconds]
05:55 -!- Kostenko [~Kostenko@2001:8a0:7285:1700:64fb:e158:71e9:c350] has joined #c-lightning
06:40 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Remote host closed the connection]
06:40 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning
06:56 -!- Amperture [~amp@65.79.129.113] has joined #c-lightning
07:14 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning
07:14 -!- mdunnio [~mdunnio@208.59.170.5] has joined #c-lightning
07:24 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning []
07:24 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning
07:37 -!- spaced0ut [~spaced0ut@unaffiliated/spaced0ut] has joined #c-lightning
07:43 < vasild> x7268: Did you confirm it is a hardware failure? dmesg shows some errors? fsck on the filesystem? Also the fact that manually running the insert gives one error and running it from within lightningd gives another error is puzzling.
08:22 < x7268> vasild: dmesg shows no errors. fsck gives all-clean.
08:22 < x7268> I don't think it is a hardware failure.
08:23 < vasild> sqlite bug seems unlikely to me either
08:23 < vasild> or OS/filesystem bug, unlikely
08:26 < x7268> I agree.
08:26 < x7268> I *think* only the indices are corrupt.
08:26 < x7268> ALthough in my understanding, the table temp_payments should exist, too.
08:26 < x7268> It's a puzzler.
08:27 < x7268> Then again, the filesystem is ext4, and that's the only FS where I had data loss in 90% of all installations. Literally, not figuratively. I avoid it personally at all costs, except it's preinstalled in an image.
08:27 < x7268> I am old-school and use XFS when I can.
08:30 -!- x7268 [~Funky0x@2a02:8070:a9e:8200:79c4:aed5:45fb:f3cf] has quit [Quit: Leaving...]
08:56 <@cdecker> Oh, I missed him. `temp_payments` is a table that is only there for the sake of migration, the `payments` table gets renamed, the new table gets created and then we re-fill it with the original data
08:56 <@cdecker> I'lll tell him next time I see him ^^
08:58 -!- mdunnio [~mdunnio@208.59.170.5] has quit [Remote host closed the connection]
09:49 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 265 seconds]
09:58 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning
10:18 -!- ghost43_ [~daer@gateway/tor-sasl/daer] has joined #c-lightning
10:19 -!- ghost43 [~daer@gateway/tor-sasl/daer] has quit [Ping timeout: 240 seconds]
10:47 -!- mdunnio [~mdunnio@205.178.40.206] has joined #c-lightning
10:55 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has joined #c-lightning
11:01 -!- mdunnio [~mdunnio@205.178.40.206] has quit [Read error: Connection reset by peer]
11:03 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]
11:04 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning
11:25 -!- ghost43_ [~daer@gateway/tor-sasl/daer] has quit [Remote host closed the connection]
11:26 -!- ghost43 [~daer@gateway/tor-sasl/daer] has joined #c-lightning
11:40 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has joined #c-lightning
11:44 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]
12:18 -!- Victor_sueca [~Victorsue@unaffiliated/victorsueca] has joined #c-lightning
12:18 -!- Victorsueca [~Victorsue@unaffiliated/victorsueca] has quit [Ping timeout: 272 seconds]
14:32 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
14:37 -!- mode/#c-lightning [+o rusty] by ChanServ
14:37 -!- rusty changed the topic of #c-lightning to: Chat about the C-lightning implementation: https://github.com/ElementsProject/lightning https://lists.ozlabs.org/listinfo/c-lightning  Current Version: https://github.com/ElementsProject/lightning/releases/tag/v0.8.1rc3 Logs at http://gnusha.org/c-lightning/
14:38 -!- mode/#c-lightning [-o rusty] by rusty
15:13 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Ping timeout: 260 seconds]
15:23 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
15:31 -!- lxer [~lx@ip5f5bf497.dynamic.kabel-deutschland.de] has quit [Ping timeout: 260 seconds]
15:33 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning []
15:33 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning
15:57 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Ping timeout: 265 seconds]
16:06 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
16:06 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Client Quit]
16:33 -!- mdunnio [~mdunnio@208.59.170.5] has joined #c-lightning
16:38 -!- mdunnio [~mdunnio@208.59.170.5] has quit [Ping timeout: 265 seconds]
16:55 -!- Victorsueca [~Victorsue@unaffiliated/victorsueca] has joined #c-lightning
16:58 -!- achow101_ [~achow101@unaffiliated/achow101] has joined #c-lightning
16:58 -!- rotarydialer_ [~rotarydia@unaffiliated/rotarydialer] has joined #c-lightning
16:59 -!- Victor_sueca [~Victorsue@unaffiliated/victorsueca] has quit [Read error: Connection reset by peer]
16:59 -!- achow101 [~achow101@unaffiliated/achow101] has quit [Remote host closed the connection]
16:59 -!- rotarydialer [~rotarydia@unaffiliated/rotarydialer] has quit [Ping timeout: 272 seconds]
16:59 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 272 seconds]
17:02 -!- dr-orlovsky [~dr-orlovs@ip216.ip-54-36-238.eu] has quit [Ping timeout: 262 seconds]
17:04 -!- dr-orlovsky [~dr-orlovs@ip216.ip-54-36-238.eu] has joined #c-lightning
17:09 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Remote host closed the connection]
17:10 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning
17:12 -!- darosior [~darosior@194.36.189.246] has quit [Quit: Ping timeout (120 seconds)]
17:20 -!- rh0nj [~rh0nj@88.99.167.175] has quit [Remote host closed the connection]
17:21 -!- rh0nj [~rh0nj@88.99.167.175] has joined #c-lightning
17:56 -!- harding [~quassel@li1228-87.members.linode.com] has quit [Quit: tired of arguing]
19:27 -!- queip [~queip@unaffiliated/rezurus] has quit [Read error: Connection reset by peer]
19:30 -!- queip [~queip@unaffiliated/rezurus] has joined #c-lightning
19:58 -!- Amperture [~amp@65.79.129.113] has quit [Remote host closed the connection]
21:28 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 260 seconds]
22:14 -!- achow101_ is now known as achow101
22:50 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning
23:38 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning
23:43 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]
23:45 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning
23:49 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning []
23:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning
--- Log closed Sat Feb 15 00:00:39 2020