--- Log opened Fri Feb 14 00:00:40 2020 02:06 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 240 seconds] 03:42 -!- x7268 [~Funky0x@2a02:8070:a9e:8200:79c4:aed5:45fb:f3cf] has joined #c-lightning 03:44 < x7268> Hi everyone 03:44 < x7268> I am trying to restart my Lightning Node, which was shut down cleanly. 03:44 < x7268> But I get "Error executing statement: wallet/db.c:926: INSERT INTO payments (id, timestamp, status, payment_hash, destination, msatoshi, payment_preimage, path_secrets, route_nodes, route_channels, failonionreply, faildestperm, failindex, failcode, failnode, failchannel, failupdate, msatoshi_sent, faildetail, description, faildirection, bolt11)SELECT id, timestamp, status, payment_hash, destination, msatoshi, payment_preim 03:44 < x7268> age, path_secrets, route_nodes, route_channels, failonionreply, faildestperm, failindex, failcode, failnode, failchannel, failupdate, msatoshi_sent, faildetail, description, faildirection, bolt11 FROM temp_payments;: database disk image is malformed" 03:45 < x7268> I am aware of the risks fiddling with channel backups (didn't take any), and I hope my hsm_secret is intact. 03:45 < x7268> How do I recover from this? 03:48 < x7268> Reading the source, I see this is in db_get_version(struct db *db), which is an attempt to detect the schema. It's unclear to me why this fails. 03:50 < vasild> x7268: can you run "sqlite3 ~/.lightning/bitcoin/lightningd.sqlite3" and manually execute the SELECT at least? 03:51 < vasild> SELECT id, timestamp, status,...bolt11 FROM temp_payments; 03:52 <@cdecker> The hsm_secret is not written to after the initial creation x7268, so unless you have a disk issue it should be ok 03:52 < x7268> It returns "no such table" 03:52 < vasild> "database disk image is malformed" does not look nice, I guess it is returned by sqlite itself 03:52 < x7268> vasild: my guess as well. Unclear to me how. 03:52 <@cdecker> However database disk image sounds like disk failure 03:53 <@cdecker> Try `sqlite3 /path/to/lightningd.sqlite3 .schema` to see if the schema is intact 03:53 < x7268> cdecker: it's a DigitalOcean node; I could go and run the xfs checks, but since I can open the sqlite and list the tables, it sounds a bit less likely 03:54 <@cdecker> Also `dmesg` should tell you if there are I/O errors 03:54 < x7268> Running the .schema, it returns without error (long paste, avoiding it) 03:54 <@cdecker> Hm, ok. Make a copy of the DB just in case of further corruption 03:54 < vasild> yes, make a copy and try to execute the whole statement manually: insert...select...whatever; 03:55 < x7268> dmesg seems fine - no errors logged 03:55 <@cdecker> Depends on which parts of the file may be corrupted, reading the schema doesn't mean the entire file is intact 03:55 < vasild> maybe it only bricks due to the insert 03:55 < x7268> cdecker: that's right 03:55 < x7268> The .tables does show there is no temp_payments 03:57 < x7268> Is there a risk in me manually creating the temp_payments table? 03:57 < vasild> sqlite> pragma integrity_check; 03:57 < vasild> https://www.sqlite.org/pragma.html#pragma_integrity_check 03:58 < x7268> vasild: thx, details: 03:58 < x7268> sqlite> pragma integrity_check; 03:58 < x7268> *** in database main *** 03:58 < x7268> Main freelist: 7 of 32741 pages missing from overflow list starting at 139211 03:58 < x7268> On tree page 173664 cell 0: 2nd reference to page 149069 03:58 < x7268> row 978792 missing from index sqlite_autoindex_payments_1 03:58 < x7268> row 978793 missing from index sqlite_autoindex_payments_1 03:58 < x7268> row 978794 missing from index sqlite_autoindex_payments_1 03:58 < x7268> row 978795 missing from index sqlite_autoindex_payments_1 03:58 < x7268> wrong # of entries in index sqlite_autoindex_payments_1 03:58 < x7268> Error: database disk image is malformed 03:58 < x7268> mhm. 03:58 < vasild> :/ 03:58 < x7268> there is the sqlite3-journal, of course, but I don't know if that contains any useful info 04:00 < x7268> I could export the sqlite3 into a .sql file and restore from there. But I am afraid stale channel state would lead to loss of funds. 04:01 < vasild> can you export everything without errors? 04:02 < x7268> .output yolo.db 04:02 < x7268> ;-) 04:03 < vasild> Also, does the entire statement insert...select..., when executed manually, result in "malformed disk image" or "table not found temp_payments"? 04:03 < x7268> vasild: just a sec, dumping to file now 04:04 < x7268> but, judging from the error message I had, the select could not possibly go through as the table is missing, and hence it must be malformed db that already prevents execution of the SELECT 04:05 < x7268> vasild: OK - dump went through without errors. Very surprised. 04:05 < x7268> I have some hope this is only the indices. 04:07 < x7268> vasild: OK, ran the command manually. It just says "no such table: temp_payments" 04:08 < x7268> Trying this by starting the ln node, however, gets me the malformed error again 04:08 < x7268> What happens if I recreate the temp_payments? 04:08 < x7268> In theory, that should be an empty insert, but would it possibly cause the node to broadcast old TX? 04:08 < x7268> In which case, I'd be in favour of restoring via the hsm_secret 04:09 < x7268> Is there a wiki/tutorial how to restore via hsm_secret? 04:13 < x7268> This is a way, apparently, but it lacks some info for me to have confidence what I need to do: https://www.gitmemory.com/issue/ElementsProject/lightning/2459/571367485 04:14 < x7268> Issue https://github.com/ElementsProject/lightning/issues/2459 is a bit better... 04:25 <@cdecker> There is no way to restore channels solely from the `hsm_secret`, only on-chain funds would be automatically returned to your wallet (for the exact same reason backups are dangerous in the LN protocol) 04:36 < x7268> cdecker: I understand that. If I am not mistaken, the channels should be automatically closed after some timeout if my node remains offline. they should go on-chain, after that, and i should be able to get them via hsm_secret, correct? 04:37 < x7268> In which case, I'd try to go the way via hsm_secret. It seems dangerous to try and restart the node on what might be outdated channel info and commitments in the sqlite3 db. 04:39 <@cdecker> No, there is a channel-dependent secret that is being mixed into the on-chain outputs of a unilateral close (per-commitment secret) so your funds will not just return without the DB 04:39 < x7268> oh, ouch. 04:40 < x7268> Seems I have to take a leap of faith here. What's the way forward? Or are you saying the funds are definitely lost? 04:42 < x7268> cdecker: may I ask whether https://github.com/ElementsProject/lightning/issues/1156#issuecomment-370280072 is wrong info, then? 04:42 <@cdecker> No, you should be fine. If the corruption happened during a write then it should have killed c-lightning and therefore not update was committed to 04:43 < x7268> O-K. So just create an empty table and restart, or should I try the route mentioned in the link to issue 1156? 04:43 <@cdecker> Well, that assumes that you have backup, doesn't it? 04:44 <@cdecker> Not an expert on recovery, but I know people have succesfully recovered just by extracting the per-commitment points and a cli tool in `devtools` 04:52 < x7268> There is no other backup - the only thing I have is the successful dump of the DB. 04:54 < x7268> cdecker: I am not sure I am following entirely - are you saying I need the per-commitment points, or should I be fine with just the hsm_secret? The finer points of the LN protocols are hard to grasp... 04:56 <@cdecker> No, you definitely need the per-commitment points 04:58 < x7268> OK. Just to aid my understanding - if I go ahead and restore the dump, what would happen? Based on the assumption that only the indices are broken, the node might come up OK, right? If I recreate an empty temp_payments table, however, what would happen then? 04:59 < x7268> Looking at my dump, which is pure SQL, it seems very complete. 04:59 <@cdecker> Ok, so if you have a dump from the DB you should be able to close the channels, the important thing is not to start c-lightning on two systems in parallel, so make sure that the one on DO is definitely offline (I suggest renaming `hsm_secret` so any accidental start would not result in a conflict) 05:01 < x7268> You are advocating setting up a new node and replacing the lightningd.sqlite3 and the hsm_secret? 05:01 <@cdecker> Then copy the hsm_secret and dump over, then load it into the DB, then start c-lightning on this backup (start with `--offline` so it doesn't attempt to reconnect) 05:01 <@cdecker> Yes, I can't guarantee that you'll get back all the funds, but you should get a majority back 05:02 < x7268> Ok. Let me recap this: 1) set up new, clean node. Don't go online. 2) copy hsm_secret and dump over, load into DB via sqlite3. 3) start this node. 4) hope? 05:02 <@cdecker> You can then go through the list of peers and manually connect one by one to see how they react 05:03 <@cdecker> That should then either re-establish the channel correctly, or the fields in the DB are corrupt and we'd be getting an error and the connection is closed 05:03 < x7268> Makes sense. 05:04 < x7268> the important thing seems to be to keep the new node offline until everything is restored/loaded, then manually reconnect to peers to see if we can kick-start the channels. Is that right? 05:04 <@cdecker> For the channels that result in an error we need to do some extra work (and I need to read into the `mkcommit` tool) 05:04 <@cdecker> Yep, since you are using the latest version of the data you should not get punished by your peers 05:05 < x7268> Amen to that. I hope I do. 05:05 <@cdecker> The worst that can happen is that fields were corrupt and then we simply can't sign anything, which is better than signing an old state 05:05 <@cdecker> (because that'd then result in punishments) 05:06 < x7268> cdecker: Thanks a million (also contacted you on DM on different topic) 05:06 < x7268> I will try this. 05:06 <@cdecker> Keep us posted, a quick debrief can help us better address things in the future :-) 05:07 < x7268> Will do. 05:07 < x7268> It's a relevant engineering problem, I think. I should have used psql in the first place, but the work was split between two people and... well. 05:11 <@cdecker> Hehe, I understand. The postgresql server is also not perfectly safe against disk corruption, but it seems more stable 05:12 <@cdecker> Might be selection bias though, given the extra work to set psql up the people using it tend to be more tech-savvy than the simple default sqlite3 database 05:13 <@cdecker> (also sqlite3 tends to be used on RPis and cheap SD cards) 05:26 < x7268> I get the tech-savvy part. And bias notwithstanding, but sqlite3 has messed with us before. 05:26 < x7268> Whereas I have psql DBs running since 2008, and no single row ever lost. 05:26 < x7268> I think the psql people emphasize stability a lot 05:26 -!- lxer [~lx@ip5f5bf497.dynamic.kabel-deutschland.de] has joined #c-lightning 05:30 < darosior> rusty: "darosior: [...] there may be others, but this was a clear one..." -> I'm not sure to understand ?.. Do you mean I introduced this ? 05:30 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has quit [Ping timeout: 245 seconds] 05:31 < darosior> Ah ok, with dynamic plugins ?.. 05:39 -!- Kostenko [~Kostenko@2001:8a0:7293:1200:68f5:2150:4d47:f16f] has quit [Ping timeout: 245 seconds] 05:55 -!- Kostenko [~Kostenko@2001:8a0:7285:1700:64fb:e158:71e9:c350] has joined #c-lightning 06:40 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Remote host closed the connection] 06:40 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning 06:56 -!- Amperture [~amp@65.79.129.113] has joined #c-lightning 07:14 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning 07:14 -!- mdunnio [~mdunnio@208.59.170.5] has joined #c-lightning 07:24 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning [] 07:24 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning 07:37 -!- spaced0ut [~spaced0ut@unaffiliated/spaced0ut] has joined #c-lightning 07:43 < vasild> x7268: Did you confirm it is a hardware failure? dmesg shows some errors? fsck on the filesystem? Also the fact that manually running the insert gives one error and running it from within lightningd gives another error is puzzling. 08:22 < x7268> vasild: dmesg shows no errors. fsck gives all-clean. 08:22 < x7268> I don't think it is a hardware failure. 08:23 < vasild> sqlite bug seems unlikely to me either 08:23 < vasild> or OS/filesystem bug, unlikely 08:26 < x7268> I agree. 08:26 < x7268> I *think* only the indices are corrupt. 08:26 < x7268> ALthough in my understanding, the table temp_payments should exist, too. 08:26 < x7268> It's a puzzler. 08:27 < x7268> Then again, the filesystem is ext4, and that's the only FS where I had data loss in 90% of all installations. Literally, not figuratively. I avoid it personally at all costs, except it's preinstalled in an image. 08:27 < x7268> I am old-school and use XFS when I can. 08:30 -!- x7268 [~Funky0x@2a02:8070:a9e:8200:79c4:aed5:45fb:f3cf] has quit [Quit: Leaving...] 08:56 <@cdecker> Oh, I missed him. `temp_payments` is a table that is only there for the sake of migration, the `payments` table gets renamed, the new table gets created and then we re-fill it with the original data 08:56 <@cdecker> I'lll tell him next time I see him ^^ 08:58 -!- mdunnio [~mdunnio@208.59.170.5] has quit [Remote host closed the connection] 09:49 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 265 seconds] 09:58 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning 10:18 -!- ghost43_ [~daer@gateway/tor-sasl/daer] has joined #c-lightning 10:19 -!- ghost43 [~daer@gateway/tor-sasl/daer] has quit [Ping timeout: 240 seconds] 10:47 -!- mdunnio [~mdunnio@205.178.40.206] has joined #c-lightning 10:55 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has joined #c-lightning 11:01 -!- mdunnio [~mdunnio@205.178.40.206] has quit [Read error: Connection reset by peer] 11:03 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds] 11:04 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning 11:25 -!- ghost43_ [~daer@gateway/tor-sasl/daer] has quit [Remote host closed the connection] 11:26 -!- ghost43 [~daer@gateway/tor-sasl/daer] has joined #c-lightning 11:40 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has joined #c-lightning 11:44 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds] 12:18 -!- Victor_sueca [~Victorsue@unaffiliated/victorsueca] has joined #c-lightning 12:18 -!- Victorsueca [~Victorsue@unaffiliated/victorsueca] has quit [Ping timeout: 272 seconds] 14:32 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 14:37 -!- mode/#c-lightning [+o rusty] by ChanServ 14:37 -!- rusty changed the topic of #c-lightning to: Chat about the C-lightning implementation: https://github.com/ElementsProject/lightning https://lists.ozlabs.org/listinfo/c-lightning Current Version: https://github.com/ElementsProject/lightning/releases/tag/v0.8.1rc3 Logs at http://gnusha.org/c-lightning/ 14:38 -!- mode/#c-lightning [-o rusty] by rusty 15:13 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Ping timeout: 260 seconds] 15:23 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 15:31 -!- lxer [~lx@ip5f5bf497.dynamic.kabel-deutschland.de] has quit [Ping timeout: 260 seconds] 15:33 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning [] 15:33 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning 15:57 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Ping timeout: 265 seconds] 16:06 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 16:06 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has quit [Client Quit] 16:33 -!- mdunnio [~mdunnio@208.59.170.5] has joined #c-lightning 16:38 -!- mdunnio [~mdunnio@208.59.170.5] has quit [Ping timeout: 265 seconds] 16:55 -!- Victorsueca [~Victorsue@unaffiliated/victorsueca] has joined #c-lightning 16:58 -!- achow101_ [~achow101@unaffiliated/achow101] has joined #c-lightning 16:58 -!- rotarydialer_ [~rotarydia@unaffiliated/rotarydialer] has joined #c-lightning 16:59 -!- Victor_sueca [~Victorsue@unaffiliated/victorsueca] has quit [Read error: Connection reset by peer] 16:59 -!- achow101 [~achow101@unaffiliated/achow101] has quit [Remote host closed the connection] 16:59 -!- rotarydialer [~rotarydia@unaffiliated/rotarydialer] has quit [Ping timeout: 272 seconds] 16:59 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 272 seconds] 17:02 -!- dr-orlovsky [~dr-orlovs@ip216.ip-54-36-238.eu] has quit [Ping timeout: 262 seconds] 17:04 -!- dr-orlovsky [~dr-orlovs@ip216.ip-54-36-238.eu] has joined #c-lightning 17:09 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Remote host closed the connection] 17:10 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning 17:12 -!- darosior [~darosior@194.36.189.246] has quit [Quit: Ping timeout (120 seconds)] 17:20 -!- rh0nj [~rh0nj@88.99.167.175] has quit [Remote host closed the connection] 17:21 -!- rh0nj [~rh0nj@88.99.167.175] has joined #c-lightning 17:56 -!- harding [~quassel@li1228-87.members.linode.com] has quit [Quit: tired of arguing] 19:27 -!- queip [~queip@unaffiliated/rezurus] has quit [Read error: Connection reset by peer] 19:30 -!- queip [~queip@unaffiliated/rezurus] has joined #c-lightning 19:58 -!- Amperture [~amp@65.79.129.113] has quit [Remote host closed the connection] 21:28 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has quit [Ping timeout: 260 seconds] 22:14 -!- achow101_ is now known as achow101 22:50 -!- justanotheruser [~justanoth@unaffiliated/justanotheruser] has joined #c-lightning 23:38 -!- rusty [~rusty@pdpc/supporter/bronze/rusty] has joined #c-lightning 23:43 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds] 23:45 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning 23:49 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has left #c-lightning [] 23:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-zkjjaoqsjfattawp] has joined #c-lightning --- Log closed Sat Feb 15 00:00:39 2020