--- Log opened Wed Apr 14 00:00:25 2021 00:05 -!- csknk [~csknk@unaffiliated/csknk] has joined #c-lightning 00:11 -!- csknk [~csknk@unaffiliated/csknk] has quit [Quit: leaving] 02:17 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning 02:50 <@cdecker> It means the channel that is reporting the error has insufficient balance to perform the payment. The solution is to find another route (which `pay` should do automatically) or have the operators of the channel endpoints rebalance. 02:55 -!- vincenzopalazzo [~vincenzop@host-80-181-199-140.pool80181.interbusiness.it] has joined #c-lightning 03:03 -!- Teoti [~teoti@216.154.58.117] has joined #c-lightning 03:51 < openoms> Has anyone looked into building docker images for aarch64 (ARM)? Would like to use them with https://github.com/bottlepay/lightning-benchmark 04:10 < darosior> openoms: iirc we have one ? 04:10 < darosior> openoms: Yeah, there: https://github.com/ElementsProject/lightning/blob/master/contrib/linuxarm64v8.Dockerfile 04:12 < openoms> ahh cool, it is not on dockerhub yet right? 04:12 < openoms> no problem I can build myself 04:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning [] 04:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning 05:03 <@cdecker> It's mostly used to run tests on alternative supported archs, hence we did not publish it 05:14 -!- HelloShitty [~psysc0rpi@bl20-171-222.dsl.telepac.pt] has quit [Ping timeout: 252 seconds] 05:20 < rny> openoms: i have a docker build that's slimmer 05:46 -!- jasan [~j@n.bublina.eu.org] has quit [Quit: jasan] 06:29 -!- belcher_ is now known as belcher 07:57 < vincenzopalazzo> Hello cdecker, relater to this comment https://github.com/ElementsProject/lightning/issues/4425#issuecomment-818851499, do you mean that could be a solution pass a flag to the command to query the db and get the info about the closed channels? 08:10 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds] 08:10 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning 08:11 <@cdecker> Yep, that might be a solution, though we need to look at what we are currently using to back the query (DB or in-memory) in order not to duplicate results 08:36 < az0re> Hello all, currently being bitten by the bad commit_sig signature bug again. This time it's only a warning: "CHANNELD_NORMAL: channeld WARNING: Bad commit_sig signature ..." 08:36 < az0re> What can I do to recover the channel? Anything? 08:36 < az0re> The channel is still open but the peer appears offline 08:41 < vincenzopalazzo> cdecker, do you think can be a good first issue or it is too more complex that "Add just a flag and make the correct operation with the database"? 08:42 < vincenzopalazzo> ps: Sorry the delay in the answer 08:42 <@cdecker> Not sure how much work is involved, but it does give a nice tour through the code 08:42 <@cdecker> az0re: any idea what the combination of implementations / versions is that are involved in this case? 08:43 <@cdecker> fwiw, force closing will recover the funds, but let's see if we can get them to agree and have a graceful close first 08:43 < az0re> There's no way to avoid a close and just ask them to re-sign the commitment tx? 08:44 <@cdecker> The peer being disconnected in this case is a result of us (or them) getting upset and disconnecting 08:44 <@cdecker> az0re: reconnecting can work, but if we disagree on a commitment tx we might end up just recreating the same failing signature over and over again 08:44 < az0re> I don't understand the channel state machine. Is this recoverable? 08:45 < az0re> Hmmm 08:45 < az0re> Can we reconnect while forwarding a 1msat HTLC to get them to sign a new commitment tx? 08:46 < az0re> This is an improvement in behavior, at least--my channel didn't get force closed on me yet :) 08:46 <@cdecker> Hm, that is unlikely to do much good: we first confirm the prior state, resending commits and changes since that commit, and only then make progress 08:46 < az0re> Any way to hack it to ignore the signature status? 08:46 <@cdecker> This is because we need to have a common notion of the state of the channel before we can continue using it 08:47 < az0re> So I can pretend everything is fine, then forward an HTLC, shut down the node, restart without the hack 08:47 <@cdecker> No, that'd be a security nightmare 08:47 < az0re> In general yes 08:47 < az0re> I mean just for right now to recover this channel 08:47 < az0re> I would rather put my whole channel funds at risk than take yet another 25k sat hit 08:48 < az0re> I've already spent well over 100k sat on channel bugs like this 08:48 <@cdecker> If you disagree on the commitment transaction then your peer will get upset and just do the same with you. So no unilateral change can fix this 08:48 < az0re> So there are two issues here: 08:48 < az0re> 1) Disagreement on the channel state 08:48 < az0re> 2) bad signature 08:48 < az0re> as I understand, the problem my channel has is problem 2, not 1 08:48 < az0re> Is that wrong? 08:48 <@cdecker> 1 is likely the root cause for 2, hence really one issue here 08:49 < az0re> I see, so it's not a bad commit signature, it's a bad commit tx 08:49 <@cdecker> The signature signs the commitment tx, which represents the current state. Either the signer code is wrong (unlikely) or the state we or they are signing is not the same 08:49 < az0re> Can I quickly view the respective channel states we disagree on? 08:50 <@cdecker> Well, we can't really compare the state / committx, so we report the symptom we can witness 08:50 < az0re> And how difficult would it be to simply ignore our state and use theirs? 08:50 <@cdecker> Sadly no, the damn protocol is optimized with differential changes being exchanged, there is no way to retrieve the full state over the wire atm. 08:51 < az0re> So I am pretty sure this is a C-Lightning bug, BTW 08:51 <@cdecker> What gives? 08:51 < az0re> It's happened to me on multiple channels with different entities 08:51 < az0re> And on multiple physical machines 08:51 <@cdecker> Which version? 08:51 < az0re> The common variable is... C-Lightning 08:52 < az0re> Everything from very old to nearly-current-git-HEAD 08:52 <@cdecker> Well, it is likely that the other side was also always the same implementation :-) 08:52 < az0re> Could be, but unlikely 08:52 <@cdecker> It might also be a spec disagreement, which is way harder to attribute the error to either side :-) 08:53 < az0re> Right 08:53 < az0re> And how difficult would it be to simply ignore our state and use theirs? 08:53 < az0re> YOLO 08:54 <@cdecker> Very hard, it requires us to manually change the DB and go fish in the code 08:54 < az0re> Happy to go fishing 08:54 < az0re> Not happy to burn another 25k sat 08:55 < az0re> I've built up a lot of infrastructure around C-Lightning but frankly it will be cheaper for me to wait for a low-fee period, close all my channels, blow away my node and start over with LND, which doesn't have this problem 08:56 <@cdecker> I'd rather fix the issue than spend a month trying to build a workaround (we are trying to reproduce this btw) 08:56 < az0re> Would it really be a month? 08:56 < az0re> All I want is a quick shim I can put in/take out to ignore our state and sign whatever commit tx they present 08:58 < az0re> > (we are trying to reproduce this btw) 08:59 < az0re> I'm happy to help however I can but honestly I don't see a clear path to resolution, or much effort being put into fixing this 08:59 <@cdecker> Well, the thing is that the commitment engine is the very basis of the entire project, and just finding the correct place to inject the state is hard, not to speak of what that state should look like, since the remote end won't tell us what it is expecting 09:00 < az0re> OK, where does my node actually sign a commit tx? let's start there 09:01 < az0re> > since the remote end won't tell us what it is expecting 09:01 < az0re> Maybe I misunderstand 09:01 < az0re> I guess we are not passing PSBTs? 09:01 < az0re> We just have to both "know" what the state already is and sign the tx blindly, just throwing the signature over the wall? 09:01 <@cdecker> The first step to fix this would be to reproduce the issue in a clean setting, so we can see where things are going wrong. Since you have hit it a couple of times, can you see some commonalities between the cases? What implementation is the other side running, which version, etc (sometimes that info can be gathered) 09:02 <@cdecker> No, the protocol doesn't pass PSBTs around 09:02 < az0re> I don't have that information about channel peers 09:02 <@cdecker> We pass operations around that are then reflected against the state 09:03 <@cdecker> Yeah, that's the main problem we have with reproducing the issue, we don't have good info about when the issue happens and with what setup 09:03 < az0re> So if my peer and I disagree about some state transformation, I have no idea how my peer disagreed, I only have a signature signing some tx I don't agree with 09:03 <@cdecker> Yep... I was against this design, but people wanted efficiency over debuggability... 09:03 < az0re> Then frankly sigging in the db is unavoidable to fix this bug 09:04 < az0re> digging* 09:05 < az0re> The way to go is to find S', my version of current channel state, and reconstruct S, the state before the most recent update_commit attempt 09:05 < az0re> Then bang our heads against the wall until we figure out why that transition gave a bug 09:05 < az0re> Compare to behavior in LND, especially 09:06 <@cdecker> Yep, guess why https://github.com/ElementsProject/lightning/issues/4152 is marked as a compat issue 09:06 < az0re> > reproduce the issue in a clean setting 09:06 <@cdecker> (which was one of your prior instances) 09:06 < az0re> Very unlikely to happen 09:07 <@cdecker> My best guess is that we somehow fail to compute the same feerate, so the issue happens only happens with large-ish feerates and between implementations 09:07 < az0re> I guess the bug happens on some HTLC forward, and there is no way I can get a "clean" environment forwarding enough HTLC traffic to expose this bug 09:08 <@cdecker> That's not a huge issue, since we can pump hundreds of HTLCs through a test network 09:30 < az0re> So is there a plan to deal with this bug? 09:30 < az0re> Or is it just going to rot in the issue tracker for another 6 months? 09:32 < az0re> I'm trying to determine if I should just abandon C-Lightning as broken and move fully to LND, or just stop adding liquidity to this node and making a new LND node my primary 09:45 < darosior> az0re: are you using the feeadjuster pluginby any chance? 09:45 <@cdecker> Well, we are doing our best to resolve this issue, but if you don't feel like it's worth waiting I won't try to persuade you not to switch impls. I personally think that depending on where the issue lies it might end up exacerbating the issue if everybody but a few people switch to one impl, since if that impl is at fault it forces everyone to 09:45 <@cdecker> implement that error that way (see internet explorer forcing all browsers to add a quirksmode because all webmasters built for IE...) 09:46 <@cdecker> darosior: would that have any impact on the commitment tx? iirc that's just triggering channel_updates isn't it? 09:46 < darosior> grass is always greener on the other side 09:47 < darosior> Right, i was trying to think if there could be something with the HTLC amounts 09:47 < az0re> darosior: Nope, mostly static fees, occasionally tweaked manually 09:47 < darosior> Maybe in concordance with the feerate 09:48 <@cdecker> Nah, at the time we commit we already disambiguated whether we wanted the HTLC or not 09:48 < az0re> I totally agree WRT centralization of implementations 09:48 < az0re> I *want* to use C-Lightning 09:48 < az0re> But I keep getting burned by this bug, which is really quite a bit more serious than it sounds 09:49 <@cdecker> Maybe a dumb question, but do you get bad sigs when trying to reconnect? Could have just been a transient thing (should have asked that hours ago...) 09:49 < az0re> Yes, it keeps trying to reconnect and fails again 09:49 <@cdecker> Ok, it's not a feerate update message going missing then, damn :-( 09:50 < darosior> Will grep my node's logs for bad commit sigs if it can help 09:50 < az0re> I *think* I can get in contact with the peer operator 09:50 < az0re> It will take me some time to remember how 09:54 < az0re> If anyone has pointers on how to recover the relevant channel states S and S' from the DB, please let me know ASAP 10:49 < vincenzopalazzo> cdecker, yeah I think it a good point to restart from were I was stopped with the delpay command. I will check it. Thanks 11:38 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Quit: %bye%] 11:38 -!- cryptoso- [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning 12:03 -!- HelloShitty [~psysc0rpi@bl20-171-222.dsl.telepac.pt] has joined #c-lightning 12:58 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning [] 12:58 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning 13:12 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning 13:46 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has quit [Ping timeout: 252 seconds] 13:56 -!- jasan [~j@n.bublina.eu.org] has quit [Quit: jasan] 13:56 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning 13:57 -!- jasan [~j@n.bublina.eu.org] has quit [Client Quit] 14:02 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has joined #c-lightning 14:02 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning 14:03 -!- jasan [~j@n.bublina.eu.org] has quit [Client Quit] 14:29 -!- vincenzopalazzo [~vincenzop@host-80-181-199-140.pool80181.interbusiness.it] has quit [Ping timeout: 240 seconds] 14:30 -!- vincenzopalazzo [~vincenzop@host-87-10-115-59.retail.telecomitalia.it] has joined #c-lightning 14:52 -!- jonasschnelli [~jonasschn@unaffiliated/jonasschnelli] has quit [Remote host closed the connection] 14:54 -!- jonasschnelli [~jonasschn@unaffiliated/jonasschnelli] has joined #c-lightning 14:54 -!- mrostecki [mrostecki@nat/suse/x-uikgeaxofcyrvhsq] has quit [Remote host closed the connection] 14:55 -!- mrostecki [mrostecki@nat/suse/x-dsxgooamcbgldcek] has joined #c-lightning 15:01 -!- grubles_ [~user@gateway/tor-sasl/grubles] has joined #c-lightning 15:01 -!- grubles [~user@gateway/tor-sasl/grubles] has quit [Ping timeout: 240 seconds] 15:44 -!- vincenzopalazzo [~vincenzop@host-87-10-115-59.retail.telecomitalia.it] has quit [Ping timeout: 246 seconds] 15:45 -!- vincenzopalazzo [~vincenzop@host-79-18-34-139.retail.telecomitalia.it] has joined #c-lightning 15:48 -!- harrigan [~harrigan@ptr-93-89-242-235.ip.airwire.ie] has quit [Read error: Connection reset by peer] 15:50 -!- harrigan [~harrigan@ptr-93-89-242-235.ip.airwire.ie] has joined #c-lightning 16:28 -!- spinza [~spin@102.132.245.16] has quit [Quit: Coyote finally caught up with me...] 16:45 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has quit [Ping timeout: 240 seconds] 16:47 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has joined #c-lightning 16:51 -!- spinza [~spin@102.132.245.16] has joined #c-lightning 17:00 -!- k3tan [~pi@gateway/tor-sasl/k3tan] has quit [Ping timeout: 240 seconds] 17:40 -!- az0re [~az0re@gateway/tor-sasl/az0re] has quit [Remote host closed the connection] 17:54 -!- k3tan [~pi@gateway/tor-sasl/k3tan] has joined #c-lightning 18:40 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning 18:45 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 240 seconds] 18:57 -!- cdecker [~cdecker@mail.snyke.net] has quit [Ping timeout: 246 seconds] 18:58 -!- snyke [~cdecker@mail.snyke.net] has joined #c-lightning 18:58 -!- mode/#c-lightning [+o snyke] by ChanServ 19:01 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning 19:32 -!- vincenzopalazzo [~vincenzop@host-79-18-34-139.retail.telecomitalia.it] has quit [Remote host closed the connection] 20:06 -!- grubles_ is now known as grubles 20:26 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 252 seconds] 20:36 -!- Teoti [~teoti@216.154.58.117] has quit [Ping timeout: 240 seconds] 20:40 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning 21:06 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning [] 21:06 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning 21:13 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 240 seconds] 21:15 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning 21:23 -!- gleb [~gleb@178.150.137.228] has quit [Ping timeout: 252 seconds] 23:51 -!- gleb [~gleb@178.150.137.228] has joined #c-lightning --- Log closed Thu Apr 15 00:00:26 2021