--- Log opened Wed Apr 14 00:00:25 2021
00:05 -!- csknk [~csknk@unaffiliated/csknk] has joined #c-lightning
00:11 -!- csknk [~csknk@unaffiliated/csknk] has quit [Quit: leaving]
02:17 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning
02:50 <@cdecker> It means the channel that is reporting the error has insufficient balance to perform the payment. The solution is to find another route (which `pay` should do automatically) or have the operators of the channel endpoints rebalance.
02:55 -!- vincenzopalazzo [~vincenzop@host-80-181-199-140.pool80181.interbusiness.it] has joined #c-lightning
03:03 -!- Teoti [~teoti@216.154.58.117] has joined #c-lightning
03:51 < openoms> Has anyone looked into building docker images for aarch64 (ARM)? Would like to use them with https://github.com/bottlepay/lightning-benchmark
04:10 < darosior> openoms: iirc we have one ?
04:10 < darosior> openoms: Yeah, there: https://github.com/ElementsProject/lightning/blob/master/contrib/linuxarm64v8.Dockerfile
04:12 < openoms> ahh cool, it is not on dockerhub yet right?
04:12 < openoms> no problem I can build myself
04:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning []
04:50 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning
05:03 <@cdecker> It's mostly used to run tests on alternative supported archs, hence we did not publish it
05:14 -!- HelloShitty [~psysc0rpi@bl20-171-222.dsl.telepac.pt] has quit [Ping timeout: 252 seconds]
05:20 < rny> openoms: i have a docker build that's slimmer
05:46 -!- jasan [~j@n.bublina.eu.org] has quit [Quit: jasan]
06:29 -!- belcher_ is now known as belcher
07:57 < vincenzopalazzo> Hello cdecker, relater to this comment https://github.com/ElementsProject/lightning/issues/4425#issuecomment-818851499, do you mean that could be a solution pass a flag to the command to query the db and get the info about the closed channels?
08:10 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]
08:10 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #c-lightning
08:11 <@cdecker> Yep, that might be a solution, though we need to look at what we are currently using to back the query (DB or in-memory) in order not to duplicate results
08:36 < az0re> Hello all, currently being bitten by the bad commit_sig signature bug again. This time it's only a warning: "CHANNELD_NORMAL: channeld WARNING: Bad commit_sig signature ..."
08:36 < az0re> What can I do to recover the channel? Anything?
08:36 < az0re> The channel is still open but the peer appears offline
08:41 < vincenzopalazzo> cdecker, do you think can be a good first issue or it is too more complex that "Add just a flag and make the correct operation with the database"?
08:42 < vincenzopalazzo> ps: Sorry the delay in the answer
08:42 <@cdecker> Not sure how much work is involved, but it does give a nice tour through the code
08:42 <@cdecker> az0re: any idea what the combination of implementations / versions is that are involved in this case?
08:43 <@cdecker> fwiw, force closing will recover the funds, but let's see if we can get them to agree and have a graceful close first
08:43 < az0re> There's no way to avoid a close and just ask them to re-sign the commitment tx?
08:44 <@cdecker> The peer being disconnected in this case is a result of us (or them) getting upset and disconnecting
08:44 <@cdecker> az0re: reconnecting can work, but if we disagree on a commitment tx we might end up just recreating the same failing signature over and over again
08:44 < az0re> I don't understand the channel state machine. Is this recoverable?
08:45 < az0re> Hmmm
08:45 < az0re> Can we reconnect while forwarding a 1msat HTLC to get them to sign a new commitment tx?
08:46 < az0re> This is an improvement in behavior, at least--my channel didn't get force closed on me yet :)
08:46 <@cdecker> Hm, that is unlikely to do much good: we first confirm the prior state, resending commits and changes since that commit, and only then make progress
08:46 < az0re> Any way to hack it to ignore the signature status?
08:46 <@cdecker> This is because we need to have a common notion of the state of the channel before we can continue using it
08:47 < az0re> So I can pretend everything is fine, then forward an HTLC, shut down the node, restart without the hack
08:47 <@cdecker> No, that'd be a security nightmare
08:47 < az0re> In general yes
08:47 < az0re> I mean just for right now to recover this channel
08:47 < az0re> I would rather put my whole channel funds at risk than take yet another 25k sat hit
08:48 < az0re> I've already spent well over 100k sat on channel bugs like this
08:48 <@cdecker> If you disagree on the commitment transaction then your peer will get upset and just do the same with you. So no unilateral change can fix this
08:48 < az0re> So there are two issues here:
08:48 < az0re> 1) Disagreement on the channel state
08:48 < az0re> 2) bad signature
08:48 < az0re> as I understand, the problem my channel has is problem 2, not 1
08:48 < az0re> Is that wrong?
08:48 <@cdecker> 1 is likely the root cause for 2, hence really one issue here
08:49 < az0re> I see, so it's not a bad commit signature, it's a bad commit tx
08:49 <@cdecker> The signature signs the commitment tx, which represents the current state. Either the signer code is wrong (unlikely) or the state we or they are signing is not the same
08:49 < az0re> Can I quickly view the respective channel states we disagree on?
08:50 <@cdecker> Well, we can't really compare the state / committx, so we report the symptom we can witness
08:50 < az0re> And how difficult would it be to simply ignore our state and use theirs?
08:50 <@cdecker> Sadly no, the damn protocol is optimized with differential changes being exchanged, there is no way to retrieve the full state over the wire atm.
08:51 < az0re> So I am pretty sure this is a C-Lightning bug, BTW
08:51 <@cdecker> What gives?
08:51 < az0re> It's happened to me on multiple channels with different entities
08:51 < az0re> And on multiple physical machines
08:51 <@cdecker> Which version?
08:51 < az0re> The common variable is... C-Lightning
08:52 < az0re> Everything from very old to nearly-current-git-HEAD
08:52 <@cdecker> Well, it is likely that the other side was also always the same implementation :-)
08:52 < az0re> Could be, but unlikely
08:52 <@cdecker> It might also be a spec disagreement, which is way harder to attribute the error to either side :-)
08:53 < az0re> Right
08:53 < az0re> <az0re> And how difficult would it be to simply ignore our state and use theirs?
08:53 < az0re> YOLO
08:54 <@cdecker> Very hard, it requires us to manually change the DB and go fish in the code
08:54 < az0re> Happy to go fishing
08:54 < az0re> Not happy to burn another 25k sat
08:55 < az0re> I've built up a lot of infrastructure around C-Lightning but frankly it will be cheaper for me to wait for a low-fee period, close all my channels, blow away my node and start over with LND, which doesn't have this problem
08:56 <@cdecker> I'd rather fix the issue than spend a month trying to build a workaround (we are trying to reproduce this btw)
08:56 < az0re> Would it really be a month?
08:56 < az0re> All I want is a quick shim I can put in/take out to ignore our state and sign whatever commit tx they present
08:58 < az0re> > (we are trying to reproduce this btw)
08:59 < az0re> I'm happy to help however I can but honestly I don't see a clear path to resolution, or much effort being put into fixing this
08:59 <@cdecker> Well, the thing is that the commitment engine is the very basis of the entire project, and just finding the correct place to inject the state is hard, not to speak of what that state should look like, since the remote end won't tell us what it is expecting
09:00 < az0re> OK, where does my node actually sign a commit tx? let's start there
09:01 < az0re> > since the remote end won't tell us what it is expecting
09:01 < az0re> Maybe I misunderstand
09:01 < az0re> I guess we are not passing PSBTs?
09:01 < az0re> We just have to both "know" what the state already is and sign the tx blindly, just throwing the signature over the wall?
09:01 <@cdecker> The first step to fix this would be to reproduce the issue in a clean setting, so we can see where things are going wrong. Since you have hit it a couple of times, can you see some commonalities between the cases? What implementation is the other side running, which version, etc (sometimes that info can be gathered)
09:02 <@cdecker> No, the protocol doesn't pass PSBTs around
09:02 < az0re> I don't have that information about channel peers
09:02 <@cdecker> We pass operations around that are then reflected against the state
09:03 <@cdecker> Yeah, that's the main problem we have with reproducing the issue, we don't have good info about when the issue happens and with what setup
09:03 < az0re> So if my peer and I disagree about some state transformation, I have no idea how my peer disagreed, I only have a signature signing some tx I don't agree with
09:03 <@cdecker> Yep... I was against this design, but people wanted efficiency over debuggability...
09:03 < az0re> Then frankly sigging in the db is unavoidable to fix this bug
09:04 < az0re> digging*
09:05 < az0re> The way to go is to find S', my version of current channel state, and reconstruct S, the state before the most recent update_commit attempt
09:05 < az0re> Then bang our heads against the wall until we figure out why that transition gave a bug
09:05 < az0re> Compare to behavior in LND, especially
09:06 <@cdecker> Yep, guess why https://github.com/ElementsProject/lightning/issues/4152 is marked as a compat issue
09:06 < az0re> > reproduce the issue in a clean setting
09:06 <@cdecker> (which was one of your prior instances)
09:06 < az0re> Very unlikely to happen
09:07 <@cdecker> My best guess is that we somehow fail to compute the same feerate, so the issue happens only happens with large-ish feerates and between implementations
09:07 < az0re> I guess the bug happens on some HTLC forward, and there is no way I can get a "clean" environment forwarding enough HTLC traffic to expose this bug
09:08 <@cdecker> That's not a huge issue, since we can pump hundreds of HTLCs through a test network
09:30 < az0re> So is there a plan to deal with this bug?
09:30 < az0re> Or is it just going to rot in the issue tracker for another 6 months?
09:32 < az0re> I'm trying to determine if I should just abandon C-Lightning as broken and move fully to LND, or just stop adding liquidity to this node and making a new LND node my primary
09:45 < darosior> az0re: are you using the feeadjuster pluginby any chance?
09:45 <@cdecker> Well, we are doing our best to resolve this issue, but if you don't feel like it's worth waiting I won't try to persuade you not to switch impls. I personally think that depending on where the issue lies it might end up exacerbating the issue if everybody but a few people switch to one impl, since if that impl is at fault it forces everyone to
09:45 <@cdecker> implement that error that way (see internet explorer forcing all browsers to add a quirksmode because all webmasters built for IE...)
09:46 <@cdecker> darosior: would that have any impact on the commitment tx? iirc that's just triggering channel_updates isn't it?
09:46 < darosior> grass is always greener on the other side
09:47 < darosior> Right, i was trying to think if there could be something with the HTLC amounts
09:47 < az0re> darosior: Nope, mostly static fees, occasionally tweaked manually
09:47 < darosior> Maybe in concordance with the feerate
09:48 <@cdecker> Nah, at the time we commit we already disambiguated whether we wanted the HTLC or not
09:48 < az0re> I totally agree WRT centralization of implementations
09:48 < az0re> I *want* to use C-Lightning
09:48 < az0re> But I keep getting burned by this bug, which is really quite a bit more serious than it sounds
09:49 <@cdecker> Maybe a dumb question, but do you get bad sigs when trying to reconnect? Could have just been a transient thing (should have asked that hours ago...)
09:49 < az0re> Yes, it keeps trying to reconnect and fails again
09:49 <@cdecker> Ok, it's not a feerate update message going missing then, damn :-(
09:50 < darosior> Will grep my node's logs for bad commit sigs if it can help
09:50 < az0re> I *think* I can get in contact with the peer operator
09:50 < az0re> It will take me some time to remember how
09:54 < az0re> If anyone has pointers on how to recover the relevant channel states S and S' from the DB, please let me know ASAP
10:49 < vincenzopalazzo> cdecker, yeah I think it a good point to restart from were I was stopped with the delpay command. I will check it. Thanks
11:38 -!- cryptosoap [~cryptosoa@gateway/tor-sasl/cryptosoap] has quit [Quit: %bye%]
11:38 -!- cryptoso- [~cryptosoa@gateway/tor-sasl/cryptosoap] has joined #c-lightning
12:03 -!- HelloShitty [~psysc0rpi@bl20-171-222.dsl.telepac.pt] has joined #c-lightning
12:58 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning []
12:58 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning
13:12 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning
13:46 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has quit [Ping timeout: 252 seconds]
13:56 -!- jasan [~j@n.bublina.eu.org] has quit [Quit: jasan]
13:56 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning
13:57 -!- jasan [~j@n.bublina.eu.org] has quit [Client Quit]
14:02 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has joined #c-lightning
14:02 -!- jasan [~j@n.bublina.eu.org] has joined #c-lightning
14:03 -!- jasan [~j@n.bublina.eu.org] has quit [Client Quit]
14:29 -!- vincenzopalazzo [~vincenzop@host-80-181-199-140.pool80181.interbusiness.it] has quit [Ping timeout: 240 seconds]
14:30 -!- vincenzopalazzo [~vincenzop@host-87-10-115-59.retail.telecomitalia.it] has joined #c-lightning
14:52 -!- jonasschnelli [~jonasschn@unaffiliated/jonasschnelli] has quit [Remote host closed the connection]
14:54 -!- jonasschnelli [~jonasschn@unaffiliated/jonasschnelli] has joined #c-lightning
14:54 -!- mrostecki [mrostecki@nat/suse/x-uikgeaxofcyrvhsq] has quit [Remote host closed the connection]
14:55 -!- mrostecki [mrostecki@nat/suse/x-dsxgooamcbgldcek] has joined #c-lightning
15:01 -!- grubles_ [~user@gateway/tor-sasl/grubles] has joined #c-lightning
15:01 -!- grubles [~user@gateway/tor-sasl/grubles] has quit [Ping timeout: 240 seconds]
15:44 -!- vincenzopalazzo [~vincenzop@host-87-10-115-59.retail.telecomitalia.it] has quit [Ping timeout: 246 seconds]
15:45 -!- vincenzopalazzo [~vincenzop@host-79-18-34-139.retail.telecomitalia.it] has joined #c-lightning
15:48 -!- harrigan [~harrigan@ptr-93-89-242-235.ip.airwire.ie] has quit [Read error: Connection reset by peer]
15:50 -!- harrigan [~harrigan@ptr-93-89-242-235.ip.airwire.ie] has joined #c-lightning
16:28 -!- spinza [~spin@102.132.245.16] has quit [Quit: Coyote finally caught up with me...]
16:45 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has quit [Ping timeout: 240 seconds]
16:47 -!- EndFiat [EndFiat@gateway/vpn/mullvad/endfiat] has joined #c-lightning
16:51 -!- spinza [~spin@102.132.245.16] has joined #c-lightning
17:00 -!- k3tan [~pi@gateway/tor-sasl/k3tan] has quit [Ping timeout: 240 seconds]
17:40 -!- az0re [~az0re@gateway/tor-sasl/az0re] has quit [Remote host closed the connection]
17:54 -!- k3tan [~pi@gateway/tor-sasl/k3tan] has joined #c-lightning
18:40 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning
18:45 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 240 seconds]
18:57 -!- cdecker [~cdecker@mail.snyke.net] has quit [Ping timeout: 246 seconds]
18:58 -!- snyke [~cdecker@mail.snyke.net] has joined #c-lightning
18:58 -!- mode/#c-lightning [+o snyke] by ChanServ
19:01 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning
19:32 -!- vincenzopalazzo [~vincenzop@host-79-18-34-139.retail.telecomitalia.it] has quit [Remote host closed the connection]
20:06 -!- grubles_ is now known as grubles
20:26 -!- belcher [~belcher@unaffiliated/belcher] has quit [Ping timeout: 252 seconds]
20:36 -!- Teoti [~teoti@216.154.58.117] has quit [Ping timeout: 240 seconds]
20:40 -!- belcher [~belcher@unaffiliated/belcher] has joined #c-lightning
21:06 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has left #c-lightning []
21:06 -!- blockstream_bot [blockstrea@gateway/shell/sameroom/x-appjcgjzwowenres] has joined #c-lightning
21:13 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has quit [Ping timeout: 240 seconds]
21:15 -!- bitdex [~bitdex@gateway/tor-sasl/bitdex] has joined #c-lightning
21:23 -!- gleb [~gleb@178.150.137.228] has quit [Ping timeout: 252 seconds]
23:51 -!- gleb [~gleb@178.150.137.228] has joined #c-lightning
--- Log closed Thu Apr 15 00:00:26 2021