Let n be the non-segwit bytes. Let the seg/noseg ratio be 1.7. 

Segwit with 75% discount: (let WITNESS_SCALE_FACTOR=4)
n*WITNESS_SCALE_FACTOR+n*1.7 = 4,000,000
Then n=4,000,000 / 5.7 = 701K
Average block size = 701K*(1+1.7)=1.8 Mbytes
Maximum block size = 4 MBytes

Segwit with 50% discount + 2MB HF: (let WITNESS_SCALE_FACTOR=2)
n*2+n*1.7 = 4,000,000
n = 4,000,000/ 3.7 = 1.08M
Average block size = 1.08M*(1+1.7)=2.9 Mbytes
Maximum block size = 4 MBytes

The capacity of Segwit(50%)+2MbHF is 50% more than Segwit, and the maximum block size is the same. 


On Tue, May 9, 2017 at 3:58 PM, Sergio Demian Lerner <sergio.d.lerner@gmail.com> wrote:


You suggested "If the maximum block weight is set to 2.7M, each byte of
non-witness block costs 1.7", but these numbers dont work out - setting
the discount to 1.7 gets you a maximum block size of 1.7MB (in a soft
fork), not 2.7MB.

Yes. In a soft-fork is true.
I was thinking about what a HF could do to optimize the balance, and I forgot I was in the context of a SF.