I noticed that this Merkle tree we are building is made more expensive by including repetitive data. So, this proposal draws inspiration from LZ78, an algorithm which constructs a dictionary of repetitive strings which are then referred to by their index. What if the blockchain already built a dictionary for us, and now we just need to index it? --- Abstract This BIP describes how a new OP code can be used to construct smaller, more compact transactions. With a public reference (PubRef), a newly created transaction can reuse elements of a previously confirmed transaction by representing this information with a smaller numeric offset or “pointer”. Motivation Giving scripts the ability to refer to data on the blockchain will reduce transaction sizes because key material does not have to be repeated in every Script. Users of the network are rewarded with smaller transaction sizes, and miners are able to fit more transactions into new blocks. Pointers are a common feature and it felt like this was missing from Bitcoin Script. Specification This BIP defines a new Script opcode that can be introduced with BIP-0141 (Segregated Witness (Consensus layer)). This new opcode is as follows: Word Opcode Hex Input Output Description OP_PUBREF4 TBD TBD uint4 data The next four bytes is an integer reference to a previously defined PUSHDATA Let there be an ordered list of all invocations of PUSHDATA (OP codes; 0x4c,0x4d,0x4e) across all scripts starting from the genesis block: [t0,t2,...,tn]. With this list a newly created script can refer to a specific PUSHDATA that was used in any previously confirmed block. The values referenced by this list are immutable and uniform to all observers. Lets assume there is some transaction containing a ScriptSig and outputScript pair, here is what an input and an output script would look like: ScriptSig PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101] PUSHDATA (65)[04bca69c59dc7a6d8ef4d3043bdcb626e9e29837b9beb143168938ae8165848bfc788d6ff4cdf1ef843e6a9ccda988b323d12a367dd758261dd27a63f18f56ce77] outputScript DUP HASH160 PUSHDATA(20)[dd6cce9f255a8cc17bda8ba0373df8e861cb866e] EQUALVERIFY CHECKSIG The ScriptSig of an input typically contains the full public key which is ~65 bytes. Outputs will typically contain a hash of the public key which is 20 bytes. Using PubRef, the public-key material shown above no longer need to be repeated, instead the needed values can be resolved from previously confirmed transaction. The signature of the input must still be unique, however, the public key can be replaced with a call to PUBREF, as shown below: ScriptSig PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101] PUBREF[43123] outputScript DUP HASH160 PUBREF[12123] EQUALVERIFY CHECKSIG The above call to PUBREF removed the need to include the full public-key, or hash of the public key in a given transaction. This is only possible because these values where used previously in a confirmed block. If for example a user was sending Bitcoins to a newly formed address, then no PUBREF can be created in this case - and a outputScript using PUSHDATA would need to be used at least once. If a newly created wallet has never been used on the Bitcoin network, then the full public-key will need to be included in the ScriptSig. Once these transactions have been confirmed, then these values will be indexed and any future script can refer to these public-key values with a PUBREF operation. PubRef is not susceptible to malleability attacks because the blockchain is immutable. The PUSHDATA operation can store at most 520 bytes on the stack, therefore a single PUBREF operation can never obtain more than 520 bytes. In order for a client to make use of the PUBREF operations they’ll need access to a database that look up public-keys and resolve their PUBREF index. A value can be resolved to an index with a hash-table lookup in O(1) constant time. Additionally, all instances of PUSHDATA can be indexed as an ordered list, resolution of a PUBREF index to the intended value would be an O(1) array lookup. Although the data needed to build and resolve public references is already included with every full node, additional computational effort is needed to build and maintain these indices - a tradeoff which provides smaller transaction sizes and relieving the need to store repetitive data on the blockchain.