String Compression and Its Impact on Transaction Gas

The Gas Problem

Gas is a fact of life in web3, even if there is no direct charge for it. It is impacted by many factors, but ultimately it boils down to input; the greater the input, the greater the cost. What if there was a reliable way to reduce that input in a consistent and reproducible fashion?

Data on Jackal Protocol is stored both on and off-chain. Files are stored off-chain, however ensuring their security without compromising their accessibility necessitates matching on-chain records for each file and folder. This adds up quickly. More specifically, on-chain data transactions resulted in an increase of roughly 200 gas units per character. This included storing everything from the encryption keys protecting the file to the recorded last-modified date for the file.

The Solution

To mitigate this gas issue, the development team turned to string compression. Using the LZ-String compression library, the team has been able to losslessly compress and decompress character strings. Strings not required for chain operations (required strings including owner address or merkle source values) have achieved a 40-90% byte size reduction, depending on the length and repetition of characters before compression. Additionally, this process has maintained a 1:1 parity of character count pre and post-AES encryption. All compression is performed pre-encryption as there is significantly more repetition of latin characters than the full UTF-16 set used to visualize encrypted data.

How does this work?

In short, strings of latin characters used by JavaScript are visualized as UTF-8, but actually occupy a full 16 bits to maintain compatibility with the expanded UTF-16 character set. LZ-String reads in those pseudo-UTF-8 characters and identifies substrings that it can represent as distinct bit sequences and fill the full 16-bit space per “character”. This increases efficiency as strings grow longer and/or more repetitive. This trait is particularly important for our on-chain folder data as file paths quickly become repetitive.

Side Effects

There are some drawbacks to working with string compression. Working with files, file metadata, and string compression is challenging in a browser because of built-in safety systems. An excellent example of this is using the default TextDecoder/Encoder process to read strings into memory. These tools are designed to be HTML-safe, and thus, are excellent for nearly all use cases. However, this creates a problem on our end because it mutates the compressed string, thereby rendering decompression impossible.

To solve this problem we had to create a custom reader to convert each compressed UTF-16 value into an in-memory string element, bypassing the normal HTML sanitization step. This only works because we never try to use this compressed string data on the web page.

Another drawback of data compression algorithms is the increase in computational load. This occurs when processing values from compressed and decompressed states. We find this an acceptable tradeoff as we want to reduce on-chain computation as much as possible, again to reduce gas needs. By leveraging the user’s device for computations as opposed to the chain, we achieve–in most instances–a significant reduction in gas requirements. Most users have reasonably modern devices and browsers, allowing for even involved local processing to be virtually unnoticeable. This is particularly true in comparison to the user’s internet bandwidth as a primary bottleneck.

An important consideration is that this only works for data that is only used off-chain. Strings that would need to be compressed or decompressed by the chain to complete some action would generally result in a net loss due to the computation complexity.

Conclusion

The payment of gas fees for on-chain transactions is inevitable. However, implementing string compression helps significantly alleviate one of the major contributors to high gas fees. With reductions of up to 90%, we can do more and store more at a lower cost. Systems like LZ-String allows us to repurpose wasted space in traditional data to optimize gas requirements, without sacrificing privacy or security standards.

You might also be interested in