This One Weird Mistake

I’m working on some code to encrypt and decrypt compressed files. Take a file like a database dump. You don’t want to store the data in the open, so you want to encrypt it. You also don’t want to store raw data. It probably takes up a lot more space then it needs. Naturally, you compress and then encrypt the data?

But which compression scheme and which encryption scheme. Making the selection of encryption and compression simple is the basic idea behind the library. There are only a handful of compression algorithms most people will try. With encryption, it’s largely AES, but maybe other suites would be attractive. These are small, finite universes.

Compression is Simple

The compression libraries in the standard package essentially implement the io.WriteCloser interface. They wrap an existing write. The standard approach is to use two files. One is a reader on the file with the data to compress and the other is a writer to hold the compressed data. The io.Copy function can copy the data between the two.

For the library I was developing, compression would be in-memory. The source data would be a few kilobytes already in memory. The data would immediately be sent over the network after compression. The source is a bytes.Reader and the writer is a bytes.Buffer.

func CompressData(data []byte) ([]byte, error) {
	src := bytes.NewReader(data)
	dest := new(bytes.Buffer)

	zipper := gzip.NewWriter(dest)
	defer zipper.Close()

	_, err := io.Copy(zipper, src)
	if err != nil {
		return nil, err
	}
	
	return dest.Bytes(), nil
}

Nothing Is Working

Once you see the problem with the code, you can’t un-see it. I create my data source (src). I create a destination (dest). I wrap the destination in a writer (zipper). I use io.Copy to move the bytes from the source to the GZip writer. And then I return the bytes, if there’s no error.

I would get back empty buffers.

In the majority of cases when a writer is opened, it’s a good policy to defer a Close. I do that. If someone implements a WriteCloser, they can clean up any temporary buffers or structures at that time. If they have open file descriptors, they can close those. It’s good practice to call close when you’re done with a WriteCloser. It’s good practice to defer the call immediately after successfully opening the writer. That way you are less likely to delete it when you refactor the code.

Except what’s going on here? I return the bytes from the underlying byte buffer. The deferred call runs after the return. That’s when the data is actually flushed to the underlying byte buffer. Until the Close call, all my data is buffered inside the GZip writer and hasn’t made it to the underlying destination.

The solution is trivial. Move the Close call to after the io.Copy but before the return. The downside is that I can accidentally remove it, but testing should pick that up.

func CompressData(data []byte) ([]byte, error) {
	src := bytes.NewReader(data)
	dest := new(bytes.Buffer)

	zipper := gzip.NewWriter(dest)

	_, err := io.Copy(zipper, src)
	if err != nil {
		return nil, err
	}
	
	zipper.Close()
	return dest.Bytes(), nil
}

The Normal Case

I would not have run into this except for the fact I’m returning the compressed bytes. A typical use of compression wouldn’t run into this problem because it’s likely sending data to a file. Maybe we would return the number of bytes, but most likely just the error.

func CompressToFile(data []byte) error {
	src := bytes.NewReader(data)

	dest, err := os.Create("/tmp/somefile.gzip")
	if err != nil {
		return err
	}
	defer dest.Close()

	zipper := gzip.NewWriter(dest)
	defer zipper.Close()

	_, err = io.Copy(zipper, src)
	if err != nil {
		return err
	}

	return nil
}

In this version I correctly close the underlying file (the docs indicate closing the GZip writer doesn’t close the underlying file). I defer the close to zipper as well. The order the two deferred calls are executed is to first flush zipper and then flush dest. This will ensure the data is written out to the file. The calls to defer are handled as a stack. The actual writing doesn’t complete until the CompressToFile function returns. But that’s fine, because CompressToFile doesn’t depend on the data being written prior to returning.

Conclusion

The CompressData function, however, relies on the data being flushed to return a valid value. This is one of the few times I’ve been bitten by defer and not reading the docs. If I were to make any error, I would rather err on the side of always adding a defer with a Close. For me, at least, Go seems to follow the principle of least surprise. Most APIs work the way I would expect and I often don’t bother reading the docs.