
A primer on VLQ
Variable-Length Quantity (VLQ) is an way of representing arbitrarly
large integers in bytes. In this blog post we will explore the
details of how this is achieved.
Table of Contents
1 Representation Problem
Lets say you are representing a music score. You need to represent the
duration of various notes. For each note you will tell how many
milliseconds it lasts. Although this story is apocryphal it offers a
nice conundrum: how do you represent numbers on an open ended scale?
2 VLQ
One way of doing this is Variable-Length Quantity, a code that uses a
a variable amount of bytes and a such is amendable to use in a
computer.
VLQ is composed of a number of octets, i.e. eight bits. The most
significant bit of an octet is used to signal if an other octet
follows. The least significant bits of all octets together form the
integer.
3 Example
Lets say I want to represent the number 3435
in VLQ. 3435
in
binary is 110101101011
. We can not fit this in a byte. So we will
chop it up from the end in 7-bit blocks
Septet | 7 | 6 | 5 | 4 | 3 | 2 | 1 | |
---|---|---|---|---|---|---|---|---|
#1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | |
#2 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
Now we prepend all but the last with a 1-bit to indicate that an octet
follows and prepend a 0-bit to the last, signalling the final octet.
Octet | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
---|---|---|---|---|---|---|---|---|
#1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
#2 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
Finally we concatenate them, most significant octet first, into
1001101001101011
. An implementation in java is provided.
4 Variations
There are a few variations on VLQ possible. First and foremost is the
order of octets. We agreed to send the most significant octet first,
but you can switch the order around. You would have to change which
groups get a continuation bit prepended, but all in all it is not
very different.
A more drastic change is to represent both positive and negative
numbers. In the current code there is no room to encode a sign for
the number. So we need to free up a bit to encode the sign. One way
of doing this is to keep the most significant bit as a continuation
bit, i.e. it signals if an other octet follows. Then use the next
most significant bit to encode the sign of the number.
Under this representation -3435
would be represented as
1111010101101011
.