
Test Driving Base64
Base64 is a workhorse on the internet. It allows binary data to be
encoded in an ASCII string. This way it can be stored and transported
without danger of data being corrupted.
Because the encoding/decoding scheme is so ubiquitous one wonders why
you should implement it yourself instead of using a library.
Table of Contents
1 Reinventing the Wheel
Numerous cases have been made against reinventing the
wheel in production. Arguments usually fall in the following
categories.
- Libraries are better tuned for memory usage and speed.
- Libraries are better tested because of the greater usage base.
These are valid objections. But reinventing the wheel is not about
building software that already is widely available. The benefits of
reinventing the wheel are numerous.
- One could invent a different wheel that is better suited in some
situations. - One gains the experience of making various design decision under
the given constraints. - One becomes familiar with the details of a technology.
- The time spend counts for the 10000 hour rule.
Just don’t use it in production.
2 Specification
Base64 encodes a multiple of three bytes into four characters via the
following mechanism.
byte | #0 | #1 | #2 | |||||||||||||||||||||
bit pattern | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
index | 3 | 26 | 46 | 45 | ||||||||||||||||||||
encoding | C | a | t | s |
Three bytes are placed after each other. Each group of six bits are
encoded via the following table.
Index | Character | Index | Character | Index | Character | Index | Character |
---|---|---|---|---|---|---|---|
0 | A | 16 | Q | 32 | g | 48 | w |
1 | B | 17 | R | 33 | h | 49 | x |
2 | C | 18 | S | 34 | i | 50 | y |
3 | D | 19 | T | 35 | j | 51 | z |
4 | E | 20 | U | 36 | k | 52 | 0 |
5 | F | 21 | V | 37 | l | 53 | 1 |
6 | G | 22 | W | 38 | m | 54 | 2 |
7 | H | 23 | X | 39 | n | 55 | 3 |
8 | I | 24 | Y | 40 | o | 56 | 4 |
9 | J | 25 | Z | 41 | p | 57 | 5 |
10 | K | 26 | a | 42 | q | 58 | 6 |
11 | L | 27 | b | 43 | r | 59 | 7 |
12 | M | 28 | c | 44 | s | 60 | 8 |
13 | N | 29 | d | 45 | t | 61 | 9 |
14 | O | 30 | e | 46 | u | 62 | + |
15 | P | 31 | f | 47 | v | 63 | / |
If the total number of bytes is not a multiple of three pad with zero
bytes until it is. If there was one byte left over take the first two
characters and append ==
. If there were two bytes left over take
the first three characters and append =
.
This will result in a string with a number of characters that is
divisible by four.
3 Test Driven Development
Test Driven Development (TDD) is the a process where a developer sets
up a very short feedback loop by writing test and only then writing
code to make the test pass. When all the tests pass one cleans up the
codebase before new tests are introduced.
So we will focus on the tests that are used to drive our
implementation. The test is a number of test cases created with the
following code.
verifyThat(0b0, 0b0, 0b0).encodesAs("AAAA");
The above code is used to create a test case that, together with a
Parameterized
test runner, actually performs the verification
described in the snippit. If you want to know how this is achieved,
you can look at the implementation.
4 Test Cases
What test cases should we consider? The simplest input would be three
zero bytes. Performing the encoding by hand produces the following
test case.
verifyThat(0b0, 0b0, 0b0).encodesAs("AAAA");
The following few test cases increase the number of zero bytes to
multiples of three.
verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAA"); verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAAAA");
For the next test cases we take a few steps back. When the number of
available bytes is not divisible by three we need to introduce =
in
the output. The simplest case we can think of is one and two zero
bytes.
verifyThat(0b0, 0b0).encodesAs("AAA="); verifyThat(0b0).encodesAs("AA==");
In order to make sure that longer sequences of zero bytes are still
being encoded correctly we introduce the following test cases.
verifyThat(0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAA="); verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAAA="); verifyThat(0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAA=="); verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAA==");
Now we have our work cut out for us, because the next thing to
consider is encoding something different than zero bytes. The
simplest thing that springs to mind is
verifyThat(0b00000000, 0b00000000, 0b00000001).encodesAs("AAAB");
We switch to writing our bytes with zeros prepend to see the
structure of the bits. The next few test cases are a bit dull because
they go through every output symbol. The end result is
verifyThat(0b00000000, 0b00000000, 0b00000010).encodesAs("AAAC"); <<testcase:AAAD-AAA+>> verifyThat(0b00000000, 0b00000000, 0b00111111).encodesAs("AAA/");
Now that we confidently encoded the fourth character, we can expand
our territory by taking on the third character. Because we have
working code that can be reused we only focus on the edge cases.
verifyThat(0b00000000, 0b00000000, 0b01000000).encodesAs("AABA"); verifyThat(0b00000000, 0b00001111, 0b11000000).encodesAs("AA/A");
Similar for the second and first character
verifyThat(0b00000000, 0b00010000, 0b00000000).encodesAs("ABAA"); verifyThat(0b00000011, 0b11110000, 0b00000000).encodesAs("A/AA"); verifyThat(0b00000100, 0b00000000, 0b00000000).encodesAs("BAAA"); verifyThat(0b11111100, 0b00000000, 0b00000000).encodesAs("/AAA");
Now that we have a proper implementation for all the four characters
lets check our encoding on smaller sizes.
data.add(verifyThat(0b00000001).encodesAs("AQ==")); data.add(verifyThat(0b11111111).encodesAs("/w==")); data.add(verifyThat(0b00000000, 0b000000001).encodesAs("AAE=")); data.add(verifyThat(0b11111111, 0b111111111).encodesAs("//8="));
With a proper set of test cases we can confidently refactor our
implementation without losing our solution. A similar procedure can
be used to implement the decode
method.
5 Theories
Now that we have a proper understanding of how the Base64 encoding is
performed, it is time to reflect on its properties. Although the test
cases that were provided during TDD phase have merit, they offer a
myopic view on the process. JUnit Theories can cast a wider net and
provide a way to check theories about code.
For example, the whole purpose of the encode
and decode
methods
is to be inverses of each other. This can be expressed with the
following test that uses the Theories
test runner.
@RunWith(Theories.class) public class Base64Theories { private Kata kata; @Before public void createKata() { kata = new Kata(); } @DataPoints public static byte[][] dataPoints = new byte[][]{ new byte[]{0b0}, new byte[]{0b0, 0b0}, new byte[]{0b0, 0b0, 0b0}, new byte[]{0b0, 0b0, 0b0, 0b0}, new byte[]{0b0, 0b0, 0b0, 0b0, 0b0}, new byte[]{0b0, 0b0, 0b0, 0b0, 0b0, 0b0}, new byte[]{0b111111}, new byte[]{0b111111, 0b111111}, new byte[]{0b111111, 0b111111, 0b111111}, new byte[]{0b111111, 0b111111, 0b111111, 0b111111}, new byte[]{0b111111, 0b111111, 0b111111, 0b111111, 0b111111}, new byte[]{0b111111, 0b111111, 0b111111, 0b111111, 0b111111, 0b111111} }; @Theory public void decodeIsTheInverseOfEncode(byte[] input) { assumeThat(input.length, is(greaterThan(0))); assertThat(kata.decode(kata.encode(input)), is(input)); } }
Most notable is the @Theory
annotation. It is used to validate the
assertions for all data points that adhere to the assumptions. The
data points are provided by an array annotated with the @DataPoints
annotation.
For other theories about encoding and decoding Base64 see the full
source of the Base64Theories.
6 Conclusions
Base64 encoding/decoding is an ideal problem to use Test Driven
Development on. It has the benefits of being widely used but
succintly expressed. It offers wide range of decision moments, both
in the the selection of test cases and in implementation choice.