After being triggered by a post on OpenXmlDeveloper.org, I have been trying to implement the password hashing algorithm used by SpreadsheetML files. After some discussion with Doug and the Excel team, we found out that the algorithm being displayed in the final draft of the Open XML specs is unfortunately incorrect. With this post I will display code which calculates the correct hash.
First of all, let’s talk about the process of verifying your password. Actually, lets not. It is unbelievable sometimes that I forget the public I am writing for... :)
As I have mentioned the displayed hash algorithm in the ECMA specs produces the wrong value compared to Excel. Now we can debate on who generates the wrong value, Excel or the ECMA spec. Because the ECMA spec is of course derived from Excel capabilities, and Excel has been hashing passwords using the same algorithm since the beginning of time, I’d say the specs is the thing which will need updating. This in turn requires a new hashing method to be displayed in the spec.
I’ve taken the liberty to produce the C# equivalent of the hashing algorithm used by Excel. The ECMA spec displays the C version, but most of us will handle Open XML files using the Packaging API, which means .NET code all around. One note about this code is that it does not take 2-byte wide character encodings in to account (Unicode etc...). The specs state that these characters need to be converted to 1-byte wide characters using a specific conversion. I believe the conversion method is specified by W3C or something similar. As long as you use basic ASCII characters you’ll be fine.
static string HashPassword(
byte[] passwordCharacters)
{
int hash = 0;
if (passwordCharacters.Length > 0)
{
int charIndex = passwordCharacters.Length;
while (charIndex-- > 0)
{
hash = ((hash >> 14) & 0x01) | ((hash << 1) & 0x7fff);
hash ^= passwordCharacters[charIndex];
}
// Main difference from spec, also hash with charcount
hash = ((hash >> 14) & 0x01) | ((hash << 1) & 0x7fff);
hash ^= passwordCharacters.Length;
hash ^= (0x8000 | ('N' << 8) | 'K');
}
return Convert.ToString(hash, 16).ToUpper();
}
The main difference with the code found inside the ECMA specs, is the extra hashing operation using the character count right after the while loop. The reason for this extra hash operation is the way Excel stores a string (your password). The C code displayed in the spec uses a 0-terminated string, Excel uses the first byte as the character count. Besides hashing with each character, Excel also hashes with the first ‘character’, or the character count value. This was left out in the ECMA spec.
In the near future I plan to extract this method into its own HashAlgorithm class, a .NET base class for hashing functions. Also the conversion of Unicode to single-byte characters will need to be addressed later on.
Hope this helps