Subject: Hash compatibility: Unicode considerations Date: Sat, 20 Apr 2002 10:29:55 -0400 From: Michel Gallant Organization: Bell Sympatico Newsgroups: microsoft.public.platformsdk.security,comp.lang.java.security Obtaining a hash (digest) of textual data can introduce compatibility issues between different approaches, depending on how the text is passed to the hash procedure. For example, text internally represented as unicode (for example in JScript) might be passed as a LittleEndian byte encoding to the hash algorithm. Of course this shows up as differences in signed hashes and pkcs#7 encoded messages. To help troubleshoot and resolve some of these issues, I have updated the MD5/SHA-1 signed Java applet calculator with a choice to show the UnicodeLittleUnmarked hash value (mainly used on win32/Intel) as well as the usual default character --> byte encoding (typically UTF-8): http://home.istar.ca/~neutron/messagedigest/ Some examples: In Java, an internally represented String object (Unicode) when converted to a byte array with string.getBytes() encodes by default to bytes without unicode encoding Examples in CAPICOM 2 which show hashing (e.g. CHashData.vbs) use the unicode byte representation (LittleEndian) for HashedData.Hash Content The .net SDK provides capability to easily control this encoding of byte data passed to the hash algorithm, for example: Byte[] data2hash = UTF8Encoding()).GetBytes(s) ; Byte[] data2hash = UnicodeEncoding()).GetBytes(s) ; byte[] hashvalue2 = (new MD5CryptoServiceProvider()).ComputeHash(data2hash); Sometimes, examples that hash text data explicitly add a null byte to the data to be hashed (for example, MS PSDK cryptoAPI signature and hash demos) which of course changes the hash value and any subsequent signatures. - Mitch Gallant http://home.istar.ca/~neutron/wsh