1e1051a39Sopenharmony_ci=pod 2e1051a39Sopenharmony_ci 3e1051a39Sopenharmony_ci=encoding utf8 4e1051a39Sopenharmony_ci 5e1051a39Sopenharmony_ci=head1 NAME 6e1051a39Sopenharmony_ci 7e1051a39Sopenharmony_cipassphrase-encoding 8e1051a39Sopenharmony_ci- How diverse parts of OpenSSL treat pass phrases character encoding 9e1051a39Sopenharmony_ci 10e1051a39Sopenharmony_ci=head1 DESCRIPTION 11e1051a39Sopenharmony_ci 12e1051a39Sopenharmony_ciIn a modern world with all sorts of character encodings, the treatment of pass 13e1051a39Sopenharmony_ciphrases has become increasingly complex. 14e1051a39Sopenharmony_ciThis manual page attempts to give an overview over how this problem is 15e1051a39Sopenharmony_cicurrently addressed in different parts of the OpenSSL library. 16e1051a39Sopenharmony_ci 17e1051a39Sopenharmony_ci=head2 The general case 18e1051a39Sopenharmony_ci 19e1051a39Sopenharmony_ciThe OpenSSL library doesn't treat pass phrases in any special way as a general 20e1051a39Sopenharmony_cirule, and trusts the application or user to choose a suitable character set 21e1051a39Sopenharmony_ciand stick to that throughout the lifetime of affected objects. 22e1051a39Sopenharmony_ciThis means that for an object that was encrypted using a pass phrase encoded in 23e1051a39Sopenharmony_ciISO-8859-1, that object needs to be decrypted using a pass phrase encoded in 24e1051a39Sopenharmony_ciISO-8859-1. 25e1051a39Sopenharmony_ciUsing the wrong encoding is expected to cause a decryption failure. 26e1051a39Sopenharmony_ci 27e1051a39Sopenharmony_ci=head2 PKCS#12 28e1051a39Sopenharmony_ci 29e1051a39Sopenharmony_ciPKCS#12 is a bit different regarding pass phrase encoding. 30e1051a39Sopenharmony_ciThe standard stipulates that the pass phrase shall be encoded as an ASN.1 31e1051a39Sopenharmony_ciBMPString, which consists of the code points of the basic multilingual plane, 32e1051a39Sopenharmony_ciencoded in big endian (UCS-2 BE). 33e1051a39Sopenharmony_ci 34e1051a39Sopenharmony_ciOpenSSL tries to adapt to this requirements in one of the following manners: 35e1051a39Sopenharmony_ci 36e1051a39Sopenharmony_ci=over 4 37e1051a39Sopenharmony_ci 38e1051a39Sopenharmony_ci=item 1. 39e1051a39Sopenharmony_ci 40e1051a39Sopenharmony_ciTreats the received pass phrase as UTF-8 encoded and tries to re-encode it to 41e1051a39Sopenharmony_ciUTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000 42e1051a39Sopenharmony_cito U+FFFF, but becomes an expansion for any other character), or failing that, 43e1051a39Sopenharmony_ciproceeds with step 2. 44e1051a39Sopenharmony_ci 45e1051a39Sopenharmony_ci=item 2. 46e1051a39Sopenharmony_ci 47e1051a39Sopenharmony_ciAssumes that the pass phrase is encoded in ASCII or ISO-8859-1 and 48e1051a39Sopenharmony_ciopportunistically prepends each byte with a zero byte to obtain the UCS-2 49e1051a39Sopenharmony_ciencoding of the characters, which it stores as a BMPString. 50e1051a39Sopenharmony_ci 51e1051a39Sopenharmony_ciNote that since there is no check of your locale, this may produce UCS-2 / 52e1051a39Sopenharmony_ciUTF-16 characters that do not correspond to the original pass phrase characters 53e1051a39Sopenharmony_cifor other character sets, such as any ISO-8859-X encoding other than 54e1051a39Sopenharmony_ciISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical" 55e1051a39Sopenharmony_cicharacters in the 0x80-0x9F range). 56e1051a39Sopenharmony_ci 57e1051a39Sopenharmony_ci=back 58e1051a39Sopenharmony_ci 59e1051a39Sopenharmony_ciOpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why 60e1051a39Sopenharmony_ciOpenSSL still does this, to be able to read files produced with older versions. 61e1051a39Sopenharmony_ci 62e1051a39Sopenharmony_ciIt should be noted that this approach isn't entirely fault free. 63e1051a39Sopenharmony_ci 64e1051a39Sopenharmony_ciA pass phrase encoded in ISO-8859-2 could very well have a sequence such as 65e1051a39Sopenharmony_ci0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE" 66e1051a39Sopenharmony_ciand "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would 67e1051a39Sopenharmony_cibe misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN 68e1051a39Sopenharmony_ciSMALL LETTER I WITH DIAERESIS) I<if the pass phrase doesn't contain anything that 69e1051a39Sopenharmony_ciwould be invalid UTF-8>. 70e1051a39Sopenharmony_ciA pass phrase that contains this kind of byte sequence will give a different 71e1051a39Sopenharmony_cioutcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0. 72e1051a39Sopenharmony_ci 73e1051a39Sopenharmony_ci 0x00 0xC3 0x00 0xAF # OpenSSL older than 1.1.0 74e1051a39Sopenharmony_ci 0x00 0xEF # OpenSSL 1.1.0 and newer 75e1051a39Sopenharmony_ci 76e1051a39Sopenharmony_ciOn the same accord, anything encoded in UTF-8 that was given to OpenSSL older 77e1051a39Sopenharmony_cithan 1.1.0 was misinterpreted as ISO-8859-1 sequences. 78e1051a39Sopenharmony_ci 79e1051a39Sopenharmony_ci=head2 OSSL_STORE 80e1051a39Sopenharmony_ci 81e1051a39Sopenharmony_ciL<ossl_store(7)> acts as a general interface to access all kinds of objects, 82e1051a39Sopenharmony_cipotentially protected with a pass phrase, a PIN or something else. 83e1051a39Sopenharmony_ciThis API stipulates that pass phrases should be UTF-8 encoded, and that any 84e1051a39Sopenharmony_ciother pass phrase encoding may give undefined results. 85e1051a39Sopenharmony_ciThis API relies on the application to ensure UTF-8 encoding, and doesn't check 86e1051a39Sopenharmony_cithat this is the case, so what it gets, it will also pass to the underlying 87e1051a39Sopenharmony_ciloader. 88e1051a39Sopenharmony_ci 89e1051a39Sopenharmony_ci=head1 RECOMMENDATIONS 90e1051a39Sopenharmony_ci 91e1051a39Sopenharmony_ciThis section assumes that you know what pass phrase was used for encryption, 92e1051a39Sopenharmony_cibut that it may have been encoded in a different character encoding than the 93e1051a39Sopenharmony_cione used by your current input method. 94e1051a39Sopenharmony_ciFor example, the pass phrase may have been used at a time when your default 95e1051a39Sopenharmony_ciencoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 96e1051a39Sopenharmony_ci0xEF 0x76 0x65), and you're now in an environment where your default encoding 97e1051a39Sopenharmony_ciis UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76 98e1051a39Sopenharmony_ci0x65). 99e1051a39Sopenharmony_ciWhenever it's mentioned that you should use a certain character encoding, it 100e1051a39Sopenharmony_cishould be understood that you either change the input method to use the 101e1051a39Sopenharmony_cimentioned encoding when you type in your pass phrase, or use some suitable tool 102e1051a39Sopenharmony_cito convert your pass phrase from your default encoding to the target encoding. 103e1051a39Sopenharmony_ci 104e1051a39Sopenharmony_ciAlso note that the sub-sections below discuss human readable pass phrases. 105e1051a39Sopenharmony_ciThis is particularly relevant for PKCS#12 objects, where human readable pass 106e1051a39Sopenharmony_ciphrases are assumed. 107e1051a39Sopenharmony_ciFor other objects, it's as legitimate to use any byte sequence (such as a 108e1051a39Sopenharmony_cisequence of bytes from F</dev/urandom> that's been saved away), which makes any 109e1051a39Sopenharmony_cicharacter encoding discussion irrelevant; in such cases, simply use the same 110e1051a39Sopenharmony_cibyte sequence as it is. 111e1051a39Sopenharmony_ci 112e1051a39Sopenharmony_ci=head2 Creating new objects 113e1051a39Sopenharmony_ci 114e1051a39Sopenharmony_ciFor creating new pass phrase protected objects, make sure the pass phrase is 115e1051a39Sopenharmony_ciencoded using UTF-8. 116e1051a39Sopenharmony_ciThis is default on most modern Unixes, but may involve an effort on other 117e1051a39Sopenharmony_ciplatforms. 118e1051a39Sopenharmony_ciSpecifically for Windows, setting the environment variable 119e1051a39Sopenharmony_ciB<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt 120e1051a39Sopenharmony_ciconverted to UTF-8 (command line and separately prompted pass phrases alike). 121e1051a39Sopenharmony_ci 122e1051a39Sopenharmony_ci=head2 Opening existing objects 123e1051a39Sopenharmony_ci 124e1051a39Sopenharmony_ciFor opening pass phrase protected objects where you know what character 125e1051a39Sopenharmony_ciencoding was used for the encryption pass phrase, make sure to use the same 126e1051a39Sopenharmony_ciencoding again. 127e1051a39Sopenharmony_ci 128e1051a39Sopenharmony_ciFor opening pass phrase protected objects where the character encoding that was 129e1051a39Sopenharmony_ciused is unknown, or where the producing application is unknown, try one of the 130e1051a39Sopenharmony_cifollowing: 131e1051a39Sopenharmony_ci 132e1051a39Sopenharmony_ci=over 4 133e1051a39Sopenharmony_ci 134e1051a39Sopenharmony_ci=item 1. 135e1051a39Sopenharmony_ci 136e1051a39Sopenharmony_ciTry the pass phrase that you have as it is in the character encoding of your 137e1051a39Sopenharmony_cienvironment. 138e1051a39Sopenharmony_ciIt's possible that its byte sequence is exactly right. 139e1051a39Sopenharmony_ci 140e1051a39Sopenharmony_ci=item 2. 141e1051a39Sopenharmony_ci 142e1051a39Sopenharmony_ciConvert the pass phrase to UTF-8 and try with the result. 143e1051a39Sopenharmony_ciSpecifically with PKCS#12, this should open up any object that was created 144e1051a39Sopenharmony_ciaccording to the specification. 145e1051a39Sopenharmony_ci 146e1051a39Sopenharmony_ci=item 3. 147e1051a39Sopenharmony_ci 148e1051a39Sopenharmony_ciDo a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try 149e1051a39Sopenharmony_ciwith the result. 150e1051a39Sopenharmony_ciThis differs from the previous attempt because ISO-8859-1 maps directly to 151e1051a39Sopenharmony_ciU+0000 to U+00FF, which other non-UTF-8 character sets do not. 152e1051a39Sopenharmony_ci 153e1051a39Sopenharmony_ciThis also takes care of the case when a UTF-8 encoded string was used with 154e1051a39Sopenharmony_ciOpenSSL older than 1.1.0. 155e1051a39Sopenharmony_ci(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3 156e1051a39Sopenharmony_ci0x83 0xC2 0xAF when re-encoded in the naïve manner. 157e1051a39Sopenharmony_ciThe conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the 158e1051a39Sopenharmony_cierroneous/non-compliant encoding used by OpenSSL older than 1.1.0) 159e1051a39Sopenharmony_ci 160e1051a39Sopenharmony_ci=back 161e1051a39Sopenharmony_ci 162e1051a39Sopenharmony_ci=head1 SEE ALSO 163e1051a39Sopenharmony_ci 164e1051a39Sopenharmony_ciL<evp(7)>, 165e1051a39Sopenharmony_ciL<ossl_store(7)>, 166e1051a39Sopenharmony_ciL<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>, 167e1051a39Sopenharmony_ciL<PEM_do_header(3)>, 168e1051a39Sopenharmony_ciL<PKCS12_parse(3)>, L<PKCS12_newpass(3)>, 169e1051a39Sopenharmony_ciL<d2i_PKCS8PrivateKey_bio(3)> 170e1051a39Sopenharmony_ci 171e1051a39Sopenharmony_ci=head1 COPYRIGHT 172e1051a39Sopenharmony_ci 173e1051a39Sopenharmony_ciCopyright 2018-2021 The OpenSSL Project Authors. All Rights Reserved. 174e1051a39Sopenharmony_ci 175e1051a39Sopenharmony_ciLicensed under the Apache License 2.0 (the "License"). You may not use 176e1051a39Sopenharmony_cithis file except in compliance with the License. You can obtain a copy 177e1051a39Sopenharmony_ciin the file LICENSE in the source distribution or at 178e1051a39Sopenharmony_ciL<https://www.openssl.org/source/license.html>. 179e1051a39Sopenharmony_ci 180e1051a39Sopenharmony_ci=cut 181