1e1051a39Sopenharmony_ci=pod
2e1051a39Sopenharmony_ci
3e1051a39Sopenharmony_ci=encoding utf8
4e1051a39Sopenharmony_ci
5e1051a39Sopenharmony_ci=head1 NAME
6e1051a39Sopenharmony_ci
7e1051a39Sopenharmony_cipassphrase-encoding
8e1051a39Sopenharmony_ci- How diverse parts of OpenSSL treat pass phrases character encoding
9e1051a39Sopenharmony_ci
10e1051a39Sopenharmony_ci=head1 DESCRIPTION
11e1051a39Sopenharmony_ci
12e1051a39Sopenharmony_ciIn a modern world with all sorts of character encodings, the treatment of pass
13e1051a39Sopenharmony_ciphrases has become increasingly complex.
14e1051a39Sopenharmony_ciThis manual page attempts to give an overview over how this problem is
15e1051a39Sopenharmony_cicurrently addressed in different parts of the OpenSSL library.
16e1051a39Sopenharmony_ci
17e1051a39Sopenharmony_ci=head2 The general case
18e1051a39Sopenharmony_ci
19e1051a39Sopenharmony_ciThe OpenSSL library doesn't treat pass phrases in any special way as a general
20e1051a39Sopenharmony_cirule, and trusts the application or user to choose a suitable character set
21e1051a39Sopenharmony_ciand stick to that throughout the lifetime of affected objects.
22e1051a39Sopenharmony_ciThis means that for an object that was encrypted using a pass phrase encoded in
23e1051a39Sopenharmony_ciISO-8859-1, that object needs to be decrypted using a pass phrase encoded in
24e1051a39Sopenharmony_ciISO-8859-1.
25e1051a39Sopenharmony_ciUsing the wrong encoding is expected to cause a decryption failure.
26e1051a39Sopenharmony_ci
27e1051a39Sopenharmony_ci=head2 PKCS#12
28e1051a39Sopenharmony_ci
29e1051a39Sopenharmony_ciPKCS#12 is a bit different regarding pass phrase encoding.
30e1051a39Sopenharmony_ciThe standard stipulates that the pass phrase shall be encoded as an ASN.1
31e1051a39Sopenharmony_ciBMPString, which consists of the code points of the basic multilingual plane,
32e1051a39Sopenharmony_ciencoded in big endian (UCS-2 BE).
33e1051a39Sopenharmony_ci
34e1051a39Sopenharmony_ciOpenSSL tries to adapt to this requirements in one of the following manners:
35e1051a39Sopenharmony_ci
36e1051a39Sopenharmony_ci=over 4
37e1051a39Sopenharmony_ci
38e1051a39Sopenharmony_ci=item 1.
39e1051a39Sopenharmony_ci
40e1051a39Sopenharmony_ciTreats the received pass phrase as UTF-8 encoded and tries to re-encode it to
41e1051a39Sopenharmony_ciUTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000
42e1051a39Sopenharmony_cito U+FFFF, but becomes an expansion for any other character), or failing that,
43e1051a39Sopenharmony_ciproceeds with step 2.
44e1051a39Sopenharmony_ci
45e1051a39Sopenharmony_ci=item 2.
46e1051a39Sopenharmony_ci
47e1051a39Sopenharmony_ciAssumes that the pass phrase is encoded in ASCII or ISO-8859-1 and
48e1051a39Sopenharmony_ciopportunistically prepends each byte with a zero byte to obtain the UCS-2
49e1051a39Sopenharmony_ciencoding of the characters, which it stores as a BMPString.
50e1051a39Sopenharmony_ci
51e1051a39Sopenharmony_ciNote that since there is no check of your locale, this may produce UCS-2 /
52e1051a39Sopenharmony_ciUTF-16 characters that do not correspond to the original pass phrase characters
53e1051a39Sopenharmony_cifor other character sets, such as any ISO-8859-X encoding other than
54e1051a39Sopenharmony_ciISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical"
55e1051a39Sopenharmony_cicharacters in the 0x80-0x9F range).
56e1051a39Sopenharmony_ci
57e1051a39Sopenharmony_ci=back
58e1051a39Sopenharmony_ci
59e1051a39Sopenharmony_ciOpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why
60e1051a39Sopenharmony_ciOpenSSL still does this, to be able to read files produced with older versions.
61e1051a39Sopenharmony_ci
62e1051a39Sopenharmony_ciIt should be noted that this approach isn't entirely fault free.
63e1051a39Sopenharmony_ci
64e1051a39Sopenharmony_ciA pass phrase encoded in ISO-8859-2 could very well have a sequence such as
65e1051a39Sopenharmony_ci0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE"
66e1051a39Sopenharmony_ciand "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would
67e1051a39Sopenharmony_cibe misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN
68e1051a39Sopenharmony_ciSMALL LETTER I WITH DIAERESIS) I<if the pass phrase doesn't contain anything that
69e1051a39Sopenharmony_ciwould be invalid UTF-8>.
70e1051a39Sopenharmony_ciA pass phrase that contains this kind of byte sequence will give a different
71e1051a39Sopenharmony_cioutcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.
72e1051a39Sopenharmony_ci
73e1051a39Sopenharmony_ci 0x00 0xC3 0x00 0xAF                    # OpenSSL older than 1.1.0
74e1051a39Sopenharmony_ci 0x00 0xEF                              # OpenSSL 1.1.0 and newer
75e1051a39Sopenharmony_ci
76e1051a39Sopenharmony_ciOn the same accord, anything encoded in UTF-8 that was given to OpenSSL older
77e1051a39Sopenharmony_cithan 1.1.0 was misinterpreted as ISO-8859-1 sequences.
78e1051a39Sopenharmony_ci
79e1051a39Sopenharmony_ci=head2 OSSL_STORE
80e1051a39Sopenharmony_ci
81e1051a39Sopenharmony_ciL<ossl_store(7)> acts as a general interface to access all kinds of objects,
82e1051a39Sopenharmony_cipotentially protected with a pass phrase, a PIN or something else.
83e1051a39Sopenharmony_ciThis API stipulates that pass phrases should be UTF-8 encoded, and that any
84e1051a39Sopenharmony_ciother pass phrase encoding may give undefined results.
85e1051a39Sopenharmony_ciThis API relies on the application to ensure UTF-8 encoding, and doesn't check
86e1051a39Sopenharmony_cithat this is the case, so what it gets, it will also pass to the underlying
87e1051a39Sopenharmony_ciloader.
88e1051a39Sopenharmony_ci
89e1051a39Sopenharmony_ci=head1 RECOMMENDATIONS
90e1051a39Sopenharmony_ci
91e1051a39Sopenharmony_ciThis section assumes that you know what pass phrase was used for encryption,
92e1051a39Sopenharmony_cibut that it may have been encoded in a different character encoding than the
93e1051a39Sopenharmony_cione used by your current input method.
94e1051a39Sopenharmony_ciFor example, the pass phrase may have been used at a time when your default
95e1051a39Sopenharmony_ciencoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61
96e1051a39Sopenharmony_ci0xEF 0x76 0x65), and you're now in an environment where your default encoding
97e1051a39Sopenharmony_ciis UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76
98e1051a39Sopenharmony_ci0x65).
99e1051a39Sopenharmony_ciWhenever it's mentioned that you should use a certain character encoding, it
100e1051a39Sopenharmony_cishould be understood that you either change the input method to use the
101e1051a39Sopenharmony_cimentioned encoding when you type in your pass phrase, or use some suitable tool
102e1051a39Sopenharmony_cito convert your pass phrase from your default encoding to the target encoding.
103e1051a39Sopenharmony_ci
104e1051a39Sopenharmony_ciAlso note that the sub-sections below discuss human readable pass phrases.
105e1051a39Sopenharmony_ciThis is particularly relevant for PKCS#12 objects, where human readable pass
106e1051a39Sopenharmony_ciphrases are assumed.
107e1051a39Sopenharmony_ciFor other objects, it's as legitimate to use any byte sequence (such as a
108e1051a39Sopenharmony_cisequence of bytes from F</dev/urandom> that's been saved away), which makes any
109e1051a39Sopenharmony_cicharacter encoding discussion irrelevant; in such cases, simply use the same
110e1051a39Sopenharmony_cibyte sequence as it is.
111e1051a39Sopenharmony_ci
112e1051a39Sopenharmony_ci=head2 Creating new objects
113e1051a39Sopenharmony_ci
114e1051a39Sopenharmony_ciFor creating new pass phrase protected objects, make sure the pass phrase is
115e1051a39Sopenharmony_ciencoded using UTF-8.
116e1051a39Sopenharmony_ciThis is default on most modern Unixes, but may involve an effort on other
117e1051a39Sopenharmony_ciplatforms.
118e1051a39Sopenharmony_ciSpecifically for Windows, setting the environment variable
119e1051a39Sopenharmony_ciB<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt
120e1051a39Sopenharmony_ciconverted to UTF-8 (command line and separately prompted pass phrases alike).
121e1051a39Sopenharmony_ci
122e1051a39Sopenharmony_ci=head2 Opening existing objects
123e1051a39Sopenharmony_ci
124e1051a39Sopenharmony_ciFor opening pass phrase protected objects where you know what character
125e1051a39Sopenharmony_ciencoding was used for the encryption pass phrase, make sure to use the same
126e1051a39Sopenharmony_ciencoding again.
127e1051a39Sopenharmony_ci
128e1051a39Sopenharmony_ciFor opening pass phrase protected objects where the character encoding that was
129e1051a39Sopenharmony_ciused is unknown, or where the producing application is unknown, try one of the
130e1051a39Sopenharmony_cifollowing:
131e1051a39Sopenharmony_ci
132e1051a39Sopenharmony_ci=over 4
133e1051a39Sopenharmony_ci
134e1051a39Sopenharmony_ci=item 1.
135e1051a39Sopenharmony_ci
136e1051a39Sopenharmony_ciTry the pass phrase that you have as it is in the character encoding of your
137e1051a39Sopenharmony_cienvironment.
138e1051a39Sopenharmony_ciIt's possible that its byte sequence is exactly right.
139e1051a39Sopenharmony_ci
140e1051a39Sopenharmony_ci=item 2.
141e1051a39Sopenharmony_ci
142e1051a39Sopenharmony_ciConvert the pass phrase to UTF-8 and try with the result.
143e1051a39Sopenharmony_ciSpecifically with PKCS#12, this should open up any object that was created
144e1051a39Sopenharmony_ciaccording to the specification.
145e1051a39Sopenharmony_ci
146e1051a39Sopenharmony_ci=item 3.
147e1051a39Sopenharmony_ci
148e1051a39Sopenharmony_ciDo a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try
149e1051a39Sopenharmony_ciwith the result.
150e1051a39Sopenharmony_ciThis differs from the previous attempt because ISO-8859-1 maps directly to
151e1051a39Sopenharmony_ciU+0000 to U+00FF, which other non-UTF-8 character sets do not.
152e1051a39Sopenharmony_ci
153e1051a39Sopenharmony_ciThis also takes care of the case when a UTF-8 encoded string was used with
154e1051a39Sopenharmony_ciOpenSSL older than 1.1.0.
155e1051a39Sopenharmony_ci(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3
156e1051a39Sopenharmony_ci0x83 0xC2 0xAF when re-encoded in the naïve manner.
157e1051a39Sopenharmony_ciThe conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the
158e1051a39Sopenharmony_cierroneous/non-compliant encoding used by OpenSSL older than 1.1.0)
159e1051a39Sopenharmony_ci
160e1051a39Sopenharmony_ci=back
161e1051a39Sopenharmony_ci
162e1051a39Sopenharmony_ci=head1 SEE ALSO
163e1051a39Sopenharmony_ci
164e1051a39Sopenharmony_ciL<evp(7)>,
165e1051a39Sopenharmony_ciL<ossl_store(7)>,
166e1051a39Sopenharmony_ciL<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>,
167e1051a39Sopenharmony_ciL<PEM_do_header(3)>,
168e1051a39Sopenharmony_ciL<PKCS12_parse(3)>, L<PKCS12_newpass(3)>,
169e1051a39Sopenharmony_ciL<d2i_PKCS8PrivateKey_bio(3)>
170e1051a39Sopenharmony_ci
171e1051a39Sopenharmony_ci=head1 COPYRIGHT
172e1051a39Sopenharmony_ci
173e1051a39Sopenharmony_ciCopyright 2018-2021 The OpenSSL Project Authors. All Rights Reserved.
174e1051a39Sopenharmony_ci
175e1051a39Sopenharmony_ciLicensed under the Apache License 2.0 (the "License").  You may not use
176e1051a39Sopenharmony_cithis file except in compliance with the License.  You can obtain a copy
177e1051a39Sopenharmony_ciin the file LICENSE in the source distribution or at
178e1051a39Sopenharmony_ciL<https://www.openssl.org/source/license.html>.
179e1051a39Sopenharmony_ci
180e1051a39Sopenharmony_ci=cut
181