Signed-off-by: Manuel Pégourié-Gonnard <manuel.pegourie-gonnard@arm.com>
26 KiB
This document explains the strategy that was used so far in starting the migration to PSA Crypto and mentions future perspectives and open questions.
Goals
Several benefits are expected from migrating to PSA Crypto:
G1. Use PSA Crypto drivers when available. G2. Allow isolation of long-term secrets (for example, private keys). G3. Allow isolation of short-term secrets (for example, TLS session keys). G4. Have a clean, unified API for Crypto (retire the legacy API). G5. Code size: compile out our implementation when a driver is available.
As of Mbed TLS 3.2, most of (G1) and all of (G2) is implemented when
MBEDTLS_USE_PSA_CRYPTO
is enabled. For (G2) to take effect, the application
needs to be changed to use new APIs. For a more detailed account of what's
implemented, see docs/use-psa-crypto.md
, where new APIs are about (G2), and
internal changes implement (G1).
Generally speaking, the numbering above doesn't mean that each goal requires the preceding ones to be completed.
Compile-time options
We currently have two compile-time options that are relevant to the migration:
MBEDTLS_PSA_CRYPTO_C
- enabled by default, controls the presence of the PSA Crypto APIs.MBEDTLS_USE_PSA_CRYPTO
- disabled by default (enabled in "full" config), controls usage of PSA Crypto APIs to perform operations in X.509 and TLS (G1 above), as well as the availability of some new APIs (G2 above).PSA_CRYPTO_CONFIG
- disabled by default, supports builds with drivers and without the corresponding software implementation (G5 above).
The reasons why MBEDTLS_USE_PSA_CRYPTO
is optional and disabled by default
are:
- it's incompatible with
MBEDTLS_ECP_RESTARTABLE
; - to avoid a hard/default dependency of TLS, X.509 and PK on
MBEDTLS_PSA_CRYPTO_C
, for backward compatibility reasons:- When
MBEDTLS_PSA_CRYPTO_C
is enabled and used, applications need to callpsa_crypto_init()
before TLS/X.509 uses PSA functions. (This prevents us from even enabling the option by default.) MBEDTLS_PSA_CRYPTO_C
has a hard dependency onMBEDTLS_ENTROPY_C || MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG
but it's currently possible to compile TLS and X.509 without any of the options. Also, we can't just auto-enableMBEDTLS_ENTROPY_C
as it doesn't build out of the box on all platforms, and even lessMBEDTLS_PSA_CRYPTO_EXTERNAL_RNG
as it requires a user-provided RNG function.
- When
The downside of this approach is that until we are able to make
MBDEDTLS_USE_PSA_CRYPTO
non-optional (always enabled), we have to maintain
two versions of some parts of the code: one using PSA, the other using the
legacy APIs. However, see next section for strategies that can lower that
cost. The rest of this section explains the reasons for the
incompatibilities mentioned above.
At the time of writing (early 2022) it is unclear what could be done about the backward compatibility issues, and in particular if the cost of implementing solutions to these problems would be higher or lower than the cost of maintaining dual code paths until the next major version. (Note: these solutions would probably also solve other problems at the same time.)
MBEDTLS_ECP_RESTARTABLE
Currently this option controls not only the presence of restartable APIs in the crypto library, but also their use in the TLS and X.509 layers. Since PSA Crypto does not support restartable operations, there's a clear conflict: the TLS and X.509 layers can't both use only PSA APIs and get restartable behaviour.
Supporting this in PSA is on our roadmap and currently planned for end of 2022, see https://github.com/orgs/Mbed-TLS/projects/1#column-18883250.
It will then require follow-up work to make use of the new PSA API in PK/X.509/TLS in all places where we currently allow restartable operations.
Backward compatibility issues with making MBEDTLS_USE_PSA_CRYPTO
always on
- Existing applications may not be calling
psa_crypto_init()
before using TLS, X.509 or PK. We can try to work around that by calling (the relevant part of) it ourselves under the hood as needed, but that would likely require splitting init between the parts that can fail and the parts that can't (see https://github.com/ARM-software/psa-crypto-api/pull/536 for that). - It's currently not possible to enable
MBEDTLS_PSA_CRYPTO_C
in configurations that don't haveMBEDTLS_ENTROPY_C
, and we can't just auto-enable the latter, as it won't build or work out of the box on all platforms. There are two kinds of things we'd need to do if we want to work around that:- Make it possible to enable the parts of PSA Crypto that don't require an
RNG (typically, public key operations, symmetric crypto, some key
management functions (destroy etc)) in configurations that don't have
ENTROPY_C
. This requires going through the PSA code base to adjust dependencies. Risk: there may be annoying dependencies, some of which may be surprising. - For operations that require an RNG, provide an alternative function
accepting an explicit
f_rng
parameter (see #5238), that would be available in entropy-less builds. (Then code using those functions still needs to have one version using it, for entropy-less builds, and one version using the standard function, for driver support in build with entropy.)
- Make it possible to enable the parts of PSA Crypto that don't require an
RNG (typically, public key operations, symmetric crypto, some key
management functions (destroy etc)) in configurations that don't have
See https://github.com/Mbed-TLS/mbedtls/issues/5156.
Taking advantage of the existing abstractions layers - or not
The Crypto library in Mbed TLS currently has 3 abstraction layers that offer algorithm-agnostic APIs for a class of algorithms:
- MD for messages digests aka hashes (including HMAC)
- Cipher for symmetric ciphers (included AEAD)
- PK for asymmetric (aka public-key) cryptography (excluding key exchange)
Note: key exchange (FFDH, ECDH) is not covered by an abstraction layer.
These abstraction layers typically provide, in addition to the API for crypto
operations, types and numerical identifiers for algorithms (for
example mbedtls_cipher_mode_t
and its values). The
current strategy is to keep using those identifiers in most of the code, in
particular in existing structures and public APIs, even when
MBEDTLS_USE_PSA_CRYPTO
is enabled. (This is not an issue for G1, G2, G3
above, and is only potentially relevant for G4.)
The are multiple strategies that can be used regarding the place of those layers in the migration to PSA.
Silently call to PSA from the abstraction layer
- Provide a new definition (conditionally on
USE_PSA_CRYPTO
) of wrapper functions in the abstraction layer, that calls PSA instead of the legacy crypto API. - Upside: changes contained to a single place, no need to change TLS or X.509 code anywhere.
- Downside: tricky to implement if the PSA implementation is currently done on top of that layer (dependency loop).
This strategy is currently (early 2022) used for all operations in the PK layer.
This strategy is not very well suited to the Cipher layer, as the PSA implementation is currently done on top of that layer.
This strategy will probably be used for some time for the PK layer, while we
figure out what the future of that layer is: parts of it (parse/write, ECDSA
signatures in the format that X.509 & TLS want) are not covered by PSA, so
they will need to keep existing in some way. (Also, the PK layer is a good
place for dispatching to either PSA or mbedtls_xxx_restartable
while that
part is not covered by PSA yet, if we decide to do that.)
Replace calls for each operation
- For every operation that's done through this layer in TLS or X.509, just
replace function call with calls to PSA (conditionally on
USE_PSA_CRYPTO
) - Upside: conceptually simple, and if the PSA implementation is currently done on top of that layer, avoids concerns about dependency loops.
- Upside: opens the door to building TLS/X.509 without that layer, saving some code size.
- Downside: TLS/X.509 code has to be done for each operation.
This strategy is currently (early 2022) used for the MD layer and the Cipher layer.
Opt-in use of PSA from the abstraction layer
- Provide a new way to set up a context that causes operations on that context to be done via PSA.
- Upside: changes mostly contained in one place, TLS/X.509 code only needs to
be changed when setting up the context, but not when using it. In
particular, no changes to/duplication of existing public APIs that expect a
key to be passed as a context of this layer (eg,
mbedtls_pk_context
). - Upside: avoids dependency loop when PSA implemented on top of that layer.
- Downside: when the context is typically set up by the application, requires changes in application code.
This strategy is not useful when no context is used, for example with the
one-shot function mbedtls_md()
.
There are two variants of this strategy: one where using the new setup function also allows for key isolation (the key is only held by PSA, supporting both G1 and G2 in that area), and one without isolation (the key is still stored outside of PSA most of the time, supporting only G1).
This strategy, with support for key isolation, is currently (early 2022) used for
private-key operations in the PK layer - see mbedtls_pk_setup_opaque()
. This
allows use of PSA-held private ECDSA keys in TLS and X.509 with no change to
the TLS/X.509 code, but a contained change in the application.
This strategy, without key isolation, was also previously used (until 3.1
included) in the Cipher layer - see mbedtls_cipher_setup_psa()
. This allowed
use of PSA for cipher operations in TLS with no change to the application
code, and a contained change in TLS code. (It only supported a subset of
ciphers.)
Note: for private key operations in the PK layer, both the "silent" and the "opt-in" strategy can apply, and can complement each other, as one provides support for key isolation, but at the (unavoidable) code of change in application code, while the other requires no application change to get support for drivers, but fails to provide isolation support.
Summary
Strategies currently (early 2022) used with each abstraction layer:
- PK (for G1): silently call PSA
- PK (for G2): opt-in use of PSA (new key type)
- Cipher (G1): replace calls at each call site
- MD (G1): replace calls at each call site
Supporting builds with drivers without the software implementation
This section presents a plan towards G5: save code size by compiling out our software implementation when a driver is available.
Additionally, we want to save code size by compiling out the
abstractions layers that we are not using when MBEDTLS_USE_PSA_CRYPTO
is
enabled (see previous section): MD and Cipher.
Let's expand a bit on the definition of the goal: in such a configuration (driver used, software implementation and abstraction layer compiled out), we want:
a. the library to build in a reasonably-complete configuration, b. with all tests passing, c. and no more tests skipped than the same configuration with software implementation.
Criterion (c) ensures not only test coverage, but that driver-based builds are at feature parity with software-based builds.
We can roughly divide the work needed to get there in the following steps:
- Have a working driver interface for the algorithms we want to replace.
- Have users of these algorithms call to PSA, not the legacy API, for all
operations. (This is G1, and for PK, X.509 and TLS this is controlled by
MBEDTLS_USE_PSA_CRYPTO
.) This needs to be done in the library and tests. - Have users of these algorithms not depend on the legacy API for information management (getting a size for a given algorithm, etc.)
- Adapt compile-time guards used to query availability of a given algorithm; this needs to be done in the library (for crypto operations and data) and tests.
Note: the first two steps enable use of drivers, but not by themselves removal of the software implementation.
Note: the fact that step 1 is not achieved for all of libmbedcrypto (see below) is the reason why criterion (a) has "a reasonably-complete configuration", to allow working around internal crypto dependencies when working on other parts such as X.509 and TLS - for example, a configuration without RSA PKCS#1 v2.1 still allows reasonable use of X.509 and TLS.
Note: this is a conceptual division that will sometimes translate to how the work is divided into PRs, sometimes not. For example, in situations where it's not possible to achieve good test coverage at the end of step 1 or step 2, it is preferable to group with the next step(s) in the same PR until good test coverage can be reached.
Status as of Mbed TLS 3.2:
- Step 0 is achieved for most algorithms, with only a few gaps remaining.
- Step 1 is achieved for most of PK, X.509, and TLS when
MBEDTLS_USE_PSA_CRYPTO
is enabled with only a few gaps remaining (see docs/use-psa-crypto.md). - Step 1 is not achieved for a lot of the crypto library including the PSA
core. For example,
entropy.c
calls the legacy APImbedtls_sha256
(ormbedtls_sha512
optionally);hmac_drbg.c
calls the legacy APImbedtls_md
andctr_drbg.c
calls the legacy APImbedtls_aes
; the PSA core depends on the entropy module and at least one of the DRBG modules (unlessMBEDTLS_PSA_CRYPTO_EXTERNAL_RNG
is used). Further, several crypto modules have similar issues, for example RSA PKCS#1 v2.1 callsmbedtls_md
directly. - Step 2 is achieved for most of X.509 and TLS (same gaps as step 1) when
MBEDTLS_USE_PSA_CRYPTO
is enabled - this was tasks like #5795, #5796, #5797. It is being done in PK and RSA PKCS#1 v1.5 by PR #6065. - Step 3 was mostly not started at all before 3.2; it is being done for PK by PR #6065.
Strategy for step 1:
Regarding PK, X.509, and TLS, this is mostly achieved with only a few gaps. (The strategy was outlined in the previous section.)
Regarding libmbedcrypto, outside of the RNG subsystem, for modules that
currently depend on other legacy crypto modules, this can be achieved without
backwards compatibility issues, by using the software implementation if
available, and "falling back" to PSA only if it's not. The compile-time
dependency changes from the current one (say, MD_C
or AES_C
) to "the
previous dependency OR PSA Crypto with needed algorithms". When building
without software implementation, users need to call psa_crypto_init()
before
calling any function from these modules. This condition does not constitute a
break of backwards compatibility, as it was previously impossible to build in
those configurations, and in configurations were the build was possible,
application code keeps working unchanged. An work-in-progress example of
applying this strategy, for RSA PKCS#1 v2.1, is here:
https://github.com/Mbed-TLS/mbedtls/pull/6141
There is a problem with the modules used for the PSA RNG, as currently the RNG is initialized before drivers and the key store. This part will need further study, but in the meantime we can proceed with everything that's not the entropy module of one of the DRBG modules, and that does not depend on one of those modules.
Strategy for step 2:
The most satisfying situation here is when we can just use the PSA Crypto API
for information management as well. However sometimes it may not be
convenient, for example in parts of the code that accept old-style identifiers
(such as mbedtls_md_type_t
) in their API and can't assume PSA to be
compiled in (such as rsa.c
).
It is suggested that, as a temporary solution until we clean this up
later when removing the legacy API including its identifiers (G4), we may
occasionally use ad-hoc internal functions, such as the ones introduced by PR
6065 in library/hash_info.[ch]
.
An alternative would be to have two different code paths depending on whether
MBEDTLS_PSA_CRYPTO_C
is defined or not. However this is not great for
readability or testability.
Strategy for step 3:
There are currently two (complementary) ways for crypto-using code to check if a
particular algorithm is supported: using MBEDTLS_xxx
macros, and using
PSA_WANT_xxx
macros. For example, PSA-based code that want to use SHA-256
will check for PSA_WANT_ALG_SHA_256
, while legacy-based code that wants to
use SHA-256 will check for MBEDTLS_SHA256_C
if using the mbedtls_sha256
API, or for MBEDTLS_MD_C && MBEDTLS_SHA256_C
if using the mbedtls_md
API.
Code that obeys MBEDTLS_USE_PSA_CRYPTO
will want to use one of the two
dependencies above depending on whether MBEDTLS_USE_PSA_CRYPTO
is defined:
if it is, the code want the algorithm available in PSA, otherwise, it wants it
available via the legacy API(s) is it using (MD and/or low-level).
The strategy for steps 1 and 2 above will introduce new situations: code that currently compute hashes using MD (resp. a low-level hash module) will gain the ability to "fall back" to using PSA if the legacy dependency isn't available. Data related to a certain hash (OID, sizes, translations) should only be included in the build if it is possible to use that hash in some way.
In order to cater to these new needs, new families of macros are introduced in
legacy_or_psa.h
, see its documentation for details.
It should be noted that there are currently:
- too many different ways of computing a hash (low-level, MD, PSA);
- too many different ways to configure the library that influence which of
these ways is available and will be used (
MBEDTLS_USE_PSA_CRYPTO
,MBEDTLS_PSA_CRYPTO_CONFIG
,mbedtls_config.h
+psa/crypto_config.h
).
As a result, we need more families of dependency macros than we'd like to. This is a temporary situation until we move to a place where everything is based on PSA Crypto. In the meantime, long and explicit names where chosen for the new macros in the hope of avoiding confusion.
Note: the new macros supplement but do not replace the existing macros:
- code that always uses PSA Crypto (for example, code specific to TLS 1.3)
should use
PSA_WANT_xxx
; - code that always uses the legacy API (for example, crypto modules that have
not undergone step 1 yet) should use
MBEDTLS_xxx_C
; - code that may use one of the two APIs, either based on
MBEDTLS_USE_PSA_CRYPTO
(X.509, TLS 1.2, shared between TLS 1.2 and 1.3), or based on availability (crypto modules after step 1), should use one of the new macros fromlegacy_or_psa.h
.
Executing step 3 will mostly consist of using the right dependency macros in the right places (once the previous steps are done).
Note on testing
Since supporting driver-only builds is not about adding features, but about
supporting existing features in new types of builds, testing will not involve
adding cases to the test suites, but instead adding new components in all.sh
that build and run tests in newly-supported configurations. For example, if
we're making some part of the library work with hashes provided only by
drivers when MBEDTLS_USE_PSA_CRYPTO
is defined, there should be a place in
all.sh
that builds and run tests in such a configuration.
There is however a risk, especially in step 3 where we change how dependencies are expressed (sometimes in bulk), to get things wrong in a way that would result in more tests being skipped, which is easy to miss. Care must be taken to ensure this does not happen. The following criteria can be used:
- The sets of tests skipped in the default config and the full config must be
the same before and after the PR that implements step 3. This is tested
manually for each PR that changes dependency declarations by using the script
outcome-analysis.sh
in the present directory. - The set of tests skipped in the driver-only build is the same as in an
equivalent software-based configuration. This is tested automatically by the
CI in the "Results analysis" stage, by running
tests/scripts/analyze_outcomes.py
. See theanalyze_driver_vs_reference_xxx
actions in the script and the comments above their declaration for how to do that locally.
Migrating away from the legacy API
This section briefly introduces questions and possible plans towards G4, mainly as they relate to choices in previous stages.
The role of the PK/Cipher/MD APIs in user migration
We're currently taking advantage of the existing PK layer in order to reduce the number of places where library code needs to be changed. It's only natural to consider using the same strategy (with the PK, MD and Cipher layers) for facilitating migration of application code.
Note: a necessary first step for that would be to make sure PSA is no longer implemented of top of the concerned layers
Zero-cost compatibility layer?
The most favourable case is if we can have a zero-cost abstraction (no
runtime, RAM usage or code size penalty), for example just a bunch of
#define
s, essentially mapping mbedtls_
APIs to their psa_
equivalent.
Unfortunately that's unlikely to fully work. For example, the MD layer uses the same context type for hashes and HMACs, while the PSA API (rightfully) has distinct operation types. Similarly, the Cipher layer uses the same context type for unauthenticated and AEAD ciphers, which again the PSA API distinguishes.
It is unclear how much value, if any, a zero-cost compatibility layer that's incomplete (for example, for MD covering only hashes, or for Cipher covering only AEAD) or differs significantly from the existing API (for example, introducing new context types) would provide to users.
Low-cost compatibility layers?
Another possibility is to keep most or all of the existing API for the PK, MD
and Cipher layers, implemented on top of PSA, aiming for the lowest possible
cost. For example, mbedtls_md_context_t
would be defined as a (tagged) union
of psa_hash_operation_t
and psa_mac_operation_t
, then mbedtls_md_setup()
would initialize the correct part, and the rest of the functions be simple
wrappers around PSA functions. This would vastly reduce the complexity of the
layers compared to the existing (no need to dispatch through function
pointers, just call the corresponding PSA API).
Since this would still represent a non-zero cost, not only in terms of code size, but also in terms of maintenance (testing, etc.) this would probably be a temporary solution: for example keep the compatibility layers in 4.0 (and make them optional), but remove them in 5.0.
Again, this provides the most value to users if we can manage to keep the existing API unchanged. Their might be conflicts between this goal and that of reducing the cost, and judgment calls may need to be made.
Note: when it comes to holding public keys in the PK layer, depending on how
the rest of the code is structured, it may be worth holding the key data in
memory controlled by the PK layer as opposed to a PSA key slot, moving it to a
slot only when needed (see current ecdsa_verify_wrap
when
MBEDTLS_USE_PSA_CRYPTO
is defined) For example, when parsing a large
number, N, of X.509 certificates (for example the list of trusted roots), it
might be undesirable to use N PSA key slots for their public keys as long as
the certs are loaded. OTOH, this could also be addressed by merging the "X.509
parsing on-demand" (#2478), and then the public key data would be held as
bytes in the X.509 CRT structure, and only moved to a PK context / PSA slot
when it's actually used.
Note: the PK layer actually consists of two relatively distinct parts: crypto operations, which will be covered by PSA, and parsing/writing (exporting) from/to various formats, which is currently not fully covered by the PSA Crypto API.
Algorithm identifiers and other identifiers
It should be easy to provide the user with a bunch of #define
s for algorithm
identifiers, for example #define MBEDTLS_MD_SHA256 PSA_ALG_SHA_256
; most of
those would be in the MD, Cipher and PK compatibility layers mentioned above,
but there might be some in other modules that may be worth considering, for
example identifiers for elliptic curves.
Lower layers
Generally speaking, we would retire all of the low-level, non-generic modules, such as AES, SHA-256, RSA, DHM, ECDH, ECP, bignum, etc, without providing compatibility APIs for them. People would be encouraged to switch to the PSA API. (The compatibility implementation of the existing PK, MD, Cipher APIs would mostly benefit people who already used those generic APis rather than the low-level, alg-specific ones.)
APIs in TLS and X.509
Public APIs in TLS and X.509 may be affected by the migration in at least two ways:
-
APIs that rely on a legacy
mbedtls_
crypto type: for examplembedtls_ssl_conf_own_cert()
to configure a (certificate and the associated) private key. Currently the private key is passed as ambedtls_pk_context
object, which would probably change to apsa_key_id_t
. Since some users would probably still be using the compatibility PK layer, it would need a way to easily extract the PSA key ID from the PK context. -
APIs the accept list of identifiers: for example
mbedtls_ssl_conf_curves()
taking a list ofmbedtls_ecp_group_id
s. This could be changed to accept a list of pairs (psa_ecc_family_t
, size) but we should probably take this opportunity to move to a identifier independent from the underlying crypto implementation and use TLS-specific identifiers instead (based on IANA values or custom enums), as is currently done in the newmbedtls_ssl_conf_groups()
API, see #4859).
Testing
An question that needs careful consideration when we come around to removing the low-level crypto APIs and making PK, MD and Cipher optional compatibility layers is to be sure to preserve testing quality. A lot of the existing test cases use the low level crypto APIs; we would need to either keep using that API for tests, or manually migrate tests to the PSA Crypto API. Perhaps a combination of both, perhaps evolving gradually over time.