Fix bug with UDP proxy not forwarding enough

We previously introduced a safety check ensuring that if a datagram had
already been dropped twice, it would no longer be dropped or delayed
after that.

This missed an edge case: if a datagram is dropped once, it can be
delayed any number of times. Since "delay" is not defined in terms of
time (x seconds) but in terms of ordering with respect to other messages
(will be forwarded after the next message is forwarded), depending on
the RNG results this could result in an endless loop where all messages
are delayed until the next, which is itself delayed, etc. and no message
is ever forwarded.

The probability of this happening n times in a row is (1/d)^n, where d
is the value passed as delay=d, so for delay=5 and n=5 it's around 0.03%
which seems small but we still happened on such an occurrence in real
life:

    tests/ssl-opt.sh --seed 1625061502 -f 'DTLS proxy: 3d, min handshake, resumption$'

results (according to debug statements added for the investigation) in
the ClientHello of the second handshake being dropped once then delayed
5 times, after which the client stops re-trying and the test fails for
no interesting reason.

Make sure this doesn't happen again by putting a cap on the number of
times we fail to forward a given datagram immediately.

Signed-off-by: Manuel Pégourié-Gonnard <manuel.pegourie-gonnard@arm.com>
This commit is contained in:
Manuel Pégourié-Gonnard 2021-07-06 12:39:43 +02:00
parent bd3bfbf5c2
commit 71ce4ef981

View file

@ -681,26 +681,21 @@ int send_delayed()
} }
/* /*
* Avoid dropping or delaying a packet that was already dropped twice: this * Avoid dropping or delaying a packet that was already dropped or delayed
* only results in uninteresting timeouts. We can't rely on type to identify * ("held") twice: this only results in uninteresting timeouts. We can't rely
* packets, since during renegotiation they're all encrypted. So, rely on * on type to identify packets, since during renegotiation they're all
* size mod 2048 (which is usually just size). * encrypted. So, rely on size mod 2048 (which is usually just size).
*/ *
static unsigned char dropped[2048] = { 0 }; * We only hold packets at the level of entire datagrams, not at the level
#define DROP_MAX 2
/* We only drop packets at the level of entire datagrams, not at the level
* of records. In particular, if the peer changes the way it packs multiple * of records. In particular, if the peer changes the way it packs multiple
* records into a single datagram, we don't necessarily count the number of * records into a single datagram, we don't necessarily count the number of
* times a record has been dropped correctly. However, the only known reason * times a record has been held correctly. However, the only known reason
* why a peer would change datagram packing is disabling the latter on * why a peer would change datagram packing is disabling the latter on
* retransmission, in which case we'd drop involved records at most * retransmission, in which case we'd hold involved records at most
* DROP_MAX + 1 times. */ * HOLD_MAX + 1 times.
void update_dropped( const packet *p ) */
{ static unsigned char held[2048] = { 0 };
size_t id = p->len % sizeof( dropped ); #define HOLD_MAX 2
++dropped[id];
}
int handle_message( const char *way, int handle_message( const char *way,
mbedtls_net_context *dst, mbedtls_net_context *dst,
@ -727,7 +722,7 @@ int handle_message( const char *way,
cur.dst = dst; cur.dst = dst;
print_packet( &cur, NULL ); print_packet( &cur, NULL );
id = cur.len % sizeof( dropped ); id = cur.len % sizeof( held );
if( strcmp( way, "S <- C" ) == 0 ) if( strcmp( way, "S <- C" ) == 0 )
{ {
@ -769,10 +764,10 @@ int handle_message( const char *way,
! ( opt.protect_hvr && ! ( opt.protect_hvr &&
strcmp( cur.type, "HelloVerifyRequest" ) == 0 ) && strcmp( cur.type, "HelloVerifyRequest" ) == 0 ) &&
cur.len != (size_t) opt.protect_len && cur.len != (size_t) opt.protect_len &&
dropped[id] < DROP_MAX && held[id] < HOLD_MAX &&
rand() % opt.drop == 0 ) ) rand() % opt.drop == 0 ) )
{ {
update_dropped( &cur ); ++held[id];
} }
else if( ( opt.delay_ccs == 1 && else if( ( opt.delay_ccs == 1 &&
strcmp( cur.type, "ChangeCipherSpec" ) == 0 ) || strcmp( cur.type, "ChangeCipherSpec" ) == 0 ) ||
@ -782,9 +777,10 @@ int handle_message( const char *way,
! ( opt.protect_hvr && ! ( opt.protect_hvr &&
strcmp( cur.type, "HelloVerifyRequest" ) == 0 ) && strcmp( cur.type, "HelloVerifyRequest" ) == 0 ) &&
cur.len != (size_t) opt.protect_len && cur.len != (size_t) opt.protect_len &&
dropped[id] < DROP_MAX && held[id] < HOLD_MAX &&
rand() % opt.delay == 0 ) ) rand() % opt.delay == 0 ) )
{ {
++held[id];
delay_packet( &cur ); delay_packet( &cur );
} }
else else
@ -895,7 +891,7 @@ accept:
* 3. Forward packets forever (kill the process to terminate it) * 3. Forward packets forever (kill the process to terminate it)
*/ */
clear_pending(); clear_pending();
memset( dropped, 0, sizeof( dropped ) ); memset( held, 0, sizeof( held ) );
nb_fds = client_fd.fd; nb_fds = client_fd.fd;
if( nb_fds < server_fd.fd ) if( nb_fds < server_fd.fd )