Reject "weird" characters in text files, especially control characters that
might be escape sequences or that might cause other text to appear garbled
(as in https://trojansource.codes/).
Also reject byte sequences that aren't valid UTF-8.
Accept only ASCII (except most control characters), letters, some non-ASCII
punctuation and some mathematical and technical symbols. This covers
everything that's currently present in Mbed TLS ( §áèéëñóöüłŽ–—’“”…≥).
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>