Make invalid UTF-8 characters in strings non-fatal

STRLEN and STRSUB report the erroneous bytes

Fixes #848
This commit is contained in:
Rangi
2021-04-20 12:24:01 -04:00
committed by Eldred Habert
parent e596dbfc80
commit 4d21588eb2
5 changed files with 92 additions and 11 deletions

View File

@@ -0,0 +1,23 @@
; characters:
; 1: U+0061 a
; 2: U+00E4 a with diaresis (0xC3 0xA4)
; 3: U+0062 b
; 4: U+6F22 kanji (0xE6 0xBC 0xA2)
; 5: U+002C ,
; 6: U+0061 a
; 7: invalid byte 0xA3
; 8: invalid byte 0xA4
; 9: U+0062 b
; 10: invalid bytes 0xE6 0xF0
; 11: invalid byte 0xA2
; 12: U+0021 !
invalid EQUS "aäb漢,a<><61>b<EFBFBD><62><EFBFBD>!"
n = STRLEN("{invalid}")
copy EQUS STRSUB("{invalid}", 1)
println "\"{invalid}\" == \"{copy}\" ({d:n})"
mid1 EQUS STRSUB("{invalid}", 5, 2)
mid2 EQUS STRSUB("{invalid}", 9, 1)
println "\"{mid2}{mid1}\""