Character Encoding

From FileZilla Wiki
Jump to navigationJump to search

tE0TAi jr39ug7djalfgpitg94gbvm

Technical details

The FTP protocol is specified in RFC 959, which got published in 1985. The FTP protocol is designed on top of the original Telnet protocol, which is specified in RFC 854. The relevant sections of the Telnet specification regarding FTP are those covering the Network Virtual Terminal (NVT). According to RFC 854, the NVT requires to use 7-bit ASCII as character set with any other character set being subject of explicit negotiation. This character set only contains 127 different characters: English letters and numbers, punctuation characters and a few control characters. Accented letters, umlauts or other scripts are not contained in the ASCII character set.

In order to support non-English characters, the FTP specifications have been extended 1999 in RFC 2640. This extension requires the use of UTF-8 as character set. This character is a strict superset of ASCII, every valid ascii character is also the same character in UTF-8. The UTF-8 character set can display any valid Unicode character. That includes umlauts, accented letters and also different scripts. This extension is fully backwards compatible. As long as you're not using any non-English characters, it doesn't matter if the used software supports RFC 2640 or not. Note that if you used non-English characters before using RFC 2640 compatible software, there will be problems. Problems which are entirely self-made by not obeying the specifications.