|
|
Line 1: |
Line 1: |
| == Overview ==
| | So that's the case? Quite a reetvlaion that is. |
| | |
| FTP is a rather old protocol and things we take for granted now were not even considered 20 years ago. One of these things is support for non-English characters. When the FTP protocol was designed, computers mostly spoke English and were unable to display any non-English characters like umlauts, accented letters or even completely different scripts like for example Chinese.
| |
| As such, the FTP protocol has been designed to be used with English characters only, namely 7-bit ASCII.
| |
| | |
| The problem is that many FTP clients and servers purposely violate the FTP specifications in order to support other, non-standard character sets. Which of these character sets are used is not subject to any negotiation and completely arbitrary. For any character set in existence, you can find a server using it with no way of detecting the proper encoding.
| |
| | |
| To solve this problem, the FTP protocol has been extended in a backwards compatible way to use UTF-8 as character set, which is a strict superset of the previously used character set. Note that this obviously can only be backwards compatible with servers in compliance with the original specifications.
| |
| | |
| If you have problems with filenames containing any foreign characters, this can have two reasons:
| |
| * The server or client follows the original specifications by the letter and rightfully rejects those filenames
| |
| * The server or client violates the specifications and uses a custom encoding
| |
| | |
| Note that both FileZilla Client and Server are fully compliant with the updated specifications and use UTF-8.
| |
| | |
| If you have problems with other clients or servers, please upgrade to FTP software capable of UTF-8 or refrain from using foreign characters. Anything else is in violation to the FTP specifications and does not work.
| |
| | |
| == Technical details ==
| |
| | |
| The FTP protocol is specified in [http://filezilla-project.org/specs/rfc0959.txt <nowiki>RFC 959</nowiki>], which got published in 1985. The FTP protocol is designed on top of the original Telnet protocol, which is specified in [http://filezilla-project.org/specs/rfc0854.txt <nowiki>RFC 854</nowiki>]. The relevant sections of the Telnet specification regarding FTP are those covering the Network Virtual Terminal (NVT).
| |
| According to [http://filezilla-project.org/specs/rfc0854.txt RFC 854], the NVT requires to use 7-bit ASCII as character set with any other character set being subject of explicit negotiation. This character set only contains 127 different characters: English letters and numbers, punctuation characters and a few control characters. Accented letters, umlauts or other scripts are not contained in the ASCII character set.
| |
| | |
| In order to support non-English characters, the FTP specifications have been extended 1999 in [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>]. This extension requires the use of UTF-8 as character set. This character set is a strict superset of ASCII, every valid ASCII character is also the same character in UTF-8. The UTF-8 character set can display any valid Unicode character. That includes umlauts, accented letters and also different scripts.
| |
| This extension is fully backwards compatible. As long as you're not using any non-English characters, it doesn't matter if the used software supports RFC 2640 or not. Note that if you used non-English characters before using RFC 2640 compatible software, there will be problems. Problems which are entirely self-made by not obeying the specifications.
| |
| | |
| === UTF8 feature negotiation ===
| |
| | |
| An [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>] compliant server ''must'' support the FEAT command and ''must'' include a line containing UTF8 in its response:
| |
| Command: FEAT
| |
| Response: 211-Features:
| |
| [...]
| |
| Response: UTF8
| |
| [...]
| |
| Response: 211 End
| |
| | |
| === Conflicting specification ===
| |
| | |
| There exists a long expired [http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 IETF draft] that is in conflict to [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>]. This draft also requires the FEAT response to include UTF8, but in addition requires the client to send '''OPTS UTF-8 ON''' to enable UTF-8 support.
| |
| | |
| If an [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>] compliant client sends '''OPTS UTF-8 ON''', it has to use UTF-8 regardless whether '''OPTS UTF-8 ON''' succeeds or not.
| |
| | |
| [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>] compliant servers ''must not'' make UTF-8 dependent on '''OPTS UTF-8 ON'''.
| |
| | |
| == SFTP ==
| |
| | |
| The situation for SFTP is similar to the one of FTP. Current versions of the SFTP specifications require filenames to be encoded as UTF-8, beginning with version 4 of the SFTP specifications.
| |
| | |
| However, the most commonly used SFTP protocol version is version 3 as implemented in OpenSSH. This version of the SFTP specifications does not require UTF-8. In fact it does not say anything about the encoding.
| |
| It is however reasonable to assume UTF-8 on those servers anyhow for the following reasons:
| |
| * The later protocol versions require UTF-8
| |
| * The SSH protocol, under wvhich SFTP operates, already requires UTF-8
| |
| * Even in version 3 of the protocol, some parts of the protocol already use UTF-8
| |
| * The native character set on most modern Unix(-like) operating systems is UTF-8
| |
| | |
| In essence this means that everywhere where SFTP is available, the necessary infrastructure to use UTF-8 is in place.
| |