Editing Character Encoding
From FileZilla Wiki
Jump to navigationJump to searchWarning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
− | + | FTP is a rather old protocol and things we take for granted now were not even considered 20 years ago. One of these things is support for non-English characters. When the FTP protocol was designed, computers mostly spoke English and were unable to display any non-English characters like umlauts, accented letters or even completely different scripts like for example Chinese. | |
+ | As such, the FTP protocol has been designed to be used with English characters only, namely 7-bit ASCII. | ||
− | FTP | + | The problem is that many FTP clients and servers purposely violate the FTP specifications in order to support other, non-standard character sets. Which of these character sets are used is not subject to any negotiation and completely arbitrary. For any character set in existence, you can find a server using it with no way of detecting the proper encoding. |
− | + | To solve this problem, the FTP protocol has been extended in a backwards compatible way to use UTF-8 as character set, which is a strict superset of the previously used character set. Note that this obviously can only be backwards compatible with servers in compliance with the original specifications. | |
− | |||
− | To solve this problem, the FTP protocol has been extended in a backwards compatible way to use UTF-8 as the character set. | ||
If you have problems with filenames containing any foreign characters, this can have two reasons: | If you have problems with filenames containing any foreign characters, this can have two reasons: | ||
* The server or client follows the original specifications by the letter and rightfully rejects those filenames | * The server or client follows the original specifications by the letter and rightfully rejects those filenames | ||
− | * The server or client violates the specifications and uses a custom encoding | + | * The server or client violates the specifications and uses a custom encoding |
− | + | Note that both FileZilla Client and Server are fully compliant with the updated specifications and use UTF-8. | |
− | If you have problems with other clients or servers, please upgrade | + | If you have problems with other clients or servers, please upgrade to FTP software capable of UTF-8 or refrain from using foreign characters. Anything else is in violation to the FTP specifications and does not work. |
== Technical details == | == Technical details == | ||
Line 25: | Line 24: | ||
This extension is fully backwards compatible with [http://filezilla-project.org/specs/rfc0959.txt <nowiki>RFC 959</nowiki>]. | This extension is fully backwards compatible with [http://filezilla-project.org/specs/rfc0959.txt <nowiki>RFC 959</nowiki>]. | ||
− | As long as you're using | + | As long as you're not using any non-English characters, it doesn't matter if the software you are using supports RFC 2640 or not. However, if you use non-English characters without using RFC 2640 compatible software, there will be problems--problems which are entirely self-made by not obeying the specifications. |
=== UTF8 feature negotiation === | === UTF8 feature negotiation === | ||
Line 45: | Line 44: | ||
[http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>] compliant servers ''must not'' make UTF-8 dependent on '''OPTS UTF-8 ON'''. | [http://filezilla-project.org/specs/rfc2640.txt <nowiki>RFC 2640</nowiki>] compliant servers ''must not'' make UTF-8 dependent on '''OPTS UTF-8 ON'''. | ||
− | + | == SFTP == | |
− | The situation for SFTP is similar to the one | + | The situation for SFTP is similar to the one of FTP. Current versions of the SFTP specifications require filenames to be encoded as UTF-8, beginning with version 4 of the SFTP specifications. |
However, the most commonly used SFTP protocol version is version 3 as implemented in OpenSSH. This version of the SFTP specifications does not require UTF-8. In fact it does not say anything about the encoding. | However, the most commonly used SFTP protocol version is version 3 as implemented in OpenSSH. This version of the SFTP specifications does not require UTF-8. In fact it does not say anything about the encoding. | ||
− | It is however reasonable to assume UTF-8 on those servers for the following reasons: | + | It is however reasonable to assume UTF-8 on those servers anyhow for the following reasons: |
* The later protocol versions require UTF-8 | * The later protocol versions require UTF-8 | ||
− | * The SSH protocol, under | + | * The SSH protocol, under wvhich SFTP operates, already requires UTF-8 |
* Even in version 3 of the protocol, some parts of the protocol already use UTF-8 | * Even in version 3 of the protocol, some parts of the protocol already use UTF-8 | ||
* The native character set on most modern Unix(-like) operating systems is UTF-8 | * The native character set on most modern Unix(-like) operating systems is UTF-8 | ||
In essence this means that everywhere where SFTP is available, the necessary infrastructure to use UTF-8 is in place. | In essence this means that everywhere where SFTP is available, the necessary infrastructure to use UTF-8 is in place. |