Character Encoding

From FileZilla Wiki
Jump to navigationJump to search

Overview

Users sometimes encounter problems with FTP transfers that garble non-English characters in filenames, such as umlauts, accented letters or completely different scripts like Chinese or Arabic.

FTP is a rather old protocol and things we take for granted now were not even considered when it was designed. One of these things is support for non-English characters in filenames. When the FTP protocol was designed, computers mostly spoke English and were unable to display any non-English characters. As such, the FTP protocol was designed to be used with English characters only, namely 7-bit ASCII.

The problem is that many FTP clients and servers purposely violate the FTP specifications in order to support other, non-standard character sets. Which of these character sets are used is not subject to any negotiation. For any character set in existence, you can find a server using it with no way of detecting the proper encoding. The result: non-English characters are not transferred correctly.

To solve this problem, the FTP protocol has been extended in a backwards compatible way to use UTF-8 as the character set. (This solution is backwards compatible only with servers in compliance with the original specifications.)

If you have problems with filenames containing any foreign characters, this can have two reasons:

  • The server or client follows the original specifications by the letter and rightfully rejects those filenames
  • The server or client violates the specifications and uses a custom encoding that the other party does not understand

Both FileZilla Client and Server are fully compliant with the updated specifications and use UTF-8. FileZilla will not break FTP specifications by supporting non-standard encodings in order to accommodate the user.

If you have problems with other clients or servers, please upgrade (or ask the server to upgrade) to FTP software capable of UTF-8 or refrain from using foreign characters. Anything else is in violation of the FTP specifications and will only work if you manually ensure that the server and client use the same character encoding (which may not even be possible).

I've been looknig for a post like this for an age