Difference between revisions of "Data Type"

From FileZilla Wiki
Jump to navigationJump to search
(Revert spam)
(fXEoYcwHgjGQ)
Line 1: Line 1:
Files can be transferred between an FTP client and server in different ways. The FTP specification ([http://filezilla-project.org/specs/rfc0959.txt <nowiki>RFC 959</nowiki>]) calls them "data type", but they are commonly referred to as "transfer mode", even though this is not correct.
+
Not sure if I'll be able to make it to the meeting, but here is a thohgut: The main reason why OSX builds are slower is that in practice, we're doing two builds. I see two possible hacks around this, one of which I'm not certain of the impact:- Build universal binaries in one pass. Apple's gcc allows to build both i386 and x86-64 binaries with one command line. I don't know if that's actually faster than doing one pass in i386 and another one in x86-64- Build i386 and x86-64 binaries separately, in parallel, on different machines, and aggregate the result in a universal package when they are both finished.
 
 
The different data types are:
 
*ASCII
 
*binary (called "image" in the specification)
 
*EBCDIC
 
*local
 
 
 
But most of the time, however, only ASCII and binary types are used or even implemented.
 
 
 
ASCII type is used to transfer text files. The problem with text files is that different platforms have different kinds of line endings. Microsoft Windows for example uses a CR+LF pair (carriage return and line feed), while Unix(-like) systems, including Linux and MacOS X, only use LF and traditional MacOS systems (MacOS 9 or older) only use CR. The purpose of ASCII type is to ensure that line endings are properly changed to what is right on the platform. According to the FTP specification, ASCII files are always transferred using a CR+LF pair as line ending.
 
 
 
So in case the file is transferred from the client to the server, the client has to make sure CR+LF is used. Therefore it has to add nothing (on Microsoft Windows), add CR (on Unix) or add LF (on legacy MacOS) to each line ending. The server then adjusts the line ending again to what is used on the platform the server runs at. If it is Microsoft Windows, nothing has to be removed, while on Unix the superfluous CR is removed and on legacy MacOS the unneeded LF.
 
 
 
The same happens when a file is downloaded from the server to the client: the server makes sure the line endings are CR+LF when sending the file and the client then strips away whatever is not needed as line ending on its platform.
 
 
 
Because the file is changed if client and server are not running on the same kind of platform, this data type cannot be used for files with arbitrary characters, so called binary files, like images and videos. If it is used anyway, the binary files most likely are corrupted and won't work as expected anymore.
 
 
 
Compared to ASCII type, binary type is the easier one: the file is just transferred as-is, and no line ending translation is done.
 
 
 
So when you are not sure what to use, always go for binary type. Nowadays, nearly all (good) text editors can handle the three possible line endings, and other textual files like the ones of scripting languages such as Perl or PHP, as well as XML files (nearly) always work with any line ending as well.
 
 
 
== Example ==
 
 
 
Client system: Windows (CRLF line endings)
 
 
 
Server system: Some Linux distribution (LF line endings)
 
 
 
If you upload a text file with 200 lines and a total size of 5768 bytes, it will have a size of 5568 bytes on the server.
 

Revision as of 04:44, 7 May 2012

Not sure if I'll be able to make it to the meeting, but here is a thohgut: The main reason why OSX builds are slower is that in practice, we're doing two builds. I see two possible hacks around this, one of which I'm not certain of the impact:- Build universal binaries in one pass. Apple's gcc allows to build both i386 and x86-64 binaries with one command line. I don't know if that's actually faster than doing one pass in i386 and another one in x86-64- Build i386 and x86-64 binaries separately, in parallel, on different machines, and aggregate the result in a universal package when they are both finished.