In computing, end-of-file (commonly abbreviated EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream.
——Wikipedia
难为下定义的人们,描述既不能太复杂,又要尽可能的说清一个事物的本质。
好,从上面的叙述中,我们萃取出关于 EOF:
范畴:计算机操作系统中,其他领域看来用不着这玩意儿
含义:一种状况,什么状况?表明从数据源(通常指文件或流)中已无数据可读
如果只看到这里,EOF 似乎只是抽象概念而已,她应该独立于操作系统的种类、也应该独立于能够在某种操作系统下编译的计算机语言,everything before 'but' is bullshit。
但是,维基百科在紧挨着定义的下一段里说:
In the C Standard Library, the character reading functions such as getchar return a value equal to the symbolic value (macro) EOF to indicate that an end-of-file condition has occurred. The actual value of EOF is implementation-dependent (but is commonly -1, such as in glibc[2]) and is distinct from all valid character codes. Block-reading functions return the number of bytes read, and if this is fewer than asked for, then the end of file was reached or an error occurred (checking of errno or dedicated function, such as ferror is often required to determine which).
——Wikipedia
展开:
定义既然说:EOF 表明了"已无数据可读"的状况,那么识别这种状况的依据是什么?
计算机语言上千种,唯独选了 C 来描述 EOF 的实现,隐约感到虽然不同语言对于 EOF 的实现可能不同,但 C 的很有代表性
只要能起到识别结尾在哪的作用就成,并没有一个标准说 EOF 该怎么实现,但通常是用一个能够区别于全部有效字符码的值,比如 glibc 里用 -1。啊,越来越具体,越来越靠近 CPU 里那些用于判断的指令和寄存器
不要将 Linux 中的一切皆文件的文件二字,与我们现在所说的文件混淆,这里说的文件,就是通常在外部存储器(如:磁盘)中保存的那些普通文件,特别是文本文件。
静态的相对单纯些,我们就从其开始——
Some MS-DOS programs, including parts of the Microsoft MS-DOS shell (COMMAND.COM) and operating-system utility programs (such as EDLIN), treat a Control-Z in a text file as marking the end of meaningful data, and/or append a Control-Z to the end when writing a text file. This was done for two reasons:
Backward compatibility with CP/M. The CP/M file system only recorded the lengths of files in multiples of 128-byte "records", so by convention a Control-Z character was used to mark the end of meaningful data if it ended in the middle of a record. The MS-DOS filesystem has always recorded the exact byte-length of files, so this was never necessary on MS-DOS.
It allows programs to use the same code to read input from both a terminal and a text file.
Input from a terminal never really "ends" (unless the device is disconnected), but it is useful to enter more than one "file"into a terminal, so a key sequence is reserved to indicate end of input. In UNIX the translation of the keystroke to EOF is performed by the terminal driver, so a program does not need to distinguish terminals from other input files. By default, the driver converts a Control-D character at the start of a line into an end-of-file indicator. To insert an actual Control-D (ASCII 04) character into the input stream, the user precedes it with a "quote" command character (usually Control-V). AmigaDOS is similar but uses Control-\ instead of Control-D.
In DOS and Windows (and in CP/M and many DEC operating systems such as RT-11 or VMS), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26.
Macro: int EOF
This macro is an integer value that is returned by a number of narrow stream functions to indicate an end-of-file condition, or some other error situation. With the GNU C Library, EOF is -1. In other libraries, its value may be some other negative number.
也就是说,程序要做判断时 EOF 不可与 -1 互换。欸,NULL 和 ‘\0’ 何尝不是如此。像这类规范中不限定具体实现的例子,在 CS(Computer Science) 世界里比比皆是,比如 Go 的包导入;比如 C++ 自增自减运算符的副作用;比如……停,眼前这点儿事还没扯完呢。
关于 EOF 是在哪里产生的,这里还有一篇老外的文章,EOF is not a character。图文并茂,既有跨语言的横向对比,又有纵深的底层原理说明。比我写的好多了,大家有空可以看看。
我们摘抄一段:
How do the high-level I/O routines in the examples above determine the end-of-file condition? On Linux systems the routines either directly or indirectly use the read() system call provided by the kernel. The getc() function (or macro) in C, for example, uses the read() system call and returns EOF if read() indicated the end-of-file condition. The read() system call returns 0 to indicate the EOF condition.
EOF
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
1; Hello World Program (Getting input) 2; Compile with: nasm -f elf helloworld-input.asm 3; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-input.o -o helloworld-input 4; Run with: ./helloworld-input 5 6%include 'functions.asm'
7 8SECTION.data 9msg1db'Please enter your name: ',0h; message string asking user for input10msg2db'Hello, ',0h; message string to use after user has entered their name1112SECTION.bss13sinput:resb255; reserve a 255 byte space in memory for the users input string1415SECTION.text16global_start1718_start:1920moveax,msg121callsprint2223movedx,255; number of bytes to read24movecx,sinput; reserved space to store our input (known as a buffer)25movebx,0; read from the STDIN file26moveax,3; invoke SYS_READ (kernel opcode 3)27int80h2829moveax,msg230callsprint3132moveax,sinput; move our buffer into eax (Note: input contains a linefeed)33callsprint; call our print function3435callquit