I have a Mercurial repository for Blogger templates, I download all templates with a Python script, then I edit them when I need to make a change.
I see this message every time after I save the files, it’s actually from diff not Hg nor Git:
diff -r 27343c68b285 3803541356848955053-blog.xml
--- a/3803541356848955053-blog.xml Thu Nov 15 18:34:08 2012 +0800
+++ b/3803541356848955053-blog.xml Fri Nov 16 18:53:12 2012 +0800
@@ -1011,4 +1011,4 @@
</footer>
<b:include data='blog' name='google-analytics'/>
</body>
-</html>
\\ No newline at end of file
+</html>
It’s not actually a big deal, but quite annoying, because always has an unnecessary line of change. Vim will add an end-of-line (EOL) at the end of file. The last byte of downloaded XML file is > not \n (0x0A).
$ hexdump -C "$FILE" | tail -2
0000a770 62 6f 64 79 3e 0a 3c 2f 68 74 6d 6c 3e |body>.</html>|
0000a77d
$ vim "$FILE" # do save
$ hexdump -C "$FILE" | tail -2
0000a770 62 6f 64 79 3e 0a 3c 2f 68 74 6d 6c 3e 0a |body>.</html>.|
0000a77e
The file size is increased by 1 byte, that is the EOL character.
1 Keep untouched
If I want to keep the source untouched where I do not change, an easy way to save is:
:set binary noeol
:w
binary is needed in order to get noeol to work, see :help eol.
Or I can write a script to do:
head -c -1 "$FILE" > "$FILE.tmp" && mv "$FILE"{.tmp,}
This will remove the last byte, but it does not check if last byte is \n.
2 Fix it
From what I read on Internet, having EOL at the end of file seems like a correct way. So, I changed my download shell script to add an EOL to files:
#!/bin/bash
# 2008-10-25T20:16:46+0800
python ~/p/yjl/Blogger/BackupTemplate.py
for tmpl in *-20??????-??????.xml; do
echo >> $tmpl
mv $tmpl ${tmpl%-20??????-??????.xml}.xml
done
I use the script to remove the date part from filename of downloaded XML, don’t need those since I use version control.
I found this entry while looking for whether EOLs at the end of, specifically, HTML document files are valid HTML. Here's the relevant link to HTML5 as of today, 2013-10-10 Z, which currently points to a page of the HTML5 "W3C Candidate Recommendation 6 August 2013": http://www.w3.org/TR/html5/syntax.html#writing (HTML5 section 8.1 "Writing HTML documents", until start of section 8.1.1 "The DOCTYPE")
ReplyDeleteIt lists the allowed structure of a HTML file's contents; the last point, numbered 6, also allows "Any number of [...] space characters."; the definition of those includes regular blanks U+0020, as well as both U+000A (LF) and U+000D (CR), among others. This point 6 specifies what is allowed to occur after a document's closing tag of the root (html) element.
Interestingly, apparently standard C specifies that any C language source code file which isn't empty not only is allowed to but rather "shall" end in an EOL, as described here: http://gcc.gnu.org/ml/gcc/2003-11/msg01568.html (referencing ISO C90 (also C99) section 5.1.1.2, based on ANSI C89 section 2.1.1.2; point 2)