{
    "href": "/post/2012/03/15/on-preferring-spaces-over-tabs-in-php/",
    "relId": "2012/03/15/on-preferring-spaces-over-tabs-in-php",
    "title": "On Preferring Spaces Over Tabs in PHP",
    "author": "pmjones",
    "markup": "html",
    "tags": [
        {
            "href": "/tag/php/",
            "relId": "php",
            "title": "PHP",
            "author": null,
            "created": null,
            "updated": [],
            "markup": "markdown"
        },
        {
            "href": "/tag/programming/",
            "relId": "programming",
            "title": "Programming",
            "author": null,
            "created": null,
            "updated": [],
            "markup": "markdown"
        }
    ],
    "created": "2012-03-15 16:02:38 UTC",
    "updated": [
        "2012-03-15 16:02:38 UTC"
    ],
    "html": "<p><em>The best lack all conviction, while the worst are full of passionate intensity. \u2014 \u201cThe Second Coming\u201d, William Butler Yeats</em></p>\n<p>Keep the above in mind when considering either side of the debate. ;-)</p>\n<h3>tl;dr</h3>\n<p>Herein I assert and discuss the following:</p>\n<ul>\n<li>\n<p>Using spaces has <em>subtle</em> advantages over using tabs in collaborative environments.</p>\n</li>\n<li>\n<p>The \u201ctabs reduce file size\u201d argument is factually true, but is a case of optimizing on the wrong resource.</p>\n</li>\n<li>\n<p>The \u201ctabs allow each developer to set his own indent widths\u201d argument sounds good in theory but leads to problems in practice regarding line length recognition and inter-line alignment.</p>\n</li>\n</ul>\n<h3>Introduction</h3>\n<p>In the PHP world, there are effectively two competing indenting practices: \u201c4-spaces\u201d and \u201ctab.\u201d  (There are some in the 2-space camp as well but they are very few.)</p>\n<p>I want to point out a couple things about why spaces might be considered preferable in a collaborative environment when style is important on a PHP project. And yes, it turns out <a href=\"http://www.codinghorror.com/blog/2009/04/death-to-the-space-infidels.html\">this dicussion <em>does</em> matter</a>.</p>\n<p>I used to use tabs, and slowly migrated over to spaces. Over the course of several years, I have found there is a slight but useful advantage to using spaces for indentation when working with other developers, and I want to discuss that one advantage in this essay.</p>\n<p>Note that I am not asserting an overwhemling, absolutely obvious, infallible moral rule that clearly favors spaces over tabs as the One True Path. It is merely a noticeable improvement regarding more sophisticated rules of style.</p>\n<p>Do I expect this essay to change anybody\u2019s mind on using tabs?  No; but, I do hope it will give some food for thought.</p>\n<h3>Regarding Tabs</h3>\n<p>When making an argument, it is important to state the alternative viewpoint in a way so that people who hold that viewpoint actually agree with it.</p>\n<p>What are the reasons for preferring a <code>tab</code> indent? As far as I can tell, they are:</p>\n<ul>\n<li>\n<p>The tab is a single character, so files are smaller.</p>\n</li>\n<li>\n<p>Using a tab character allows each developer to change the level of indent<br>\nthat he sees, without actually modifying the on-disk file.</p>\n</li>\n</ul>\n<p>If there are other reasons I have missed, please let me know.</p>\n<h3>File Size</h3>\n<p>In general, I assert that the \u201cfile size\u201d argument is a case of \u201coptimizing on the wrong resource.\u201d</p>\n<p>By way of example, let\u2019s take one file from a real project that uses 4-space indenting, <code>Zend_Db_Abstract</code>, and use <code>wc -c</code> to count the number of bytes in the file.</p>\n<pre><code>$ wc -c Abstract.php\n40953 Abstract.php\n</code></pre>\n<p>Now, let\u2019s convert each 4-space indent to a tab.</p>\n<pre><code>$ unexpand -t 4 Abstract.php &gt; Abstract-tabs.php\n$ wc -c Abstract-tabs.php\n34632 Abstract-tabs.php\n</code></pre>\n<p>We save 6K of space on a 40K file, or roughly 15%, by using a tab character for indents instead of a 4-space indent.</p>\n<p>Now, to get an idea of how that compares to another way to reduce size, let\u2019s remove all the comments from the original 4-space file and see what that does. We\u2019ll use a tool I found after two minutes of Googling (you may need to change the hashbang line of <code>remccoms3.sed</code> to point to your <code>sed</code>):</p>\n<pre><code>$ wget http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed\n$ chmod +x remccoms3.sed\n$ ./remccoms3.sed Abstract.php &gt; Abstract-no-comments.php\n$ wc -c Abstract-no-comments.php\n21022 Abstract-no-comments.php\n</code></pre>\n<p>That\u2019s about a 50% reduction. If disk storage is <em>really</em> a concern, we\u2019d be much better off to remove comments than to convert spaces to tabs. Of course, we could do both.</p>\n<p>This example makes me believe that the \u201cfile size\u201d argument, while factually correct, is a case of \u201coptimizing on the wrong resource.\u201d That is, the argument gives strong consideration to a low-value item.  Disk space is pretty cheap, after all.</p>\n<p>A followup argument about this is usually, \u201cEven so, it\u2019s less for the PHP interpreter to deal with. Fewer characters means faster code.\u201d  Well, not exactly.  Whitespace is tokenized, so the parser sees it all the same.</p>\n<h3>Developer Tab Stop Preferences</h3>\n<p>This, to me, seems to be the primary argument for preferring tabs over spaces for indenting.  Essentially, the idea is to allow each individual developer on a project to make the code look the way that individual developer prefers.</p>\n<p>This is a non-trivial argument.  It\u2019s very appealing for the individual developers to be able to work on a project where Developer A sees a tab stop every 4 characters, and Developer B sees a tab stop every 2 or 8 or whatever characters, without changing the actual bytes on disk.</p>\n<p>I have two arguments against this; they seem to be minor, until we examine them in practice:</p>\n<ul>\n<li>\n<p>It becomes difficult to recognize line-length violations with over-wide tab stop settings.</p>\n</li>\n<li>\n<p>Under sophisticated style guides, inter-line alignment for readability becomes inconsistent between developers using different tab stops.</p>\n</li>\n</ul>\n<p>These arguments require a little exposition.</p>\n<h3>Line Length Recognition</h3>\n<p>Because of limitations of this blog, let\u2019s say that our coding style guide has a line length limit of 40 characters.  (I know, that\u2019s half or less of what it should be, but it serves as an easy illustration.)</p>\n<p>The following code, with 4-character tab stops, shows what that line length limit looks like:</p>\n<pre><code>         1         2         3         4\n1234567890123456789012345678901234567890\nfunction funcFoo()\n{\n    $varname = '12' . funcBar() . '34';\n}\n</code></pre>\n<p>It\u2019s clearly within the line length limit.  But it looks like this under an 8-character tab stop:</p>\n<pre><code>         1         2         3         4\n1234567890123456789012345678901234567890123\nfunction funcFoo()\n{\n        $varname = '12' . funcBar() . '34';\n}\n</code></pre>\n<p>A developer who sees this code under 8-character stops will think the line is past the limit, and attempt to reformat it in some way.  After that reformatting, the developer working with 4-character tab stops will think the line is too short, and reformat it back to being longer.  This is not particularly productive.</p>\n<p>Some will say this just shows that line length limits are dumb. <a href=\"http://paul-m-jones.com/archives/276\">I disagree.</a></p>\n<h3>Inter-Line Alignment</h3>\n<p>By \u201cinter-line alignment\u201d I mean the practice where, if we have several lines of code that are similar, we align the corresponding parts of each line in columns.  To be clear, it\u2019s not that unaligned code is <em>impossible</em> to read; it\u2019s just <em>noticeably easier</em> to read when it\u2019s aligned.</p>\n<p>Typically, inter-line alignment is applied to variable assignment.  For example, the following unaligned code \u2026</p>\n<pre><code>$foo = 'bar';\n$bazdib = 'gir';\n$zim = 'irk';\n</code></pre>\n<p>\u2026 is easier to scan in columns aligned on the <code>=</code> sign:</p>\n<pre><code>$foo    = 'bar';\n$bazdib = 'gir';\n$zim    = 'irk';\n</code></pre>\n<p>We can see clearly what the variables are in the one column, and what the assigned values are in the next column.</p>\n<p>Alternatively, we may need to break an over-long line across several lines, and make it glaringly obvious during even a cursory scan that it\u2019s all one statement.</p>\n<p>Now, let\u2019s say we have a bit of code that should be aligned across two or more lines, whether for readability or to adhere to a line length limit.  We begin with this contrived example using 4-space indents (the spaces are indicated by \u2022 characters):</p>\n<pre><code>function funcName()\n{\n\u2022\u2022\u2022\u2022$varname = '1234' . aVeryLongFunctionName() . 'foo' . otherFunction();\n}\n</code></pre>\n<p>Under a style guide where we align on <code>=</code> to keep within a line length limit, we can do so regardless of tab stops:</p>\n<pre id=\"scroll_to_here\"><code>function funcName()\n{\n\u2022\u2022\u2022\u2022$varname = '1234' . aVeryLongFunctionName()\n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 . 'foo' . otherFunction();\n}\n</code></pre>\n<p>Under a guide where we use tabs, and Developer A uses 4-character tab stops, we need to push the alignment out to the tab stops to line things up (tabs are indicated by \u2192 characters):</p>\n<pre><code>function funcName()\n{\n\u2192   $varname\u2192   = '1234' . aVeryLongFunctionName()\n\u2192   \u2192   \u2192   \u2192   . 'foo' . otherFunction();\n}\n</code></pre>\n<p>However, if a Developer B uses an 8-character tab stop, the same code looks like this on Developer B\u2019s terminal:</p>\n<pre><code>function funcName()\n{\n\u2192       $varname\u2192       = '1234' . aVeryLongFunctionName()\n\u2192       \u2192       \u2192       \u2192       . 'foo' . otherFunction();\n}\n</code></pre>\n<p>The second example has the same tabbing as in the first example, but the alignment looks broken under 8-character tab stops. Developers who prefer the 8-character stop are likely to try to reformat that code to make it look right on their terminal. That, in turn, will make it look broken for those developers who prefer a 4-character stop.</p>\n<p>Thus, the argument that \u201ceach developer can set tab stops wherever he likes\u201d is fine in theory, but is flawed in practice.</p>\n<p>The first response to alignment arguments is generally: \u201cUse tabs for indenting and spaces for alignment.\u201d Let\u2019s try that.</p>\n<p>First, a 4-character tab stop indent, followed by spaces for alignment:</p>\n<pre><code>function funcName()\n{\n\u2192   $varname = '1234' . aVeryLongFunctionName()\n\u2192   \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 . 'foo' . otherFunction();\n}\n</code></pre>\n<p>Now, an 8-character tab stop indent, followed by spaces for alignment:</p>\n<pre><code>function funcName()\n{\n\u2192       $varname = '1234' . aVeryLongFunctionName()\n\u2192       \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 . 'foo' . otherFunction();\n}\n</code></pre>\n<p>That looks OK, right?  Sure \u2026 until a developer, through habit (and we <em>are</em> creatures of habit) hits the tab key for alignment when he should have used spaces. They are both invisible, so the developer won\u2019t notice on his own terminal \u2014 it will only be noticed by developers with other tab stop preferences. It is the same problem as before: misalignment under the different tab stop preferences of different developers.</p>\n<p>The general response at this point is to modify the tab-oriented style guide to disallow that kind of inter-line alignment. I suppose that is reasonable if we are committed to using tabs, but I find code of that sort to be less readable overall.</p>\n<h3>Solution: Use Spaces Instead</h3>\n<p>The solution to these subtle and sophisticated issues, for me and for lots of other PHP developers, is to use spaces for indentation and alignment.  All professional text editor software allows what are called \u201csoft tabs\u201d where pressing the tab key inserts a user-defined number of spaces.  When using spaces for indentation and alignment, all code looks the same everywhere, and does not mess up alignment under different tab stop preferences of different developers.</p>\n<h3>Conclusion</h3>\n<p>I realize this is a point of religious fervor among developers. Even though I have a <em>preference</em> for spaces, I am not a spaces zealot.  This post is not evangelism; it is a dissection of the subtle and long-term issues related to tabs-vs-spaces discovered only after years of professional collaboration.</p>\n<p>Please feel free to leave comments, criticism, etc.  Because this is such a touchy subject, please be especially careful to be civil and maintain a respectful tone in the comments.  If you have a very long comment, please consider pinging/tracking this post with a blog entry of your own, instead of commenting directly.  I reserve the right to do as I wish with uncivil commentary.</p>\n<p>Thanks for reading, all!</p>\n"
}
