NPath complexity and cyclomatic complexity explained

If you happen to be using PHP Mess Detector (which you should for any larger project) you have probably stumbled upon these two, but do you really know what they stand for? NPath complexity and cyclomatic complexity sounds really scary, but they are fancy words for quite simple concepts. So let’s go through them and also find out why they’re important for maintainable and testable code. Both of these concepts are used in static code analysis and are measurements of how complex a function is.

Cyclomatic complexity

This is a very straight forward concept, it’s pretty well documented in PHPMD’s documentation and what does is pretty much count some statements. The standard threshold for this complexity is 10 points, so if you have a function with higher complexion than that, you should try to reduce it.

It will begin by adding 1 point for the function declaration, after that it will add 1 point for every if, while, for and case. I’ll just copy and paste the example code from PHPMD’s documentation to illustrate this, of how this function ends up with a cyclomatic complexity of 12.

<?php 
class Foo 
{ 
1   public function example() { 
2       if ($a == $b) { 
3           if ($a1 == $b1) { 
                fiddle(); 
4           } else if ($a2 == $b2) { 
                fiddle(); 
            } else { 
                fiddle(); 
            } 
5       } else if ($c == $d) { 
6           while ($c == $d) { 
                fiddle(); 
            } 
7       } else if ($e == $f) { 
8           for ($n = 0; $n > $h; $n++) {
                fiddle();
            }
        } else {
            switch ($z) {
9               case 1:
                    fiddle();
                    break;
10              case 2:
                    fiddle();
                    break;
11              case 3:
                    fiddle();
                    break;
12              default:
                    fiddle();
                    break;
            }
        }
    }
}

Not that hard, huh?

NPath complexity

Now this one is a bit trickier since the PHPMD documentation really suck. This is what is says about it:

The NPath complexity of a method is the number of acyclic execution paths through that method.

Wait, what? At least to me that made zero sense when I first read it. But as previously stated, it’s just a lot of fancy talk for simple stuff. The simple explanation is that how many “paths” there are in the flow of your code in the function. Let’s write ut a simple example function.

<?php 
function foo($a, $b) 
{ 
    if ($a > 10) {
        echo 1;
    } else {
        echo 2;
    }
    if ($a > $b) {
        echo 3;
    } else {
        echo 4;
    }
}

foo(1, 2); // Outputs 24
foo(11, 1); // Outputs 13
foo(11, 20); // Outputs 14
foo(5, 1); // Outputs 23

So here we have function with 4 possible outcomes, since we have 2 statements that have 2 possible outcomes each (2 * 2 = 4). That means that the functions Npath complexity is 4. If we would add another statement with 2 possible outcomes we would get a complexity of 8 since 2 * 2 * 2 = 8.

So the NPath complexity is exponential and could easily get out of hand, in old legacy code don’t be surprised if you find functions with complexity over 100 000. The default value of the threshold for this complexity is 200, staying under that value is something you should strive for.

So why is the complexity important?

Simple code is always better than complex code. This applies to areas as readability, maintainability and testability. Consider writing a unit test and you have a function with an NPath complexity of 16. This means that if you need want 100% code coverage you need to test for 16 possible outcomes and that would end up in pretty messy tests. This goes again the entire philosophy of unit testing since you want as much isolation as possible when writing your tests. That’s why you should always try to reduce complexity in your application.

So don’t be scared of these seemingly scary concepts but instead embrace them and use them to your advantage.

  • Somebody give that man a medal

  • Mahder

    Just love how you presented these concepts….simple and down to the point! Perfect!

  • Larry

    “the number of ACYCLIC execution paths”

    acyclic: not displaying or forming part of a cycle.

    In other words: An execution path through the code (routine or block) that does not cycle back and execute the same code along a different path based on variable or state changes. This begs the question of: How does one count the N-path of a feature having cyclic (e.g. not acyclic) loops?

    Moreover, note that putting an “a” in front of the word is a means of saying “not”. It stems from ancient Greek (and other languages) as well as semitic languages like Hebrew, where “a” is aleph (or alpha in Greek). See WikiPedia on Alpha, which contains the following:

    Privative a is the Ancient Greek prefix ἀ- or ἀν- a-, an-, added to words to negate them. It originates from the Proto-Indo-European *n̥- (syllabic nasal) and is cognate with English un-.

    Thus, in English, we might say: uncyclic, which is not a word, but you get the idea.

    • It’s an interesting question. I don’t know if there is a way of measuring it. One simple way would be to count “N-paths * number of possible recursive iterations”. But it might be too simplified, because what happens if a recursive call could have different N-paths?

      Not really sure I’m following what you’re saying about acyclic/uncyclic. Did I get it wrong somewhere? English is my second language 🙂

      • Larry

        I believe there is a math-based way of measuring N-path and there might even be a version of that where we (as common folk programmers) can use it in a work-a-day world. I will explore that and let you know what I find. Still, with a proper testing paradigm, one may find the answer to the question of measurement inconsequential to the work at-hand.

        re: “uncyclic”—I am meaning to communicate that acyclic is a way of saying (made up word) “uncyclic”—that is—code that executes from top to bottom without repeating itself (perhaps by recursion?).

        • Great, I see your point now. Let me know if you find anything! The whole intent of this post was to try and lower the bar of understanding and using these concepts for people so they could use it in their day to day work.

          • Larry

            Glad I was able to clarify for you.

            The work-a-day world is certainly a key goal. I know there is a method of N-path calculation. In Eiffel Studio, there is a new tool: Eiffel Inspector. This is a “code-analyzer” (not a new concept), which knows how to calculate N-Path and actually presents a Warning if the N-Path count goes to high on a method (feature). I rather like that I have this tool in Eiffel Studio. I presume that there are other code analysis tools that do the same for other languages and invite readers of your post to explore code analysis engines as a means of improving their code!

            Cheers!! 🙂

            P.S. Excellent post on your part and fantastic follow-ups from everyone. I am delighted to know people are striving for excellence in their work!

          • Cool! I’ve never heard of Eiffel Studio before. What I’ve been using is vim-phpqa (https://github.com/joonty/vim-phpqa) for Vim and/or sublime-phpmd (https://github.com/madedotcom/sublime-phpmd) for Sublime. What they do is use the local phpmd binary and then hook the output into the editor. Since phpmd is capable of calculating both cyclomatic and NPath complexity, you can get warning on both of those straight in your editor.

            Pretty sure there is tools for various editors out there, I know PhpStorm has this feature as well.

  • Larry

    Also—I will point out that measuring acyclic execution paths or N-path is not as straight-forward or simple as it might seem. However, if you learn to pay attention to the matter, your code will improve! Nevertheless, there is an 80% value you can get with 20% effort!

    Whatever routine or block of code you are examining will be one of two flavors:

    1. Has-not subordinate client calls (e.g. your code does not call another feature or method).
    2. Has subordinate client calls.

    In the case of #1, the work is pretty simple. One examines the code, finds the calls with arguments and states that will execute each conditionally-controlled pathway. In doing so, one will write tests that “prove” the code performs as required by its specification (you do have one of those don’t you?).

    In the case of #2, the work is much more challenging and tedious! For each client call (e.g. routine component-call), one will isolate the called routine first, exercising it with N-Path tests, working one’s way down the call-stack towards routines that fit condition #1 (above). Each #1-conditioned routine will have its own tests that reasonably exhaust the N-paths. I say reasonably because exhaustive may not be humanly possible!

    Once one handles the bottom layer of #1-conditioned-routines, then one can come up a layer in the call-stack. At the next layer up, the necessity for being extremely thorough will diminish, as one has proven the supporting layer in greater detail. However, there are some interesting facets to note as we rise up in the call-stack!

    First, one may find that doing the work of proving lower layers makes suggestions about poor or improper design of the upper. Perhaps the upper layer routine calls are mutually exclusive of each other? This might suggest that they get pulled out into their own routine and the upper layer routine is done away with completely.

    Second, the conditions that control which lower-layer calls are made are essentially “contracts” or stateful conditions that say, “This routine is only legal to call under these circumstances of state!” In a language with Design-by-Contract (e.g. Eiffel, D, .NET, et al), such things will suggest controlling contracts in the lower-layer routines, and even invariants in the enclosing class.

    Third, now that the underlying routines are satisfactorily tested, the upper layer calling routine may only need to be spot-checked with primary N-path use-case calls, and the exhaustive approach can be relaxed.

    Finally, what one must note about all of this is how it literally defies the ruthless dogma of the TDD religion that states that one MUST write a failing test first and then one can write the code! This presumes that one is writing from an exhaustively detailed design, wherein one understands the complete scope and nature of the routine and code and the job it is to perform from the start. Reality tends to say this: 80/20 rule—that is—even with the best of planning, design, instruction, research, and so on, one may yet only ever expect to know 20% of what a job fully takes from the beginning, and learns the remaining 80% along the pathway to completion!

    Writing software is a messy human and imperfect business. We do well and better to not treat it with undo pride, which suggests that we know all as we code: If that were true, we would need no testing, no proving, no refactoring, and we would encounter no bugs and no flaws, eh?

    Cheers!!

  • Ben Derisgreat

    Thanks for the explanation! A bit annoying that PHPMD will count null coalescing toward the NPath complexity. 7 null coalesces already reaches to 156000 complexity.

    • Same happens for guard clauses. All the more reason to treat the metric as an useful tool for finding problems, and not for replacing good programmer judgment (as should be done with all metrics).

  • Jessica Mauerhan

    I just took a function that was 600+ lines, and started with an N-Path of 1478464112885979 (That’s 1.4 quadrillion) and got it under 200. It took 3 hours, but now we’ll be able to maintain that code a lot easier.

  • David Kaštánek

    Excellent explanation! Thanks 🙂

  • Pingback: PHP Unconference Europe 2016 – Entropy Wins()

  • Pingback: Cyclomatic Complexity How To Calculate | Information()

  • denis631

    Great article. Short and straight to the point!

    • Thanks, my intention was to make it short and straight for two short and straight concepts that sound a lot more complicated then they are 🙂