I attended a talk on Monday morning hosted by the Ottawa SPIN (Software Process Improvement Network) forum titled “The Cost, Quality, and Security of Open Source Software: Fact and Fiction”. The presenter, Khaled El Emam, who among his many varied titles is an associate professor at the University of Ottawa, has done a study of open source software. In his study, he made very few distinctions between types of software. He did distinguish between “commercial open source” and “free open source”, but did not distinguish between things like open source operating systems, applications, utilities, and libraries. I imagine that, if he had made such divisions, he may have come up with some different results (and maybe some useful conclusions).
He did make several interesting points concerning misconceptions about open source software. Perhaps the most effort was centered on the notion that open source software is free. His point was that even free open source software isn’t technically free. Essentially, you still need to invest time and resources into training, implementation, and maintenance of open source software. Of course, he is absolutely correct. However, you need to do all that for closed source commercial software as well.
He also attacked the notion that open source is good because you can change it if you need to. He cautioned that changing open source software comes with risk. Of course it does. Sure you can change the open source, but that change can become a liability. When the next version is released, you may need to reapply the change. I don’t think that anybody will argue that you need to think before you make such changes and see if you can find better alternatives to the change. If you do need to make a change, the theory is that you can push that change back into the code base. For many projects this may be possible under the right set of circumstances. If the change is really specific to your business, however, don’t count on the change being accepted.
I made the argument that anytime you write a line of code anywhere you’re taking on risk. Change happens. To ground the conversation, we discussed the possiblity of modifying Apache Web Server. Let’s say that–for whatever reason–you need to modify Apache. I can’t argue that the change becomes a liability. However, is it more or less of a risk than completely writing your own web server to get the custom functionality (for those readers who have drifted off to sleep by this point, the answer is “less”). Is it more or less of risk than getting a web server vendor to modify and support the change in their product (you’ll have to come up with your own answer for this).
He discussed the notion that open software is good because it has many thousands of people looking at it and testing it. He did make an interesting point that there is some motivation to actually do less testing of an open source project because you can count on the thousands of eyes. This all sounds very wonderful until you look at the numbers. Apparently in the ~46K projects his team researched, the median number of developers was one and some 95% of all projects have fewer than 5 developers. He showed statistics that state that most (80%) open source projects have less than 11 users. That’s not a lot of eyeballs.
Of course, this doesn’t apply for Eclipse. We really do have 1000’s of eyeballs looking at our stuff. This, according to Khaled’s research puts Eclipse in the top 1% of open source software projects with more than 100 users.
In the end, I found myself disappointed with the talk. I think it was good stuff, but when you roll ~46K open source projects into the mix, the results can’t possibly be anything but a bunch of abstract thoughts and numbers. When you lump efforts like Eclipse that have the backing of several involved member organizations providing full time staff to projects along with stuff cobbled together by some guy in his garage (not to disparage the efforts of the dedicated lone coder) I don’t think you can trust the results of the study.
What insight can you take from the fact that approximately 80% of ~46K open source projects have less than eleven users? Does this mean that nobody’s using open source software? Does that mean that nobody should use open source software? Frankly, I think it means that statistics such as these taken over a ridiculously varied sample set are useless for making any actual decisions. I guess that the only real message was “let’s be careful out there”.
I’m reminded of a calculation made in my all time favourite book series, The Hitchhiker’s Guide to the Galaxy (no, I haven’t seen the movie yet). The calculation was this: it is understood that there is an infinite number of planets in the universe and it is known that there is finite number of inhabited planets in the universe. When you divide any number (even a really big one) by infinity, the result is so small as to effectively be zero. Therefore, the average population of a planet is zero and by extension, the population of the universe is also zero.
At one point in the talk, he did acknowledge the Eclipse was a different beast all together (or something to that effect) which was good to hear.
I did have one minor nit with his talk. He lumped iterative vs. incremental software together and called them the same thing. He’s wrong.