Tuesday, March 10, 2009

Schedule, LOC, and Budget? You're Lying

Software engineering is an imprecise science. Anyone who tells you differently is either not working very hard, or doesn't understand that fact (or both). Actually, calling it engineering is a bit of a misnomer itself, since the engineering work in building systems is as different as engineering a submarine vs. a bridge. Engineering involves creating repeatable systems of common features. That is, there are a set of rules for building sets of bridges that have little in common with the rules for building submarines.

The problem with defining software as engineering lies in the identification of the dividing line between engineering and manufacturing. Determining the amount of effort (time, code, and people) for software is much less defined than traditional "peoplespace" engineering because of the fluidity of the tools that are used. Imagine writing a proposal to build the first transcontinental railroad. 50 miles per year, thousands of workers. Now image the proposal today: much more production, much less workers. Computers allow the creation of these monstrously efficient production mechanisms. Hence the statement "hardware is cheap, people are expensive."

Looking at schedules, we see there are two types of release schedules: time-constrained (release on a certain date, like Ubuntu), or effort-constrained (all these features are in release X, like, say, Apache Httpd). Time-constrained releases on a set date, with a variable number of features. Effort-constrained delivers when the set features are done. Neither schedule mechanism has any concern with people or time. So either you release on a scheduled date, with whatever is done at that time, or you release when everything is done, regardless of date.

It would be silly to create a release schedule that released every 10,000 lines of code, but that's what our snakeoil salesman are proposing. Here's how it works: a budget is calculated based on the estimated lines of code, times the number of man/hours that would take to create, based on historical estimates of productivity. So, like Earl Sheib, you're saying "I can build that system with 100,000 lines of code!" Calculating budget is now a function of hourly billing rate times productivity and estimated lines of code.

Here's an example: customer says "build me a new SuperWidget." Contractor looks thoughtful and says "We think SuperWidget should take 50,000 lines of code, since it's kind of like TurboSquirrel we built before. Our normal efficiency is 1 LOC per hour (seriously), and billing rate is $200 per hour. So budget is $10 million dollars. There's 2,000 man/hours per year, so we need 25 people and you can have it in one year." There's your budget, schedule, and estimate, all rolled into one. If one changes, the others have to change as well. SuperWidget is a large project, obviously.

That's seriously how it works. Oh sure, there's supposed to be more analysis to determine how similar the project will be to previous projects, but the number is still, at best, a guess. It's not like building a bridge: you can't estimate (very well) the amount of steel and concrete, simply because software is not bound to physical properties.

So how do you get this model to work? The "fudge factor" is in the productivity. You set your productivity so low that you know, absolutely, you won't miss the estimate. Why do you only produce 1 LOC per hour? StarOffice, considered one of the largest open source projects, is estimated around 9 million LOC. That's 4,500 man/years, using our previous calculation of 1 LOC per day. 4.5 years with 1,000 developers, or 45 years with 100 developers. Obviously, something else is going on. Estimates show that productivity can be around 10 times our estimate. But how do you get there? That's the topic for next time.

No comments: